Skip to content

fedebotu/ICLR2023-OpenReviewData

Repository files navigation

Crawl and Visualize ICLR 2023 OpenReview Data

WebsiteDrive

→ Open full submission list here → Download datasets here

Description

This repository contains code to crawl and visualize the data from the ICLR 2023 OpenReview. Crawling is done via parallel requests directly to OpenReview's API, which is way faster than selenium - in the order of 10-100x. It also saves datasets that can be used for further analysis, including all reviews and rebuttals and PDF files metadata and text.

Usage

Run:

pip install -r requirements.txt

And run the notebooks under the notebooks/ folder:

  1. 0a. Parse data.ipynb: crawl the data from the OpenReview website: all paper metadata (such as title, abstract, authors, etc.), reviews, and rebuttals.
  2. 0b. Crawl PDF.ipynb: parse the PDF files of the papers to extract the main text.
  3. 1. Plots.ipynb: visualize the data using word clouds, bar charts, and other plots.
  4. 2. Save Website.ipynb: save the website as a static HTML file.

Statistics

  • Total submitted papers: 4874 papers
  • Average rating: 4.94

Rating Distribution

Top 50 Keywords

Keywords vs Ratings

Wordcloud

Review Lengths

Review Lengths by Rating

Review Lengths by Confidence

Paper Length (pages) vs Rating

Feedback

Feel free to open an issue or a pull request if you have any feedback or suggestions!

Acknowledgements

This repository is inspired by the following: