# ICLR Data Download Tool

This notebook is used to download papers and review data from the ICLR conference.

Features:
- Download ICLR paper PDFs for specified years
- Retrieve official reviews for papers
- Save data in CSV format


## 1. Install Dependencies


In [1]:
%pip install openreview-py requests


Collecting openreview-py
  Downloading openreview_py-1.53.3-py3-none-any.whl.metadata (4.1 kB)
Collecting pycryptodome (from openreview-py)
  Downloading pycryptodome-3.23.0-cp37-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (3.4 kB)
Collecting Deprecated (from openreview-py)
  Downloading deprecated-1.3.1-py2.py3-none-any.whl.metadata (5.9 kB)
Collecting pylatexenc (from openreview-py)
  Downloading pylatexenc-2.10.tar.gz (162 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m162.6/162.6 kB[0m [31m3.8 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
Collecting tld>=0.12 (from openreview-py)
  Downloading tld-0.13.1-py2.py3-none-any.whl.metadata (10 kB)
Collecting litellm==1.76.1 (from openreview-py)
  Downloading litellm-1.76.1-py3-none-any.whl.metadata (41 kB)
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m41.3/41.3 kB[0m [31m2.4 MB/s[0m eta [36m0:00:00[0m
Collecting fastuuid>=0.12.0 (from

## 2. Import Libraries


In [2]:
import openreview
import requests
import os
import csv
import time
from pathlib import Path


## 3. Mount Google Drive

**Important**: Run this cell to connect to Google Drive. Data will be saved to your Google Drive.


In [3]:
from google.colab import drive
drive.mount('/content/drive')

# Create project working directory
import os
WORK_DIR = '/content/drive/MyDrive/Notebooks/AI_review'
os.makedirs(WORK_DIR, exist_ok=True)
print(f"✓ Google Drive mounted")
print(f"✓ Working directory: {WORK_DIR}")
print(f"All data will be saved to this directory")


Mounted at /content/drive
✓ Google Drive mounted
✓ Working directory: /content/drive/MyDrive/Notebooks/AI_review
All data will be saved to this directory


## 4. Define Download Function


In [4]:
def download_iclr_data(year, max_papers, work_dir='/content/drive/MyDrive/Notebooks/AI_review'):
    """
    Download ICLR papers and review data for a given year

    Args:
        year: Year of the conference
        max_papers: Maximum number of papers to download. None means download all.
        work_dir: Working directory (default: Google Drive)
    """
    OUTPUT_DIR = os.path.join(work_dir, f'iclr_{year}_data')
    PDF_DIR = os.path.join(OUTPUT_DIR, 'pdfs')
    CSV_FILE = os.path.join(OUTPUT_DIR, f'iclr_{year}_reviews.csv')

    # Create output directory
    os.makedirs(PDF_DIR, exist_ok=True)

    # Use API v2 for 2024 and later; otherwise, use API v1
    use_api_v2 = (year >= 2024)

    # Initialize OpenReview client
    if use_api_v2:
        client = openreview.api.OpenReviewClient(baseurl='https://api2.openreview.net')
    else:
        client = openreview.Client(baseurl='https://api.openreview.net')

    # Retrieve ICLR papers
    venue_id = f'ICLR.cc/{year}/Conference'
    print(f"\n{'='*80}")
    print(f"Fetching data for {venue_id}... (API {'v2' if use_api_v2 else 'v1'})")
    print(f"{'='*80}")

    try:
        if use_api_v2:
            # API v2 uses a different invitation
            submissions = client.get_all_notes(
                invitation=f'{venue_id}/-/Submission'
            )
        else:
            submissions = client.get_all_notes(
                invitation=f'{venue_id}/-/Blind_Submission',
                details='replyCount,writable'
            )
    except Exception as e:
        print(f"✗ Failed to fetch submissions: {e}")
        return

    total_papers = len(submissions)
    papers_to_process = submissions[:max_papers] if max_papers else submissions

    print(f"Found {total_papers} papers in total")
    print(f"Processing {len(papers_to_process)} papers")
    print(f"Output directory: {OUTPUT_DIR}")

    # Prepare CSV data
    csv_data = []

    # Process each paper
    for idx, paper in enumerate(papers_to_process, 1):
        paper_id = paper.id

        # Note: API v2 'content' has nested 'value'; API v1 structure is flat
        if use_api_v2:
            title = paper.content.get('title', {}).get('value', 'No Title')
            authors = paper.content.get('authors', {}).get('value', [])
            abstract = paper.content.get('abstract', {}).get('value', '')
            keywords = paper.content.get('keywords', {}).get('value', [])
            # API v2 may not have direct 'pdf' field
            pdf_path = ''
        else:
            title = paper.content.get('title', 'No Title')
            authors = paper.content.get('authors', [])
            abstract = paper.content.get('abstract', '')
            keywords = paper.content.get('keywords', [])
            pdf_path = paper.content.get('pdf', '')

        print(f"\n[{idx}/{len(papers_to_process)}] Processing paper: {title[:60]}...")

        # 1. Download PDF
        pdf_filename = ''
        pdf_url = None

        if use_api_v2:
            # API v2: build PDF URL from paper ID
            pdf_url = f'https://openreview.net/pdf?id={paper_id}'
            pdf_filename = os.path.join(PDF_DIR, f'{paper_id}.pdf')
        elif pdf_path:
            # API v1: use 'pdf' field from content
            pdf_url = f'https://openreview.net{pdf_path}'
            pdf_filename = os.path.join(PDF_DIR, f'{paper_id}.pdf')

        if pdf_url:
            try:
                if not os.path.exists(pdf_filename):
                    print(f"  - Downloading PDF...")
                    response = requests.get(pdf_url, timeout=30)
                    if response.status_code == 200:
                        with open(pdf_filename, 'wb') as f:
                            f.write(response.content)
                        print(f"    ✓ PDF saved")
                    else:
                        print(f"    ✗ PDF download failed: HTTP {response.status_code}")
                        pdf_filename = ''
                else:
                    print(f"  - PDF already exists, skipping download")
            except Exception as e:
                print(f"    ✗ PDF download error: {e}")
                pdf_filename = ''

        # 2. Get official reviews
        print(f"  - Fetching reviews...")
        try:
            if use_api_v2:
                # API v2: get reviews via invitation
                reviews = client.get_all_notes(
                    forum=paper_id,
                    invitation=f'{venue_id}/Submission{paper.number}/-/Official_Review'
                )
            else:
                # API v1: fetch all notes and filter for Official_Review
                all_notes = client.get_all_notes(forum=paper_id)
                reviews = [note for note in all_notes if 'Official_Review' in note.invitation]

            print(f"    Found {len(reviews)} official reviews")

            # 3. Save each review to CSV
            if reviews:
                for review_idx, review in enumerate(reviews, 1):
                    # Field names differ by year; try several possibilities
                    if use_api_v2:
                        # API v2: content.field.value
                        review_text = (
                            review.content.get('review', {}).get('value') or
                            review.content.get('main_review', {}).get('value') or
                            review.content.get('summary', {}).get('value') or
                            review.content.get('strengths_and_weaknesses', {}).get('value') or
                            ''
                        )
                        rating = (
                            review.content.get('rating', {}).get('value') or
                            review.content.get('recommendation', {}).get('value') or
                            ''
                        )
                    else:
                        # API v1: content.field is value directly
                        review_text = (
                            review.content.get('review') or  # 2020, 2021
                            review.content.get('main_review') or  # 2022
                            review.content.get('strength_and_weaknesses') or  # 2023
                            review.content.get('summary_of_the_review') or  # alternative
                            ''
                        )
                        rating = (
                            review.content.get('rating') or  # 2020, 2021
                            review.content.get('recommendation') or  # 2022, 2023
                            ''
                        )

                    review_data = {
                        'paper_title': title,
                        'review_text': review_text,
                        'pdf_path': pdf_filename,
                    }
                    csv_data.append(review_data)
                    print(f"      Review {review_idx}: rating={rating}")
            else:
                # If no reviews found, at least save paper info
                csv_data.append({
                    'paper_title': title,
                    'review_text': '',
                    'pdf_path': pdf_filename,
                })
        except Exception as e:
            print(f"    ✗ Error fetching reviews: {e}")

        # Avoid sending requests too quickly
        time.sleep(0.5)

    # 4. Save to CSV file
    print(f"\n{'='*80}")
    print(f"Saving data to CSV: {CSV_FILE}")

    if csv_data:
        fieldnames = ['paper_title', 'review_text', 'pdf_path']

        with open(CSV_FILE, 'w', newline='', encoding='utf-8') as f:
            writer = csv.DictWriter(f, fieldnames=fieldnames)
            writer.writeheader()
            writer.writerows(csv_data)

        print(f"✓ Successfully saved {len(csv_data)} records")
        print(f"✓ PDFs are saved in: {PDF_DIR}")
        print(f"✓ CSV file saved at: {CSV_FILE}")
    else:
        print("No data to save")

    print("="*80)


## 5. Configure Parameters and Run

Modify the parameters below to set the years and number of papers to download


In [5]:
# Configure parameters
YEARS = [2021, 2022, 2023]  # List of years to download, can set multiple, e.g., [2020, 2021, 2022]
NUM_PAPERS = 100  # Number of papers to download per year, set to None to download all papers
# NUM_PAPERS = None

print("="*80)
print(f"Starting download of ICLR papers and reviews for years: {YEARS}")
print(f"Save location: {WORK_DIR}")
if NUM_PAPERS:
    print(f"Downloading {NUM_PAPERS} papers per year")
else:
    print(f"Downloading all papers per year")
print("="*80)

for year in YEARS:
    try:
        download_iclr_data(year, NUM_PAPERS, WORK_DIR)
    except Exception as e:
        print(f"\n✗ Download failed for year {year}: {e}")
        continue

print("\n" + "="*80)
print("All requested years downloaded successfully!")
print(f"Files saved to Google Drive: {WORK_DIR}")
print("="*80)


Starting download of ICLR papers and reviews for years: [2021, 2022, 2023]
Save location: /content/drive/MyDrive/Notebooks/AI_review
Downloading 100 papers per year

Fetching data for ICLR.cc/2021/Conference... (API v1)


Getting V1 Notes: 100%|█████████▉| 2591/2594 [00:02<00:00, 1265.79it/s]


Found 2594 papers in total
Processing 100 papers
Output directory: /content/drive/MyDrive/Notebooks/AI_review/iclr_2021_data

[1/100] Processing paper: Contextual Transformation Networks for Online Continual Lear...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
    Found 4 official reviews
      Review 1: rating=7: Good paper, accept
      Review 2: rating=6: Marginally above acceptance threshold
      Review 3: rating=7: Good paper, accept
      Review 4: rating=6: Marginally above acceptance threshold

[2/100] Processing paper: Retrieval-Augmented Generation for Code Summarization via Hy...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
    Found 3 official reviews
      Review 1: rating=7: Good paper, accept
      Review 2: rating=7: Good paper, accept
      Review 3: rating=7: Good paper, accept

[3/100] Processing paper: Breaking the Expressive Bottlenecks of Graph Neural Networks...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
  

Getting V1 Notes: 100%|█████████▉| 2614/2617 [00:02<00:00, 1112.16it/s]


Found 2617 papers in total
Processing 100 papers
Output directory: /content/drive/MyDrive/Notebooks/AI_review/iclr_2022_data

[1/100] Processing paper: A Theory of Tournament Representations...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
    Found 4 official reviews
      Review 1: rating=8: accept, good paper
      Review 2: rating=5: marginally below the acceptance threshold
      Review 3: rating=5: marginally below the acceptance threshold
      Review 4: rating=6: marginally above the acceptance threshold

[2/100] Processing paper: Revisiting Design Choices in Offline Model Based Reinforceme...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
    Found 5 official reviews
      Review 1: rating=6: marginally above the acceptance threshold
      Review 2: rating=6: marginally above the acceptance threshold
      Review 3: rating=8: accept, good paper
      Review 4: rating=8: accept, good paper
      Review 5: rating=6: marginally above the acceptanc

Getting V1 Notes: 100%|█████████▉| 3788/3792 [00:03<00:00, 1124.30it/s]


Found 3792 papers in total
Processing 100 papers
Output directory: /content/drive/MyDrive/Notebooks/AI_review/iclr_2023_data

[1/100] Processing paper: Guiding Safe Exploration with Weakest Preconditions...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
    Found 4 official reviews
      Review 1: rating=6: marginally above the acceptance threshold
      Review 2: rating=6: marginally above the acceptance threshold
      Review 3: rating=6: marginally above the acceptance threshold
      Review 4: rating=8: accept, good paper

[2/100] Processing paper: An Adaptive Entropy-Regularization Framework for Multi-Agent...
  - Downloading PDF...
    ✓ PDF saved
  - Fetching reviews...
    Found 3 official reviews
      Review 1: rating=3: reject, not good enough
      Review 2: rating=8: accept, good paper
      Review 3: rating=6: marginally above the acceptance threshold

[3/100] Processing paper: AutoSparse: Towards Automated Sparse Training...
  - Downloading PDF...
    ✓ P