# Download all manuscript revisions

## Description
Given an OpenReview invitation, download pdfs of all its manuscripts, including all their revisions.

## Notes
* I don't know what the significance of the different versions are, in particular which is the one considered for peer review, which is the camera ready version, which version is the 'main' version in the forum -- I just got the main/reference distinction from the API.
* You may have to do some kind of rate limiting thing in order to not get in trouble with the OpenReview API. I haven't tested this with `LIMIT > 10`.
* Don't be alarmed if filenames appear like `main_2018-09-28T04/08/02.pdf` in Finder on your mac -- that's just a bug in Finder with displaying `:` in filenames.

In [6]:
from datetime import datetime
import openreview
import os
import tqdm

# Change these values according to your needs
INVITATION = 'ICLR.cc/2019/Conference/-/Blind_Submission'
OUTPUT_DIR = "./ICLR2019_pdfs"
LIMIT = 1 # Number of papers to download all revisions for

# A client is required for any OpenReview API actions
guest_client = openreview.Client(baseurl='https://api.openreview.net')

In [7]:
def get_pdf_filename(forum_dir, timestamp, is_reference):
    """Produce a filename for the pdf with a human readable timestamp.
    
        Args:
            forum_dir: Directory to write pdfs to
            timestamp: Unix timestamp from note.tcdate in OR Note object
            is_reference: follows is_reference value of OR API.
        
        Returns:
            pdf filename with timestamps in forum_dir
    """
    nice_timestamp = datetime.fromtimestamp(
        timestamp/1000).strftime('%Y-%m-%dT%H_%M_%S') # Something human-readable for the file name
    if is_reference:
        main_or_revision =  "revision"
    else:
        main_or_revision =  "main"
    return forum_dir + "/{0}_{1}.pdf".format(main_or_revision, nice_timestamp)

def write_pdf_to_file(guest_client, forum_dir, note):
    """Get pdf of manuscript and write to an appropriately named file.
        Args:
            guest_client: OR API client
            forum_dir: Directory to write pdfs to
            note: OR API Note object
        Returns:
            None
    """
    is_reference = not (note.id == note.forum)
    pdf_binary = guest_client.get_pdf(note.id, is_reference=is_reference)
    with open(get_pdf_filename(forum_dir, note.tcdate, is_reference), 'wb') as file_handle:
        file_handle.write(pdf_binary)

In [8]:
pdfs_dir = OUTPUT_DIR + "/"
os.makedirs(pdfs_dir, exist_ok=True)

for i, forum_note in enumerate(openreview.tools.iterget_notes(
        guest_client, invitation=INVITATION)):
    forum_dir = pdfs_dir + forum_note.id
    os.makedirs(forum_dir, exist_ok=True)
    write_pdf_to_file(guest_client, forum_dir, forum_note)
    for revision in tqdm.tqdm(guest_client.get_references(
            referent=forum_note.id, original=True),
            desc="Getting revisions for {0}".format(forum_note.id)):
        try:
            write_pdf_to_file(guest_client, forum_dir, revision)
        except openreview.OpenReviewException:
            continue
    if i == LIMIT + 1:
        break

Getting revisions for rJl0r3R9KX: 100%|█████████████| 8/8 [00:07<00:00,  1.02it/s]
Getting revisions for SylCrnCcFX: 100%|███████████| 10/10 [00:24<00:00,  2.50s/it]
