# Download all manuscript revisions

## Description
Given an OpenReview invitation, download pdfs of all its manuscripts, including all their revisions.

## Notes
* OpenReview allows authors to submit new versions of their manuscripts at various times in the peer review process. As I understand it, some uploads simply overwrite the most recent version, while others are uploaded as official revisions.
* By requesting revisions (OpenReview calls them 'references'), we get all uploads officially labeled as revisions by the author at the time of uploading.
* You may have to do some kind of rate limiting thing in order to not get in trouble with the OpenReview API. I haven't tested this with `LIMIT > 10`.

In [1]:
from datetime import datetime
import openreview
import os
import tqdm

# Change these values according to your needs
INVITATION = 'ICLR.cc/2019/Conference/-/Blind_Submission'
OUTPUT_DIR = "./ICLR2019_pdfs"
LIMIT = 10 # Number of papers to download all revisions for

# A client is required for any OpenReview API actions
guest_client = openreview.Client(baseurl='https://api.openreview.net')

In [2]:
def get_pdf_filename(forum_dir, timestamp, is_reference):
    """Produce a filename for the pdf with a human readable timestamp.
    
        Args:
            forum_dir: Directory to write pdfs to
            timestamp: Unix timestamp from note.tcdate in OR Note object
            is_reference: follows is_reference value of OR API.
        
        Returns:
            pdf filename with timestamps in forum_dir
    """
    nice_timestamp = datetime.fromtimestamp(
        timestamp/1000).strftime('%Y-%m-%dT%H_%M_%S') # Something human-readable for the file name
    main_or_revision =  "revision" if is_reference else 'main'
    return f'{forum_dir}/{main_or_revision}_{nice_timestamp}'

def write_pdf_to_file(guest_client, forum_dir, note):
    """Get pdf of manuscript and write to an appropriately named file.
        Args:
            guest_client: OR API client
            forum_dir: Directory to write pdfs to
            note: OR API Note object
        Returns:
            None
    """
    is_reference = note.referent is not None
    pdf_binary = guest_client.get_pdf(note.id, is_reference=is_reference)
    with open(get_pdf_filename(forum_dir, note.tcdate, is_reference), 'wb') as file_handle:
        file_handle.write(pdf_binary)

In [None]:
pdfs_dir = OUTPUT_DIR + "/"
os.makedirs(pdfs_dir, exist_ok=True)

for i, forum_note in enumerate(openreview.tools.iterget_notes(
        guest_client, invitation=INVITATION)):
    forum_dir = pdfs_dir + forum_note.id
    os.makedirs(forum_dir, exist_ok=True)
    write_pdf_to_file(guest_client, forum_dir, forum_note)
    for revision in tqdm.tqdm(guest_client.get_references(
            referent=forum_note.id, original=True),
            desc="Getting revisions for {0}".format(forum_note.id)):
        try:
            write_pdf_to_file(guest_client, forum_dir, revision)
        except openreview.OpenReviewException as e:
            print(f'{revision.id}: {e.args[0]["message"]}')
            continue
    if i == LIMIT + 1:
        break

Getting revisions for rJl0r3R9KX:  12%|█▋           | 1/8 [00:00<00:01,  4.84it/s]

rtZtjY_fk9: The Pdf file was not found


Getting revisions for rJl0r3R9KX: 100%|█████████████| 8/8 [00:04<00:00,  1.64it/s]
Getting revisions for SylCrnCcFX:  10%|█▏          | 1/10 [00:00<00:01,  4.86it/s]

rZ4W1cOzJ5: The Pdf file was not found


Getting revisions for SylCrnCcFX: 100%|███████████| 10/10 [00:11<00:00,  1.18s/it]
Getting revisions for H1xAH2RqK7:  33%|████▎        | 1/3 [00:00<00:00,  4.60it/s]

BrBNIO_M15: The Pdf file was not found


Getting revisions for H1xAH2RqK7: 100%|█████████████| 3/3 [00:01<00:00,  2.08it/s]
Getting revisions for HJeABnCqKQ:  33%|████▎        | 1/3 [00:00<00:00,  4.91it/s]

rUqtZ_Ozkc: The Pdf file was not found


Getting revisions for HJeABnCqKQ: 100%|█████████████| 3/3 [00:01<00:00,  1.58it/s]
Getting revisions for SyVpB2RqFX:   9%|█           | 1/11 [00:00<00:02,  4.88it/s]

SVg38GBp3f: The Pdf file was not found


Getting revisions for SyVpB2RqFX:  18%|██▏         | 2/11 [00:00<00:01,  4.82it/s]

rMfttddzJ9: The Pdf file was not found


Getting revisions for SyVpB2RqFX: 100%|███████████| 11/11 [00:06<00:00,  1.82it/s]
Getting revisions for SJf6BhAqK7: 100%|█████████████| 3/3 [00:01<00:00,  1.96it/s]
Getting revisions for H1faSn0qY7:  20%|██▌          | 1/5 [00:00<00:00,  4.77it/s]

r89l_uuG15: The Pdf file was not found


Getting revisions for H1faSn0qY7: 100%|█████████████| 5/5 [00:02<00:00,  2.39it/s]
Getting revisions for HJgTHnActQ:  33%|████▎        | 1/3 [00:00<00:00,  4.92it/s]

BWEBK__f1c: The Pdf file was not found


Getting revisions for HJgTHnActQ: 100%|█████████████| 3/3 [00:02<00:00,  1.10it/s]
Getting revisions for HylTBhA5tQ: 100%|███████████| 13/13 [00:06<00:00,  2.15it/s]
Getting revisions for B1gTShAct7:   9%|█           | 1/11 [00:00<00:02,  4.90it/s]

H5zeuuGyc: The Pdf file was not found


Getting revisions for B1gTShAct7: 100%|███████████| 11/11 [00:08<00:00,  1.33it/s]
Getting revisions for HJehSnCcFX: 100%|█████████████| 3/3 [00:01<00:00,  1.61it/s]
Getting revisions for ryxnHhRqFm:  25%|███▎         | 1/4 [00:00<00:00,  4.89it/s]