Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't work for ICLR 2024? #3

Open
NoviScl opened this issue May 9, 2024 · 1 comment
Open

Doesn't work for ICLR 2024? #3

NoviScl opened this issue May 9, 2024 · 1 comment

Comments

@NoviScl
Copy link

NoviScl commented May 9, 2024

Great repo!

I realize it doesn't seem to work for ICLR 2024 when I tried:

years = [
    '2024'
]

conferences = [
    'ICLR'
]
keywords = [
    'language model'
]

def modify_paper(paper):
  paper.forum = f"https://openreview.net/forum?id={paper.forum}"
  paper.content['pdf'] = f"https://openreview.net{paper.content['pdf']}"
  return paper

# what fields to extract
extractor = Extractor(fields=['forum'], subfields={'content':['title', 'keywords', 'abstract', 'pdf', 'match']})

# if you want to select papers manually among the scraped papers
# selector = Selector()

# select all scraped papers
selector = None

scraper = Scraper(conferences=conferences, years=years, keywords=keywords, extractor=extractor, fpath='examples.csv', fns=[modify_paper], selector=selector)

# adding filters to filter on
scraper.add_filter(title_filter)
scraper.add_filter(keywords_filter)
scraper.add_filter(abstract_filter)

scraper()

But it works fine when I change the year from 2024 to 2023. Any idea why?

@simra-shahid
Copy link

The openreview api has two versions.
This wrapper code is built on the older version which does not contain newer papers. If you want ICLR 2024 papers you can try using this code:

from venue import *

papers_with_new_api = defaultdict()
venues = get_venues(client, ["ICLR"], ["2024"])
for venue_id in venues: 
    try:
        venue_group = client.get_group(venue_id)
        submission_name = venue_group.content['submission_name']['value']
        submissions = client.get_all_notes(invitation=f'{venue_id}/-/{submission_name}', details='replies')
        review_name = venue_group.content['review_name']['value']
        reviews=[openreview.api.Note.from_json(reply) for s in submissions for reply in s.details['replies'] if f'{venue_id}/{submission_name}{s.number}/-/{review_name}' in reply['invitations']]
        if len(reviews)!=0:
            print(f"{venue_id}: Submissions: {len(submissions)}, Reviews - {len(reviews)}") 
            papers_with_new_api[venue_id] = {
                'submissions': submissions, 
                'reviews': reviews
            }
    except: 
        pass   

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants