- Pandoc conversion from HTML (via neurips.cc) to markdown, manually trimmed ⇢ AcceptedPapersInitial.md
- Markdown converted to pandas DataFrame
accepted_papers
by parse_accepts.py - pandas DataFrame exported to TSV ⇢ accepted_paper_listings.tsv (writing to this file by default switched off by a parameter in main.py, to avoid accidental overwrites)
To simply view the listings in the less
pager:
column -t -s $'\t' neurips2020proc/data/accepted_paper_listings.tsv | less -S
Otherwise, drop into a shell with the dataset by running
python -im neurips2020proc
⇣
Parsed ⠶ List of 1900 papers
- This is the output from main.py which calls:
src.parse_accepts.parse_listings
which parses the markdown listings (which came from pandoc)- [disabled]
src.util.tsv_writer.write_df_to_tsv
to create accepted_paper_listings.tsv
From this shell, the PaperList
object (a list of Paper
objects) can be inspected or processed:
for p in accepted_paper_list: print(p)
⇣
Neverova et al. — "Continuous Surface Embeddings"
Krishnan et al. — "Improving model calibration with accuracy versus uncertainty optimization"
Li et al. — "Few-shot Image Generation via Self-Adaptation"
Simsekli et al. — "Hausdorff Dimension, Heavy Tails, and Generalization in Neural Networks"
De Bortoli et al. — "Quantitative Propagation of Chaos for SGD in Wide Neural Networks"
Mendler-Dünner et al. — "Stochastic Optimization for Performative Prediction"
...
This is useful to look over, but:
- there are no paper links
- the object is not a dataframe
To assist with processing, the PaperList
class provides a as_df
method to produce a
pandas.DataFrame
representation from the objects. This simply merges the dataframes
from the Paper
class (this should work nicely with filtering operations).
accepted_paper_list.as_df()
⇣
title authors affiliations
0 Continuous Surface Embeddings [Natalia Neverova, David Novotny, Marc Szafran... [[Facebook AI Research], [Facebook AI Research...
1 Improving model calibration with accuracy vers... [Ranganath Krishnan, Omesh Tickoo] [[Intel Labs], [Intel]]
2 Few-shot Image Generation via Self-Adaptation [Yijun Li, Richard Zhang, Jingwan (Cynthia) Lu... [[Adobe Research], [Adobe], [Adobe Research], ...
3 Hausdorff Dimension, Heavy Tails, and Generali... [Umut Simsekli, Ozan Sener, George Deligiannid... [[Institut Polytechnique de Paris, University ...
4 Quantitative Propagation of Chaos for SGD in W... [Valentin De Bortoli, Alain Durmus, Xavier Fon... [[ENS Paris-Saclay], [ENS Paris Saclay], [ENS ...
... ... ... ...
1895 Distribution-free binary classification: predi... [Chirag Gupta, Aleksandr Podkopaev, Aaditya Ra... [[Carnegie Mellon University], [Carnegie Mello...
1896 Lipschitz Bounds and Provably Robust Training ... [Vishaal Krishnan, Abed AlRahman Al Makdah, Fa... [[University of California, Riverside], [Unive...
1897 Agnostic Learning with Multiple Objectives [Corinna Cortes, Mehryar Mohri, Javier Gonzalv... [[Google Research], [Courant Inst. of Math. Sc...
1898 Model Class Reliance for Random Forests [Gavin Smith, Roberto Mansilla, James Goulding] [[University of Nottingham], [University of No...
1899 Mitigating Local Identifiability in Probabilis... [Shib Dasgupta, Michael Boratko, Dongxu Zhang,... [[University of Massachusetts Amherst], [UMass...
[1900 rows x 3 columns]
Lastly, to view all the papers in your system pager, run:
dfpager(accepted_paper_list.as_df())
This uses the pydoc.pager
function (an undocumented part of Python standard library), which requires you
to set your $PAGER
environment variable. My bashrc
has it set with:
# for Python with pydoc.pager in ~/.pythonrc :: listpager()
export PAGER='less -S'
- See here
for my notes on
pydoc.pager
usage with Python lists and pandasDataFrame
s.
- I tried to clean it up (see bug_fixes.py) but there may be some typos I didn't see.
- The main cleanup I did was to separate out distinct research institutions with a
/
where they were ambiguously separated by a comma (modified before parsing by a check againstaffils_bugfix_dict
in theAffiliations
class in authors.py) - There's one name in what looks like Chinese Unicode characters
- There's one entry with name and affiliation identical (this is an error in the listing,
not my parsing of it!)
- This is the paper on 'Deep Graph Pose' by "The International Brain Laboratory The International Brain Laboratory (The International Brain Laboratory)"
Usually this kind of preprocessing precedes analysis of submissions per-company/academic institution, for which you'd have to deduplicate more carefully.
It'd be useful to separate out these listings by category/topic, but for this you may need the abstracts.
The papers for many (all?) of these acceptances appear to be online, but I assume the links will be posted shortly, so I don't know if it's worth sourcing them or just waiting.