Publicly available data for Paperscape
Switch branches/tags
Clone or download
Fetching latest commit…
Cannot retrieve the latest commit at this time.
Permalink
Failed to load latest commit information.
LICENSE
README.md
pscp-1991.csv
pscp-1992.csv
pscp-1993.csv
pscp-1994.csv
pscp-1995.csv
pscp-1996.csv
pscp-1997.csv
pscp-1998.csv
pscp-1999.csv
pscp-2000.csv
pscp-2001.csv
pscp-2002.csv
pscp-2003.csv
pscp-2004.csv
pscp-2005.csv
pscp-2006.csv
pscp-2007.csv
pscp-2008.csv
pscp-2009.csv
pscp-2010.csv
pscp-2011.csv
pscp-2012.csv
pscp-2013.csv
pscp-2014.csv
pscp-2015.csv
pscp-2016.csv
pscp-2017.csv

README.md

Paperscape data

Publicly available graph data for Paperscape (http://paperscape.org) in CSV format. Includes authors, title and references.

If you use this data, please give us credit by citing the following BibTeX entry:

@misc{paperscape,
    author = {Damien P. George and Robert Knegjens},
    title = {Paperscape},
    howpublished = {\url{http://paperscape.org}},
    doi = {10.5281/zenodo.10052},
}

Extraction success

These numbers correspond to the data set of 1991 to the end of 2017 (v2017.12 - released March 18th 2018). There were 1,219,522 entries with a total of 38,464,630 references, of which we were able to match 14,726,797 with an arXiv id (38%). Note that not every reference has a corresponding arXiv id, nor have we successfully identified every reference that does. Our algorithms are currently most successful for high energy physics categories, which can be seen in the following table summarising the extraction success per category:

Category Papers ArXiv refs Total refs Avg. arXiv Avg. total Ratio
hep-lat 13784 299121 364349 21.70 26.43 82.10%
hep-ph 96173 3332496 4114100 34.65 42.78 81.00%
hep-th 76801 2270045 2820872 29.56 36.73 80.47%
hep-ex 16553 339068 496602 20.48 30.00 68.28%
gr-qc 38488 867542 1338659 22.54 34.78 64.81%
nucl-th 24253 567751 901158 23.41 37.16 63.00%
nucl-ex 8060 149562 264224 18.56 32.78 56.60%
quant-ph 58103 681014 1779232 11.72 30.62 38.28%
astro-ph 192382 3469708 9109172 18.04 47.35 38.09%
cond-mat 196357 1726730 6218230 8.8 31.67 27.77%
q-alg 1177 5033 18699 4.3 15.89 26.92%
math-ph 21735 152243 584801 7.0 26.91 26.03%
solv-int 844 3756 16294 4.5 19.31 23.05%
atom-ph 68 221 1484 3.2 21.82 14.89%
physics 77710 276153 1943616 3.6 25.01 14.21%
nlin 13044 51562 373383 4.0 28.62 13.81%
acc-phys 47 68 516 1.4 10.98 13.18%
chao-dyn 1770 4282 39003 2.4 22.04 10.98%
math 242462 413518 5081904 1.7 20.96 8.14%
adap-org 306 367 4703 1.2 15.37 7.80%
funct-an 320 325 4528 1.0 14.15 7.18%
alg-geom 1209 899 12633 0.74 10.45 7.12%
dg-ga 562 510 7791 0.91 13.86 6.55%
supr-con 69 202 3140 2.9 45.51 6.43%
patt-sol 452 632 9960 1.4 22.04 6.35%
comp-gas 140 129 2057 0.92 14.69 6.27%
q-fin 4897 7199 122354 1.5 24.99 5.88%
chem-ph 129 204 3503 1.6 27.16 5.82%
mtrl-th 165 217 4729 1.3 28.66 4.59%
plasm-ph 28 13 327 0.46 11.68 3.98%
cs 104475 79712 2092204 0.76 20.03 3.81%
stat 13333 13314 356259 1.0 26.72 3.74%
q-bio 12708 13122 364294 1.0 28.67 3.60%
bayes-an 11 2 91 0.18 8.27 2.20%
cmp-lg 894 75 9467 0.084 10.59 0.79%
ao-sci 13 2 292 0.15 22.46 0.68%

License

This Paperscape data is made available under the Open Database License: http://opendatacommons.org/licenses/odbl/1.0/. Any rights in individual contents of the database are licensed under the Database Contents License: http://opendatacommons.org/licenses/dbcl/1.0/ - See more at: http://opendatacommons.org/licenses/odbl/#sthash.INIaa7OJ.dpuf