Skip to content

Releases: jannisborn/paperscraper

v0.2.13

23 Jun 09:49
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.12...v0.2.13

v0.2.12

28 May 07:00
d6b893d
Compare
Choose a tag to compare

What's Changed

  • chore(deps): bump requests from 2.31.0 to 2.32.0 by @dependabot in #42
  • add retry logic in XRXivApi to tackle request timed out by @memray in #43

New Contributors

Full Changelog: v0.2.11...v0.2.12

v0.2.11

22 Feb 07:57
8cca29c
Compare
Choose a tag to compare

What's Changed

Full Changelog: v0.2.10...v0.2.11

Impact factor restoration

11 Jan 22:47
95052bd
Compare
Choose a tag to compare

0.2.9 was broken because deps of paperscraper.impact were not shipped via PyPI (installation from source was OK).
Fixed this and expanded tests to discover such cases in future

What's Changed

Full Changelog: v0.2.9...v0.2.10

Impact factor integration

24 Dec 14:10
0ff9218
Compare
Choose a tag to compare

Fuzzy search of impact factor from journals

What's Changed

Full Changelog: v0.2.8...v0.2.9

v0.2.8

08 Dec 05:24
db4f0c1
Compare
Choose a tag to compare

What's Changed

New Contributors

Full Changelog: v0.2.7...v0.2.8

v0.2.7

10 May 22:09
Compare
Choose a tag to compare

What's Changed

A bugfix for Windows users that prevented from querying the chemrxiv API

Full Changelog: v0.2.6...v0.2.7

0.2.6

07 May 22:39
f1d1d85
Compare
Choose a tag to compare

What's Changed

  • Save DOIs from arxiv papers by @jannisborn in #27
    --> This also allows to scrape PDFs from arxiv metadata

Full Changelog: v0.2.5...v0.2.6

v0.2.5

19 Apr 16:30
04b10c9
Compare
Choose a tag to compare

What's Changed

  • Extract records from biorxiv and medrxiv based on start date and end date by @achouhan93 in #24
  • Extract records from chemrxiv based on start date and end date by @achouhan93 and @jannisborn in #25

EXAMPLE

Since v0.2.5 paperscraper also allows to scrape {med/bio/chem}rxiv for specific dates!

medrxiv(begin_date="2023-04-01", end_date="2023-04-08")

But watch out. The resulting .jsonl file will be labelled according to the current date and all your subsequent searches will be based on this file only. If you use this option you might want to keep an eye on the source files (paperscraper/server_dumps/*jsonl) to ensure they contain the paper metadata for all papers you're interested in.

New Contributors

Full Changelog: v0.2.4...v0.2.5

v0.2.4

03 Aug 15:11
Compare
Choose a tag to compare

v0.2.4 - release summary

  1. Support for scraping PDFs
  2. Harmonize return types of scraper classes to pd.DataFrame rather than List[Dict].

1. Scraping PDFs
v0.2.4 now supports downloading PDFs. The core function is paperscraper.pdf.save_pdf which receives a dictionary with the key doi and downloads the PDF for the desired DOI. There's also a wrapper function paperscraper.pdf.save_pdf_from_dump that can be called with a filepath of a .jsonl file that was previously obtained in the metadata search. This wrapper downloads all PDFs from the metadata search. Examples are given in the README.

Thanks to @daenuprobst for suggestions!

2.Return types
With this version, it is ensured that all scraper classes return the results in a pandas dataframe (one paper per row) as opposed to a list of dictionaries (one paper per dict).

Full Changelog: v0.2.3...v0.2.4