Releases: jannisborn/paperscraper
v0.2.13
v0.2.12
What's Changed
- chore(deps): bump requests from 2.31.0 to 2.32.0 by @dependabot in #42
- add retry logic in XRXivApi to tackle request timed out by @memray in #43
New Contributors
Full Changelog: v0.2.11...v0.2.12
v0.2.11
What's Changed
- fix: lower default max_results in PubMed by @jannisborn in #41
Full Changelog: v0.2.10...v0.2.11
Impact factor restoration
0.2.9 was broken because deps of paperscraper.impact
were not shipped via PyPI (installation from source was OK).
Fixed this and expanded tests to discover such cases in future
What's Changed
- Hotfix by @jannisborn in #39
Full Changelog: v0.2.9...v0.2.10
Impact factor integration
Fuzzy search of impact factor from journals
What's Changed
- Impact factor by @jannisborn in #37
Full Changelog: v0.2.8...v0.2.9
v0.2.8
What's Changed
- Graceful handling of connection errors by @jannisborn in #35
- chore(deps): bump requests from 2.24.0 to 2.31.0 by @dependabot in #30
New Contributors
- @dependabot made their first contribution in #30
Full Changelog: v0.2.7...v0.2.8
v0.2.7
What's Changed
- fix: OS agnostic urljoining by @jannisborn in #29
A bugfix for Windows users that prevented from querying the chemrxiv API
Full Changelog: v0.2.6...v0.2.7
0.2.6
What's Changed
- Save DOIs from arxiv papers by @jannisborn in #27
--> This also allows to scrape PDFs from arxiv metadata
Full Changelog: v0.2.5...v0.2.6
v0.2.5
What's Changed
- Extract records from biorxiv and medrxiv based on start date and end date by @achouhan93 in #24
- Extract records from chemrxiv based on start date and end date by @achouhan93 and @jannisborn in #25
EXAMPLE
Since v0.2.5 paperscraper
also allows to scrape {med/bio/chem}rxiv for specific dates!
medrxiv(begin_date="2023-04-01", end_date="2023-04-08")
But watch out. The resulting .jsonl
file will be labelled according to the current date and all your subsequent searches will be based on this file only. If you use this option you might want to keep an eye on the source files (paperscraper/server_dumps/*jsonl
) to ensure they contain the paper metadata for all papers you're interested in.
New Contributors
- @achouhan93 made their first contribution in #24
Full Changelog: v0.2.4...v0.2.5
v0.2.4
v0.2.4 - release summary
- Support for scraping PDFs
- Harmonize return types of scraper classes to
pd.DataFrame
rather thanList[Dict]
.
1. Scraping PDFs
v0.2.4 now supports downloading PDFs. The core function is paperscraper.pdf.save_pdf
which receives a dictionary with the key doi
and downloads the PDF for the desired DOI. There's also a wrapper function paperscraper.pdf.save_pdf_from_dump
that can be called with a filepath of a .jsonl
file that was previously obtained in the metadata search. This wrapper downloads all PDFs from the metadata search. Examples are given in the README.
Thanks to @daenuprobst for suggestions!
2.Return types
With this version, it is ensured that all scraper classes return the results in a pandas dataframe (one paper per row) as opposed to a list of dictionaries (one paper per dict).
Full Changelog: v0.2.3...v0.2.4