Skip to content
This repository has been archived by the owner on Mar 23, 2021. It is now read-only.

Latest commit

 

History

History
37 lines (21 loc) · 1.05 KB

README.md

File metadata and controls

37 lines (21 loc) · 1.05 KB

Retrieve and extract citations from Crossref data.

Pre-requsites

  • Python 2 or 3 (Python 3 preferred)

Setup

pip install -r requirements.txt

Data Retrieval

Data is retrieved via the Crossref's Works API (doc).

Starting with the cursor *. The data/crossref-works.zip.meta file contains the next cursor to use, should the download be interrupted for any reasons (it is likely it will). The download currently takes about 90 hours at the minimum and can't be run in parallel due to the way the cursor works.

To start or resume the download run:

./download_crossref_works.sh

The file data/crossref-works.zip as well as data/crossref-works.zip.meta will be created and updated. crossref-works.zip will contain files with the raw response.

Extract Citations

Run:

./extract_citations_from_crossref_works.sh

That will create data/crossref-works-citations.csv.gz a compressed csv file with the following columns:

  • citing_doi
  • cited_doi