This repository creates user-friendly (tidy) TSVs of data from Scopus and Journal Metrics and converts data to NLM journal IDs for PubMed integration. Data pulled from Scopus include journal subject areas and open access status. Data pulled from Journal Metrics include journal three measures (CiteScore
, SJR
, SNIP
) of journal prestige and a Scopus–ISSN mapping.
Execution is performed by running notebooks in the following order:
1.process-titles.ipynb
to process the raw Scopus title list2.process-metrics.ipynb
to process the raw Journal Metric download3.pubmed.ipynb
to convert Scopus IDs to NLM journal IDs
The data
directory contains the following tidy outputs:
titles.tsv
: IDs and names for titles in Scopustitle-attributes.tsv
: attributes for Scopus titles such as open access status, publisher, and active status (excludes conference proceedings)publishers.tsv
: number of journals per publisher as well as URL-friendly slugs. Redundant or misspelled publisher names can be manually fixed inpublisher-name-patches.tsv
.title-top-levels.tsv
: top-level subject categories for each Scopus titleasjc-codes.tsv
: ASJC (All Science Journal Classification) code definitionssubject-areas.tsv
: ASJC subject areas for each Scopus title
The data
directory contains the following tidy outputs:
issn.tsv
: a mapping between Scopus titles and ISSNspubmed-map.tsv
: a Scopus–NLM journal mapping
metrics.tsv.gz
: metrics for Scopus journalspubmed-metrics.tsv.gz
: metrics for PubMed journals
This repository is built from the following publicly-available inputs in download
:
Scopus_Source_List.xlsx
: Scopus title list (source)CiteScore_Metrics_2011-2016_Download_21Jun2017.xlsx
: Journal Metrics (source)pubmed-journals.tsv
: PubMed journal information (source)
See download.sh
, which downloads local copies of the inputs, for the source URLs.
This repository uses conda to manage its environment as specified in environment.yml
.
Install the environment with:
conda env create --file=environment.yml
Then use source activate scopus
and source deactivate
to activate or deactivate the environment. On windows, use activate scopus
and deactivate
instead.
All original work in this repository is dedicated to the public domain under CC0 1.0 Universal. Note that this repository incorporates publicly available datasets that were not explicitly released with a public license. The authors of this repository claim no ownership of this content.