Skip to content

Research Software for "Open Science for Social Science and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta " a research carried out in the context of Open Science course a.a. 2022/2023

Notifications You must be signed in to change notification settings

open-sci/2022-2023-playarists-code

Repository files navigation

DOI

2022-2023-Playarist Software

The repository for the team Playarists of the Open Science course a.a. 2022/2023

Source data

How to run the software

To reuse our program, please install the requirements.txt:

pip install -r requirements.txt

Users can run the program by cloning the repository and accessing it from shell. The command to launch the program is:

python run_workflow.py --batch_size number_batch_size --max_workers number__workers --oc_meta path_to_OC_Meta_folder --erih_plus path_to_erih_plus.csv --doaj path_to_doaj.csv

All the parameters are already set to default values, however, users are strongly suggested to modify them depending on their system specifications (e.g.: --batch_size 100 --max_workers 4) and the names/locations of the downloaded datasets (--oc_meta, --erih_plus, --doaj). In case users do not need to specify any parameter, the command is as follows:

python run_workflow.py

Naming convention of Datasets

For the sake of clarity, we have used some predefined labels to identify the main entities of our research: they will be introduced as they are split in each of the resulting CSV files.

SSH_Publications_in_OC_Meta_and_Open_Access_status.csv:

OC_omid issn EP_id Publications_in_venue Open_Access

Each publication venue is associated with its identifier [ISSN]. Items included in OpenCitations Meta and/or Erih-PLUS are also identified by their internal IDs [OC_omid] and/or [EP_id]. [Publications_in_venue] refers to the numbers of publications in each venue, while [Open_access] expresses whether the venue's Open Access status is True or Unknown.

SSH_Publications_by_Discipline.csv:

Discipline Journal_count Publication_count

Each discipline is associated with a label [Discipline] and the number of journals/publications referring to it: [Journal_count] and [Publication_count].

SSH_Publications_by_Discipline.csv:

Country Journal_count Publication_count

Each country of publication is associated with its name [Country] and the number of journals/publications published ther: [Journal_count] and [Publication_count].

Besides, the resulting csv will also be produced by running the software:

  • meta_coverage_us: The dataset SSH_Publications_in_OC_Meta_and_Open_Access_status.csv filtered to contain only US Journals

  • meta_coverage_eu: The dataset SSH_Publications_in_OC_Meta_and_Open_Access_status.csv filtered to contain only US Journals

  • us_data: For US Journals covered in OCMeta, contains

EP_id Publications_in_venue Original Title Country of Publication ERIH PLUS Disciplines disc_count

the correlation between Publications_in_venue and the number of disciplines is visualized in the scatterplot "scatter_correlation_ndisc_npub_US"

  • eu_data: For UK Journals covered in OCMeta, contains
EP_id Publications_in_venue Original Title Country of Publication ERIH PLUS Disciplines disc_count

the correlation between Publications_in_venue and the number of disciplines is visualized in the scatterplot "scatter_correlation_ndisc_npub_UK"

  • us_disciplines_count: contains the count of US Journals and theit Publications for each Discipline
Discipline Journal_count Publication_count
  • eu_disciplines_count: contains the count of UK Journals and theit Publications for each Discipline
Discipline Journal_count Publication_count

Extra

About

Research Software for "Open Science for Social Science and Humanities: Open Access availability and distribution across disciplines and Countries in OpenCitations Meta " a research carried out in the context of Open Science course a.a. 2022/2023

Resources

Stars

Watchers

Forks

Packages

No packages published

Contributors 4

  •  
  •  
  •  
  •  

Languages