# Report: Library coverage of SJR titles

#### Description
This notebook allows you to compare list(s) of journal titles from __[Scimago Journal Ranking reports (SJR)](https://www.scimagojr.com/)__ against your library's holdings of journals in Primo. It will generate a spreadsheet of the full-text availability for all titles in the report, automatically!

#### Dependencies & requirements
This notebook queries the Alma link resolver, so this will only work for libraries that use Primo.

To run this notebook, you will need:
* Python 3
* Jupyter Notebooks
* The open source packages that are loaded below

#### Notes & disclaimer
This code may not be perfect, so it is worth double checking the results. Errors will also be introduced based on the metadata quality:
* The ISSNs provided by the SJR report may not match up with the ISSNs in your MARC records
* The coverage availability statements are pulled from the link resolver, which are only as good as your electronic records in Alma

I am always welcome to collaboration -- if this work can be improved, please reach out!

#### Author
Created by Roger Reka and last updated 9 February 2023.

***

# Setup

### Base URL
The first thing you need to do to setup this notebook is to identify the base URL for your Alma link resolver. It should look something like this `https://ca01.alma.exlibrisgroup.com/view/uresolver/01UTON_UW/openurl?`.

Once you have it, go to the `config.py` file and enter it in the `base_URL` field.

You only have to do this once.

### Install the required packages
If you don't have these installed already, you will need to do this now. This notebook uses several open source Python packages that you will have to install into your environment.

* `pandas`
* `requests`
* `xml.etree.ElementTree`
* `re`
* `glob`

You only have to do this once (aside from updates)

***

## Analysis
This is the start of the actual report analysis. 

### Grab your SJR files
Go to __[Scimago Journal Ranking reports (SJR)](https://www.scimagojr.com/journalrank.php)__ webpage and download the ranking lists that you are interested in. 

Place these files in the `/data` folder.

### Run the code
From this section onwards, you can run all the cells below until the report is generated.

In [14]:
# Load the required Python packages. Note, you will have to install these if you have not yet done this before.

import pandas as pd
import requests
import xml.etree.ElementTree as ElementTree
import re
import glob
import datetime

In [2]:
# Load the functions from the associated Python file

import autoCollectionsFunctions as cf

In [3]:
# Find all the CSV files in the directory

files = glob.glob('data/*.csv')
files

['data\\scimagojr 2021  Subject Category - Inorganic Chemistry.csv',
 'data\\scimagojr 2021  Subject Category - Organic Chemistry.csv']

### Prepare the data
This section will prepare the data for querying by combining all the data together into one dataframe, and identifying one ISSN for use in querying the link resolver.

In [4]:
# Create an empty dataframe
df_journals = pd.DataFrame()

In [5]:
# For every csv file, grab the data from the named columns (only first X rows) and append them to the df_all dataframe. 
# Edit the 'nrows' value to select how many rows from each file should be included. Default is the first 50.

for file in files:
    df_temp = pd.read_csv(file, sep=';', usecols=['Rank', 'Title', 'Type', 'Issn', 'SJR'], nrows=50)
    # Also, add the name of the file to each row
    df_temp['Category'] = file
    df_journals = pd.concat([df_journals, df_temp], sort=False)

In [6]:
# For those rows with multiple ISSNS (indicated with a comma), delete the second ISSN

df_journals['q_issn'] = df_journals['Issn'].str.split(',').str[0]

In [7]:
# Select only the columns we need (remove the original ISSN column)

df_journals = df_journals[['Rank', 'Title', 'Type', 'q_issn', 'SJR', 'Category']]

In [8]:
df_journals

Unnamed: 0,Rank,Title,Type,q_issn,SJR,Category
0,1,Coordination Chemistry Reviews,journal,00108545,4442,data\scimagojr 2021 Subject Category - Inorga...
1,2,ACS Macro Letters,journal,21611653,1705,data\scimagojr 2021 Subject Category - Inorga...
2,3,Macromolecules,journal,00249297,1504,data\scimagojr 2021 Subject Category - Inorga...
3,4,Ultrasonics Sonochemistry,journal,13504177,1414,data\scimagojr 2021 Subject Category - Inorga...
4,5,Inorganic Chemistry Frontiers,journal,20521545,1316,data\scimagojr 2021 Subject Category - Inorga...
...,...,...,...,...,...,...
45,46,European Journal of Organic Chemistry,journal,10990690,0738,data\scimagojr 2021 Subject Category - Organi...
46,47,ChemMedChem,journal,18607187,0735,data\scimagojr 2021 Subject Category - Organi...
47,48,Journal of Flow Chemistry,journal,20630212,0735,data\scimagojr 2021 Subject Category - Organi...
48,49,Bioorganic Chemistry,journal,00452068,0728,data\scimagojr 2021 Subject Category - Organi...


### Query the link resolver
In this section, the notebook will now check every row of the dataframe above and query the ISSN against the Alma link resolver. The link resolver will return a response via structured XML, and the code will parse out the relevant coverage data.

In [9]:
# Search the Open URL link resolver to find the coverage for these journals

df_journals[['availability', 'coverage']] = df_journals.apply(cf.searchOpenURL ,axis=1)

Full-text available for 00108545
Full-text available for 21611653
Full-text available for 00249297
Full-text available for 13504177
Full-text available for 20521545
Full-text available for 18673880
Full-text available for 20532733
Full-text available for 14220067
Full-text available for 00201669
Full-text available for 15206041
Full-text available for 15653633
Full-text available for 02603594
Full-text available for 14779226
Full-text available for 15590720
Full-text available for 25901478
Full-text available for 02682575
Full-text available for 09498257
Full-text available for 0946672X
Full-text available for 0018019X
Full-text available for 09253467
Full-text available for 02682605
Full-text available for 1095726X
Full-text available for 01620134
Full-text available for 10990682
Full-text available for 0030770X
Full-text available for 23046740
Full-text not available for 03010074
Full-text available for 00222860
Full-text available for 13877003
Full-text available for 20734352
Full-t

In [10]:
# Update the availability statements based on the coverage dates (emabargo, not to the present)

df_journals[['availability']] = df_journals.apply(cf.coverageStatement_availParser,axis=1)

In [11]:
df_journals

Unnamed: 0,Rank,Title,Type,q_issn,SJR,Category,availability,coverage
0,1,Coordination Chemistry Reviews,journal,00108545,4442,data\scimagojr 2021 Subject Category - Inorga...,Full-text available to present,{'CRKN Elsevier Additional Journals': 'Availab...
1,2,ACS Macro Letters,journal,21611653,1705,data\scimagojr 2021 Subject Category - Inorga...,Full-text available to present,{'CRKN American Chemical Society Journals': 'A...
2,3,Macromolecules,journal,00249297,1504,data\scimagojr 2021 Subject Category - Inorga...,Full-text available to present,{'CRKN American Chemical Society Journals': 'A...
3,4,Ultrasonics Sonochemistry,journal,13504177,1414,data\scimagojr 2021 Subject Category - Inorga...,Full-text available to present,{'Elsevier SD Freedom Collection Journals [SCF...
4,5,Inorganic Chemistry Frontiers,journal,20521545,1316,data\scimagojr 2021 Subject Category - Inorga...,Full-text available to present,{'CRKN Royal Society of Chemistry Journals Gol...
...,...,...,...,...,...,...,...,...
45,46,European Journal of Organic Chemistry,journal,10990690,0738,data\scimagojr 2021 Subject Category - Organi...,Full-text available to present,{'CRKN Wiley Online Library': 'Available from ...
46,47,ChemMedChem,journal,18607187,0735,data\scimagojr 2021 Subject Category - Organi...,Full-text available to present,{'CRKN Wiley Online Library': 'Available from ...
47,48,Journal of Flow Chemistry,journal,20630212,0735,data\scimagojr 2021 Subject Category - Organi...,Full-text available to present,{'Canadian Research Knowledge Network Springer...
48,49,Bioorganic Chemistry,journal,00452068,0728,data\scimagojr 2021 Subject Category - Organi...,Full-text available to present,{'CRKN Elsevier Academic Press Journals': 'Ava...


In [12]:
# Summary

df_journals.availability.value_counts()

Full-text available to present           92
No full-text available                    6
Full-text available, but not complete     2
Name: availability, dtype: int64

In [25]:
# Export the results into a CSV file

df_journals.to_csv('results/SJR_rankings_report_{}.csv'.format(datetime.datetime.now().strftime("%Y_%m_%d_%H%M%S")))