# Scopus 2017 - CiteScore, SNIP and SJR

In [Scopus](https://www.scopus.com/sources),
we can download a single spreadsheet workbook
with all the data they have (titles and metrics)
regarding their free journal rankings and metrics,
provided you're signed in.
As of 2018-09-21, it's a 38MB XLSX file
with a spreadsheet of metrics for each year.

In [1]:
import openpyxl
import pandas as pd
import seaborn as sns
pd.options.display.max_colwidth = 200 # Default is 50
pd.options.display.max_rows = 200 # Default is 60
%matplotlib inline

## Opening the Excel File in Pandas

Pandas have a `read_excel` function that can read with `xlrd`
a spreadsheet in an old XLS file,
loading its data into a Pandas DataFrame.
However, we're not going to use it.

In order to open an OOXML containing spreadsheets from Microsoft Excel
(a.k.a. XLSX) in Python, we'll need another library.
[There's a web page](http://www.python-excel.org)
listing which packages were created to deal with MS Excel files,
stating we should use the
[openpyxl](https://openpyxl.readthedocs.io)
library to load the data we've got.

Which spreadsheets are in the Scopus spreadsheet workbook?

In [2]:
wb = openpyxl.load_workbook("CiteScore_Metrics_2011-2017_Download_25May2018.xlsx")
wb.sheetnames

['About CiteScore',
 '2017 All',
 'Sheet1',
 '2016 All',
 '2015 All',
 '2014 All',
 '2013 All',
 '2012 All',
 '2011 All',
 'ASJC Codes']

For now, we're mainly interested in the $2017$ worksheet.
Let's see it.

In [3]:
ws2017 = wb["2017 All"]

[There's a documentation](https://openpyxl.readthedocs.io/en/2.6/pandas.html)
on how to convert such a worksheet object
to a Pandas DataFrame instance (as well as the other way around).

In [4]:
data_gen = ws2017.values
info = next(data_gen)
header, *data = data_gen
scopus2017 = pd.DataFrame(data, columns=header).dropna(how="all")

In [5]:
print(info[0])
print(scopus2017.shape)
scopus2017.head().T

CiteScore metrics calculated using data from 30 April, 2018. SNIP and SJR calculated using data from 30 April, 2018
(50182, 21)


Unnamed: 0,0,1,2,3,4
Scopus SourceID,28773,28773,19434,19434,19434
Title,Ca-A Cancer Journal for Clinicians,Ca-A Cancer Journal for Clinicians,MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports / Centers for Disease Control,MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports / Centers for Disease Control,MMWR. Recommendations and reports : Morbidity and mortality weekly report. Recommendations and reports / Centers for Disease Control
CiteScore,130.47,130.47,63.12,63.12,63.12
Percentile,99,99,99,99,99
Citation Count,16961,16961,1010,1010,1010
Scholarly Output,130,130,16,16,16
Percent Cited,70,70,100,100,100
SNIP,88.164,88.164,32.534,32.534,32.534
SJR,61.786,61.786,34.638,34.638,34.638
RANK,1,1,1,1,1


The first five entries regards to just two journals,
this duplication makes it clear we'll need some cleaning
before we can use this data.