
## Which journals have published articles on RCR and metadata?
We are interested in ranking journals by their previous interest in reproducible research and metadata. Because PubMed is paginated and the Python packages `eutils` and `entrezpy` are a bit lacking, I chose to use Lens.org searches, and exported all columns except `abstract` using the following search terms:

` reproducible research OR ( reproducibility OR reproducible computational research ) `

` metadata `

Export appears limited to 50k records.

In [1]:
import pandas as pd
pd.set_option('display.max_rows', None)

In [4]:
metadata=pd.read_csv('../data/lens/metadata.lens.csv.gz', compression='gzip')
rcr=pd.read_csv('../data/lens/rcr.lens.csv.gz', compression='gzip')

In [5]:
#pandas is kind of awkward compared to dplyr, let's count unique lens ids
#https://nbviewer.jupyter.org/gist/TomAugspurger/6e052140eaa5fdb6e8c0
meta_jrnls = metadata.groupby(['Source Title']).agg({"Lens ID": "count"}).rename(columns={"Lens ID": "metadata_cnt"})
rcr_jrnls = rcr.groupby(['Source Title']).agg({"Lens ID": "count"}).rename(columns={"Lens ID": "rcr_cnt"})


## Top journals for metadata in Lens
Let's use a scaled rank to ensure metadata and rcr are roughly comparable

In [6]:
meta_jrnls['meta_rank']=meta_jrnls['metadata_cnt'].rank()
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler()
meta_jrnls['meta_scaled'] = scaler.fit_transform(meta_jrnls['meta_rank'].values.reshape(-1,1))
meta_jrnls.sort_values("metadata_cnt",ascending=False).head(n=100)

Unnamed: 0_level_0,metadata_cnt,meta_rank,meta_scaled
Source Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Lecture Notes in Computer Science,867,10619.0,1.0
Scientific Data,284,10618.0,0.999861
"International Journal of Metadata, Semantics and Ontologies",253,10617.0,0.999722
Communications in Computer and Information Science,245,10616.0,0.999583
Metadata and Semantic Research,222,10615.0,0.999444
Procedia Computer Science,213,10614.0,0.999305
Journal of Library Metadata,198,10613.0,0.999166
Research and Advanced Technology for Digital Libraries,195,10612.0,0.999028
Cataloging & Classification Quarterly,182,10611.0,0.998889
D-lib Magazine,124,10609.5,0.99868


## Top journals for reproducible research in Lens

In [7]:
rcr_jrnls['rcr_rank']=rcr_jrnls['rcr_cnt'].rank()
rcr_jrnls['rcr_scaled'] = scaler.fit_transform(rcr_jrnls['rcr_rank'].values.reshape(-1,1))
rcr_jrnls.sort_values("rcr_cnt",ascending=False).head(n=100)

Unnamed: 0_level_0,rcr_cnt,rcr_rank,rcr_scaled
Source Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
Genome Biology,130,565.0,1.0
Frontiers in Neuroinformatics,27,564.0,0.997155
PLOS ONE,26,563.0,0.99431
BMC Bioinformatics,24,562.0,0.991465
F1000Research,19,561.0,0.98862
Computing in Science and Engineering,16,560.0,0.985775
Bioinformatics,15,559.0,0.98293
PeerJ,14,558.0,0.980085
GigaScience,11,557.0,0.97724
Biostatistics,10,556.0,0.974395


In [8]:
jrnls = meta_jrnls.merge(rcr_jrnls, how='inner', 
                                         left_on='Source Title', 
                                         right_on='Source Title')

In [9]:
jrnls['total'] = jrnls['rcr_scaled'] + jrnls['meta_scaled']

In [10]:
jrnls.sort_values("total",ascending=False).head(n=100)

Unnamed: 0_level_0,metadata_cnt,meta_rank,meta_scaled,rcr_cnt,rcr_rank,rcr_scaled,total
Source Title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
Frontiers in Neuroinformatics,56,10579.0,0.994443,27,564.0,0.997155,1.991598
BMC Bioinformatics,86,10596.5,0.996874,24,562.0,0.991465,1.988339
PLOS ONE,46,10561.5,0.992012,26,563.0,0.99431,1.986322
F1000Research,41,10549.5,0.990345,19,561.0,0.98862,1.978965
Bioinformatics,50,10570.5,0.993262,15,559.0,0.98293,1.976192
PeerJ,27,10506.0,0.984301,14,558.0,0.980085,1.964387
Genome Biology,14,10341.5,0.961448,130,565.0,1.0,1.961448
GigaScience,20,10446.5,0.976035,11,557.0,0.97724,1.953275
SSRN Electronic Journal,52,10577.0,0.994165,7,547.5,0.950213,1.944378
Journal of Biomedical Informatics,51,10574.0,0.993748,6,543.5,0.938834,1.932582
