# CORD-19 Collect SCOPUS data

In general, this jupyter notebook is designated to collect additional data via the Elsevier SCOPUS API to enrich the analysis.

First, relevant packages must be imported to the Notebook.

In [1]:
import numpy as np
import pandas as pd
import csv
import ast
import collections
import matplotlib.pyplot as plt
import re
import time
import json
from urllib.parse import urlparse
from collections import Counter
from elsapy.elsclient import ElsClient
from elsapy.elsdoc import FullDoc, AbsDoc
from elsapy.elssearch import ElsSearch
from pybtex.database import parse_file, BibliographyData, Entry

Retrieve columns from CORD19 CSV and store the data to a variable.

In [2]:
CORD19_CSV = pd.read_csv('../data/cord-19/CORD19_software_mentions.csv')

Check the length of the column containing doi's.

In [3]:
len(CORD19_CSV['doi'])

77448

Display the column doi to see if there are inconsistencies such as NaN's. Subsequently, the existence of NaN's require specific consideration. 

In [4]:
doi = CORD19_CSV['doi']
doi

0                                 NaN
1          10.1016/j.regg.2021.01.002
2           10.1016/j.rec.2020.08.002
3        10.1016/j.vetmic.2006.11.026
4                   10.3390/v12080849
                     ...             
77443      10.1007/s11229-020-02869-9
77444                             NaN
77445     10.1101/2020.05.13.20100206
77446      10.1007/s42991-020-00052-8
77447     10.1101/2020.09.14.20194670
Name: doi, Length: 77448, dtype: object

Create a series with solely unique values and neglect NaN's. It is important to sort the unique values. Otherwise, the method is creating different results after each restart of the notebook. 

In [5]:
doi_counted = doi.value_counts().sort_index(ascending=True)
doi_counted

10.1001/jamainternmed.2020.1369       1
10.1001/jamanetworkopen.2020.16382    1
10.1001/jamanetworkopen.2020.17521    1
10.1001/jamanetworkopen.2020.20485    1
10.1001/jamanetworkopen.2020.24984    1
                                     ..
10.9745/ghsp-d-20-00115               1
10.9745/ghsp-d-20-00171               1
10.9745/ghsp-d-20-00218               1
10.9758/cpn.2020.18.4.607             1
10.9781/ijimai.2020.02.002            1
Name: doi, Length: 74302, dtype: int64

The following function determines the requested information from the Scopus API.

In [6]:
#Code adapted from https://github.com/ElsevierDev/elsapy/blob/master/exampleProg.py
def fetch_scopus_api(client, doi):
    """obtain additional paper information from scopus by doi
    """
    doc_srch = ElsSearch("DOI("+doi+")",'scopus')
    doc_srch.execute(client, get_all = True)
    try:
        scopus_id=doc_srch.results[0]["dc:identifier"].split(":")[1]
        scp_doc = AbsDoc(scp_id = scopus_id)
        if scp_doc.read(client):
            scp_doc.write()   
        else:
            print ("Read document failed.")
        return scp_doc.data
    except:
        return None

Thusly, the configuration file is set up and contains an APIkey. For further information visit the following website: https://github.com/ElsevierDev/elsapy/blob/master/CONFIG.md

In [7]:
con_file = open("config.json")
config = json.load(con_file)
con_file.close()

Moreover, the client is initialized with the API-Key. It is important to know that the API-key needs to be valid to fetch data. Otherwise, the SCOPUS API respond with an authorisation/access error. 

In [8]:
client = ElsClient(config['apikey'])

For demonstation purposes, the following cells shows which data is returned by the Scopus API. 

In [9]:
return_example = fetch_scopus_api(client, '10.1016/j.dsx.2020.04.012')
print(json.dumps(return_example, indent=2))

{
  "affiliation": [
    {
      "affiliation-city": "New Delhi",
      "affilname": "Jamia Hamdard",
      "affiliation-country": "India"
    },
    {
      "affiliation-city": "New Delhi",
      "affilname": "Jamia Millia Islamia",
      "affiliation-country": "India"
    },
    {
      "affiliation-city": "New Delhi",
      "affilname": "Indraprastha Apollo Hospitals",
      "affiliation-country": "India"
    }
  ],
  "coredata": {
    "srctype": "j",
    "eid": "2-s2.0-85083171050",
    "pubmed-id": "32305024",
    "prism:coverDate": "2020-07-01",
    "prism:aggregationType": "Journal",
    "prism:url": "https://api.elsevier.com/content/abstract/scopus_id/85083171050",
    "dc:creator": {
      "author": [
        {
          "ce:given-name": "Raju",
          "preferred-name": {
            "ce:given-name": "Raju",
            "ce:initials": "R.",
            "ce:surname": "Vaishya",
            "ce:indexed-name": "Vaishya R."
          },
          "@seq": "1",
          "ce:init

Further analysis will be conducted with such data as fetched above. 
Therefore, two notebooks are created to analyse data linked to: 
<ul>
  <li>affiliation (CORD-19-analyse-affiliation-data-CS5099)</li>
  <li>coredata (CORD-19-analyse-coredata-CS5099)</li>
</ul>    

Due to the ethical guidelines of this project, the retrieved data is not stored completely to the directory. Prior storing fetched data, the following information is removed from the response: 
<ul>
  <li>"affilname" and "affiliation-city"</li>
  <li>"dc:creator"</li>
</ul>  
Therefore, the following functions transform the fetched information into an ethically correct form.

In [10]:
def remove_unethical_entries(json_holder):
    """
    This function receives a JSON and removes ethically sensitive data from it. Therefore, various JSON details are obeyed.
    When the removal is finished, the cleaned JSON is returned to the invoking place of this method. 
    """
    #Checking if JSON is None
    if json_holder is None:
        return json_holder
    
     #Checking if JSON is NaN
    if json_holder == "NaN":
        return None

    #JSON starts with { else [
    string_helper = str(json_holder)
    #print(string_helper[0])
    if string_helper[0] == "{":
            if 'affiliation-city' in json_holder:
                del json_holder['affiliation-city']
            if 'affilname' in json_holder:
                 del json_holder['affilname']
            if 'dc:creator' in json_holder:
                 del json_holder['dc:creator']
            return json_holder
    elif string_helper[0] == "[":
        #print(json_holder)
        for element in json_holder: 
            if 'affiliation-city' in element.keys():
                del element['affiliation-city']
            if 'affilname' in element.keys():
                 del element['affilname']
            if 'dc:creator' in element.keys():
                 del element['dc:creator']
        return json_holder

In [11]:
def transform_to_ethical_correct_data(df):
    """
    This functions retrives a a dataframe and invokes the function remove_unethical_entries(json) with the proper parameters.
    Subsequently, the output is an ethically correct DataFrame which is returned to the invoking place of this method.  
    """
    df_index = df.index
    len_df_index = len(df_index)
    i = 0

    while i < len_df_index:
        print("Progress: "+str(i+1) +"/"+str(len_df_index)+" Index position: "+str(df_index[i])+" (Data cleaning according to ethical guidelines.)")
        df['affiliation'][df_index[i]] = remove_unethical_entries(df['affiliation'][df_index[i]])
        df['coredata'][df_index[i]] = remove_unethical_entries(df['coredata'][df_index[i]])
        i = i + 1
    return df

Thusly, the already fetched SCOPUS API information is read from the disk for further processing.

In [12]:
df_current_extra_info = pd.DataFrame()
bool_show_df = False
try:
    df_current_extra_info = pd.read_pickle('extra_info_CS5099.pkl')
    bool_show_df = True
except:
    print("The DataFrame is empty")
    bool_show_df = True

If fetched information is available, it will be shown below. The data is read from the disk. 

In [13]:
if bool_show_df == True: 
    print(df_current_extra_info)
else:
    print("There is no fetched information available.")

                                             affiliation  \
0      [{'affiliation-country': 'United States'}, {'a...   
1      [{'affiliation-country': 'United States'}, {'a...   
2      [{'affiliation-country': 'United States'}, {'a...   
3      [{'affiliation-country': 'United States'}, {'a...   
4      [{'affiliation-country': 'United States'}, {'a...   
...                                                  ...   
74297           {'affiliation-country': 'United States'}   
74298  [{'affiliation-country': 'United States'}, {'a...   
74299           {'affiliation-country': 'United States'}   
74300  [{'affiliation-country': 'Turkey'}, {'affiliat...   
74301                                               None   

                                                coredata  
0      {'srctype': 'j', 'eid': '2-s2.0-85083266658', ...  
1      {'srctype': 'j', 'prism:issueIdentifier': '7',...  
2      {'srctype': 'j', 'prism:issueIdentifier': '8',...  
3      {'srctype': 'j', 'prism:issueIdentif

In [14]:
def contains_only_None(dic):
    """
    This function inspects an dictionary and returns True if it solely contains None values.
    """
    return len(dic) == sum(value == None for value in dic.values())    

In [15]:
def transpose_and_structure(df):
    """
    This function receives a DataFrame and returns it after tranposing. 
    """
    df = df.T
    if 'affiliation' not in df.columns:
        df['affiliation'] = None
    if 'coredata' not in df.columns:
        df['coredata'] = None
    return df

In [16]:
def append_fetched_data_to_df(df_current_extra_info, dic):
    """
    This function appends or inserts newly fetched data to the DataFrame containing scopus data.
    Moreover, this function is replacing None values with retrieved data. 
    df_current_extra_info holds the current entries read from the disk
    df_newly_fetched_transposed holds the newly fetched information in form ready to be inserted/appended
    """
    
    #checking if the dictionary contains value to append or insert to the existing information
    if contains_only_None(dic):
        #If a dictionary contains solely None values, they will be prepared for appending/inserting
        placeholder_entries = pd.DataFrame(np.empty((len(dict_new_extra_info),2),dtype=object),columns=['affiliation','coredata'], index=dict_new_extra_info.keys())
        df_newly_fetched_transposed = placeholder_entries
        print(df_newly_fetched_transposed)
    else:
        #Prior appending, the dictionary is transformed to a DataFrame
        df_newly_fetched = pd.DataFrame(dic)
        #For readability, the DataFrame is transposed
        df_newly_fetched_transposed = transpose_and_structure(df_newly_fetched)
        #The data is modified accordingly to the ethical guidelines
        df_newly_fetched_transposed = transform_to_ethical_correct_data(df_newly_fetched_transposed)
        print(df_newly_fetched_transposed)
    
    #Insert newly fetched rows which were previously not successful inserted/appended
    for index, row in df_newly_fetched_transposed.iterrows():
        #Insert to current extra info DataFrame because the row is existent
        if index in df_current_extra_info.index and row is not None:
            df_current_extra_info.loc[index] = row
        #Append to current extra info DataFrame because the row is new     
        if index not in df_current_extra_info.index:
            df_current_extra_info = df_current_extra_info.append(row, ignore_index=True)
            
    #Returning the DataFrame with newly fetched entries
    return df_current_extra_info

Thusly, the function handles the storing of newly fetched information to the disk. 

In [17]:
def store_df_columns(df):
    """
    This function stores a DataFrame to a local file on the disk. 
    """
    df.to_pickle('extra_info_CS5099.pkl')

The length of the DataFrame containing the current information is assigned to a variable to be used for further processing. Therefore, the length will be used within a while loop as a starting index.

In [18]:
len_df_current_extra_info = len(df_current_extra_info)
len_df_current_extra_info

74302

Subsequently, the SCOPUS API is fetched and stored within a dictionary. 
Besides, the print function is used to show the state of the process by displaying the latest fetched information. 

In [19]:
dict_new_extra_info = dict()
len_dois = len(doi_counted)

def trigger_fetching():
    threshold = 0 
    i = len_df_current_extra_info
    while i < len_dois: 
        dict_new_extra_info[i] = fetch_scopus_api(client, doi_counted.index[i])
        print("Fetching index position: " + str(i) + " -> " +  doi_counted.index[i])
        i = i + 1 
        threshold = threshold + 1
        if threshold > 99:
            df_combined_extra_info = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info)
            store_df_columns(df_combined_extra_info)
            threshold = 0
            print("New batch saved.")

Thusly, the SCOPUS API fething process is triggered. 

In [20]:
trigger_fetching()

The existing and newly fetched information are combined into one DataFrame and shown. 

In [21]:
df_combined_extra_info = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info)
df_combined_extra_info

Empty DataFrame
Columns: [affiliation, coredata]
Index: []


Unnamed: 0,affiliation,coredata
0,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'eid': '2-s2.0-85083266658', ..."
1,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '7',..."
2,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '8',..."
3,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '9',..."
4,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '11'..."
...,...,...
74297,{'affiliation-country': 'United States'},"{'srctype': 'j', 'eid': '2-s2.0-85092678139', ..."
74298,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'eid': '2-s2.0-85087468210', ..."
74299,{'affiliation-country': 'United States'},"{'srctype': 'j', 'eid': '2-s2.0-85092677974', ..."
74300,"[{'affiliation-country': 'Turkey'}, {'affiliat...","{'srctype': 'j', 'prism:issueIdentifier': '4',..."


Verifying that the returned None values are due to non existent data and not to an invalid API-Key or later available information. Thusly, the function is used to fetched specific entries which could not be fetched previously. 

In [22]:
def enrich_data():
    """
    This function fetches again the scopus API and solely asks for information which previously returned None. 
    """
    #Add a new column to the DataFrame containg the DOI's which are used to fetch the API
    ser_doi = pd.Series(doi_counted.index[:len_data])
    df_current_extra_info_checker = df_combined_extra_info
    
    #Fetching solely None entries 
    len_df_current_extra_info_checker = len(df_current_extra_info_checker)
    dict_new_extra_info_checker = dict()
    i = 0
    #Specify range to fetch. Otherwise comment out the next two lines.
    i = 70000
    #len_df_current_extra_info_checker = 70000
    while i < len_df_current_extra_info_checker:
        #Fetch entries which miss informtion for affiliation and/or coredata 
        if df_current_extra_info_checker['affiliation'][i] == None or df_current_extra_info_checker['coredata'][i] == None:
            dict_new_extra_info_checker[i] = fetch_scopus_api(client, ser_doi[i])
            print("Fetched again index position: " + str(i) + " -> " +  ser_doi[i])
        i = i + 1

    #Check if at least one of the fetched values is not None, otherwise the process is finished
    if contains_only_None(dict_new_extra_info_checker):
        print("The scopus API did not returned new information for existing None values.")
    else:
        #There is new information to insert to the existing DataFrame
        df_combined_extra_info_fetched_again  = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info_checker)
        store_df_columns(df_combined_extra_info_fetched_again)

Printing out the number of existent doi's and the length of the DataFrame holding the latest entries. 

In [23]:
len_dois = len(doi_counted)
len_dois

74302

In [24]:
len_data = len(df_combined_extra_info)
len_data 

74302

The next cell invokes the enrich_data() function when the SCOPUS API was fetched once with all doi's.
Consequently, the Dataframe holding the latest fetched information must have the same length as the number of existent doi's.

In [25]:
if len_dois == len_data:
    enrich_data()
else:
    print("There are entries which are not fetched yet from the scopus API.")

Fetched again index position: 70120 -> 10.3390/ijms16035072
Fetched again index position: 70327 -> 10.3390/ijms21228776
Fetched again index position: 70354 -> 10.3390/ijms21249662
Fetched again index position: 70391 -> 10.3390/ijns6040081
Fetched again index position: 70399 -> 10.3390/jcm10010004
Fetched again index position: 70400 -> 10.3390/jcm10010024
Fetched again index position: 70401 -> 10.3390/jcm10010028
Fetched again index position: 70402 -> 10.3390/jcm10010032
Fetched again index position: 70403 -> 10.3390/jcm10010038
Fetched again index position: 70404 -> 10.3390/jcm10010039
Fetched again index position: 70405 -> 10.3390/jcm10010050
Fetched again index position: 70406 -> 10.3390/jcm10010052
Fetched again index position: 70407 -> 10.3390/jcm10010071
Fetched again index position: 70408 -> 10.3390/jcm10010084
Fetched again index position: 70409 -> 10.3390/jcm10010133
Fetched again index position: 70410 -> 10.3390/jcm10010156
Fetched again index position: 70411 -> 10.3390/jcm100

Fetched again index position: 70537 -> 10.3390/jcm9092847
Fetched again index position: 70538 -> 10.3390/jcm9092872
Fetched again index position: 70539 -> 10.3390/jcm9092875
Fetched again index position: 70540 -> 10.3390/jcm9092879
Fetched again index position: 70541 -> 10.3390/jcm9092895
Fetched again index position: 70542 -> 10.3390/jcm9092906
Fetched again index position: 70543 -> 10.3390/jcm9092925
Fetched again index position: 70544 -> 10.3390/jcm9092935
Fetched again index position: 70545 -> 10.3390/jcm9092943
Fetched again index position: 70546 -> 10.3390/jcm9092953
Fetched again index position: 70547 -> 10.3390/jcm9092961
Fetched again index position: 70548 -> 10.3390/jcm9092976
Fetched again index position: 70549 -> 10.3390/jcm9092986
Fetched again index position: 70550 -> 10.3390/jcm9092990
Fetched again index position: 70551 -> 10.3390/jcm9093000
Fetched again index position: 70552 -> 10.3390/jcm9093002
Fetched again index position: 70553 -> 10.3390/jcm9093014
Fetched again 

Fetched again index position: 71459 -> 10.3390/pharmacy8030165
Fetched again index position: 71460 -> 10.3390/pharmacy8040177
Fetched again index position: 71461 -> 10.3390/pharmacy8040184
Fetched again index position: 71462 -> 10.3390/pharmacy8040185
Fetched again index position: 71463 -> 10.3390/pharmacy8040192
Fetched again index position: 71464 -> 10.3390/pharmacy8040194
Fetched again index position: 71465 -> 10.3390/pharmacy8040195
Fetched again index position: 71466 -> 10.3390/pharmacy8040199
Fetched again index position: 71467 -> 10.3390/pharmacy8040207
Fetched again index position: 71468 -> 10.3390/pharmacy8040208
Fetched again index position: 71469 -> 10.3390/pharmacy8040210
Fetched again index position: 71470 -> 10.3390/pharmacy8040216
Fetched again index position: 71471 -> 10.3390/pharmacy8040217
Fetched again index position: 71472 -> 10.3390/pharmacy8040221
Fetched again index position: 71473 -> 10.3390/pharmacy8040225
Fetched again index position: 71474 -> 10.3390/pharmacy

Fetched again index position: 72997 -> 10.4103/sja.sja_253_20
Fetched again index position: 72998 -> 10.4103/sjmms.sjmms_542_20
Fetched again index position: 72999 -> 10.4103/sjmms.sjmms_630_20
Fetched again index position: 73000 -> 10.4103/sjmms.sjmms_731_20
Fetched again index position: 73002 -> 10.4110/in.2012.12.1.8
Fetched again index position: 73003 -> 10.4110/in.2015.15.2.51
Fetched again index position: 73008 -> 10.4137/bci.s30379
Fetched again index position: 73009 -> 10.4137/becb.s10886
Fetched again index position: 73040 -> 10.4155/fdd-2020-0027
Fetched again index position: 73055 -> 10.4161/21690693.2014.970499
Fetched again index position: 73057 -> 10.4161/bact.20693
Fetched again index position: 73059 -> 10.4161/dish.25216
Fetched again index position: 73062 -> 10.4161/idp.24684
Fetched again index position: 73063 -> 10.4161/idp.27454
Fetched again index position: 73069 -> 10.4161/rna.8.2.14991
Fetched again index position: 73082 -> 10.4172/2161-0517.s2-004
Fetched again 

Fetched again index position: 74087 -> 10.7759/cureus.10360
Fetched again index position: 74088 -> 10.7759/cureus.10366
Fetched again index position: 74089 -> 10.7759/cureus.10373
Fetched again index position: 74090 -> 10.7759/cureus.10383
Fetched again index position: 74091 -> 10.7759/cureus.10384
Fetched again index position: 74092 -> 10.7759/cureus.10400
Fetched again index position: 74093 -> 10.7759/cureus.10402
Fetched again index position: 74094 -> 10.7759/cureus.10413
Fetched again index position: 74095 -> 10.7759/cureus.10423
Fetched again index position: 74096 -> 10.7759/cureus.10452
Fetched again index position: 74097 -> 10.7759/cureus.10453
Fetched again index position: 74098 -> 10.7759/cureus.10487
Fetched again index position: 74099 -> 10.7759/cureus.10501
Fetched again index position: 74100 -> 10.7759/cureus.10523
Fetched again index position: 74101 -> 10.7759/cureus.10555
Fetched again index position: 74102 -> 10.7759/cureus.10563
Fetched again index position: 74103 -> 1

Fetched again index position: 74224 -> 10.7759/cureus.8460
Fetched again index position: 74225 -> 10.7759/cureus.8501
Fetched again index position: 74226 -> 10.7759/cureus.8530
Fetched again index position: 74227 -> 10.7759/cureus.8550
Fetched again index position: 74228 -> 10.7759/cureus.8558
Fetched again index position: 74229 -> 10.7759/cureus.8582
Fetched again index position: 74230 -> 10.7759/cureus.8622
Fetched again index position: 74231 -> 10.7759/cureus.8632
Fetched again index position: 74232 -> 10.7759/cureus.8679
Fetched again index position: 74233 -> 10.7759/cureus.8712
Fetched again index position: 74234 -> 10.7759/cureus.8726
Fetched again index position: 74235 -> 10.7759/cureus.8781
Fetched again index position: 74236 -> 10.7759/cureus.8791
Fetched again index position: 74237 -> 10.7759/cureus.8800
Fetched again index position: 74238 -> 10.7759/cureus.8821
Fetched again index position: 74239 -> 10.7759/cureus.8864
Fetched again index position: 74240 -> 10.7759/cureus.88

Progress: 146/748 Index position: 70541 (Data cleaning according to ethical guidelines.)
Progress: 147/748 Index position: 70542 (Data cleaning according to ethical guidelines.)
Progress: 148/748 Index position: 70543 (Data cleaning according to ethical guidelines.)
Progress: 149/748 Index position: 70544 (Data cleaning according to ethical guidelines.)
Progress: 150/748 Index position: 70545 (Data cleaning according to ethical guidelines.)
Progress: 151/748 Index position: 70546 (Data cleaning according to ethical guidelines.)
Progress: 152/748 Index position: 70547 (Data cleaning according to ethical guidelines.)
Progress: 153/748 Index position: 70548 (Data cleaning according to ethical guidelines.)
Progress: 154/748 Index position: 70549 (Data cleaning according to ethical guidelines.)
Progress: 155/748 Index position: 70550 (Data cleaning according to ethical guidelines.)
Progress: 156/748 Index position: 70551 (Data cleaning according to ethical guidelines.)
Progress: 157/748 Ind

Progress: 305/748 Index position: 71731 (Data cleaning according to ethical guidelines.)
Progress: 306/748 Index position: 71843 (Data cleaning according to ethical guidelines.)
Progress: 307/748 Index position: 71905 (Data cleaning according to ethical guidelines.)
Progress: 308/748 Index position: 72325 (Data cleaning according to ethical guidelines.)
Progress: 309/748 Index position: 72344 (Data cleaning according to ethical guidelines.)
Progress: 310/748 Index position: 72357 (Data cleaning according to ethical guidelines.)
Progress: 311/748 Index position: 72523 (Data cleaning according to ethical guidelines.)
Progress: 312/748 Index position: 72524 (Data cleaning according to ethical guidelines.)
Progress: 313/748 Index position: 72535 (Data cleaning according to ethical guidelines.)
Progress: 314/748 Index position: 72537 (Data cleaning according to ethical guidelines.)
Progress: 315/748 Index position: 72538 (Data cleaning according to ethical guidelines.)
Progress: 316/748 Ind

Progress: 474/748 Index position: 73346 (Data cleaning according to ethical guidelines.)
Progress: 475/748 Index position: 73353 (Data cleaning according to ethical guidelines.)
Progress: 476/748 Index position: 73361 (Data cleaning according to ethical guidelines.)
Progress: 477/748 Index position: 73362 (Data cleaning according to ethical guidelines.)
Progress: 478/748 Index position: 73363 (Data cleaning according to ethical guidelines.)
Progress: 479/748 Index position: 73372 (Data cleaning according to ethical guidelines.)
Progress: 480/748 Index position: 73373 (Data cleaning according to ethical guidelines.)
Progress: 481/748 Index position: 73374 (Data cleaning according to ethical guidelines.)
Progress: 482/748 Index position: 73375 (Data cleaning according to ethical guidelines.)
Progress: 483/748 Index position: 73376 (Data cleaning according to ethical guidelines.)
Progress: 484/748 Index position: 73380 (Data cleaning according to ethical guidelines.)
Progress: 485/748 Ind

Progress: 607/748 Index position: 74149 (Data cleaning according to ethical guidelines.)
Progress: 608/748 Index position: 74150 (Data cleaning according to ethical guidelines.)
Progress: 609/748 Index position: 74151 (Data cleaning according to ethical guidelines.)
Progress: 610/748 Index position: 74152 (Data cleaning according to ethical guidelines.)
Progress: 611/748 Index position: 74153 (Data cleaning according to ethical guidelines.)
Progress: 612/748 Index position: 74154 (Data cleaning according to ethical guidelines.)
Progress: 613/748 Index position: 74155 (Data cleaning according to ethical guidelines.)
Progress: 614/748 Index position: 74156 (Data cleaning according to ethical guidelines.)
Progress: 615/748 Index position: 74157 (Data cleaning according to ethical guidelines.)
Progress: 616/748 Index position: 74158 (Data cleaning according to ethical guidelines.)
Progress: 617/748 Index position: 74159 (Data cleaning according to ethical guidelines.)
Progress: 618/748 Ind

                                                coredata affiliation
70120  {'srctype': 'j', 'prism:issueIdentifier': '3',...        None
70327  {'srctype': 'j', 'eid': '2-s2.0-85096328767', ...        None
70354  {'srctype': 'j', 'eid': '2-s2.0-85098065913', ...        None
70391                                               None        None
70399                                               None        None
...                                                  ...         ...
74286                                               None        None
74287                                               None        None
74288                                               None        None
74293                                               None        None
74301                                               None        None

[748 rows x 2 columns]
