# CORD-19 Collect SCOPUS data

In general, this notebook is designated to collect additional data via the Elsevier SCOPUS API to enrich the analysis.
https://dev.elsevier.com/

First, relevant packages must be imported into the notebook.

In [1]:
import numpy as np
import pandas as pd
import csv
import ast
import collections
import matplotlib.pyplot as plt
import re
import time
import json
from urllib.parse import urlparse
from collections import Counter
from elsapy.elsclient import ElsClient
from elsapy.elsdoc import FullDoc, AbsDoc
from elsapy.elssearch import ElsSearch
from pybtex.database import parse_file, BibliographyData, Entry

Retrieve columns from the CORD19 CSV and store the data to a variable.

In [2]:
CORD19_CSV = pd.read_csv('../data/cord-19/CORD19_software_mentions.csv')

Check the length of the column containing doi's.

In [3]:
len(CORD19_CSV['doi'])

77448

Display the column doi to see if there are inconsistencies such as NaN's. Subsequently, the existence of NaN's requires specific consideration.

In [4]:
doi = CORD19_CSV['doi']
doi

0                                 NaN
1          10.1016/j.regg.2021.01.002
2           10.1016/j.rec.2020.08.002
3        10.1016/j.vetmic.2006.11.026
4                   10.3390/v12080849
                     ...             
77443      10.1007/s11229-020-02869-9
77444                             NaN
77445     10.1101/2020.05.13.20100206
77446      10.1007/s42991-020-00052-8
77447     10.1101/2020.09.14.20194670
Name: doi, Length: 77448, dtype: object

Create a series with solely unique values and neglect NaN's. It is important to sort the unique values. Otherwise, the method is creating different results after restarting the notebook. 

In [5]:
doi_counted = doi.value_counts().sort_index(ascending=True)
doi_counted

10.1001/jamainternmed.2020.1369       1
10.1001/jamanetworkopen.2020.16382    1
10.1001/jamanetworkopen.2020.17521    1
10.1001/jamanetworkopen.2020.20485    1
10.1001/jamanetworkopen.2020.24984    1
                                     ..
10.9745/ghsp-d-20-00115               1
10.9745/ghsp-d-20-00171               1
10.9745/ghsp-d-20-00218               1
10.9758/cpn.2020.18.4.607             1
10.9781/ijimai.2020.02.002            1
Name: doi, Length: 74302, dtype: int64

The following function determines the requested information from the Scopus API.

In [6]:
#Code adapted from https://github.com/ElsevierDev/elsapy/blob/master/exampleProg.py
def fetch_scopus_api(client, doi):
    """obtain additional paper information from scopus by doi
    """
    doc_srch = ElsSearch("DOI("+doi+")",'scopus')
    doc_srch.execute(client, get_all = True)
    try:
        scopus_id=doc_srch.results[0]["dc:identifier"].split(":")[1]
        scp_doc = AbsDoc(scp_id = scopus_id)
        if scp_doc.read(client):
            scp_doc.write()   
        else:
            print ("Read document failed.")
        return scp_doc.data
    except:
        return None

Thusly, the configuration file is set up and contains an API key. For further information visit the following website: https://github.com/ElsevierDev/elsapy/blob/master/CONFIG.md

In [7]:
con_file = open("config.json")
config = json.load(con_file)
con_file.close()

Moreover, the client is initialized with the API-Key. It is important to know that the API key needs to be valid to fetch data. Otherwise, the SCOPUS API responds with an authorisation/access error.

In [8]:
client = ElsClient(config['apikey'])

For demonstration purposes, the following cells show which data is returned by the Scopus API.

In [9]:
return_example = fetch_scopus_api(client, '10.1016/j.dsx.2020.04.012')
print(json.dumps(return_example, indent=2))

{
  "affiliation": [
    {
      "affiliation-city": "New Delhi",
      "affilname": "Jamia Hamdard",
      "affiliation-country": "India"
    },
    {
      "affiliation-city": "New Delhi",
      "affilname": "Jamia Millia Islamia",
      "affiliation-country": "India"
    },
    {
      "affiliation-city": "New Delhi",
      "affilname": "Indraprastha Apollo Hospitals",
      "affiliation-country": "India"
    }
  ],
  "coredata": {
    "srctype": "j",
    "eid": "2-s2.0-85083171050",
    "pubmed-id": "32305024",
    "prism:coverDate": "2020-07-01",
    "prism:aggregationType": "Journal",
    "prism:url": "https://api.elsevier.com/content/abstract/scopus_id/85083171050",
    "dc:creator": {
      "author": [
        {
          "ce:given-name": "Raju",
          "preferred-name": {
            "ce:given-name": "Raju",
            "ce:initials": "R.",
            "ce:surname": "Vaishya",
            "ce:indexed-name": "Vaishya R."
          },
          "@seq": "1",
          "ce:init

Further analysis will be conducted with such data as fetched above. 
Therefore, two notebooks are created to analyse data linked to: 
<ul>
  <li>affiliation (CORD-19-analyse-affiliation-data-CS5099)</li>
  <li>coredata (CORD-19-analyse-coredata-CS5099)</li>
</ul>    

Due to the ethical guidelines of this project, the retrieved data is not stored completely in the directory. Before storing fetched data, the following information is removed from the response:
<ul>
  <li>"affilname" and "affiliation-city"</li>
  <li>"dc:creator"</li>
</ul>  
Therefore, the following functions transform the fetched information into an ethically correct form.

In [10]:
def remove_unethical_entries(json_holder):
    """
    This function receives a JSON and removes ethically sensitive data from it. Therefore, various JSON details are obeyed.
    When the removal is finished, the cleaned JSON is returned to the invoking place of this method. 
    """
    #Checking if JSON is None
    if json_holder is None:
        return json_holder
    
     #Checking if JSON is NaN
    if json_holder == "NaN":
        return None

    #JSON starts with { else [
    string_helper = str(json_holder)
    #print(string_helper[0])
    if string_helper[0] == "{":
            if 'affiliation-city' in json_holder:
                del json_holder['affiliation-city']
            if 'affilname' in json_holder:
                 del json_holder['affilname']
            if 'dc:creator' in json_holder:
                 del json_holder['dc:creator']
            return json_holder
    elif string_helper[0] == "[":
        #print(json_holder)
        for element in json_holder: 
            if 'affiliation-city' in element.keys():
                del element['affiliation-city']
            if 'affilname' in element.keys():
                 del element['affilname']
            if 'dc:creator' in element.keys():
                 del element['dc:creator']
        return json_holder

In [11]:
def transform_to_ethical_correct_data(df):
    """
    This functions retrives a a dataframe and invokes the function remove_unethical_entries(json) with the proper parameters.
    Subsequently, the output is an ethically correct DataFrame which is returned to the invoking place of this method.  
    """
    df_index = df.index
    len_df_index = len(df_index)
    i = 0

    while i < len_df_index:
        print("Progress: "+str(i+1) +"/"+str(len_df_index)+" Index position: "+str(df_index[i])+" (Data cleaning according to ethical guidelines.)")
        df['affiliation'][df_index[i]] = remove_unethical_entries(df['affiliation'][df_index[i]])
        df['coredata'][df_index[i]] = remove_unethical_entries(df['coredata'][df_index[i]])
        i = i + 1
    return df

Thusly, the already fetched SCOPUS API information is read from the disk for further processing.

In [12]:
df_current_extra_info = pd.DataFrame()
bool_show_df = False
try:
    df_current_extra_info = pd.read_pickle('extra_info_CS5099.pkl')
    bool_show_df = True
except:
    print("The DataFrame is empty")
    bool_show_df = True

If fetched information is available, it will be shown below. The data is read from the disk. 

In [13]:
if bool_show_df == True: 
    print(df_current_extra_info)
else:
    print("There is no fetched information available.")

                                             affiliation  \
0      [{'affiliation-country': 'United States'}, {'a...   
1      [{'affiliation-country': 'United States'}, {'a...   
2      [{'affiliation-country': 'United States'}, {'a...   
3      [{'affiliation-country': 'United States'}, {'a...   
4      [{'affiliation-country': 'United States'}, {'a...   
...                                                  ...   
74297           {'affiliation-country': 'United States'}   
74298  [{'affiliation-country': 'United States'}, {'a...   
74299           {'affiliation-country': 'United States'}   
74300  [{'affiliation-country': 'Turkey'}, {'affiliat...   
74301                                               None   

                                                coredata  
0      {'srctype': 'j', 'eid': '2-s2.0-85083266658', ...  
1      {'srctype': 'j', 'prism:issueIdentifier': '7',...  
2      {'srctype': 'j', 'prism:issueIdentifier': '8',...  
3      {'srctype': 'j', 'prism:issueIdentif

In [14]:
def contains_only_None(dic):
    """
    This function inspects an dictionary and returns True if it solely contains None values.
    """
    return len(dic) == sum(value == None for value in dic.values())    

In [15]:
def transpose_and_structure(df):
    """
    This function receives a DataFrame and returns it after tranposing. 
    """
    df = df.T
    if 'affiliation' not in df.columns:
        df['affiliation'] = None
    if 'coredata' not in df.columns:
        df['coredata'] = None
    return df

In [16]:
def append_fetched_data_to_df(df_current_extra_info, dic):
    """
    This function appends or inserts newly fetched data to the DataFrame containing scopus data.
    Moreover, this function is replacing None values with retrieved data. 
    df_current_extra_info holds the current entries read from the disk
    df_newly_fetched_transposed holds the newly fetched information in form ready to be inserted/appended
    """
    
    #checking if the dictionary contains value to append or insert to the existing information
    if contains_only_None(dic):
        #If a dictionary contains solely None values, they will be prepared for appending/inserting
        placeholder_entries = pd.DataFrame(np.empty((len(dict_new_extra_info),2),dtype=object),columns=['affiliation','coredata'], index=dict_new_extra_info.keys())
        df_newly_fetched_transposed = placeholder_entries
        print(df_newly_fetched_transposed)
    else:
        #Prior appending, the dictionary is transformed to a DataFrame
        df_newly_fetched = pd.DataFrame(dic)
        #For readability, the DataFrame is transposed
        df_newly_fetched_transposed = transpose_and_structure(df_newly_fetched)
        #The data is modified accordingly to the ethical guidelines
        df_newly_fetched_transposed = transform_to_ethical_correct_data(df_newly_fetched_transposed)
        print(df_newly_fetched_transposed)
    
    #Insert newly fetched rows which were previously not successful inserted/appended
    for index, row in df_newly_fetched_transposed.iterrows():
        #Insert to current extra info DataFrame because the row is existent
        if index in df_current_extra_info.index and row is not None:
            df_current_extra_info.loc[index] = row
        #Append to current extra info DataFrame because the row is new     
        if index not in df_current_extra_info.index:
            df_current_extra_info = df_current_extra_info.append(row, ignore_index=True)
            
    #Returning the DataFrame with newly fetched entries
    return df_current_extra_info

Thusly, the function handles the storing of newly fetched information to the disk. 

In [17]:
def store_df_columns(df):
    """
    This function stores a DataFrame to a local file on the disk. 
    """
    df.to_pickle('extra_info_CS5099.pkl')

The length of the DataFrame containing the current information is assigned to a variable to be used for further processing. Therefore, the length will be used within a while loop as a starting index.

In [18]:
len_df_current_extra_info = len(df_current_extra_info)
len_df_current_extra_info

74302

Subsequently, the SCOPUS API is fetched and stored within a dictionary. 
Besides, the print function is used to show the state of the process by displaying the latest fetched information. 

In [19]:
dict_new_extra_info = dict()
len_dois = len(doi_counted)

def trigger_fetching():
    threshold = 0 
    i = len_df_current_extra_info
    while i < len_dois: 
        dict_new_extra_info[i] = fetch_scopus_api(client, doi_counted.index[i])
        print("Fetching index position: " + str(i) + " -> " +  doi_counted.index[i])
        i = i + 1 
        threshold = threshold + 1
        if threshold > 99:
            df_combined_extra_info = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info)
            store_df_columns(df_combined_extra_info)
            threshold = 0
            print("New batch saved.")

Thusly, the SCOPUS API fething process is triggered. 

In [20]:
trigger_fetching()

The existing and newly fetched information is combined into one DataFrame and shown.

In [21]:
df_combined_extra_info = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info)
df_combined_extra_info

Empty DataFrame
Columns: [affiliation, coredata]
Index: []


Unnamed: 0,affiliation,coredata
0,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'eid': '2-s2.0-85083266658', ..."
1,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '7',..."
2,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '8',..."
3,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '9',..."
4,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'prism:issueIdentifier': '11'..."
...,...,...
74297,{'affiliation-country': 'United States'},"{'srctype': 'j', 'eid': '2-s2.0-85092678139', ..."
74298,"[{'affiliation-country': 'United States'}, {'a...","{'srctype': 'j', 'eid': '2-s2.0-85087468210', ..."
74299,{'affiliation-country': 'United States'},"{'srctype': 'j', 'eid': '2-s2.0-85092677974', ..."
74300,"[{'affiliation-country': 'Turkey'}, {'affiliat...","{'srctype': 'j', 'prism:issueIdentifier': '4',..."


Verifying that the returned None values are due to non-existent data and not to an invalid API-Key or later available information. Thusly, the function is used to fetched specific entries which could not be fetched previously. 

In [22]:
def enrich_data():
    """
    This function fetches again the scopus API and solely asks for information which previously returned None. 
    """
    #Add a new column to the DataFrame containg the DOI's which are used to fetch the API
    ser_doi = pd.Series(doi_counted.index[:len_data])
    df_current_extra_info_checker = df_combined_extra_info
    
    #Fetching solely None entries 
    len_df_current_extra_info_checker = len(df_current_extra_info_checker)
    dict_new_extra_info_checker = dict()
    i = 0
    #Specify range to fetch. Otherwise comment out the next two lines.
    #i = 70000
    #len_df_current_extra_info_checker = 5000
    while i < len_df_current_extra_info_checker:
        #Fetch entries which miss informtion for affiliation and/or coredata 
        if df_current_extra_info_checker['affiliation'][i] == None or df_current_extra_info_checker['coredata'][i] == None:
            dict_new_extra_info_checker[i] = fetch_scopus_api(client, ser_doi[i])
            print("Fetched again index position: " + str(i) + " -> " +  ser_doi[i])
        i = i + 1

    #Check if at least one of the fetched values is not None, otherwise the process is finished
    if contains_only_None(dict_new_extra_info_checker):
        print("The scopus API did not returned new information for existing None values.")
    else:
        #There is new information to insert to the existing DataFrame
        df_combined_extra_info_fetched_again  = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info_checker)
        store_df_columns(df_combined_extra_info_fetched_again)

Print out the number of existent doi's and the length of the DataFrame holding the latest entries. 

In [23]:
len_dois = len(doi_counted)
len_dois

74302

In [24]:
len_data = len(df_combined_extra_info)
len_data 

74302

The next cell invokes the enrich_data() function when the SCOPUS API was fetched once with all doi's. Consequently, the DataFrame holding the latest fetched information must have the same length as the number of existent doi's.

In [25]:
if len_dois == len_data:
    enrich_data()
else:
    print("There are entries which are not fetched yet from the scopus API.")

Fetched again index position: 13 -> 10.1002/0471142700.nc0430s27
Fetched again index position: 15 -> 10.1002/0471142735.im0700s95
Fetched again index position: 16 -> 10.1002/0471142735.im0700s97
Fetched again index position: 43 -> 10.1002/2211-5463.13058
Fetched again index position: 52 -> 10.1002/acr.24487
Fetched again index position: 53 -> 10.1002/acr2.11148
Fetched again index position: 54 -> 10.1002/acr2.11164
Fetched again index position: 55 -> 10.1002/acr2.11174
Fetched again index position: 56 -> 10.1002/acr2.11207
Fetched again index position: 68 -> 10.1002/adtp.202000034
Fetched again index position: 69 -> 10.1002/adtp.202000129
Fetched again index position: 85 -> 10.1002/aepp.13096
Fetched again index position: 86 -> 10.1002/aepp.13100
Fetched again index position: 88 -> 10.1002/aepp.13128
Fetched again index position: 99 -> 10.1002/aisy.202000070
Fetched again index position: 108 -> 10.1002/ajh.21744
Fetched again index position: 142 -> 10.1002/ame2.12036
Fetched again inde

Fetched again index position: 1335 -> 10.1002/lary.28896
Fetched again index position: 1347 -> 10.1002/leap.1314
Fetched again index position: 1348 -> 10.1002/leap.1332
Fetched again index position: 1349 -> 10.1002/lim2.1
Fetched again index position: 1350 -> 10.1002/lio2.384
Fetched again index position: 1351 -> 10.1002/lio2.386
Fetched again index position: 1352 -> 10.1002/lio2.389
Fetched again index position: 1353 -> 10.1002/lio2.439
Fetched again index position: 1354 -> 10.1002/lio2.445
Fetched again index position: 1355 -> 10.1002/lio2.484
Fetched again index position: 1356 -> 10.1002/lio2.498
Fetched again index position: 1357 -> 10.1002/lio2.507
Fetched again index position: 1362 -> 10.1002/lt.20208
Fetched again index position: 1383 -> 10.1002/mco2.14
Fetched again index position: 1391 -> 10.1002/mds3.10112
Fetched again index position: 1422 -> 10.1002/nau.22577
Fetched again index position: 1423 -> 10.1002/nau.23386
Fetched again index position: 1480 -> 10.1002/path.171141040

Fetched again index position: 1968 -> 10.1007/4-431-26925-8_1
Fetched again index position: 1991 -> 10.1007/82_2012_239
Fetched again index position: 1994 -> 10.1007/82_2012_259
Fetched again index position: 1995 -> 10.1007/82_2012_263
Fetched again index position: 1997 -> 10.1007/82_2012_271
Fetched again index position: 1999 -> 10.1007/82_2013_317
Fetched again index position: 2006 -> 10.1007/978-0-230-55477-1_14
Fetched again index position: 2007 -> 10.1007/978-0-230-55477-1_4
Fetched again index position: 2020 -> 10.1007/978-0-387-25842-3_2
Fetched again index position: 2021 -> 10.1007/978-0-387-29844-3_12
Fetched again index position: 2022 -> 10.1007/978-0-387-29845-0_1
Fetched again index position: 2046 -> 10.1007/978-0-387-34163-7_17
Fetched again index position: 2048 -> 10.1007/978-0-387-46364-3_2
Fetched again index position: 2049 -> 10.1007/978-0-387-49009-0_10
Fetched again index position: 2053 -> 10.1007/978-0-387-68825-1_1
Fetched again index position: 2056 -> 10.1007/978-

Fetched again index position: 2274 -> 10.1007/978-1-4939-6966-1_4
Fetched again index position: 2278 -> 10.1007/978-1-4939-7431-3_9
Fetched again index position: 2285 -> 10.1007/978-1-4939-9034-4_45
Fetched again index position: 2286 -> 10.1007/978-1-4939-9034-4_47
Fetched again index position: 2290 -> 10.1007/978-1-59259-785-7_29
Fetched again index position: 2291 -> 10.1007/978-1-59259-880-9_21
Fetched again index position: 2292 -> 10.1007/978-1-59259-921-9_10
Fetched again index position: 2298 -> 10.1007/978-1-59745-127-7_17
Fetched again index position: 2308 -> 10.1007/978-1-59745-240-3_1
Fetched again index position: 2309 -> 10.1007/978-1-59745-240-3_2
Fetched again index position: 2311 -> 10.1007/978-1-59745-326-4_13
Fetched again index position: 2312 -> 10.1007/978-1-59745-326-4_15
Fetched again index position: 2313 -> 10.1007/978-1-59745-326-4_16
Fetched again index position: 2320 -> 10.1007/978-1-59745-501-5_21
Fetched again index position: 2321 -> 10.1007/978-1-59745-501-5_8


Fetched again index position: 2869 -> 10.1007/978-3-030-47150-7_14
Fetched again index position: 2890 -> 10.1007/978-3-030-47276-4_11
Fetched again index position: 3023 -> 10.1007/978-3-030-47756-1_1
Fetched again index position: 3024 -> 10.1007/978-3-030-48055-4_4
Fetched again index position: 3061 -> 10.1007/978-3-030-48114-8_2
Fetched again index position: 3062 -> 10.1007/978-3-030-48114-8_8
Fetched again index position: 3064 -> 10.1007/978-3-030-48270-1_5
Fetched again index position: 3065 -> 10.1007/978-3-030-48283-1_3
Fetched again index position: 3073 -> 10.1007/978-3-030-48618-1_1
Fetched again index position: 3074 -> 10.1007/978-3-030-48618-1_10
Fetched again index position: 3075 -> 10.1007/978-3-030-48618-1_11
Fetched again index position: 3076 -> 10.1007/978-3-030-48618-1_12
Fetched again index position: 3077 -> 10.1007/978-3-030-48618-1_2
Fetched again index position: 3078 -> 10.1007/978-3-030-48618-1_6
Fetched again index position: 3079 -> 10.1007/978-3-030-48618-1_7
Fetch

Fetched again index position: 4676 -> 10.1007/978-3-319-30355-0_8
Fetched again index position: 4677 -> 10.1007/978-3-319-30379-6_31
Fetched again index position: 4678 -> 10.1007/978-3-319-30469-4_4
Fetched again index position: 4679 -> 10.1007/978-3-319-30472-4_3
Fetched again index position: 4680 -> 10.1007/978-3-319-30472-4_5
Fetched again index position: 4681 -> 10.1007/978-3-319-30723-7_17
Fetched again index position: 4684 -> 10.1007/978-3-319-32564-4_1
Fetched again index position: 4685 -> 10.1007/978-3-319-32564-4_5
Fetched again index position: 4688 -> 10.1007/978-3-319-33900-9_13
Fetched again index position: 4689 -> 10.1007/978-3-319-33919-1_26
Fetched again index position: 4694 -> 10.1007/978-3-319-40262-8_5
Fetched again index position: 4697 -> 10.1007/978-3-319-41981-7_10
Fetched again index position: 4698 -> 10.1007/978-3-319-42217-6_7
Fetched again index position: 4699 -> 10.1007/978-3-319-42252-7_9
Fetched again index position: 4700 -> 10.1007/978-3-319-42792-8_13
Fetc

Fetched again index position: 4885 -> 10.1007/978-3-540-69480-9_17
Fetched again index position: 4892 -> 10.1007/978-3-540-70995-4_2
Fetched again index position: 4897 -> 10.1007/978-3-540-72108-6_25
Fetched again index position: 4898 -> 10.1007/978-3-540-72296-0_17
Fetched again index position: 4899 -> 10.1007/978-3-540-72296-0_38
Fetched again index position: 4900 -> 10.1007/978-3-540-72296-0_65
Fetched again index position: 4901 -> 10.1007/978-3-540-72296-0_80
Fetched again index position: 4902 -> 10.1007/978-3-540-72296-0_84
Fetched again index position: 4917 -> 10.1007/978-3-540-74083-4_9
Fetched again index position: 4919 -> 10.1007/978-3-540-74891-5_7
Fetched again index position: 4922 -> 10.1007/978-3-540-75837-2_3
Fetched again index position: 4923 -> 10.1007/978-3-540-75863-1_11
Fetched again index position: 4924 -> 10.1007/978-3-540-76460-1_96
Fetched again index position: 4925 -> 10.1007/978-3-540-76839-5_20
Fetched again index position: 4929 -> 10.1007/978-3-540-78392-3_4


Progress: 91/824 Index position: 573 (Data cleaning according to ethical guidelines.)
Progress: 92/824 Index position: 574 (Data cleaning according to ethical guidelines.)
Progress: 93/824 Index position: 575 (Data cleaning according to ethical guidelines.)
Progress: 94/824 Index position: 576 (Data cleaning according to ethical guidelines.)
Progress: 95/824 Index position: 577 (Data cleaning according to ethical guidelines.)
Progress: 96/824 Index position: 578 (Data cleaning according to ethical guidelines.)
Progress: 97/824 Index position: 579 (Data cleaning according to ethical guidelines.)
Progress: 98/824 Index position: 580 (Data cleaning according to ethical guidelines.)
Progress: 99/824 Index position: 581 (Data cleaning according to ethical guidelines.)
Progress: 100/824 Index position: 582 (Data cleaning according to ethical guidelines.)
Progress: 101/824 Index position: 583 (Data cleaning according to ethical guidelines.)
Progress: 102/824 Index position: 584 (Data cleaning

Progress: 194/824 Index position: 1602 (Data cleaning according to ethical guidelines.)
Progress: 195/824 Index position: 1603 (Data cleaning according to ethical guidelines.)
Progress: 196/824 Index position: 1604 (Data cleaning according to ethical guidelines.)
Progress: 197/824 Index position: 1605 (Data cleaning according to ethical guidelines.)
Progress: 198/824 Index position: 1606 (Data cleaning according to ethical guidelines.)
Progress: 199/824 Index position: 1607 (Data cleaning according to ethical guidelines.)
Progress: 200/824 Index position: 1611 (Data cleaning according to ethical guidelines.)
Progress: 201/824 Index position: 1614 (Data cleaning according to ethical guidelines.)
Progress: 202/824 Index position: 1618 (Data cleaning according to ethical guidelines.)
Progress: 203/824 Index position: 1624 (Data cleaning according to ethical guidelines.)
Progress: 204/824 Index position: 1626 (Data cleaning according to ethical guidelines.)
Progress: 205/824 Index position

Progress: 328/824 Index position: 2102 (Data cleaning according to ethical guidelines.)
Progress: 329/824 Index position: 2103 (Data cleaning according to ethical guidelines.)
Progress: 330/824 Index position: 2104 (Data cleaning according to ethical guidelines.)
Progress: 331/824 Index position: 2105 (Data cleaning according to ethical guidelines.)
Progress: 332/824 Index position: 2106 (Data cleaning according to ethical guidelines.)
Progress: 333/824 Index position: 2107 (Data cleaning according to ethical guidelines.)
Progress: 334/824 Index position: 2108 (Data cleaning according to ethical guidelines.)
Progress: 335/824 Index position: 2115 (Data cleaning according to ethical guidelines.)
Progress: 336/824 Index position: 2116 (Data cleaning according to ethical guidelines.)
Progress: 337/824 Index position: 2117 (Data cleaning according to ethical guidelines.)
Progress: 338/824 Index position: 2118 (Data cleaning according to ethical guidelines.)
Progress: 339/824 Index position

Progress: 471/824 Index position: 2408 (Data cleaning according to ethical guidelines.)
Progress: 472/824 Index position: 2409 (Data cleaning according to ethical guidelines.)
Progress: 473/824 Index position: 2410 (Data cleaning according to ethical guidelines.)
Progress: 474/824 Index position: 2411 (Data cleaning according to ethical guidelines.)
Progress: 475/824 Index position: 2412 (Data cleaning according to ethical guidelines.)
Progress: 476/824 Index position: 2413 (Data cleaning according to ethical guidelines.)
Progress: 477/824 Index position: 2414 (Data cleaning according to ethical guidelines.)
Progress: 478/824 Index position: 2415 (Data cleaning according to ethical guidelines.)
Progress: 479/824 Index position: 2421 (Data cleaning according to ethical guidelines.)
Progress: 480/824 Index position: 2423 (Data cleaning according to ethical guidelines.)
Progress: 481/824 Index position: 2425 (Data cleaning according to ethical guidelines.)
Progress: 482/824 Index position

Progress: 589/824 Index position: 4548 (Data cleaning according to ethical guidelines.)
Progress: 590/824 Index position: 4549 (Data cleaning according to ethical guidelines.)
Progress: 591/824 Index position: 4550 (Data cleaning according to ethical guidelines.)
Progress: 592/824 Index position: 4552 (Data cleaning according to ethical guidelines.)
Progress: 593/824 Index position: 4553 (Data cleaning according to ethical guidelines.)
Progress: 594/824 Index position: 4554 (Data cleaning according to ethical guidelines.)
Progress: 595/824 Index position: 4555 (Data cleaning according to ethical guidelines.)
Progress: 596/824 Index position: 4556 (Data cleaning according to ethical guidelines.)
Progress: 597/824 Index position: 4557 (Data cleaning according to ethical guidelines.)
Progress: 598/824 Index position: 4558 (Data cleaning according to ethical guidelines.)
Progress: 599/824 Index position: 4559 (Data cleaning according to ethical guidelines.)
Progress: 600/824 Index position

Progress: 745/824 Index position: 4839 (Data cleaning according to ethical guidelines.)
Progress: 746/824 Index position: 4841 (Data cleaning according to ethical guidelines.)
Progress: 747/824 Index position: 4842 (Data cleaning according to ethical guidelines.)
Progress: 748/824 Index position: 4843 (Data cleaning according to ethical guidelines.)
Progress: 749/824 Index position: 4844 (Data cleaning according to ethical guidelines.)
Progress: 750/824 Index position: 4845 (Data cleaning according to ethical guidelines.)
Progress: 751/824 Index position: 4846 (Data cleaning according to ethical guidelines.)
Progress: 752/824 Index position: 4847 (Data cleaning according to ethical guidelines.)
Progress: 753/824 Index position: 4848 (Data cleaning according to ethical guidelines.)
Progress: 754/824 Index position: 4849 (Data cleaning according to ethical guidelines.)
Progress: 755/824 Index position: 4850 (Data cleaning according to ethical guidelines.)
Progress: 756/824 Index position