# CORD-19-collect-scopus-data

In general, this jupyter notebook is designated to collect additional data via scopus to enbroaden the CORD19 dataset: 
https://datadryad.org/stash/dataset/doi:10.5061/dryad.vmcvdncs0

First, relevant packages must be imported to the Notebook.

In [1]:
import numpy as np
import pandas as pd
import csv
import ast
import collections
import matplotlib.pyplot as plt
import Levenshtein as lev
from fuzzywuzzy import fuzz 
import datetime
import matplotlib.pyplot as plt
import re
from urllib.parse import urlparse
from collections import Counter

from elsapy.elsclient import ElsClient
from elsapy.elsdoc import FullDoc, AbsDoc
from elsapy.elssearch import ElsSearch

import time # for sleep
from pybtex.database import parse_file, BibliographyData, Entry
import json
from elsapy.elsclient import ElsClient
from elsapy.elsdoc import AbsDoc
from elsapy.elssearch import ElsSearch

Get the data and save it to a variable.

In [2]:
CORD19_CSV = pd.read_csv('../data/cord-19/CORD19_software_mentions.csv')

Check the length of the column containing doi's.

In [3]:
len(CORD19_CSV['doi'])

77448

Display the column doi to see if there are inconsistencies such as NaN's

In [4]:
doi = CORD19_CSV['doi']
doi

0                                 NaN
1          10.1016/j.regg.2021.01.002
2           10.1016/j.rec.2020.08.002
3        10.1016/j.vetmic.2006.11.026
4                   10.3390/v12080849
                     ...             
77443      10.1007/s11229-020-02869-9
77444                             NaN
77445     10.1101/2020.05.13.20100206
77446      10.1007/s42991-020-00052-8
77447     10.1101/2020.09.14.20194670
Name: doi, Length: 77448, dtype: object

Create a series with solely unique values and neglect NaN's. It is important to sort the unique values. Otherwise, the method is creating different results after each restart of the notebook. 

In [5]:
doi_counted = doi.value_counts().sort_index(ascending=True)
doi_counted

10.1001/jamainternmed.2020.1369       1
10.1001/jamanetworkopen.2020.16382    1
10.1001/jamanetworkopen.2020.17521    1
10.1001/jamanetworkopen.2020.20485    1
10.1001/jamanetworkopen.2020.24984    1
                                     ..
10.9745/ghsp-d-20-00115               1
10.9745/ghsp-d-20-00171               1
10.9745/ghsp-d-20-00218               1
10.9758/cpn.2020.18.4.607             1
10.9781/ijimai.2020.02.002            1
Name: doi, Length: 74302, dtype: int64

The following function determines the requested information from the Scopus API. (https://api.elsevier.com/content/search/scopus?query=DOI(10.1109/MCOM.2016.7509373)&apiKey=6d485ef1fe1408712f37e8a783a285a4)

In [6]:
#Adapted from https://github.com/ElsevierDev/elsapy/blob/master/exampleProg.py
def fetch_scopus_api(client, doi):
    """obtain additional paper information from scopus by doi
    """
    doc_srch = ElsSearch("DOI("+doi+")",'scopus')
    doc_srch.execute(client, get_all = True)
    #print ("doc_srch has", len(doc_srch.results), "results.")
    #print(doc_srch.results)
    try:
        scopus_id=doc_srch.results[0]["dc:identifier"].split(":")[1]
        scp_doc = AbsDoc(scp_id = scopus_id)
        if scp_doc.read(client):
            # print ("scp_doc.title: ", scp_doc.title)
            scp_doc.write()   
        else:
            print ("Read document failed.")
        # print(scp_doc.data["affiliation"])
        return scp_doc.data
    except:
        return None

Thusly, the configuration file is set up and contains an APIkey. Further information: https://github.com/ElsevierDev/elsapy/blob/master/CONFIG.md

In [7]:
con_file = open("config.json")
config = json.load(con_file)
con_file.close()

Moreover, the client is initialized with the API-Key.

In [8]:
client = ElsClient(config['apikey'])

For demonstation purposes, the following cells shows which data is returned by the Scopus API. 

In [9]:
return_example = fetch_scopus_api(client, '10.1016/j.dsx.2020.04.012')
print(json.dumps(return_example, indent=2))

{
  "affiliation": [
    {
      "affiliation-city": "New Delhi",
      "affilname": "Jamia Hamdard",
      "affiliation-country": "India"
    },
    {
      "affiliation-city": "New Delhi",
      "affilname": "Jamia Millia Islamia",
      "affiliation-country": "India"
    },
    {
      "affiliation-city": "New Delhi",
      "affilname": "Indraprastha Apollo Hospitals",
      "affiliation-country": "India"
    }
  ],
  "coredata": {
    "srctype": "j",
    "eid": "2-s2.0-85083171050",
    "pubmed-id": "32305024",
    "prism:coverDate": "2020-07-01",
    "prism:aggregationType": "Journal",
    "prism:url": "https://api.elsevier.com/content/abstract/scopus_id/85083171050",
    "dc:creator": {
      "author": [
        {
          "ce:given-name": "Raju",
          "preferred-name": {
            "ce:given-name": "Raju",
            "ce:initials": "R.",
            "ce:surname": "Vaishya",
            "ce:indexed-name": "Vaishya R."
          },
          "@seq": "1",
          "ce:init

Based on the returned data, further analysis is conductable. Therefore, two notebooks are created to analyse data linked to: 
<ul>
  <li>affiliation</li>
  <li>coredata</li>
</ul>    

Thusly, the already fetched coredata and affiliation are read and combined to a DataFrame for further processing.

In [10]:
df_current_extra_info = pd.DataFrame()
try:
    read_affiliation = pd.read_pickle('extra_info_affiliation_CS.pkl')
    read_coredata = pd.read_pickle('extra_info_coredata_CS.pkl')
    df_current_extra_info['affiliation'] = read_affiliation
    df_current_extra_info['coredata'] = read_coredata
    df_current_extra_info
except:
    print("The DataFrame is empty")
    #if the dataframe is not empty set the variable to show the dataframe

The length of the DataFrame containing the current information is assigned to a variable to be used for further processing. 
Therefore, the length will be used within a while loop as a starting index. 

In [11]:
len_df_current_extra_info = len(df_current_extra_info)
len_df_current_extra_info

74302

In [12]:
df_current_extra_info

Unnamed: 0,affiliation,coredata
0,"[{'affiliation-city': 'Palo Alto', 'affilname'...","{'srctype': 'j', 'eid': '2-s2.0-85083266658', ..."
1,"[{'affiliation-city': 'Seattle', 'affilname': ...","{'srctype': 'j', 'prism:issueIdentifier': '7',..."
2,"[{'affiliation-city': 'Cambridge', 'affilname'...","{'srctype': 'j', 'prism:issueIdentifier': '8',..."
3,"[{'affiliation-city': 'Madison', 'affilname': ...","{'srctype': 'j', 'prism:issueIdentifier': '9',..."
4,"[{'affiliation-city': 'Los Angeles', 'affilnam...","{'srctype': 'j', 'prism:issueIdentifier': '11'..."
...,...,...
74297,"{'affiliation-city': 'Rockville', 'affilname':...","{'srctype': 'j', 'eid': '2-s2.0-85092678139', ..."
74298,"[{'affiliation-city': 'Glastonbury', 'affilnam...","{'srctype': 'j', 'eid': '2-s2.0-85087468210', ..."
74299,"{'affiliation-city': 'Newark', 'affilname': 'U...","{'srctype': 'j', 'eid': '2-s2.0-85092677974', ..."
74300,"[{'affiliation-city': 'Istanbul', 'affilname':...","{'srctype': 'j', 'prism:issueIdentifier': '4',..."


In [13]:
def contains_only_None(dic):
    """
    This functions inspects an dictionary and returns True if it solely contains None values
    """
    return len(dic) == sum(value == None for value in dic.values())    

In [14]:
def append_fetched_data_to_df(df_current_extra_info, dic):
    """
    This function appends or inserts newly fetched data to the DataFrame containing scopus data.
    """
    #df_current_extra_info -> holding the latest data, new data needs to be appended to it, 
    #df_newly_fetched_transposed -> holdy newly fetched data, needs to be inserted or fetched
    
    if contains_only_None(dic):
        placeholder_entries = pd.DataFrame(np.empty((len(dict_new_extra_info),2),dtype=object),columns=['affiliation','coredata'], index=dict_new_extra_info.keys())
        df_newly_fetched_transposed = placeholder_entries
        print(placeholder_entries)
    else:
        #Prior appending, the dictionary is transformed to a DataFrame
        df_newly_fetched = pd.DataFrame(dic)
        #For readability, the DataFrame is transposed
        df_newly_fetched_transposed = df_newly_fetched.T
        print(df_newly_fetched_transposed)
    
    #Insert newly fetched rows which were previously not successful appended
    for index, row in df_newly_fetched_transposed.iterrows():
        #insert to current extra info DataFrame because the row is existent
        if index in df_current_extra_info.index and row.affiliation is not None:
            df_current_extra_info.loc[index] = row
        #append to current extra info DataFrame because the row is new     
        if index not in df_current_extra_info.index:
            df_current_extra_info = df_current_extra_info.append(row, ignore_index=True)
            
    #returning DataFrame with inserted and replaced rows. 
    return df_current_extra_info

Both Dataframes columns are stored each to an object. The series objects are stored to each to a pkl-file which is not exceeding the size of 100MB allowing GitHub uploads.

In [15]:
def store_df_columns(df):
    ser_affiliation = df['affiliation']
    ser_coredata = df['coredata']
    ser_affiliation.to_pickle('extra_info_affiliation_CS.pkl')
    ser_coredata.to_pickle('extra_info_coredata_CS.pkl')
    return ser_affiliation, ser_coredata

In [16]:
# placeholder_entries = pd.DataFrame(np.empty((4,2),dtype=object),columns=['affiliation','coredata'])

In [17]:
# placeholder_entries

Subsequently, the fetched scopus data is stored within a dictionary. Besides, the print function is used to show the state of the process by displaying the latest fetched information. 

In [18]:
%%time
dict_new_extra_info = dict()
len_dois = len(doi_counted)
def trigger_fetching():
    threshold = 0 
    i = len_df_current_extra_info
    while i < len_dois: #-> upto modified, normally len_dois
        dict_new_extra_info[i] = fetch_scopus_api(client, doi_counted.index[i])
        print("Position fetched: " + str(i) + " -> " +  doi_counted.index[i])
        i = i + 1 
#         threshold = threshold + 1
#         if threshold > 99:
#             df_combined_extra_info = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info)
#             stored_series = store_df_columns(df_combined_extra_info)
#             threshold = 0
#             print("batch saved")
trigger_fetching()

Wall time: 6.02 ms


The following cell is useful when the process above is interrupted. Therefore, the dictionary containing fetched information can be narrowed down to useful entries. 

In [19]:
# def save_new_extra_info(len_df_current_extra_info, upto):
#     """
#     This function is used to separate successfull API calls from API calls which were prevented due to an invalid API-Key.
#     As a result, this function returns a range of valid entries up to the given parameter. 
#     """
#     dict_new_extra_info_saver = dict()
#     i = len_df_current_extra_info
#     while i < upto:
#         #print("Position: " + str(i) + " -> " +  doi_counted.index[i])
#         dict_new_extra_info_saver[i] = dict_new_extra_info[i]
#         i = i + 1 
#     return dict_new_extra_info_saver

The existing and newly fetched information are combined into one DataFrame. 

In [20]:
df_combined_extra_info = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info)
df_combined_extra_info

Empty DataFrame
Columns: [affiliation, coredata]
Index: []


Unnamed: 0,affiliation,coredata
0,"[{'affiliation-city': 'Palo Alto', 'affilname'...","{'srctype': 'j', 'eid': '2-s2.0-85083266658', ..."
1,"[{'affiliation-city': 'Seattle', 'affilname': ...","{'srctype': 'j', 'prism:issueIdentifier': '7',..."
2,"[{'affiliation-city': 'Cambridge', 'affilname'...","{'srctype': 'j', 'prism:issueIdentifier': '8',..."
3,"[{'affiliation-city': 'Madison', 'affilname': ...","{'srctype': 'j', 'prism:issueIdentifier': '9',..."
4,"[{'affiliation-city': 'Los Angeles', 'affilnam...","{'srctype': 'j', 'prism:issueIdentifier': '11'..."
...,...,...
74297,"{'affiliation-city': 'Rockville', 'affilname':...","{'srctype': 'j', 'eid': '2-s2.0-85092678139', ..."
74298,"[{'affiliation-city': 'Glastonbury', 'affilnam...","{'srctype': 'j', 'eid': '2-s2.0-85087468210', ..."
74299,"{'affiliation-city': 'Newark', 'affilname': 'U...","{'srctype': 'j', 'eid': '2-s2.0-85092677974', ..."
74300,"[{'affiliation-city': 'Istanbul', 'affilname':...","{'srctype': 'j', 'prism:issueIdentifier': '4',..."


In [21]:
#to big for GitHub
#df_combined_extra_info.to_csv('extra_info_CS5099.csv', sep='\t')

Both Dataframes columns are stored each to an object. The series objects are stored to each to a pkl-file which is not exceeding the size of 100MB allowing GitHub uploads.

In [22]:
stored_series = store_df_columns(df_combined_extra_info)
stored_series[0]

0        [{'affiliation-city': 'Palo Alto', 'affilname'...
1        [{'affiliation-city': 'Seattle', 'affilname': ...
2        [{'affiliation-city': 'Cambridge', 'affilname'...
3        [{'affiliation-city': 'Madison', 'affilname': ...
4        [{'affiliation-city': 'Los Angeles', 'affilnam...
                               ...                        
74297    {'affiliation-city': 'Rockville', 'affilname':...
74298    [{'affiliation-city': 'Glastonbury', 'affilnam...
74299    {'affiliation-city': 'Newark', 'affilname': 'U...
74300    [{'affiliation-city': 'Istanbul', 'affilname':...
74301                                                 None
Name: affiliation, Length: 74302, dtype: object

In [23]:
stored_series[1]

0        {'srctype': 'j', 'eid': '2-s2.0-85083266658', ...
1        {'srctype': 'j', 'prism:issueIdentifier': '7',...
2        {'srctype': 'j', 'prism:issueIdentifier': '8',...
3        {'srctype': 'j', 'prism:issueIdentifier': '9',...
4        {'srctype': 'j', 'prism:issueIdentifier': '11'...
                               ...                        
74297    {'srctype': 'j', 'eid': '2-s2.0-85092678139', ...
74298    {'srctype': 'j', 'eid': '2-s2.0-85087468210', ...
74299    {'srctype': 'j', 'eid': '2-s2.0-85092677974', ...
74300    {'srctype': 'j', 'prism:issueIdentifier': '4',...
74301                                                 None
Name: coredata, Length: 74302, dtype: object

Verifying that the returned None values are due to non existent data and not to an invalid API-Key

In [24]:
def enrich_data():
    """
    This function fetches again the scopus API and solely asks for information which previously returned None. 
    """
    #Add a new column to the DataFrame containg the DOI's which are used to fetch the API
    ser_doi = pd.Series(doi_counted.index[:len_data])
    df_current_extra_info_checker = df_combined_extra_info
    df_current_extra_info_checker['doi'] = ser_doi
    
    #fetching solely none entries 
    len_df_current_extra_info_checker = len(df_current_extra_info_checker)
    dict_new_extra_info_checker = dict()
    len_df_current_extra_info_checker = 45000
    i = 40000 
    while i < len_df_current_extra_info_checker: 
        if df_current_extra_info_checker['affiliation'][i] == None:
            dict_new_extra_info_checker[i] = fetch_scopus_api(client, ser_doi[i])
            print("Position fetched again: " + str(i) + " -> " +  ser_doi[i])
        i = i + 1
    #check if at least one value is not None, otherwise the process is finished
    if contains_only_None(dict_new_extra_info_checker):
        print("The scopus API did not returned new information for existing None values.")
    else:
        #There is new information to insert to the existing DataFrame
        df_combined_extra_info_fetched_again  = append_fetched_data_to_df(df_current_extra_info, dict_new_extra_info_checker)
        store_df_columns(df_combined_extra_info_fetched_again)
        # Show an extract of the newly fetched and inserted data
#         df_combined_extra_info_fetched_again['check_doi'] = ser_doi
#         df_combined_extra_info_fetched_again.head(30)

In [25]:
len_dois = len(doi_counted)
len_dois

74302

In [26]:
len_data = len(stored_series[0])
len_data 

74302

In [27]:
if len_dois == len_data:
    enrich_data()
else:
    print("There are entries which are not yet fetched from the scopus API.")

Position fetched again: 40270 -> 10.1097/nen.0b013e3181a407ee
Position fetched again: 40274 -> 10.1097/or9.0000000000000032
Position fetched again: 40308 -> 10.1097/pq9.0000000000000242
Position fetched again: 40309 -> 10.1097/pq9.0000000000000255
Position fetched again: 40310 -> 10.1097/pq9.0000000000000257
Position fetched again: 40311 -> 10.1097/pq9.0000000000000258
Position fetched again: 40312 -> 10.1097/pq9.0000000000000259
Position fetched again: 40313 -> 10.1097/pq9.0000000000000260
Position fetched again: 40314 -> 10.1097/pq9.0000000000000263
Position fetched again: 40315 -> 10.1097/pq9.0000000000000264
Position fetched again: 40316 -> 10.1097/pq9.0000000000000265
Position fetched again: 40317 -> 10.1097/pq9.0000000000000266
Position fetched again: 40318 -> 10.1097/pq9.0000000000000267
Position fetched again: 40319 -> 10.1097/pq9.0000000000000268
Position fetched again: 40320 -> 10.1097/pq9.0000000000000269
Position fetched again: 40321 -> 10.1097/pq9.0000000000000270
Position

Position fetched again: 40679 -> 10.1101/2020.01.29.925354
Position fetched again: 40680 -> 10.1101/2020.01.29.925867
Position fetched again: 40681 -> 10.1101/2020.01.30.20019877
Position fetched again: 40682 -> 10.1101/2020.01.30.926477
Position fetched again: 40683 -> 10.1101/2020.01.30.927889
Position fetched again: 40684 -> 10.1101/2020.01.31.20019265
Position fetched again: 40685 -> 10.1101/2020.01.31.928796
Position fetched again: 40686 -> 10.1101/2020.01.31.929042
Position fetched again: 40687 -> 10.1101/2020.01.31.929547
Position fetched again: 40688 -> 10.1101/2020.01.31.929695
Position fetched again: 40689 -> 10.1101/2020.02.01.930537
Position fetched again: 40690 -> 10.1101/2020.02.03.20019497
Position fetched again: 40691 -> 10.1101/2020.02.03.20020206
Position fetched again: 40692 -> 10.1101/2020.02.03.20020263
Position fetched again: 40693 -> 10.1101/2020.02.03.20020289
Position fetched again: 40694 -> 10.1101/2020.02.03.931766
Position fetched again: 40695 -> 10.1101/202

Position fetched again: 40816 -> 10.1101/2020.02.24.20027623
Position fetched again: 40817 -> 10.1101/2020.02.24.963348
Position fetched again: 40818 -> 10.1101/2020.02.25.20024398
Position fetched again: 40819 -> 10.1101/2020.02.25.20024711
Position fetched again: 40820 -> 10.1101/2020.02.25.20027433
Position fetched again: 40821 -> 10.1101/2020.02.25.20027755
Position fetched again: 40822 -> 10.1101/2020.02.25.960930
Position fetched again: 40823 -> 10.1101/2020.02.25.965434
Position fetched again: 40824 -> 10.1101/2020.02.25.965582
Position fetched again: 40825 -> 10.1101/2020.02.26.20026971
Position fetched again: 40826 -> 10.1101/2020.02.26.20027938
Position fetched again: 40827 -> 10.1101/2020.02.26.20028191
Position fetched again: 40828 -> 10.1101/2020.02.26.20028225
Position fetched again: 40829 -> 10.1101/2020.02.26.20028308
Position fetched again: 40830 -> 10.1101/2020.02.26.20028373
Position fetched again: 40831 -> 10.1101/2020.02.26.961938
Position fetched again: 40832 -> 1

Position fetched again: 40952 -> 10.1101/2020.03.12.20027185
Position fetched again: 40953 -> 10.1101/2020.03.12.20034678
Position fetched again: 40954 -> 10.1101/2020.03.12.20034686
Position fetched again: 40955 -> 10.1101/2020.03.12.20034728
Position fetched again: 40956 -> 10.1101/2020.03.12.20034736
Position fetched again: 40957 -> 10.1101/2020.03.12.20034793
Position fetched again: 40958 -> 10.1101/2020.03.12.20035048
Position fetched again: 40959 -> 10.1101/2020.03.12.988246
Position fetched again: 40960 -> 10.1101/2020.03.12.988634
Position fetched again: 40961 -> 10.1101/2020.03.13.20033290
Position fetched again: 40962 -> 10.1101/2020.03.13.20034082
Position fetched again: 40963 -> 10.1101/2020.03.13.20034496
Position fetched again: 40964 -> 10.1101/2020.03.13.20035345
Position fetched again: 40965 -> 10.1101/2020.03.13.20035568
Position fetched again: 40966 -> 10.1101/2020.03.13.990242
Position fetched again: 40967 -> 10.1101/2020.03.13.990267
Position fetched again: 40968 ->

Position fetched again: 41088 -> 10.1101/2020.03.22.20040642
Position fetched again: 41089 -> 10.1101/2020.03.22.20040758
Position fetched again: 41090 -> 10.1101/2020.03.22.20040782
Position fetched again: 41091 -> 10.1101/2020.03.22.20040832
Position fetched again: 41092 -> 10.1101/2020.03.22.20040899
Position fetched again: 41093 -> 10.1101/2020.03.22.20040915
Position fetched again: 41094 -> 10.1101/2020.03.22.20040949
Position fetched again: 41095 -> 10.1101/2020.03.22.20040964
Position fetched again: 41096 -> 10.1101/2020.03.22.20041061
Position fetched again: 41097 -> 10.1101/2020.03.22.20041145
Position fetched again: 41098 -> 10.1101/2020.03.22.20041244
Position fetched again: 41099 -> 10.1101/2020.03.22.20041277
Position fetched again: 41100 -> 10.1101/2020.03.23.002931
Position fetched again: 41101 -> 10.1101/2020.03.23.004176
Position fetched again: 41102 -> 10.1101/2020.03.23.004580
Position fetched again: 41103 -> 10.1101/2020.03.23.20034058
Position fetched again: 41104 

Position fetched again: 41224 -> 10.1101/2020.03.29.008631
Position fetched again: 41225 -> 10.1101/2020.03.29.009464
Position fetched again: 41226 -> 10.1101/2020.03.29.013342
Position fetched again: 41227 -> 10.1101/2020.03.29.013490
Position fetched again: 41228 -> 10.1101/2020.03.29.014209
Position fetched again: 41229 -> 10.1101/2020.03.29.014381
Position fetched again: 41230 -> 10.1101/2020.03.29.014407
Position fetched again: 41231 -> 10.1101/2020.03.29.014761
Position fetched again: 41232 -> 10.1101/2020.03.29.20039693
Position fetched again: 41233 -> 10.1101/2020.03.29.20041962
Position fetched again: 41234 -> 10.1101/2020.03.29.20044461
Position fetched again: 41235 -> 10.1101/2020.03.29.20045187
Position fetched again: 41236 -> 10.1101/2020.03.29.20046565
Position fetched again: 41237 -> 10.1101/2020.03.29.20046789
Position fetched again: 41238 -> 10.1101/2020.03.29.20046862
Position fetched again: 41239 -> 10.1101/2020.03.29.20046870
Position fetched again: 41240 -> 10.1101

Position fetched again: 41360 -> 10.1101/2020.04.03.020602
Position fetched again: 41361 -> 10.1101/2020.04.03.022723
Position fetched again: 41362 -> 10.1101/2020.04.03.022939
Position fetched again: 41363 -> 10.1101/2020.04.03.023135
Position fetched again: 41364 -> 10.1101/2020.04.03.024075
Position fetched again: 41365 -> 10.1101/2020.04.03.024521
Position fetched again: 41366 -> 10.1101/2020.04.03.024539
Position fetched again: 41367 -> 10.1101/2020.04.03.024885
Position fetched again: 41368 -> 10.1101/2020.04.03.20043992
Position fetched again: 41369 -> 10.1101/2020.04.03.20047175
Position fetched again: 41370 -> 10.1101/2020.04.03.20047977
Position fetched again: 41371 -> 10.1101/2020.04.03.20048389
Position fetched again: 41372 -> 10.1101/2020.04.03.20048868
Position fetched again: 41373 -> 10.1101/2020.04.03.20049734
Position fetched again: 41374 -> 10.1101/2020.04.03.20051649
Position fetched again: 41375 -> 10.1101/2020.04.03.20051706
Position fetched again: 41376 -> 10.1101

Position fetched again: 41496 -> 10.1101/2020.04.07.20056432
Position fetched again: 41497 -> 10.1101/2020.04.07.20056739
Position fetched again: 41498 -> 10.1101/2020.04.07.20056754
Position fetched again: 41499 -> 10.1101/2020.04.07.20056788
Position fetched again: 41500 -> 10.1101/2020.04.07.20056804
Position fetched again: 41501 -> 10.1101/2020.04.07.20056937
Position fetched again: 41502 -> 10.1101/2020.04.07.20057216
Position fetched again: 41503 -> 10.1101/2020.04.07.20057224
Position fetched again: 41504 -> 10.1101/2020.04.07.20057299
Position fetched again: 41505 -> 10.1101/2020.04.07.20057356
Position fetched again: 41506 -> 10.1101/2020.04.08.013516
Position fetched again: 41507 -> 10.1101/2020.04.08.031203
Position fetched again: 41508 -> 10.1101/2020.04.08.031435
Position fetched again: 41509 -> 10.1101/2020.04.08.031526
Position fetched again: 41510 -> 10.1101/2020.04.08.031856
Position fetched again: 41511 -> 10.1101/2020.04.08.031948
Position fetched again: 41512 -> 10.

Position fetched again: 41632 -> 10.1101/2020.04.12.20061929
Position fetched again: 41633 -> 10.1101/2020.04.12.20062380
Position fetched again: 41634 -> 10.1101/2020.04.12.20062497
Position fetched again: 41635 -> 10.1101/2020.04.12.20062604
Position fetched again: 41636 -> 10.1101/2020.04.12.20062661
Position fetched again: 41637 -> 10.1101/2020.04.12.20062794
Position fetched again: 41638 -> 10.1101/2020.04.12.20062869
Position fetched again: 41639 -> 10.1101/2020.04.12.20062893
Position fetched again: 41640 -> 10.1101/2020.04.12.20062943
Position fetched again: 41641 -> 10.1101/2020.04.13.031245
Position fetched again: 41642 -> 10.1101/2020.04.13.036079
Position fetched again: 41643 -> 10.1101/2020.04.13.038620
Position fetched again: 41644 -> 10.1101/2020.04.13.038687
Position fetched again: 41645 -> 10.1101/2020.04.13.038752
Position fetched again: 41646 -> 10.1101/2020.04.13.039198
Position fetched again: 41647 -> 10.1101/2020.04.13.039263
Position fetched again: 41648 -> 10.11

Position fetched again: 41768 -> 10.1101/2020.04.16.20067504
Position fetched again: 41769 -> 10.1101/2020.04.16.20067611
Position fetched again: 41770 -> 10.1101/2020.04.16.20067645
Position fetched again: 41771 -> 10.1101/2020.04.16.20067728
Position fetched again: 41772 -> 10.1101/2020.04.16.20067751
Position fetched again: 41773 -> 10.1101/2020.04.16.20067835
Position fetched again: 41774 -> 10.1101/2020.04.16.20067975
Position fetched again: 41775 -> 10.1101/2020.04.16.20068163
Position fetched again: 41776 -> 10.1101/2020.04.16.20068205
Position fetched again: 41777 -> 10.1101/2020.04.16.20068213
Position fetched again: 41778 -> 10.1101/2020.04.16.20068312
Position fetched again: 41779 -> 10.1101/2020.04.16.20068379
Position fetched again: 41780 -> 10.1101/2020.04.16.20068403
Position fetched again: 41781 -> 10.1101/2020.04.16.20068411
Position fetched again: 41782 -> 10.1101/2020.04.17.042366
Position fetched again: 41783 -> 10.1101/2020.04.17.044743
Position fetched again: 4178

Position fetched again: 41904 -> 10.1101/2020.04.20.20073031
Position fetched again: 41905 -> 10.1101/2020.04.20.20073056
Position fetched again: 41906 -> 10.1101/2020.04.20.20073098
Position fetched again: 41907 -> 10.1101/2020.04.20.20073130
Position fetched again: 41908 -> 10.1101/2020.04.20.20073213
Position fetched again: 41909 -> 10.1101/2020.04.20.20073288
Position fetched again: 41910 -> 10.1101/2020.04.20.20073338
Position fetched again: 41911 -> 10.1101/2020.04.21.050633
Position fetched again: 41912 -> 10.1101/2020.04.21.051201
Position fetched again: 41913 -> 10.1101/2020.04.21.051912
Position fetched again: 41914 -> 10.1101/2020.04.21.052084
Position fetched again: 41915 -> 10.1101/2020.04.21.052209
Position fetched again: 41916 -> 10.1101/2020.04.21.052639
Position fetched again: 41917 -> 10.1101/2020.04.21.053009
Position fetched again: 41918 -> 10.1101/2020.04.21.053058
Position fetched again: 41919 -> 10.1101/2020.04.21.053199
Position fetched again: 41920 -> 10.1101/2

Position fetched again: 42040 -> 10.1101/2020.04.24.20077891
Position fetched again: 42041 -> 10.1101/2020.04.24.20077933
Position fetched again: 42042 -> 10.1101/2020.04.24.20077966
Position fetched again: 42043 -> 10.1101/2020.04.24.20078006
Position fetched again: 42044 -> 10.1101/2020.04.24.20078113
Position fetched again: 42045 -> 10.1101/2020.04.24.20078204
Position fetched again: 42046 -> 10.1101/2020.04.24.20078238
Position fetched again: 42047 -> 10.1101/2020.04.24.20078287
Position fetched again: 42048 -> 10.1101/2020.04.24.20078303
Position fetched again: 42049 -> 10.1101/2020.04.24.20078477
Position fetched again: 42050 -> 10.1101/2020.04.24.20078485
Position fetched again: 42051 -> 10.1101/2020.04.24.20078568
Position fetched again: 42052 -> 10.1101/2020.04.24.20078576
Position fetched again: 42053 -> 10.1101/2020.04.24.20078584
Position fetched again: 42054 -> 10.1101/2020.04.24.20078691
Position fetched again: 42055 -> 10.1101/2020.04.24.20078741
Position fetched again: 

Position fetched again: 42176 -> 10.1101/2020.04.28.20081687
Position fetched again: 42177 -> 10.1101/2020.04.28.20081844
Position fetched again: 42178 -> 10.1101/2020.04.28.20082453
Position fetched again: 42179 -> 10.1101/2020.04.28.20082644
Position fetched again: 42180 -> 10.1101/2020.04.28.20082669
Position fetched again: 42181 -> 10.1101/2020.04.28.20082735
Position fetched again: 42182 -> 10.1101/2020.04.28.20082743
Position fetched again: 42183 -> 10.1101/2020.04.28.20082784
Position fetched again: 42184 -> 10.1101/2020.04.28.20082966
Position fetched again: 42185 -> 10.1101/2020.04.28.20083048
Position fetched again: 42186 -> 10.1101/2020.04.28.20083113
Position fetched again: 42187 -> 10.1101/2020.04.28.20083147
Position fetched again: 42188 -> 10.1101/2020.04.28.20083154
Position fetched again: 42189 -> 10.1101/2020.04.28.20083261
Position fetched again: 42190 -> 10.1101/2020.04.28.20083279
Position fetched again: 42191 -> 10.1101/2020.04.28.20083295
Position fetched again: 

Position fetched again: 42312 -> 10.1101/2020.05.01.20088237
Position fetched again: 42313 -> 10.1101/2020.05.02.043554
Position fetched again: 42314 -> 10.1101/2020.05.02.071506
Position fetched again: 42315 -> 10.1101/2020.05.02.071811
Position fetched again: 42316 -> 10.1101/2020.05.02.072439
Position fetched again: 42317 -> 10.1101/2020.05.02.073320
Position fetched again: 42318 -> 10.1101/2020.05.02.074021
Position fetched again: 42319 -> 10.1101/2020.05.02.20080390
Position fetched again: 42320 -> 10.1101/2020.05.02.20084947
Position fetched again: 42321 -> 10.1101/2020.05.02.20088013
Position fetched again: 42322 -> 10.1101/2020.05.02.20088336
Position fetched again: 42323 -> 10.1101/2020.05.02.20088344
Position fetched again: 42324 -> 10.1101/2020.05.02.20088427
Position fetched again: 42325 -> 10.1101/2020.05.02.20088492
Position fetched again: 42326 -> 10.1101/2020.05.02.20088591
Position fetched again: 42327 -> 10.1101/2020.05.02.20088666
Position fetched again: 42328 -> 10.

KeyboardInterrupt: 