# Tweets from bioRxiv and medRxiv

Find and display events from a single DOI prefix in a specified period of time, and find the most tweeted DOIs.

In [11]:
import sys
sys.path.insert(0, '..')

import pandas # data analysis library
import json
import datetime
import mrced2 # module to run event data queries
import os # some file manipulations
import math # some number manipulations
import altair.vegalite.v3 as alt # some data visualizations
from IPython.display import Markdown as md # some markdown manipulations
from datetime import datetime, timedelta # some date manipulations

In [12]:
email = "martin@front-matter.io"
prefix = "10.1101"
start_date = (datetime.today() - timedelta(7)).strftime('%Y-%m-%d')
end_date = datetime.today().strftime('%Y-%m-%d')

In [13]:
ed = mrced2.eventData(email = email)
ed.buildQuery({'obj-id.prefix' : prefix, 'source': 'twitter', 'rows': 0,'from-occurred-date' : start_date, 'until-occurred-date' : end_date})

https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&obj-id.prefix=10.1101&source=twitter&rows=0&from-occurred-date=2021-05-01&until-occurred-date=2021-05-08


In [14]:
ed.runQuery(retry = 5)

Event Data query started...
API query complete  200
output file written to 1101/tweets.json


In [15]:
pages = math.ceil(ed.events.getHits() / 1000)

11416 events found


In [16]:
email = "martin@front-matter.io"
prefix = "10.1101"
start_date = (datetime.today() - timedelta(7)).strftime('%Y-%m-%d')
end_date = datetime.today().strftime('%Y-%m-%d')

# find the all result pages for the search
ed = mrced2.eventData(email = email)
ed.getAllPages(pages, {'rows': 1000, 'obj-id.prefix' : prefix, 'source': 'twitter', 'from-occurred-date' : start_date, 'until-occurred-date' : end_date}, fileprefix = '1101/tweets_') 

https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&rows=1000&obj-id.prefix=10.1101&source=twitter&from-occurred-date=2021-05-01&until-occurred-date=2021-05-08
Event Data query started...
API query complete  200
output file written to 1101/tweets_0000.json
https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&cursor=5a200954-032b-4fe9-80f9-f3d40125a038&rows=1000&obj-id.prefix=10.1101&source=twitter&from-occurred-date=2021-05-01&until-occurred-date=2021-05-08
Event Data query started...
API query complete  200
output file written to 1101/tweets_0001.json
https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&cursor=e020c146-f47b-48ec-96af-dc476f4c6172&rows=1000&obj-id.prefix=10.1101&source=twitter&from-occurred-date=2021-05-01&until-occurred-date=2021-05-08
Event Data query started...
API query complete  200
output file written to 1101/tweets_0002.json
https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&cursor=51732175-91ef-44ed-85a3-bf306feaaad4&rows=

Initialisation to look at the properties of the results.

In [17]:
jd1 = mrced2.eventRecord() # instance of a class to interpret the events
files = os.listdir('1101') # get all the filenames

jd1.mergeJsons(files, folder = '1101') # load the json event data from multiple files

failed to load .DS_Store
failed to load preprint_tweets_no_abstract.csv
failed to load .gitkeep
failed to load tweets_sorted.json
failed to load preprint_tweets2021-05-08 09:25:56.363439.csv
failed to load preprint_tweets.csv
failed to load preprint_tweets_2021-05-08.csv
output file written to 1101/tweets.json


In [18]:
js = json.load(open("1101/tweets.json"))
df=pandas.json_normalize(js, record_path = ['message', 'events'])
gdf = df.groupby(['obj_id']).size().reset_index(name='count').sort_values('count', ascending=False)
cdf = gdf[gdf['count'] >= 5]

In [19]:
email = "martin@front-matter.io"
rest = mrced2.restApi(email = email)

data = []

for index, row in cdf.iterrows():
    rest.runQuery(row)
    if rest.work is not None and rest.work["posted"] >= start_date:
        data.append(rest.work)
    
tdf = pandas.DataFrame(data, columns=['doi','tweets','archive','subject-area','covid','title','authors','abstract','posted'])
tdf.to_csv('1101/preprint_tweets_' + str(datetime.today().strftime('%Y-%m-%d')) + '.csv')

tdf.head(50)

REST API query started for 10.1101/2021.04.10.21255248...
REST API query complete  200
REST API query started for 10.1101/2020.04.30.066209...
REST API query complete  200
REST API query started for 10.1101/2021.02.18.21251986...
REST API query complete  200
REST API query started for 10.1101/2021.02.27.433180...
REST API query complete  200
REST API query started for 10.1101/2021.04.29.442030...
REST API query complete  200
REST API query started for 10.1101/2021.04.29.441939...
REST API query complete  200
REST API query started for 10.1101/2021.03.11.21253225...
REST API query complete  200
REST API query started for 10.1101/2021.04.27.441510...
REST API query complete  200
REST API query started for 10.1101/2021.04.26.21256152...
REST API query complete  200
REST API query started for 10.1101/2021.01.29.21250653...
REST API query complete  200
REST API query started for 10.1101/2021.03.11.21253275...
REST API query complete  200
REST API query started for 10.1101/2021.03.20.2125397

Unnamed: 0,doi,tweets,archive,subject-area,covid,title,authors,abstract,posted
0,10.1101/2021.05.02.442312,134,bioRxiv,Microbiology,False,Aminoglycoside antibiotics inhibit phage infec...,"Larissa Kever, Aël Hardy, Tom Luthe, Max Hünne...","<p>In response to viral predation, bacteria ha...",2021-05-02
1,10.1101/2021.05.01.442252,64,bioRxiv,Cancer Biology,False,Mesenchymal Lineage Heterogeneity Underlies No...,"Erin J. Helms, Mark W. Berry, R. Crystal Chaw,...",<p>Cancer-associated fibroblast (CAF) heteroge...,2021-05-02
2,10.1101/2021.05.02.442311,54,bioRxiv,Microbiology,False,Vibrio cholerae biofilm dispersal regulator ca...,"Praveen K. Singh, Daniel K.H. Rode, Pauline Bu...",<p>The extracellular matrix is a defining feat...,2021-05-02
3,10.1101/2021.05.06.442916,41,bioRxiv,Microbiology,True,Identification of DAXX As A Restriction Factor...,"Alice Mac Kain, Ghizlane Maarifi, Sophie-Marie...",<p>While interferon restricts SARS-CoV-2 repli...,2021-05-06
4,10.1101/2021.05.02.442342,41,bioRxiv,Molecular Biology,False,Cytosolic aggregation of mitochondrial protein...,"Urszula Nowicka, Piotr Chroscicki, Karen Stroo...",<p>Mitochondria are organelles with their own ...,2021-05-02
5,10.1101/2021.05.01.442281,40,bioRxiv,Bioinformatics,False,RNA splicing programs define tissue compartmen...,"Julia Eve Olivieri, Roozbeh Dehghannasiri, Pet...",<p>More than 95% of human genes are alternativ...,2021-05-02
6,10.1101/2021.05.01.441648,38,bioRxiv,Neuroscience,False,A multisensory circuit for gating intense aver...,"Arun Asok, Félix Leroy, Cameron Parro, Christo...",<p>The ventral hippocampus (vHPC) is critical ...,2021-05-02
7,10.1101/2021.05.03.442461,31,bioRxiv,Plant Biology,False,Misregulation of MYB16 causes stomatal cluster...,"Shao-Li Yang, Ngan Tran, Meng-Ying Tsai, Chin-...",<p>Stomata and leaf cuticle regulate water eva...,2021-05-03
8,10.1101/2021.05.05.442780,26,bioRxiv,Microbiology,True,Prior aerosol infection with lineage A SARS-Co...,"Claude Kwe Yinda, Julia R. Port, Trenton Bushm...",<p>The circulation of SARS-CoV-2 has resulted ...,2021-05-05
9,10.1101/2021.04.28.21256261,21,medRxiv,Epidemiology,True,Aspirin and NSAID use and the risk of COVID-19,"David A. Drew, Chuan-Guo Guo, Karla A. Lee, Lo...",<p>Early reports raised concern that use of no...,2021-05-02


### Tweets of bioRxiv and medRxiv preprints

In [20]:
num_rows = tdf['archive'].count()
num_covid = tdf['covid'].value_counts(ascending=True)[1]
num_biorxiv = tdf['archive'].value_counts(ascending=True)[1]
num_medrxiv = tdf['archive'].value_counts(ascending=True)[0]
end_date = datetime.today().strftime('%Y-%m-%d')
max_count = tdf['tweets'].max()

md('{} preprints (including {} covering SARS-CoV-2, {} from bioRxiv and {} from medRxiv) published in the last 7 days before {} had been tweeted at least 5 times (maximum {}).'.format(num_rows, num_covid, num_biorxiv, num_medrxiv, end_date, max_count))

57 preprints (including 12 covering SARS-CoV-2, 48 from bioRxiv and 9 from medRxiv) published in the last 7 days before 2021-05-08 had been tweeted at least 5 times (maximum 134).