# Tweets from biorXiv and medrXiv

Find and display events from a single DOI prefix in a specified period of time, and find the most tweeted DOIs.

One publisher can generate a large number of events, which need to be retrieved using multiple pages. Use this command to make multiple consecutive queries to get results.

In [61]:
import sys
sys.path.insert(0, '..')

import pandas # data analysis library
import json
import mrced2 # module to run event data queries
import os # some file manipulations
import math # some number manipulations
from IPython.display import Markdown as md # some markdown manipulations
from datetime import datetime, timedelta # some date manipulations

In [62]:
email = "martin@front-matter.io"
prefix = "10.1101"
start_date = (datetime.today() - timedelta(7)).strftime('%Y-%m-%d')
end_date = datetime.today().strftime('%Y-%m-%d')

In [63]:
ed = mrced2.eventData(email = email)
ed.buildQuery({'obj-id.prefix' : prefix, 'source': 'twitter', 'rows': 0,'from-occurred-date' : start_date, 'until-occurred-date' : end_date})

https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&obj-id.prefix=10.1101&source=twitter&rows=0&from-occurred-date=2021-04-24&until-occurred-date=2021-05-01


In [64]:
ed.runQuery(retry = 5)

Event Data query started...
API query complete  200
output file written to 1101/tweets.json


In [65]:
pages = math.ceil(ed.events.getHits() / 1000)

15377 events found


In [66]:
email = "martin@front-matter.io"
prefix = "10.1101"
start_date = (datetime.today() - timedelta(7)).strftime('%Y-%m-%d')
end_date = datetime.today().strftime('%Y-%m-%d')

# find the all result pages for the search
ed = mrced2.eventData(email = email)
ed.getAllPages(pages, {'rows': 1000, 'obj-id.prefix' : prefix, 'source': 'twitter', 'from-occurred-date' : start_date, 'until-occurred-date' : end_date}, fileprefix = '1101/tweets_') 

https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&rows=1000&obj-id.prefix=10.1101&source=twitter&from-occurred-date=2021-04-24&until-occurred-date=2021-05-01
Event Data query started...
API query complete  200
output file written to 1101/tweets_0000.json
https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&cursor=21787977-9896-43b7-8bdd-11416641b479&rows=1000&obj-id.prefix=10.1101&source=twitter&from-occurred-date=2021-04-24&until-occurred-date=2021-05-01
Event Data query started...
API query complete  200
output file written to 1101/tweets_0001.json
https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&cursor=f3600d63-f24a-4700-9cde-785ef0e2af8b&rows=1000&obj-id.prefix=10.1101&source=twitter&from-occurred-date=2021-04-24&until-occurred-date=2021-05-01
Event Data query started...
API query complete  200
output file written to 1101/tweets_0002.json
https://api.eventdata.crossref.org/v1/events?mailto=Anonymous&cursor=8f054341-75e3-4185-93a2-7ddc2b31d9b9&rows=

Initialisation to look at the properties of the results.

In [67]:
jd1 = mrced2.eventRecord() # instance of a class to interpret the events
files = os.listdir('1101') # get all the filenames

jd1.mergeJsons(files, folder = '1101') # load the json event data from multiple files

failed to load .DS_Store
failed to load .gitkeep
failed to load tweets_sorted.json
output file written to 1101/tweets.json


In [71]:
js = json.load(open("1101/tweets.json"))
df=pandas.json_normalize(js, record_path = ['message', 'events'])
# drop columns that are not needed
df = df.drop(columns=['license', 'source_token', 'evidence_record', 'terms', 'action', 'relation_type_id','subj.title','subj.pid','subj.url','subj.alternative-id','obj.pid','obj.method','obj.verification'])

In [72]:
df.head(5)

Unnamed: 0,obj_id,occurred_at,subj_id,id,source_id,timestamp,subj.issued,subj.author.url,subj.original-tweet-url,subj.original-tweet-author,obj.url
0,https://doi.org/10.1101/2021.03.04.430128,2021-04-27T23:28:48Z,twitter://status?id=1387187002806833153,9ae7ad64-956b-4102-9e97-05af72eb99b9,twitter,2021-04-28T00:11:36Z,2021-04-27T23:28:48.000Z,twitter://user?screen_name=MakingMoneyFast,twitter://status?id=1387187002806833153,,https://www.biorxiv.org/content/10.1101/2021.0...
1,https://doi.org/10.1101/2021.04.27.441512,2021-04-28T00:04:23Z,twitter://status?id=1387195954277888003,fbe49e58-b9c7-4820-a2ae-71017dc49bfa,twitter,2021-04-28T00:11:39Z,2021-04-28T00:04:23.000Z,twitter://user?screen_name=NorthernGlory2,twitter://status?id=1387194686662746116,twitter://user?screen_name=Covid19Crusher,https://www.biorxiv.org/content/10.1101/2021.0...
2,https://doi.org/10.1101/2021.04.21.21255807,2021-04-27T22:58:27Z,twitter://status?id=1387179364098453504,75e4a2f1-59a4-4e0a-bf58-aaa556947019,twitter,2021-04-28T00:11:39Z,2021-04-27T22:58:27.000Z,twitter://user?screen_name=PietLekkerkerk,twitter://status?id=1387176549657501697,twitter://user?screen_name=mkeulemans,https://www.medrxiv.org/content/10.1101/2021.0...
3,https://doi.org/10.1101/2021.04.26.441528,2021-04-28T00:04:21Z,twitter://status?id=1387195947336097795,d4d9a64e-a295-459c-8a3e-cc784dddb726,twitter,2021-04-28T00:11:40Z,2021-04-28T00:04:21.000Z,twitter://user?screen_name=AgneeshBarua,twitter://status?id=1387195729672691713,twitter://user?screen_name=SashaMikheyev,https://www.biorxiv.org/content/10.1101/2021.0...
4,https://doi.org/10.1101/2021.04.23.441128,2021-04-27T22:58:19Z,twitter://status?id=1387179332016041989,559bc6b0-26fb-499c-8b4e-3cefd774210b,twitter,2021-04-28T00:11:43Z,2021-04-27T22:58:19.000Z,twitter://user?screen_name=ivan_skelin,twitter://status?id=1387034052855750664,twitter://user?screen_name=SaxeLab,https://www.biorxiv.org/content/10.1101/2021.0...
