Author: Wolfgang Mueller
Date: February 2018

# This tool is about creating a file with all publication info off LiSyM SEEK


Source for the JSON access code: https://stackoverflow.com/questions/35120250/python-3-get-and-parse-json-api .
The rest was done, just looking at pretty-printed JSON and normal python documentation.


In [1]:
import urllib.request
import json
import datetime

# get a json of all publications
def getAllPublications():
    url = 'https://seek.lisym.org/publications.json'
    req = urllib.request.Request(url)
    r = urllib.request.urlopen(req).read()
    cont = json.loads(r.decode('utf-8'))

    return cont['data']


def getPublication(id):
    fullPublicationUrl = "https://seek.lisym.org/publications/%s.json" % id
    fullPublicationRequest = urllib.request.Request(fullPublicationUrl)
    fullPublicationText = urllib.request.urlopen(fullPublicationRequest).read()
    fullPublicationJson = json.loads(fullPublicationText.decode('utf-8'))
    return fullPublicationJson['data']

def parsePublication(data):
    attributes = data['attributes']
    authors =         attributes['authors']
    title =           attributes['title']
    journal =         attributes['journal']
    citation =        attributes['citation']
    published_date =  attributes['published_date']
    projects = data['relationships']['projects']


    return ({
          'title'  : title,
          'authors' : authors,
          'citation': citation,
          'journal': journal,
          'published_date' : published_date,
          'projects' : projects
        })


allPublicationsData = getAllPublications()

counter = 0

#print(cont)

##parsing json

publications = []

for item in allPublicationsData:
    # print(item)
    counter += 1
    id = item["id"]
    fullPublicationData = getPublication(id)
    
    publications.append(parsePublication(fullPublicationData))
print(counter)
# now "publications" contains a list of hashes with publication information

86


In [2]:
# How to match in python (I am a still a PERL person :-/) https://docs.python.org/3/library/re.html#match-objects
import re

# A method to extract the year from citation info
def getYear(s):
    if(s):
        found = re.search('20[0-9][0-9]',s)
        return int(found.group(0))
    else:
        # 0 AD (i.e. 1 BC), if the string is empty
        return 0000

# get the date from a hash
# needed for sorting by date
def extractDate(contentHash):
    if(contentHash['published_date']):
        return datetime.datetime.strptime(contentHash['published_date'],'%Y-%m-%d')
    else:
        # now try to infer the year from citation
        if(contentHash['citation']):
            c = contentHash['citation']
            year = getYear(c)

            return datetime.datetime.strptime('%d-01-01' % year,'%Y-%m-%d')
        return datetime.datetime.strptime('1970-01-01','%Y-%m-%d')


In [3]:
# Sort publications by date 

# https://docs.python.org/3.6/tutorial/datastructures.html?highlight=data%20structures
# sorting lists
publications.sort(key=extractDate,reverse=True)


In [5]:
#create excel
import pandas as pd
titles = []
authors = []
journals = []
years = []
citations = []
for paper in publications:
    titles.append(paper['title'])
    journals.append(paper['journal'])
    years.append(extractDate(paper))
    authors.append(", ".join(paper['authors']))
    citations.append(paper['citation'])
    
excel_pubs = pd.DataFrame({'Title': titles, 'Published date': years, 'Journal': journals, 'Authors': authors, 'Citation': citations})
excel_pubs = excel_pubs[['Title', 'Journal','Published date','Authors', 'Citation']]
writer=pd.ExcelWriter('publications_12_2_2018.xlsx', engine='xlsxwriter')
excel_pubs.to_excel(writer, sheet_name='Publications', index=False)
writer.save()
excel_pubs

Unnamed: 0,Title,Journal,Published date,Authors,Citation
0,Could inherited predisposition drive non-obese...,JHG,2018-02-07,"Marcin Krawczyk, Heike Bantel, Monika Rau, Jör...",
1,Effects of Gene Variants Controlling\r\nVitami...,Digestion,2018-02-07,"Malgorzata Jamka, Anita Arslanow, Annika Bohne...",
2,Ethanol sensitizes hepatocytes for TGF-β-trigg...,Cell Death Dis,2018-02-01,"Haristi Gaitantzi, Christoph Meyer, Pia Rakocz...",Cell Death Dis 9(2) : 113
3,Tomoelastography of the prostate using multifr...,Magn Reson Med,2018-01-01,"F. Dittmann, R. Reiter, Jing Guo, M. Haas, P. ...",Magn Reson Med. 2018 Mar;79(3):1325-1333. doi:...
4,A compact 0.5 T MR elastography device and its...,Magn Reson Med,2018-01-01,"J. Braun, H. Tzschatzsch, C. Korting, A. Ariza...",Magn Reson Med. 2018 Jan;79(1):470-478. doi: 1...
5,Model prediction and validation of an order me...,Bulletin of Mathematical Biology,2017-12-28,"Stefan Hoehme, Francois Bertaux, William Weens...",
6,pSSAlib: The partial-propensity stochastic che...,PLoS Comput Biol,2017-12-04,"Oleksandr Ostrenko, Pietro Incardona, Rajesh R...",PLoS Comput Biol 13(12) : e1005865
7,Translational learning from clinical studies p...,npj Syst Biol Appl,2017-12-01,"Markus Krauss, Ute Hofmann, Clemens Schafmayer...",npj Syst Biol Appl 3(1) : 1018
8,Physiologically-based modelling in mice sugges...,Sci Rep,2017-12-01,"Arne Schenk, Ahmed Ghallab, Ute Hofmann, Reham...",Sci Rep 7(1) : 275
9,Physiologically-based modelling in mice sugges...,Sci Rep,2017-12-01,"Arne Schenk, Ahmed Ghallab, Ute Hofmann, Reham...",Sci Rep 7(1) : 275


In [30]:
# Create a "report" concerning one publication

projectCache = dict()

def getProject(id):
    ids = id
    
    if(ids in projectCache):
        # print("-----------Hit")
        return projectCache[ids]
    else:    
        fullUrl = "https://seek.lisym.org/projects/%s.json" % ids
        fullRequest = urllib.request.Request(fullUrl)
        fullText = urllib.request.urlopen(fullRequest).read()
        fullJson = json.loads(fullText.decode('utf-8'))
        projectCache[ids]=fullJson['data']
        # print("-----------MISS")
        return projectCache[ids]

def createProjectReport(projectData):
    
    
    for project in projectData['data']:
        id =  project['id']
        title = getProject(id)['attributes']['title'];
        print(title)
    

def createReport(contentHash):
    print()
    print('------------------------------------------------------------------------------')
    print(contentHash['title'])
    print(', '.join(contentHash['authors']))
    if(contentHash['citation']):
        print(contentHash['citation'])
    else:
        print("---> No citation information, please check")
    print(contentHash['journal'])
    if(contentHash['published_date']):
        print(contentHash['published_date'])
    else:
        print(getYear(contentHash['citation']))
        
    
    createProjectReport(contentHash['projects'])
print("read")

read


In [31]:
# now iterate over all publications
# and create a small report for each of them
for contentHash in publications:
    createReport(contentHash)


------------------------------------------------------------------------------
Could inherited predisposition drive non-obese fatty liver disease? Results from German tertiary referral centers
Marcin Krawczyk, Heike Bantel, Monika Rau, Jörn M Schattenberg, Frank Grünhage, Anita Pathil, Münnever Demir, Johannes Kluwe, Tobias Boettler, Susanne Weber, Andreas Geier, Frank Lammert
---> No citation information, please check
JHG
2018-02-07
Regeneration and Repair in Acute-on-Chronic Liver Failure (LiSyM-ACLF - Pillar III)

------------------------------------------------------------------------------
Effects of Gene Variants Controlling
Vitamin D Metabolism and Serum
Levels on Hepatic Steatosis
Malgorzata Jamka, Anita Arslanow, Annika Bohner, Marcin Krawczyk, Susanne Weber, Frank Grünhage, Frank Lammert, Caroline S Stokes
---> No citation information, please check
Digestion
2018-02-07
Regeneration and Repair in Acute-on-Chronic Liver Failure (LiSyM-ACLF - Pillar III)

----------------------