<a href="https://colab.research.google.com/github/mbfons/MyProject1010/blob/master/MFCopy_of_coding_club_apis.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# The NHSX Analytics Unit introduction to Python Session 3

---

This article was very useful and recommended - https://realpython.com/python-api/

Application Programming Interface (APIs) allow different systems to exchange (send or recieve) data. This could be used to automatically update a record/database or to send a data extract.

Currently, APIs are mostly used in software/app development to pass data and messages smoothly.  They also allow granular access to many large databases for analytical exploration.  However, in my experience these are often databases that are well managed and accessible already so often the API work loses potency from actual application after training.   This shouldn't deter us as API usage and being a standard way of accessing data is on the rise across government and in the NHS.  In paricular, NHSX should be leading on the call for more accessible data through APIs and so understanding how to build and use these is important.  

## Example well-known APIs

APIs are all around us but often hidden away doing the legwork to make smooth data flows e.g. weather apps on your phone, paypal, and loggin through google all use APIs.

![Commonly used APIs](https://github.com/nhs-pycom/coding-club-apis/blob/main/images/commonUses.png?raw=1)

Take a look at:

- https://any-api.com/
- https://github.com/public-apis/public-apis (note: US focussed)

**Top 50 most used APIs** *(accoriding to rapidaPIs.com in April 2021)*

Many of these enable websites and apps to quickly updated to latest information based on a search criteria or location (excet #13)

1. Skyscanner Flight Search

2. Open Weather Map

3. API-FOOTBALL

4. The Cocktail DB

5. REST Countries v1

6. Yahoo Finance

7. Love Calculator

8. URL Shortener Service

9. NasaAPI

10. Numbers

11. musiXmatch

12. SYSTRAN.io – Translation and NLP

13. Chuck Norris

14. Hearthstone

15. Currency Exchange

16. Breaking News

17. Booking

18. Free NBA

19. Deezer

20. Email Validator

21. Urban Dictionary

22. Pokemon Go

23. Recipe – Food – Nutrition

24. Investors Exchange (IEX) Trading

25. Movie Database (IMDB Alternative)


** Healthcare APIs **

The lists I found on whilst searching github and google are mainly US based.  NHS Digitial has a range of APIs available listed here: https://digital.nhs.uk/developer/api-catalogue.  Many of these are to do with passing secure records between services and the security around these.  


## Types of API 

There are four common;y used API types:

- Open/External/Public: Can be either completely open or require an API key
- Internal: Hidden from external uses
- Partner: Similar to Open APIs but use a third-party API gateway to manage access
- Composite: Access to several end points at once (useful for dev)

There are three standat sets of rules (Protocls) commonly used:

- REST
- RPC
- SOAP
- GraphQL (created by facebook)
See here for more info: https://apifriends.com/api-creation/different-types-apis/ & https://www.altexsoft.com/blog/soap-vs-rest-vs-graphql-vs-rpc/

Today we will focus on REST APIs

### REST API - Terminology

![Terminology](https://github.com/nhs-pycom/coding-club-apis/blob/main/images/terminology.png?raw=1)

The API itself defines accessible endpoints and valid request and response formats

### REST API Commands

- POST - Create
- GET - Read
- PUT - Update
- DELETE - DELETE

*note: There are others not covered here*

## Benefits of APIs

- Security for underlying database
- Consistency of output
- Separate frontend fram backend allows for interoperability
- Development without disruption or releases

https://www.england.nhs.uk/publication/open-api-architecture-policy/

### Side note on http vs https
Whilst most endpoints are https some are still http.  Note that https is the encrypted version for http communication.  Never send any sensitive or work data over a http connection.

## JSON format

The response most commonly comes in java script object notation (JSON).  This is a hierarchical list of key-value pairs similar to a Python dictionary.



In [None]:
#Example JSON layout
 
# {
#     "firstName": "Duke",
#     "lastnName": "Java",
#     "age": 18,
#     "streetAddress": "100 Internet Dr",
#     "city": :"JavaTown",
#     "state": "JA",
#     "postalCode": "12345",
#     "phoneNumbers": [
#         { "Mobile": "111-111-1111" },
#         {"Home": "222-222-2222" }
#     ]
# }

- An set of key-value pairs is called an object.

- Within an object one key can have an array of sub key-values pairs.

    - {} enclose objects
    - , separate pairs within an object 
    - : separating keys and values
    - [] enclose arrays

- Objects can contain arrays which in turn can contain further objects or arrays and so on.  This means that we can end up with fairly complex tree structures. 


# Practical


Steps: 

- Choose the API to work with
- Read the API documentation (this takes the most time)
- Start with small code, and complement it with more features.

Using the Python request package the code required is minimal (especially compared to other languages such as java).  We will also beed to import json and pprint to view the responses in a readable format. 

In [3]:
import requests
## Alternatives to requests:
# import urllib 
# import pycurl
# import postman
import json
import pprint
import pandas as pd
#!pip install xmltodict
import xmltodict
from xml.etree import ElementTree
#!pip install pubmed
from pymed import PubMed

ModuleNotFoundError: ignored

## Task 1: Find the ISS and who is currently in it
*from https://medium.com/quick-code/absolute-beginners-guide-to-slaying-apis-using-python-7b380dc82236*

In [None]:
request = requests.get('http://api.open-notify.org/iss-now.json')
print(request.status_code)

200


If a request returns a status code 200 then everything is OK, if it returns 404 then the page or resource was not found.

**Status code**
- 200 "OK"	Your request was successful!
- 201 "Created"	Your request was accepted and the resource was created.
- 400 "Bad Request"	Your request is either wrong or missing some information.
- 401 "Unauthorized"	Your request requires some additional permissions.
- 404 "Not Found"	The requested resource does not exist.
- 405 "Method Not Allowed"	The endpoint does not allow for that specific HTTP method.
- 500 "Internal Server Error"	Your request wasn’t expected and probably broke something on the server side.

To see the content which has been returned:

In [None]:
print(request.text)

{"message": "success", "timestamp": 1620309160, "iss_position": {"longitude": "-26.7771", "latitude": "-37.5550"}}


In [None]:
print(request.json())

{'message': 'success', 'timestamp': 1620309160, 'iss_position': {'longitude': '-26.7771', 'latitude': '-37.5550'}}


To get the latitude and longitude only we can filter by "iss_position

In [None]:
print(request.json()['iss_position'])

{'longitude': '-26.7771', 'latitude': '-37.5550'}


If we wanted we could now combine this with a geocoding API to give a map view.  I haven't done this here as it requires an API key but this is publicaly available if you want a go as a learning exercise. 

For the moment take a look at the documentation here: http://open-notify.org/Open-Notify-API/People-In-Space/ and spend **5-10 mins** trying to work out who is on the ISS right now.

In [None]:
#CODE IN HERE
request_names = requests.get('http://api.open-notify.org/astros.json')
print(request_names.status_code)



200


In [None]:
print(request_names.json())

{'number': 7, 'message': 'success', 'people': [{'name': 'Mark Vande Hei', 'craft': 'ISS'}, {'name': 'Oleg Novitskiy', 'craft': 'ISS'}, {'name': 'Pyotr Dubrov', 'craft': 'ISS'}, {'name': 'Thomas Pesquet', 'craft': 'ISS'}, {'name': 'Megan McArthur', 'craft': 'ISS'}, {'name': 'Shane Kimbrough', 'craft': 'ISS'}, {'name': 'Akihiko Hoshide', 'craft': 'ISS'}]}


In [None]:
print(request_names.json()['people'])

[{'name': 'Mark Vande Hei', 'craft': 'ISS'}, {'name': 'Oleg Novitskiy', 'craft': 'ISS'}, {'name': 'Pyotr Dubrov', 'craft': 'ISS'}, {'name': 'Thomas Pesquet', 'craft': 'ISS'}, {'name': 'Megan McArthur', 'craft': 'ISS'}, {'name': 'Shane Kimbrough', 'craft': 'ISS'}, {'name': 'Akihiko Hoshide', 'craft': 'ISS'}]


NASA has some great APIs for instance one which allows the astronomoy picture of the day or Mars rover public images to be requested.  Again these need a free sign-in to get a key before use - https://api.nasa.gov/

## Task 2: Search Stackoverflow Questions

Lets try a slightly more complicated request now.  This time we will use the API provided by stackoverflow to find relevent questions

The API documentation can be found here: https://api.stackexchange.com/docs

This time when making the request we want the response to be sorted to our preference and perhaps with specific search criteria.  This can be done through the url in order to reduce the amount of data being requested.  

The format for this is the same as for any url search that you may see (for instance when using a google search or scanning through a clickbait article)

In [None]:
url = 'http://api.stackexchange.com/2.2/questions?order=desc&sort=activity&site=stackoverflow'
response = requests.get(url)
print(response.status_code)

200


IT can be useful at times to see the headers of an API request or response.  The headers define a few parameters for what's accepted by the API.  Here we see that the server will only respond to json content, some details around content length and encoding, and lots of other bits and bobs. 

In [None]:
print(response.headers)

{'cache-control': 'private', 'content-length': '5313', 'content-type': 'application/json; charset=utf-8', 'content-encoding': 'gzip', 'strict-transport-security': 'max-age=15552000', 'access-control-allow-origin': '*', 'access-control-allow-methods': 'GET, POST', 'access-control-allow-credentials': 'false', 'x-content-type-options': 'nosniff', 'x-request-guid': '24cedb39-d330-4fce-9b6e-d2040a87ce58', 'content-security-policy': "upgrade-insecure-requests; frame-ancestors 'self' https://stackexchange.com", 'date': 'Thu, 06 May 2021 14:08:17 GMT'}


A better way to set up the request is to separate the parameters from the URL so they can be easily changed by a user (e.g. through a GUI)

In [None]:
url2 = 'http://api.stackexchange.com/2.2/questions'

parameters = {
    'order':'desc',
    'sort':'activity',
    'site':'stackoverflow',
}

response = requests.get(url2, params=parameters)
print(response.status_code)

200


This should be the same result which you can check if you want!

We now want to print out the response and find questions of interest.

In [None]:
print(response.json()['items'])

[{'tags': ['sql', 'sql-server'], 'owner': {'reputation': 1, 'user_id': 15853751, 'user_type': 'registered', 'profile_image': 'https://www.gravatar.com/avatar/de668bfc5e171ee9a35cbb2e11ae8c99?s=128&d=identicon&r=PG&f=1', 'display_name': 'Andreasx23', 'link': 'https://stackoverflow.com/users/15853751/andreasx23'}, 'is_answered': False, 'view_count': 14, 'answer_count': 0, 'score': -1, 'last_activity_date': 1620310148, 'creation_date': 1620309439, 'last_edit_date': 1620310148, 'question_id': 67419641, 'content_license': 'CC BY-SA 4.0', 'link': 'https://stackoverflow.com/questions/67419641/select-userid-and-sum-of-total-amount-of-books-a-user-has-made-on-all-loans', 'title': 'Select userid and sum of total amount of books a user has made on all loans'}, {'tags': ['python', 'django', 'csv'], 'owner': {'reputation': 1, 'user_id': 15853468, 'user_type': 'registered', 'profile_image': 'https://www.gravatar.com/avatar/2968f0b6bd965209ad8d6a458c140eef?s=128&d=identicon&r=PG&f=1', 'display_name':

The full JSON requested has been printed.  A nicer way of printing this is to use **pprint**

In [None]:
pprint.pprint(response.json())

You may have expected more questions to be returned than you'll see here.  The limited number is due to paging.  For stack exchange page starts at and defaults to 1, pagesize can be any value between 0 and 100 and defaults to 30.  There is a section in the stack exchange documentation on paging and how to return total results, but as the reason for paging is not to overload the API, and we dont really need all the results, we'll stick with the defaults.

APIs will also limit the rate if requests or "throttle" the number of request per second to avoid abuse or overloading.

Use a for loop to run through the items and print only those meeting a certain condition

In [None]:
term = [" r "," R ", "python", "Python","SQL"]

for data in response.json()['items']:
  if any(x in data['title'] for x in term):
    print(data['title'])
    print(data['link'])
    print()

Oracle PLSQL return value when calling function split_string
https://stackoverflow.com/questions/67419314/oracle-plsql-return-value-when-calling-function-split-string



Now try and spend **5-10 mins** attempting to get find the "answer with the most votes along with the original question"

In [None]:
# CODE IN HERE
url3 = 'http://api.stackexchange.com/2.2/questions'

parameters = {
    'order':'desc',
    'sort':'votes',
    'site':'stackoverflow',
}

response2 = requests.get(url3, params=parameters)
print(response2.status_code)



200


In [None]:
votes=(0,)

for data in response2.json()['items']:
  if data['score']>votes[0]:
    votes=data['score'],
    name=data['owner']['display_name'],
    question=data['question_id']


print(votes)
print(name)
print(question)

(25476,)
('GManNickG',)
11227809


In [None]:
pprint.pprint(response.json())

If the Stack overflow example was a bit vanilla for you then have a look at https://thedogapi.com/ or https://thecatapi.com/ which I hear are really good examples of well documented APIs.  They do require a sign-up though so I've not touched them here for time.

## Task 3: "Post" an update

Extracting data will be the most common use for data users.  However, it may be useful to also see posting data to a database.

For this we need a server to post to.  I'll use requestbin hosted by pipedream for this.  Specifcially, https://requestbin.com/r/encygohnki5lb (note: this probably won't be available after the initial session but it's easy to genereate your own).  Documentation: https://requestbin.com/docs/#examining-requests



In [None]:
url_pipedream = "https://encygohnki5lb.x.pipedream.net/training/AU/"
mydict = {
    'fav_film': 'Sound of Music',
    'fav_scene': 'Opening scene',
          }

Post your data to the requestbin

In [None]:
requests.post(url_pipedream, data = mydict)

<Response [200]>

We should now be able to see each of the posts in the requestbin log.  Feel free to have a go at "GET"ing the data back again. 

# Note on Building an API

To make a simple post and request set is fairly straight forwards but developing a fully functioning API which meets all user and REST requirements is a much larger task. 

Roughly we need to:
- create a server or app that can run in a server
- define a series of endpoints 
- for each end point define the GET, POST, PUT, DELETE functions
  - GET: This usually consists of converting a datasource into a dictionary that can be returned alongside the code "200"
  - POST/PUT: This requires a set of required fields to be defined with a series of if statements to check for duplicate or invalid entries. 
  - DELETE: Required fields and if statements to check the record exists in order to delete it

The trouble is that useable datasets have many fields with specificy conditions that need to be met and to make a useful API we would need to define a whole series of endpoints.  Thus maybe more time consuming than difficult. 

In Python the most common tools used to creat an API are FLASK and Django.  Here is a good walk through to creating an API in Flask - https://towardsdatascience.com/the-right-way-to-build-an-api-with-python-cd08ab285f8f including the repo with the full code - https://gist.github.com/jamescalam/0b309d275999f9df26fa063602753f73



In [None]:
urlpm = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi'
urlpm = 'https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pmc&term=nhsx[affiliation]'
parameterspm = {
    'db':'pmc',
    'term':'nhsx[affiliation]',
}

#responsepm = requests.get(urlpm, params=parameterspm)
responsepm = requests.get(urlpm)
print(responsepm.status_code)

200


In [None]:
responsepm.headers

{'Date': 'Thu, 06 May 2021 15:13:34 GMT', 'Server': 'Finatra', 'Strict-Transport-Security': 'max-age=31536000; includeSubDomains; preload', 'Content-Security-Policy': 'upgrade-insecure-requests', 'Cache-Control': 'private', 'NCBI-PHID': '939B7C11D2DCB035000045D8C11E7853.1.1.m_1', 'X-RateLimit-Remaining': '2', 'NCBI-SID': '8D52E970B7DB7682_CF9DSID', 'content-encoding': 'gzip', 'X-RateLimit-Limit': '3', 'Access-Control-Allow-Origin': '*', 'Content-Type': 'text/xml; charset=UTF-8', 'Access-Control-Expose-Headers': 'X-RateLimit-Limit,X-RateLimit-Remaining', 'Set-Cookie': 'ncbi_sid=8D52E970B7DB7682_CF9DSID; domain=.nih.gov; path=/; expires=Fri, 06 May 2022 15:13:35 GMT', 'X-UA-Compatible': 'IE=Edge', 'X-XSS-Protection': '1; mode=block', 'Keep-Alive': 'timeout=4, max=40', 'Connection': 'Keep-Alive', 'Transfer-Encoding': 'chunked'}

In [None]:
dict_data = xmltodict.parse(responsepm.content)
print(dict_data)

OrderedDict([('eSearchResult', OrderedDict([('Count', '4'), ('RetMax', '4'), ('RetStart', '0'), ('IdList', OrderedDict([('Id', ['7571731', '7331656', '7575286', '6971955'])])), ('TranslationSet', None), ('TranslationStack', OrderedDict([('TermSet', OrderedDict([('Term', 'nhsx[affiliation]'), ('Field', 'affiliation'), ('Count', '4'), ('Explode', 'N')])), ('OP', 'GROUP')])), ('QueryTranslation', 'nhsx[affiliation]')]))])


In [None]:
tree = ElementTree.fromstring(responsepm.content)
print(tree)

{}


In [None]:
urlid='https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esummary.fcgi?db=pmc&id=3539452&retmode=json'
responseid = requests.get(urlid)
print(responseid.status_code)
print(responseid.json())

200
{'header': {'type': 'esummary', 'version': '0.3'}, 'result': {'uids': ['3539452'], '3539452': {'uid': '3539452', 'pubdate': '2012 Feb 23', 'epubdate': '2012 Feb 23', 'printpubdate': '2013 Feb', 'source': 'Cereb Cortex', 'authors': [{'name': 'Kind PC', 'authtype': 'Author'}, {'name': 'Sengpiel F', 'authtype': 'Author'}, {'name': 'Beaver CJ', 'authtype': 'Author'}, {'name': 'Crocker-Buque A', 'authtype': 'Author'}, {'name': 'Kelly GM', 'authtype': 'Author'}, {'name': 'Matthews RT', 'authtype': 'Author'}, {'name': 'Mitchell DE', 'authtype': 'Author'}], 'title': 'The Development and Activity-Dependent Expression of Aggrecan in the Cat Visual Cortex', 'volume': '23', 'issue': '2', 'pages': '349-360', 'articleids': [{'idtype': 'pmid', 'value': '22368089'}, {'idtype': 'doi', 'value': '10.1093/cercor/bhs015'}, {'idtype': 'pmcid', 'value': 'PMC3539452'}], 'fulljournalname': 'Cerebral Cortex (New York, NY)', 'sortdate': '2012/02/23 00:00', 'pmclivedate': '2014/02/01'}}}


In [None]:
print(responseid.json())

{'header': {'type': 'esummary', 'version': '0.3'}, 'result': {'uids': ['3539452'], '3539452': {'uid': '3539452', 'pubdate': '2012 Feb 23', 'epubdate': '2012 Feb 23', 'printpubdate': '2013 Feb', 'source': 'Cereb Cortex', 'authors': [{'name': 'Kind PC', 'authtype': 'Author'}, {'name': 'Sengpiel F', 'authtype': 'Author'}, {'name': 'Beaver CJ', 'authtype': 'Author'}, {'name': 'Crocker-Buque A', 'authtype': 'Author'}, {'name': 'Kelly GM', 'authtype': 'Author'}, {'name': 'Matthews RT', 'authtype': 'Author'}, {'name': 'Mitchell DE', 'authtype': 'Author'}], 'title': 'The Development and Activity-Dependent Expression of Aggrecan in the Cat Visual Cortex', 'volume': '23', 'issue': '2', 'pages': '349-360', 'articleids': [{'idtype': 'pmid', 'value': '22368089'}, {'idtype': 'doi', 'value': '10.1093/cercor/bhs015'}, {'idtype': 'pmcid', 'value': 'PMC3539452'}], 'fulljournalname': 'Cerebral Cortex (New York, NY)', 'sortdate': '2012/02/23 00:00', 'pmclivedate': '2014/02/01'}}}


In [None]:
#https://stackoverflow.com/questions/57053378/query-pubmed-with-python-how-to-get-all-article-details-from-query-to-pandas-d
pubmed = PubMed(tool="PubMedSearcher", email="myemail@ccc.com")
results = pubmed.query('nhsx[affiliation]', max_results=500)
articleList = []
articleInfo = []

for article in results:
# Print the type of object we've found (can be either PubMedBookArticle or PubMedArticle).
# We need to convert it to dictionary with available function
    articleDict = article.toDict()
    articleList.append(articleDict)

# Generate list of dict records which will hold all article details that could be fetch from PUBMED API
for article in articleList:
#Sometimes article['pubmed_id'] contains list separated with comma - take first pubmedId in that list - thats article pubmedId
    pubmedId = article['pubmed_id'].partition('\n')[0]
    # Append article info to dictionary #
    articleInfo.append({u'pubmed_id':pubmedId,
                       u'title':article['title'],
                       u'keywords':article['keywords'],
                       u'journal':article['journal'],
                       u'abstract':article['abstract'],
                       u'conclusions':article['conclusions'],
                       u'methods':article['methods'],
                       u'results': article['results'],
                       u'copyrights':article['copyrights'],
                       u'doi':article['doi'],
                       u'publication_date':article['publication_date'], 
                       u'authors':article['authors']})

# Generate Pandas DataFrame from list of dictionaries
articlesPD = pd.DataFrame.from_dict(articleInfo)
#export_csv = df.to_csv (r'C:\Users\YourUsernam\Desktop\export_dataframe.csv', index = None, header=True) 

#Print first 10 rows of dataframe
print(articlesPD.head(10))
print(articlesPD['title'])
print(articlesPD['keywords'])
print(articleList)

  pubmed_id  ...                                            authors
0  33094226  ...  [{'lastname': 'Maguire', 'firstname': 'James',...
1  32702587  ...  [{'lastname': 'Morley', 'firstname': 'Jessica'...
2  32672131  ...  [{'lastname': 'Goldacre', 'firstname': 'Ben', ...
3  32616598  ...  [{'lastname': 'Jacob', 'firstname': 'Joseph', ...
4  32010451  ...  [{'lastname': 'Robbins', 'firstname': 'Tim', '...

[5 rows x 12 columns]
0            Digital health - a trainee's perspective.
1    The ethics of AI in health care: A mapping rev...
2    Bringing NHS data analysis into the 21st century.
3    Using imaging to combat a pandemic: rationale ...
4    Supporting early clinical careers in digital h...
Name: title, dtype: object
0                     [QI, Trainee, digital, training]
1    [Artificial intelligence, Ethics, Health polic...
2                                                   []
3                                                   []
4                                              

In [5]:
!pip install Bio
from Bio import Entrez

Collecting Bio
[?25l  Downloading https://files.pythonhosted.org/packages/85/84/13d3aa585fcaa010577cfdc40f3211933fa4468e98f8e63576066b2a7ad1/bio-0.4.1-py3-none-any.whl (73kB)
[K     |████████████████████████████████| 81kB 5.2MB/s 
[?25hCollecting biopython>=1.78
[?25l  Downloading https://files.pythonhosted.org/packages/3a/cd/0098eaff841850c01da928c7f509b72fd3e1f51d77b772e24de9e2312471/biopython-1.78-cp37-cp37m-manylinux1_x86_64.whl (2.3MB)
[K     |████████████████████████████████| 2.3MB 11.5MB/s 
Installing collected packages: biopython, Bio
Successfully installed Bio-0.4.1 biopython-1.78


In [4]:
#https://gist.github.com/mcfrank/c1ec74df1427278cbe53
Entrez.email = "martina.fonseca@nhsx.nhs.uk"

def get_abstract(pmid):
    handle = Entrez.efetch(db='pubmed', id=pmid, retmode='text', rettype='abstract')
    return handle.read()

def get_links_id(pmid):
	link_list = []
	links = Entrez.elink(dbfrom="pubmed", id=pmid, linkname="pubmed_pubmed")	
	record = Entrez.read(links)
	
	records = record[0][u'LinkSetDb'][0][u'Link']

	for link in records:
		link_list.append(link[u'Id'])

	return link_list

def get_links_term(term):
	links = Entrez.esearch(db="pubmed", retmax = 1000, term=term)	
	record = Entrez.read(links)
	#link_list = record[u'IdList']

	return record

NameError: ignored

In [1]:
#similar to https://eutils.ncbi.nlm.nih.gov/entrez/eutils/esearch.fcgi?db=pmc&term=eye&tool=my_tool&email=my_email@example.com
myrecord=get_links_term('nhsx[affiliation]')
print(myrecord[u'IdList'])

NameError: ignored

In [None]:
handle = Entrez.esummary(db='pubmed',id='33094226',retmode="json")
mysummary=handle.read()
print(mysummary)

b'{"header":{"type":"esummary","version":"0.3"},"result":{"uids":["33094226"],"33094226":{"uid":"33094226","pubdate":"2020 Oct","epubdate":"","source":"Future Healthc J","authors":[{"name":"Maguire J","authtype":"Author","clusterid":""}],"lastauthor":"Maguire J","title":"Digital health - a trainee\'s perspective.","sorttitle":"digital health a trainee s perspective","volume":"7","issue":"3","pages":"202-203","lang":["eng"],"nlmuniqueid":"101711246","issn":"2514-6645","essn":"2514-6653","pubtype":["Journal Article"],"recordstatus":"PubMed","pubstatus":"4","articleids":[{"idtype":"pubmed","idtypen":1,"value":"33094226"},{"idtype":"doi","idtypen":3,"value":"10.7861/fhj.dig-2020-trai"},{"idtype":"pii","idtypen":4,"value":"futurehealth"},{"idtype":"pmc","idtypen":8,"value":"PMC7571731"},{"idtype":"rid","idtypen":8,"value":"33094226"},{"idtype":"eid","idtypen":8,"value":"33094226"},{"idtype":"pmcid","idtypen":5,"value":"pmc-id: PMC7571731;"}],"history":[{"pubstatus":"entrez","date":"2020/10/

In [None]:
print(get_links_id("33094226"))

['33094226', '21781388', '29724664', '21210246', '29400273', '26129814', '33582730', '29297319', '31456090', '32417926', '32986342', '24475764', '22297784', '27191838', '33076893', '24294678', '24052551', '30394272', '26447007', '28891661', '17538823', '32934977', '23988632', '31133906', '15618086', '28952671', '12430196', '11328522', '28110854', '27836238', '26734408', '30983476', '20236792', '22101103', '12965165', '31182093', '23672468', '29553669', '28549464', '29792259', '12835875', '27606391', '14741793', '29653248', '11904253', '29334953', '15659905', '29062535', '25901798', '23342404', '32628374', '29902096', '30979635', '10554729', '30800886', '29304729', '24807942', '20128719', '9479334', '25073019', '22139309', '27606388', '31429410', '29174856', '19686251', '31304327', '32965233', '27506900', '24988421', '29313497', '25723379', '24720058', '25833386', '21549984', '30270102', '19473852', '31763205', '24797842', '27532314', '33687289', '12028088', '31038467', '7655808', '2316

In [None]:
link_list = []
links = Entrez.elink(dbfrom="pubmed", id="21876726", linkname="pubmed_pmc_refs")
print(links.read())

b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE eLinkResult PUBLIC "-//NLM//DTD elink 20101123//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20101123/elink.dtd">\n<eLinkResult>\n\n  <LinkSet>\n    <DbFrom>pubmed</DbFrom>\n    <IdList>\n      <Id>21876726</Id>\n    </IdList>\n    <LinkSetDb>\n      <DbTo>pmc</DbTo>\n      <LinkName>pubmed_pmc_refs</LinkName>\n      \n        <Link>\n\t\t\t\t<Id>8071896</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>8025741</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>8023533</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7998656</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7994852</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7958126</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7944561</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7944388</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7906989</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7905690</Id>\n\t\t\t</Link>\n        <Link>\n\t\t\t\t<Id>7901554</Id>\n\t\t\t</Lin

In [None]:
dict_data = xmltodict.parse(links.read())

ExpatError: ignored

In [None]:
handle = Entrez.einfo()
result = handle.read()
handle.close()
print(result)

b'<?xml version="1.0" encoding="UTF-8" ?>\n<!DOCTYPE eInfoResult PUBLIC "-//NLM//DTD einfo 20190110//EN" "https://eutils.ncbi.nlm.nih.gov/eutils/dtd/20190110/einfo.dtd">\n<eInfoResult>\n<DbList>\n\n\t<DbName>pubmed</DbName>\n\t<DbName>protein</DbName>\n\t<DbName>nuccore</DbName>\n\t<DbName>ipg</DbName>\n\t<DbName>nucleotide</DbName>\n\t<DbName>structure</DbName>\n\t<DbName>genome</DbName>\n\t<DbName>annotinfo</DbName>\n\t<DbName>assembly</DbName>\n\t<DbName>bioproject</DbName>\n\t<DbName>biosample</DbName>\n\t<DbName>blastdbinfo</DbName>\n\t<DbName>books</DbName>\n\t<DbName>cdd</DbName>\n\t<DbName>clinvar</DbName>\n\t<DbName>gap</DbName>\n\t<DbName>gapplus</DbName>\n\t<DbName>grasp</DbName>\n\t<DbName>dbvar</DbName>\n\t<DbName>gene</DbName>\n\t<DbName>gds</DbName>\n\t<DbName>geoprofiles</DbName>\n\t<DbName>homologene</DbName>\n\t<DbName>medgen</DbName>\n\t<DbName>mesh</DbName>\n\t<DbName>ncbisearch</DbName>\n\t<DbName>nlmcatalog</DbName>\n\t<DbName>omim</DbName>\n\t<DbName>orgtrack</Db

In [None]:
# https://biopython.readthedocs.io/en/latest/chapter_entrez.html
from Bio import Entrez
handle = Entrez.einfo()
record = Entrez.read(handle)
print(record)

{'DbList': ['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'proteinclusters', 'pcassay', 'protfam', 'biosystems', 'pccompound', 'pcsubstance', 'seqannot', 'snp', 'sra', 'taxonomy', 'biocollections', 'gtr']}


In [None]:
record.keys()

dict_keys(['DbList'])

In [None]:
record["DbList"]

['pubmed', 'protein', 'nuccore', 'ipg', 'nucleotide', 'structure', 'genome', 'annotinfo', 'assembly', 'bioproject', 'biosample', 'blastdbinfo', 'books', 'cdd', 'clinvar', 'gap', 'gapplus', 'grasp', 'dbvar', 'gene', 'gds', 'geoprofiles', 'homologene', 'medgen', 'mesh', 'ncbisearch', 'nlmcatalog', 'omim', 'orgtrack', 'pmc', 'popset', 'proteinclusters', 'pcassay', 'protfam', 'biosystems', 'pccompound', 'pcsubstance', 'seqannot', 'snp', 'sra', 'taxonomy', 'biocollections', 'gtr']

In [None]:
handle = Entrez.esearch(db="pubmed", term="nhsx[affiliation]")
record = Entrez.read(handle)
record["Count"]
record["IdList"]
record.keys()

for id in record["IdList"]:
  print(id)

33094226
32702587
32672131
32616598
32010451


In [None]:
#https://biopython.readthedocs.io/en/latest/chapter_entrez.html#sec-elink-citations
from Bio import Entrez
Entrez.email = "martina.fonseca@nhsx.nhs.uk"  # Always tell NCBI who you are
pmid = "32672131"
results = Entrez.read(Entrez.elink(dbfrom="pubmed", db="pmc",LinkName="pubmed_pmc_refs", id=pmid))
pmc_ids = [cell["Id"] for cell in results[0]["LinkSetDb"][0]["Link"]]
print(pmc_ids)
print(results[0]['LinkSetDb'][0]["Link"])
print(len(pmc_ids))

['7754812']
[{'Id': '7754812'}]
1


In [None]:
results2 = Entrez.read(Entrez.elink(dbfrom="pmc", db="pubmed", LinkName="pmc_pubmed",id=",".join(pmc_ids)))
pubmed_ids = [link["Id"] for link in results2[0]["LinkSetDb"][0]["Link"]]
pubmed_ids
print(pubmed_ids)
print(len(pubmed_ids))

['33054587']
1
