# Requests and bokeh

We will combine some of what we saw last week with `requests`, and this week's bokeh code, to get data from the EMDB, put it in a dataframe, then make an interactive plot with bokeh.

In [1]:
from io import StringIO

import numpy as np
import pandas as pd
import requests

from bokeh.plotting import figure 
from bokeh.io import output_notebook, show
from bokeh.palettes import Spectral3

output_notebook()

In [2]:
search_term = '*'
criteria = 'AND structure_determination_method:"singleparticle" AND overall_molecular_weight:{1 TO 150000]&fl=emdb_id,title,structure_determination_method,resolution,resolution_method,fitted_pdbs,current_status,deposition_date,map_release_date,primary_citation_author_string,primary_citation_title,xref_DOI,xref_PUBMED,primary_citation_year,primary_citation_journal_name,sample_info_string,microscope_name,illumination_mode,imaging_mode,electron_source,specimen_holder_name,segmentation_filename,slice_filename,additional_map_filename,half_map_filename,software,overall_molecular_weight,xref_UNIPROTKB,xref_CPX,xref_EMPIAR,xref_PFAM,xref_CATH,xref_GO,xref_INTERPRO,xref_CHEBI,xref_CHEMBL,xref_DRUGBANK,xref_PDBEKB,xref_ALPHAFOLD'
emdb_search = f'https://www.ebi.ac.uk/emdb/api/search/{search_term}{criteria}'
r = requests.get(emdb_search, headers={'accept': 'text/csv'})
df = pd.read_csv(StringIO(r.text))
df.iloc[:2]

Unnamed: 0,emdb_id,title,structure_determination_method,resolution,resolution_method,fitted_pdbs,current_status,deposition_date,map_release_date,primary_citation_author_string,...,xref_CATH,xref_GO,xref_INTERPRO,xref_CHEBI,xref_CHEMBL,xref_DRUGBANK,xref_PDBEKB,xref_ALPHAFOLD,AND,database
0,EMD-27388,Cryo-EM structure of the zebrafish two pore do...,singleParticle,3.4,FSC 0.143 CUT-OFF,8de9,REL,2022-06-20T00:00:00Z,2023-03-08T00:00:00Z,"Schmidpeter PAM, Petroff 2nd JT, Khajoueinejad...",...,,,,29103,,DB01345,,,,
1,EMD-31627,Cryo EM structure of lysosomal ATPase,singleParticle,3.6,FSC 0.143 CUT-OFF,7fjq,REL,2021-08-04T00:00:00Z,2023-03-08T00:00:00Z,"Chen X, Zhou M, Zhang S, Yin J, Zhang P, Xuan ...",...,,,,15746,CHEMBL23194,DB00127,Q9NQ11,Q9NQ11,,


In [5]:
p = figure(width=670, height=400,
           title="Resolution distribution in subset of the EMDB")

bins = np.linspace(0, 25, num=51)

# British spelling is not very popular online, you have been warned!
for keyword, colour in zip(["human", "cryo", "coli",], "Spectral4"):
    df_subset = df[df['title'].apply(lambda x: keyword in x)]
    res_data = df_subset['resolution'][df_subset['resolution'].notnull()]
    hist, edges = np.histogram(res_data, density=False, bins=bins)
    p.quad(top=hist, bottom=0, left=edges[:-1], right=edges[1:],
           fill_color=colour, line_color=colour, alpha=0.5,
           legend_label=f"Subset matching '{keyword}'")
    
# Some simple magic: this lets us click on legend elements to mute/hide them!
p.legend.click_policy="mute"
    
show(p)

### Practical

This code can be adapted relatively easily to work with other plot types, and using data from other APIs. In the practical, you are asked to try to get a plot using data from Uniprot. You can start with the line below, and adapt the code above (I recommend making a copy of this notebook, or at least creating more cells below).

In [240]:
r = requests.get("https://rest.uniprot.org/uniprotkb/search?query=muscle&format=tsv&fields=accession,id,organism_name,mass,length&size=100")