# Find-A-Bug API Demo

In this demo, I will showcase some of the features of the Find-A-Bug API. Prior to using the API, it needs to be installed (see installation guide in the `documentation.ipynb` file), and the package must be located in the working directory. In order to use the API, you will need to be on campus wifi, or using a VPN. 

In [1]:
# Import the API.
import fabapi

First, we can use the `mode` function to figure out which KO groups occur most frequently; no filters are specified in this query, so it scans the entire database.

In [51]:
result = fabapi.mode('ko', as_df=False)

GET http://microbes.gps.caltech.edu:8000/mode/ko/


In [54]:
print(result.text)

6 results in 297.2759678410366 seconds

SELECT gtdb_r207_annotations_kegg.ko, count(gtdb_r207_annotations_kegg.ko) AS frequency 
FROM gtdb_r207_annotations_kegg GROUP BY gtdb_r207_annotations_kegg.ko ORDER BY frequency DESC
 LIMIT :param_1

--------------------
ko,frequency
K03088,557050
K02014,331385
K02004,329854
K03406,306925
K01990,283772



Each of these KO groups represent proteins in the following categories. (*How might the frequency of different KO groups change if we only looked at archaea, or only bacteria?*)

1. K03088: RNA polymerase sigma-70 factor, ECF subfamily
2. K02014: iron complex outermembrane recepter protein
3. K02004: putative ABC transport system permease protein
4. K03406: methyl-accepting chemotaxis protein
5. K01990: ABC-2 type transport system ATP-binding protein

In [48]:
archaea_result = fabapi.mode('ko', where={'ncbi_domain':'d__Archaea'}, as_df=False)

KeyboardInterrupt: 

In [None]:
bacteria_result = fabapi.mode('ko', where={'ncbi_domain':'d__Bacteria'}, as_df=False)

For this demo, I will focus on bacteria in the genus *Rickettsia*. The first question I had was: How do common KO groups within this genus compare to the overall most common KO groups? To investigate this, we can first obtain the following information from the database, specifically for organisms in the genus *Rickettsia*: `ko`, `gene_name`, `genome_id`, `ncbi_genus`. This is achieved using the query below. 

In [16]:
result = fabapi.get(['ko', 'gene_name', 'genome_id', 'ncbi_genus'], where={'ncbi_genus':'g__Rickettsia'}, as_df=False)

GET http://microbes.gps.caltech.edu:8000/ko+gene_name+genome_id+ncbi_genus/ncbi_genus=eq;g__Rickettsia


In [None]:
# Convert the requests.Response object into a pandas DataFrame using the to_df 
df = fabapi.to_df(result.text)

In [22]:
rickettsia_mode = df['ko'].mode()[0]
rickettsia_mode_count = len(df[df['ko'] == rickettsia_mode])
print(f'The most frequent KO group in the genus Rickettsia is {rickettsia_mode}. It occurs {rickettsia_mode_count} times.')

The most frequent KO group in the genus Rickettsia is K07498. It occurs 143 times.


After looking into [this KO group](https://www.genome.jp/dbget-bin/www_bget?ko:K07498), I found that it referred to a class of proteins called "putative transposases". A transposase is any of a class of enzymes capable of binding to the end of a transposon and catalysing its movement to another part of a genome. (*Why would transposes be highly represented in Rickettsia? Is it just that these are the only annotated proteins, and that Rickettsia is not well-characterized?*)

In [49]:
result = fabapi.count('ko', where={'ko':'K07498'}, as_df=False)

GET http://microbes.gps.caltech.edu:8000/count/ko/ko=eq;K07498


In [24]:
print(result.text)

2 results in 274.5064634149894 seconds

----------
ko
12762



In [26]:
gene_names = df[df['ko'] == rickettsia_mode]['gene_name']
gene_names.head()

15      NZ_JAFEMD010000105.1_1
241     NZ_JAFEMD010000126.1_1
318     NZ_JAFEMD010000131.1_2
323     NZ_JAFEMD010000135.1_3
817    NZ_JAFEMD010000019.1_23
Name: gene_name, dtype: object

In [33]:
sequence_df = fabapi.get(['gene_name','sequence'], where={'gene_name':list(gene_names)[:10]}, verbose=False)

In [None]:
for i in range(len(gene_names)):
    print(sequence_df['sequence'].iloc[i], end='\n')

Interestingly, it seems as though searching the amino acid sequence on BLAST does not give any hits for bacteria in the genus *Rickettsia*. The closest match for the first sequence, for example, is a IS6 family transposase belonging to *Microvirga ossetica*. This matches with the [result](https://www.ncbi.nlm.nih.gov/nuccore/NZ_JAFEMD010000126.1) of searching the `gene_name` in the NIH nuccore browser

In [41]:
result = fabapi.get(['gene_name', 'genome_id'], where={'gene_name':list(gene_names)[:10]}, as_df=False)

GET http://microbes.gps.caltech.edu:8000/gene_name+genome_id/gene_name=eq;NZ_JAFEMD010000105.1_1+gene_name=eq;NZ_JAFEMD010000126.1_1+gene_name=eq;NZ_JAFEMD010000131.1_2+gene_name=eq;NZ_JAFEMD010000135.1_3+gene_name=eq;NZ_JAFEMD010000019.1_23+gene_name=eq;NZ_JAFEMD010000019.1_45+gene_name=eq;NZ_JAFEMD010000202.1_1+gene_name=eq;NZ_JAFEMD010000224.1_1+gene_name=eq;NZ_JAFEMD010000024.1_92+gene_name=eq;NZ_JAFEMD010000026.1_10


In [43]:
print(result.text)

14 results in 346.4149383539334 seconds

----------
gene_name,genome_id
NZ_JAFEMD010000105.1_1,RS_GCF_016892765.1
NZ_JAFEMD010000126.1_1,RS_GCF_016892765.1
NZ_JAFEMD010000131.1_2,RS_GCF_016892765.1
NZ_JAFEMD010000135.1_3,RS_GCF_016892765.1
NZ_JAFEMD010000019.1_23,RS_GCF_016892765.1
NZ_JAFEMD010000019.1_45,RS_GCF_016892765.1
NZ_JAFEMD010000019.1_45,RS_GCF_016892765.1
NZ_JAFEMD010000202.1_1,RS_GCF_016892765.1
NZ_JAFEMD010000202.1_1,RS_GCF_016892765.1
NZ_JAFEMD010000224.1_1,RS_GCF_016892765.1
NZ_JAFEMD010000224.1_1,RS_GCF_016892765.1
NZ_JAFEMD010000024.1_92,RS_GCF_016892765.1
NZ_JAFEMD010000026.1_10,RS_GCF_016892765.1

