# Radhika Mardikar, Xinxin Mo

## Step 1:
Looking at the KEGG pathway, we will select 4 enzymes from the glycolysis, TCA cycle and pentose phosphate cycle. 
Glycolysis: hexokinase 1, phosphoglucose isomerase, phosphofructokinase, fructose-bisphosphate aldolase (https://www.ebi.ac.uk/interpro/potm/2004_2/Page2.htm)
TCA: citrate synthase, aconitase, isocitrase dehydrogenase, alpha-ketoglurate (https://www.news-medical.net/life-sciences/Krebs-Cycle-Enzymes.aspx)
pentose phosphate: transketolase, transaldolase, lactonase, phosphopentose isomerase (https://mcb.berkeley.edu/labs/krantz/mcb102/lect_S2008/MCB102-SPRING2008-LECTURE5-PENTOSE.pdf)

In [74]:
from Bio import Entrez
from Bio import SeqIO
import sqlite3
Entrez.email = "rmardikar@berkeley.edu"
# first row is glycolysis enzyme, second is pentose phosphate cycle, third is TCA
enzymelist = ["2.7.1.1", "5.4.2.2", "3.1.3.11", "3.1.3.9",
              "5.3.1.9", "3.1.3.11", "2.7.1.11", "1.1.5.9",
             "1.2.7.1", "1.2.4.1", "1.1.1.27", "1.1.1.37"]
organismlist = ['Homo sapiens', 'Drosophila melanogaster', "Escherichia coli"]
idlist = []

for org in organismlist:
    for enzyme in enzymelist:
        handle = Entrez.esearch(db="nucleotide",
                                term = org + '[ORGN]' + enzyme, 
                                idtype = 'acc', 
                                sort='relevance',
                                retmax=1)
        print(org + '[ORGN]' + enzyme)
        record = Entrez.read(handle)
        idlist.append(record["IdList"])
for i in idlist:
    handle = Entrez.efetch(db="nucleotide", id = i, rettype = 'fasta', retmode = 'text')
    print(handle.read())

Homo sapiens[ORGN]2.7.1.1
Homo sapiens[ORGN]5.4.2.2
Homo sapiens[ORGN]3.1.3.11
Homo sapiens[ORGN]3.1.3.9
Homo sapiens[ORGN]5.3.1.9
Homo sapiens[ORGN]3.1.3.11
Homo sapiens[ORGN]2.7.1.11
Homo sapiens[ORGN]1.1.5.9
Homo sapiens[ORGN]1.2.7.1
Homo sapiens[ORGN]1.2.4.1
Homo sapiens[ORGN]1.1.1.27
Homo sapiens[ORGN]1.1.1.37
Drosophila melanogaster[ORGN]2.7.1.1
Drosophila melanogaster[ORGN]5.4.2.2
Drosophila melanogaster[ORGN]3.1.3.11
Drosophila melanogaster[ORGN]3.1.3.9
Drosophila melanogaster[ORGN]5.3.1.9
Drosophila melanogaster[ORGN]3.1.3.11
Drosophila melanogaster[ORGN]2.7.1.11
Drosophila melanogaster[ORGN]1.1.5.9
Drosophila melanogaster[ORGN]1.2.7.1
Drosophila melanogaster[ORGN]1.2.4.1
Drosophila melanogaster[ORGN]1.1.1.27
Drosophila melanogaster[ORGN]1.1.1.37
Escherichia coli[ORGN]2.7.1.1
Escherichia coli[ORGN]5.4.2.2
Escherichia coli[ORGN]3.1.3.11
Escherichia coli[ORGN]3.1.3.9
Escherichia coli[ORGN]5.3.1.9
Escherichia coli[ORGN]3.1.3.11
Escherichia coli[ORGN]2.7.1.11
Escherichia coli[ORGN

>AK303771.1 Homo sapiens cDNA FLJ51443 complete cds, highly similar to Glucose-6-phosphatase (EC 3.1.3.9)
ATAGCAGAGCAATCACCACCAAGCCTGGAATAACTGCAAGGGCTCTGCTGACATCTTCCTGAGGTGCCAA
GGAAATGAGGATGGAGGAAGGAATGAATGTTCTCCATGACTTTGGGATCCAGTCAACACATTACCTCCAG
GTGAATTACCAAGACTCCCAGGACTGGTTCATCTTGGTGTCCGTGATCGCAGACCTCAGGAATGCCTTCT
ACGTCCTCTTCCCCATCTGGTTCCATCTTCAGGAAGCTGTGGGCATTAAACTCCTTTGGGTAGCTGTGAT
TGGAGACTGGCTCAACCTCGTCTTTAAGTGGATTCTCTTTGGACAGCGTCCATACTGGTGGGTTTTGGAT
ACTGACTACTACAGCAACACTTCCGTGCCCCTGATAAAGCAGTTCCCTGTAACCCGTGAGACTGGACCAG
GGAAAGATAAAGCCGACCTACAGATTTCGGTGCTTGAATGTCATTTTGTGGTTGGGATTCTGGGCTGTGC
AGCTGAATGTCTGTCTGTCACGAATCTACCTTGCTGCTCATTTTCCTCATCAAGTTGTTGCTGGAGTCCT
GTCAGGCATTGCTGTTGCAGAAACTTTCAGCCACATCCACAGCATCTATAATGCCAGCCTCAAGAAATAT
TTTCTCATTACCTTCTTCCTGTTCAGCTTCGCCATCGGATTTTATCTGCTGCTCAAGGGACTGGGTGTAG
ACCTCCTGTGGACTCTGGAGAAAGCCCAGAGGTGGTGCGAGCAGCCAGAATGGGTCCACATTGACACCAC
ACCCTTTGCCAGCCTCCTCAAGAACCTGGGCACGCTCTTTGGCCTGGGGCTGGCTCTCAACTCCAGCATG
TACAGGGAGAGCTGCAAGGGGAAACTCAGCAAGTGGCTCCCA

>AK298834.1 Homo sapiens cDNA FLJ55792 complete cds, highly similar to L-lactate dehydrogenase A chain (EC 1.1.1.27)
AAGTTAATGGCTTTTCTGCACGTATCTCTGGTGTTTACTTGAGAAGCCTGGCTGTGTCCTTGCTGTAGGA
GCCGGAGTAGCTCAGAGTGATCTTGTCTGAGGAAAGGCCAGCCCCACTTGGGGTTAATAAACCGCGATGG
GTGAACCCTCAGGAGGCTATACTTACACCCAAACGTCGATATTCCTTTTCCACGCTAAGATTCCTTTTGG
TTCCAAGTCCAATATGGCAACTCTAAAGGATCAGCTGATTTATAATCTTCTAAAGGAAGAACAGACCCCC
CAGAATAAGATTACAGTTGTTGGGGTTGGTGCTGTTGGCATGGCCTGTGCCATCAGTATCTTAATGAAGG
ACTTGGCAGATGAACTTGCTCTTGTTGATGTCATCGAAGACAAATTGAAGGGAGAGATGATGGATCTCCA
ACATGGCAGCCTTTTCCTTAGAACACCAAAGATTGTCTCTGGCAAAGACTATAATGTAACTGCAAACTCC
AAGCTGGTCATTATCACGGCTGGGGCACGTCAGCAAGAGGGAGAAAGCCGTCTTAATTTGGTCCAGCGTA
ACGTGAACATCTTTAAATTCATCATTCCTAATGTTGTAAAATACAGCCCAAACTGCAAGTTGCTTATTGT
TTCAAATCCAGTGGATATCTTGACCTACGTGGCTTGGAAGATAAGTGGTTTTCCCAAAAACCGTGTTATT
GGAAGTGGTTGCAATCTGGATTCAGCCCGATTCCGTTACCTGATGGGGGAAAGGCTGGGAGTTCACCCAT
TAAGCTGTCATGGGTGGGTCCTTGGGGAACATGGAGATTCCAGTGTGCCTGTATGGAGTGGAATGAATGT
TGCTGGTGTCTCTCTGAAGACTCTGCACCCA

IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Supplied id parameter is empty.

Supplied id parameter is empty.



IOPub data rate exceeded.
The notebook server will temporarily stop sending output
to the client in order to avoid crashing it.
To change this limit, set the config variable
`--NotebookApp.iopub_data_rate_limit`.

Current values:
NotebookApp.iopub_data_rate_limit=1000000.0 (bytes/sec)
NotebookApp.rate_limit_window=3.0 (secs)



Supplied id parameter is empty.

>NZ_PDAR01000013.1 Escherichia coli strain 2012C-4704 NODE_13_length_126606_cov_36.0443_ID_21952, whole genome shotgun sequence
TTATTAAACCTGCCAAAAATATTATTATTTGGCAGGTTTAATTTCTTAACGCAAATATAAAACACAAAAT
TACAACATTTAAATAACAAAACAGCATCTAATATGGCGCTATTCATAATTAATGATTTATTATTTGGGGT
ATGATCGTTTTTTGTTGATCTTCTTCACAGATTATAGCCATTTCATGGATAGAATAACTCTACCTTCAAC
TGACACAGCAAGAGGTAAAGGTAAATGGAAAATAATAACCGCTTAATGCCTCATATAAGGCGGACAACCC
ATATCATGAAGTTTGCCCATCGTAATAGTTTTGACTTTCATTTCTTTAATGCCCGCTAGTCTTCTGACTA
AAGGGCACCCCAACGTACAGGTCTCCCTGACTTTAAGCATTACAGGTTAATACCTGTATTCCTCGGTGCT
CATATACTGCTAACCCTTTTAAACTCTAAATAATTCGAGTCGCAGCACTTGCAACTTGAGGTATGACGAG
TATAGCCAGTTACCGGGCTGGTCTGGGTTATTGCATCTGCAAAAAGCAAACTACTGATTTATTTATCAGC
GGTGGAGCTTTGCTTTTTTTCCGGCGTGATCGATTTCTCCTTTGAGAAATTGAGGACCTGCTATTACCTG
AAATAAAGAGATGAACAAAATGTCAGAATTAAAAATTGCCGTTAGTCGTTCTTGCCCGGATTGTTTTTCC
ACTCATCGTGCATGCGTGAATATAGACGAAAGTAATTATATTGACGTTGCCGCCATTATTTTATCAGTCA
GTGATGTTGAACGTGGAAAACTCGATGAAATAGACGCTACTGGCTATGACATTCCTGT

>X12545.1 E.coli mRNA for fructose-1,6-bisphosphatase (EC 3.1.3.11)
CTCGAGCTGGCTGGCATCGTACCCCTGAATCAGCTCAATGCCTTGCTTCTCAAGTAAGGTGCTCATCGGC
GGATACACATTGGCGTCCGAACCCGTTACTTCATGGCCTAACTGGCGCGCCAGCATCGCCAGACCGCCCA
TAAATGTGCCACAAATTCCTAAAATATGAATGCGCATACGTCACTATCCTTTTTTAATCTGCCGCTCATT
TTACGCATATGTCGGGCAAGTGAGAAACGCATTTCAGGATAATCCGTAATTTGCTGGCGCGATTCACCTG
AGCGCAACATTGAGATTATTTGTTAAGATTGTTGCGGTCGCTTTACTCCATAAACATTGCAGGGAAAGTT
TTATGAAAACGTTAGGTGAATTTATTGTCGAAAAGCAGCACGAGTTTTCTCATGCTACCGGTGAGCTCAC
TGCTTTGCTGTCGGCAATAAAACTGGGCGCCAAGATTATCCATCGCGATATCAACAAAGCAGGACTGGTT
GATATCCTGGGTGCCAGCGGTGCTGAGAACGTGCAGGGCGAGGTTCAGCAGAAACTCGACTTGTTCGCTA
ATGAAAAACTGAAAGCCGCACTGAAAGCACGCGATATCGTTGCGGGCATTGCCTCTGAAGAAGAAGATGA
GATTGTCGTCTTTGAAGGCTGTGAACACGCAAAATACGTGGTGCTGATGGACCCCCTGGATGGCTCGTCC
AACATCGATGTTAACGTCTCTGTCGGTACCATTTTCTCCATCTACCGCCGCGTTACGCCTGTTGGCACGC
CGGTAACGGAAGAAGATTTCCTCCAGCCTGGTAACAAACAGGTTGCGGCAGGTTACGTGGTATACGGCTC
CTCTACCATGCTGGTTTACACCACCGGATGCGGTGTTCACGCCTTTACTTACGATCCTTCGCTCGGCGTT
TTCTGCCTG

Supplied id parameter is empty.

>UINN01000067.1 Escherichia coli strain VREC0288 genome assembly, contig: ERS784219SCcontig000067, whole genome shotgun sequence
CATTTTTACAGTCAGAGGAATACGGATAACAGCATAACGTTTGCCGTAAACGGGCAGATGGTTCCATCAA
ACTACGGAAACTTTGATGCCCGCTATCAAACCAAAACAGGCGGTGTGCAGGATGTGCGTCTGGGCAGTGC
CATTGGTATTGGGCGTGGCGGGAATGCACCATCAGGTCACCTTATCAGCGGTCTTGATGGTGGTGAAAGT
ATGGACTGGGCCAATGCCCGCCCGGTGCAGGTTCTGATTAATGGCGTCTGGCGGAATGTAGCGAGTTTGT
AATTATGATGCACTTAAAAAATATTACGGCACAAAATCCCAAAACAATTGAGCAATACCAGCTGGCGCGA
CAGCATAAATTTTTATTGTGGCTGTTCTCCGATGATGGTCAGGAATGGCACGAAGCCCAGGAAAAATTTC
AGCCAGACACTCTGAAAGTTATTTATGTTGAAACTGGCGAAGTGGTCTGGGTCGGAAAAGACATCACCTC
AATCTGCCCGGAAAATAAAAGCGTGATTGAGCTACCGGATATTACCGCCAATCGTCGCATCGAGGCGTCG
GGTTACTGGTTCTACCGCAATGATGAATTTGTTTTTAATTACAAATTAAAAGCAGAAGACGAGCGTGATG
CACTGTTAAAACAGGCCAGCATCATGACCAGCGAATGGGAAAAAGACCTGCTGCTGGGATTAATCAGTGA
CGAAGACAGGGAGAAACTGAAAGCGTACCGCATTTACGCGAAATCGCTGCAGGCGATGGATTTCAACATT
ATCACTGATAAAACCTCATATAACGCCATTGAATGGCCCGTCTCTCCGGAAGCCTCT

>QSVS01000006.1 Escherichia coli strain OM02-11AC OM02-11AC.Scaf6, whole genome shotgun sequence
TTGTGTGAAGTGATTCACATCCGCCGTGTCGATGGAGGCGCATTATAGGGAGTCGTTTCAGGAAGACAAG
CGGAAAAATGCATTTTTATTTCAACCGCTCATCTTTTAATCATCACGCCGGTTTTTGCTGCTTTTTTATC
GCTTGCGGAAGGTCTGCCAGACTATTTAACACCCAATCCGCCGCGTTTTCTGCTTCAGGCGTAATAGGTT
TACCCGTACGCACCAGCACTTTTGTTCCCACGCTCGCCGCAGCCGCTGCCTGCATATCTTCTAATTTATC
GCCCACCATATAAGAAGCGGCCATATCAATATGCAAATAATCGCGTGCTGACAAAAACATCCCCGGATGT
GGTTTGCGGCAGTCGCAGACCTGGCGAAACTCTTCAACACTACCCTGCGGATGATGCGGGCAATAATAGA
TGCCATCCAGATCGACATCGCGGTCCGCCAGCGACCAGTCCATCCACTCGGTCAGCGTTTCAAACTGTGC
TTCAGTAAATTTACCGCGAGCAATGCCAGACTGGTTGGTTACTACCACCAGCGCAAAGCCCATTTTTTTT
AGCTCGCGCATGGCGTCAATAACACCGTCGATAAATTCAAAGTTGTCGATCTCATGGACATAGCCGTGAT
CAACATTAATGGTGCCATCACGGTCAAGAAAAATTGCGGGTACGCTCTTCGCCACCTTTTATAGCTCCTT
AATAAGGCATGTGACGCTAGTATCGCATGTTTCGACCTGCAAGAAAGTGCTCTTCGCATAAACCTGATTG
ATTTAGACGTCTGGATGCCTTAACATCCATTTCATTGACGGCGATGCCCGTTCCAGGCATTCGAAATGCC
ACGACTAACTTAATGACGATAATAAATAATCAATGATAAAACTTTCGAATA

>NZ_QRBM01000034.1 Escherichia coli strain 1-RC-17-04352 04352_S22_R1_001_contig00034, whole genome shotgun sequence
TTCTGCCTGTCTCGTGGGATCGGAGATGTGTATAAGAGACAGGGCCAACGCTGTAAAAAAGCGTAACGGG
CAAAGATCCCTGCCATCACTGCGTTTAGCGGCCAGAAAAGCGAAAGTGCCTCGACCAGACGCAGCATCGC
CCCGACAAAATAAAAAAGCGTGGTGATCAGAAAAATCACCGTCGCATTGATGAACGGCTGATCATTTCGC
AACAGGCGAAATGTAGGCGAAAAAAGAGCGCGCATTACATATCAATCCCAGGTAAGGATGCAGCAAGTCA
TTTCGACGCGGCAAAACAGCGTATGGCTTAATGTAATGCATTTACAACAGCCTGGTAACATGCGTTTACC
GCTTAATTCATTAATTTTATGAATTCATTTGCAGTACCAGCCAGTGGTAAATAACGCCGACGATATAGCG
CTGCTGCACCTCAGTAAGCTCAGCGCCGGCTGGAATAGCGTGCCATTTTGCCAGACATCCCCGACAGCAG
GTAGCCGTGGCATGCTGGGCAATAAAGACCGGATGACCGCGCATCGGCGTCTGTTTGCCGTCATTATGCG
GTAACGCCGGGGCCAGACGCCGGGCAACAAAGTCAGCCGCGTGTGTCGCGATCGTTTCGGCCCCTTTTTC
CATGCAGTACTGGCGCTCTTTCGCCCCCAGTCGAAAACGGGAACGAAAAGAGGAGCGGGCAAGCCGCGCA
AACAAAGCGTCGTATTCAGACATCAGTACACTGAGCGCCCCAGTTGCTGGGTGAGCGCCTCCAGTAAGGC
AATCCCCGCCAGTGAATTTCCGGCCTCGTCCAGTTCCGGGCTCCAGACCGCAATGGCCATCTCTTGCGGC
ACAATCGCCACAATTCCGCCGCCCACGCCCG

>UINN01000009.1 Escherichia coli strain VREC0288 genome assembly, contig: ERS784219SCcontig000009, whole genome shotgun sequence
GTCCTGAAGGAACGTTGAAGACGACGACGTTGATAGGCCGGGTGTGTAAGCGCAGCGATGCGTTGAGCTA
ACCGGTACTAATGAACCGTGAGGCTTAACCTTACAACGCCGAAGATGTTTTGGCGGATTGAGAGAAGATT
TTCAGCCTGATACAGATTAAATCGACAGGTCATAACGAGACGTGTTGATAAAACAGAATTTGCCTGGCGG
CCTTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAGCGCCGATGGTAAG
TAGGGAACTGCCAGGCATCAAATTTAGAGTGCTGATATGGCTCAGTTGGTAGAGCGCACCCTTGGTAAGG
GTGAGGTCCCCAGTTCGACTCTGGGTATCAGCACCACTTTTTAGGTTAAAGTTCGGCAGATTAGAAAAGA
ATTTGTCTGGCGGCAGTAGCGCGGTGGTCCCACCTGACCCCATGCCGAACTCAGAAGTGAAACGCCGTAG
CGCCGATGGTAGTGTGGGGTCTCCCCATGCGAGAGTAGGGAACTGCCAGACATCAAATAAAACAAAAGGC
TCAGTCGGAAGACTGGGCCTTTTGTTTTATCTGTTGTTTGTCGGTGAACACTCTCCCGAGTAGGACAAAT
CCGCCGGGAGCGGATTTGAACGTTGCGAAGCAACGGCCCGGAGGGTGGCGGGCAGGACGCCCGCCATAAA
CTGCCAGATATCAAATCAAGCGAAAGGCCATCCGAAAGAATGGCCTTTTTGCTTTTCGAACTAACATTCA
ATTAATGGATTACCTGCGATAAAAATGCCCTCGTACGCTCTGATTTAGGATGCGCAAAAAATTCATCAGG
TGCAGCTTGCTCCACTATT

In [76]:
"""
from Bio import Entrez
Entrez.email = 'xinxinmo@berkeley.edu'
handle = Entrez.esearch(db='nucleotide',
                        term='homo sapiens[ORGN]'+'5.4.2.2',
                        sort='relevance',
                        idtype='acc',
                        retmax=1)
for i in Entrez.read(handle)['IdList']:
    #handle = Entrez.efetch(db='nucleotide', id=i, rettype='fasta', retmode='text')
    handle = Entrez.efetch(db='nucleotide', id=i, rettype='fasta', retmode='text')    
"""

In [61]:
import sqlite3
conn = sqlite3.connect('my.db')
c = conn.cursor()
c.execute("""CREATE TABLE genes (id INT, name TEXT, description TEXT, organism TEXT, chromosome TEXT, start INT, end INT, strand VARCHAR(1));""")
c.execute("""INSERT INTO genes (id, name, description, organism, chromosome, start, end, strand)
                                VALUES(58341,"BRCA1", "Breast Cancer 1","chr17", 43033295, 43170245,'-');""")
c.execute("SELECT * FROM genes WHERE name = 'BRCA1';")
print(c.fetchone())

(58341, 'BRCA1', 'Breast Cancer 1', 'chr17', 43033295, 43170245, '-')
