## Procedural Notes  

### Enzyme/Pathway Selection  
Kegg glycolysis/Glucogenesis-Reference Pathway:  

Enzymes selected for each pathway are:  
Glycolysis: K00844, K11645, K01689, K16370  
Citric Acid Cyce: K01647, K00030, K01900, K00116  
Pentose Phosphate Pathway: K13937, K00036, K01807, K06859  



In [1]:
import sqlite3
conn = sqlite3.connect('my.db')
c = conn.cursor()

c.execute("""DROP TABLE genes""")
c.execute("""DROP TABLE pathways""")
c.execute("""DROP TABLE enzymes""")

c.execute("""CREATE TABLE genes (id INT PRIMARY KEY ASC, name TEXT, description TEXT, organism TEXT, nucleotide_sequence TEXT
chromosome TEXT, start INT, end INT, strand VARCHAR(1), translated_sequence TEXT);""")

c.execute("""CREATE TABLE pathways (id INT PRIMARY KEY ASC, name TEXT, description TEXT);""")

c.execute("""CREATE TABLE enzymes (id INT PRIMARY KEY ASC, name TEXT, function TEXT, ec_number INT);""")

conn.commit()


### Query the name and ec_number of the selected Enzymes from KEGG
1. Make a sample.list file in the current directory with the following lines  
K00844   
K11645   
K01689  
K16370  
K01647  
K00030  
K01900  
K00116  
K13937  
K00036  
K01807  
K06859  

2. In the terminal execute: `curl -g -s -S http://rest.kegg.jp/list/ko | grep -f sample.list | sed "s/cpd\://" > sample_extracted.table.txt`

3. Inspect the output file sample_extracted.table.txt  
ko:K00030	IDH3; isocitrate dehydrogenase (NAD+) [EC:1.1.1.41]  
ko:K00036	G6PD, zwf; glucose-6-phosphate 1-dehydrogenase [EC:1.1.1.49 1.1.1.363]  
ko:K00116	mqo; malate dehydrogenase (quinone) [EC:1.1.5.4]  
ko:K00844	HK; hexokinase [EC:2.7.1.1]  
ko:K01647	CS, gltA; citrate synthase [EC:2.3.3.1]  
ko:K01689	ENO, eno; enolase [EC:4.2.1.11]  
ko:K01807	rpiA; ribose 5-phosphate isomerase A [EC:5.3.1.6]  
ko:K01900	LSC2; succinyl-CoA synthetase beta subunit [EC:6.2.1.4 6.2.1.5]  
ko:K06859	pgi1; glucose-6-phosphate isomerase, archaeal [EC:5.3.1.9]  
ko:K11645	fbaB; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]  
ko:K13937	H6PD; hexose-6-phosphate dehydrogenase [EC:1.1.1.47 3.1.1.31]  
ko:K16370	pfkB; 6-phosphofructokinase 2 [EC:2.7.1.11]  


In [13]:
from Bio import Entrez
#Entrez.esearch(db,term, sort)
#Entrez.efetch(db, id, rettype, retmode)
#for enzymes and genes only; Entrez doesn't have pathway info




Entrez.email = 'ych323@berkeley.edu'
handle = Entrez.esearch(db = 'nucleotide',
                       term = 'homo sapiens[ORGN] G6PD',
                       sort= 'relevance',
                       idtype= 'acc')

fetched_dict = Entrez.read(handle)
print(fetched_dict)

handle = Entrez.efetch(db = 'nucleotide', id = fetched_dict["IdList"][1], rettype = 'fasta', retmode = 'text')
print(handle.read())
# TODO: Make a function to iterate through the query
# for i in Entrez.read(handle)["IdList"]:
#     handle = Entrez.efetch(db = 'nucleotide', id = i, rettype = 'fasta', retmode = 'text')
#     print(handle.read())

DictElement({'Count': '303', 'RetMax': '20', 'RetStart': '0', 'IdList': ['L44140.1', 'KJ896841.1', 'S64462.1', 'S58359.1', 'NM_000402.4', 'NM_001360016.1', 'NM_001042351.2', 'AB376963.1', 'X55448.1', 'DQ173568.1', 'DQ839546.1', 'DQ832766.1', 'M12996.1', 'MG772799.1', 'DQ173642.1', 'DQ173641.1', 'DQ173640.1', 'DQ173639.1', 'DQ173638.1', 'DQ173637.1'], 'TranslationSet': [DictElement({'From': 'homo sapiens[ORGN]', 'To': '"Homo sapiens"[Organism]'}, attributes={})], 'TranslationStack': [DictElement({'Term': '"Homo sapiens"[Organism]', 'Field': 'Organism', 'Count': '16629335', 'Explode': 'Y'}, attributes={}), DictElement({'Term': 'G6PD[All Fields]', 'Field': 'All Fields', 'Count': '41938', 'Explode': 'N'}, attributes={}), 'AND'], 'QueryTranslation': '"Homo sapiens"[Organism] AND G6PD[All Fields]'}, attributes={})
>KJ896841.1 Synthetic construct Homo sapiens clone ccsbBroadEn_06235 G6PD gene, encodes complete protein
GTTCGTTGCAACAAATTGATGAGCAATGCTTTTTTATAATGCCAACTTTGTACAAAAAAGTTGGCATGGC
AGAG