# Data from Yixin
## Xylitol strain
### Knock outs
* focA, https://www.uniprot.org/uniprot/P0AC23
* pflB, https://www.uniprot.org/uniprot/P09373
* ldhA, https://www.uniprot.org/uniprot/P52643
* adhE, https://www.uniprot.org/uniprot/P0A9Q7
* xylAB, https://www.uniprot.org/uniprot/D1MDQ9
* frdA, https://www.uniprot.org/uniprot/P00363
* cyoB, https://www.uniprot.org/uniprot/P0ABI8
* appB, https://www.uniprot.org/uniprot/P26458

### Missing reaction(s)
* xylose + NADPH -> xylitol + NADP+ https://www.brenda-enzymes.org/enzyme.php?ecno=1.1.1.307

## Isobutyric acid strain
### Knock outs
* focA, https://www.uniprot.org/uniprot/P0AC23
* pflB, https://www.uniprot.org/uniprot/P09373
* aceEF, https://www.uniprot.org/uniprot/P0ACL9
* poxB, https://www.uniprot.org/uniprot/P07003
* tdcE, https://www.uniprot.org/uniprot/P42632
* pflDC 
* deoC, https://www.uniprot.org/uniprot/P0A6L0
* ydbK, https://www.uniprot.org/uniprot/P52647
* yqhD, https://www.uniprot.org/uniprot/Q46856
* araBA
* xylAB, https://www.uniprot.org/uniprot/D1MDQ9


### Missing reactions(s): 
* isobutyraldehyde to isobutyric acid (forgot the enzyme/gene name)

In [1]:
from bioservices import UniProt
import io
import pandas as pd
import numpy as np
service = UniProt()

# Find knockouts for xylitol strain

In [2]:
df_xylitol = pd.read_csv('data/data_ko_xylitol_strain.txt', header = None,
            skiprows = 1,
            sep = ',',
            names = [
                'gene',
                'uniprot', 
                'strain'])
df_xylitol['strain'] = 'xylytol_mutant'
df_xylitol

Unnamed: 0,gene,uniprot,strain
0,pflB,https://www.uniprot.org/uniprot/P09373,xylytol_mutant
1,ldhA,https://www.uniprot.org/uniprot/P52643,xylytol_mutant
2,adhE,https://www.uniprot.org/uniprot/P0A9Q7,xylytol_mutant
3,xylAB,https://www.uniprot.org/uniprot/D1MDQ9,xylytol_mutant
4,frdA,https://www.uniprot.org/uniprot/P00363,xylytol_mutant
5,cyoB,https://www.uniprot.org/uniprot/P0ABI8,xylytol_mutant
6,appB,https://www.uniprot.org/uniprot/P26458,xylytol_mutant


In [3]:
df = pd.DataFrame(columns=["Entry", "Entry name", "Gene names  (primary )", "Gene names  (synonym )", "Protein names","Gene ontology (molecular function)","Catalytic activity"])
for i in df_xylitol.uniprot:
    uniprot_id = i.split('/')[-1]
    # Make a query string
    query = "accession:"+uniprot_id
    # Define a list of columns we want to retrive
    columnlist = "id,entry name,genes(PREFERRED),genes(ALTERNATIVE),protein names,go(molecular function),comment(CATALYTIC ACTIVITY)"
    # Run the remote search
    result = service.search(query, frmt="tab", columns=columnlist)
    df_add = (pd.read_table(io.StringIO(result)))
    df = pd.concat([df, df_add])
df = df.reset_index(drop=True)
df_xylitol = pd.merge(df_xylitol, df, left_index=True, right_index=True)
df_xylitol

Unnamed: 0,gene,uniprot,strain,Entry,Entry name,Gene names (primary ),Gene names (synonym ),Protein names,Gene ontology (molecular function),Catalytic activity
0,pflB,https://www.uniprot.org/uniprot/P09373,xylytol_mutant,P09373,PFLB_ECOLI,pflB,pfl,Formate acetyltransferase 1 (EC 2.3.1.54) (Pyr...,formate C-acetyltransferase activity [GO:0008861],CATALYTIC ACTIVITY: Reaction=acetyl-CoA + form...
1,ldhA,https://www.uniprot.org/uniprot/P52643,xylytol_mutant,P52643,LDHD_ECOLI,ldhA,hslI htpH,D-lactate dehydrogenase (D-LDH) (EC 1.1.1.28) ...,D-lactate dehydrogenase activity [GO:0008720];...,CATALYTIC ACTIVITY: Reaction=(R)-lactate + NAD...
2,adhE,https://www.uniprot.org/uniprot/P0A9Q7,xylytol_mutant,P0A9Q7,ADHE_ECOLI,adhE,ana,Aldehyde-alcohol dehydrogenase [Includes: Alco...,acetaldehyde dehydrogenase (acetylating) activ...,CATALYTIC ACTIVITY: Reaction=a primary alcohol...
3,xylAB,https://www.uniprot.org/uniprot/D1MDQ9,xylytol_mutant,D1MDQ9,D1MDQ9_ECOLX,xylAB,xylB,Xylulose kinase (Xylulokinase) (EC 2.7.1.17) (...,ATP binding [GO:0005524]; xylulokinase activit...,CATALYTIC ACTIVITY: Reaction=ATP + D-xylulose ...
4,frdA,https://www.uniprot.org/uniprot/P00363,xylytol_mutant,P00363,FRDA_ECOLI,frdA,,Fumarate reductase flavoprotein subunit (EC 1....,electron transfer activity [GO:0009055]; FAD b...,CATALYTIC ACTIVITY: Reaction=a quinone + succi...
5,cyoB,https://www.uniprot.org/uniprot/P0ABI8,xylytol_mutant,P0ABI8,CYOB_ECOLI,cyoB,,Cytochrome bo(3) ubiquinol oxidase subunit 1 (...,copper ion binding [GO:0005507]; cytochrome bo...,CATALYTIC ACTIVITY: Reaction=2 a ubiquinol + n...
6,appB,https://www.uniprot.org/uniprot/P26458,xylytol_mutant,P26458,APPB_ECOLI,appB,cbdB cyxB,Cytochrome bd-II ubiquinol oxidase subunit 2 (...,electron transfer activity [GO:0009055]; metal...,CATALYTIC ACTIVITY: Reaction=2 a ubiquinol + n...


# Find knockouts for isobutyric acid strain

In [4]:
df_isobutyric = pd.read_csv('data/data_ko_isobutyric-acid_strain.txt', header = None,
            skiprows = 1,
            sep = ',',
            names = [
                'gene',
                'uniprot', 
                'strain'])
df_isobutyric['strain'] = 'isobutyric_mutant'
df_isobutyric

Unnamed: 0,gene,uniprot,strain
0,pflB,https://www.uniprot.org/uniprot/P09373,isobutyric_mutant
1,aceEF,https://www.uniprot.org/uniprot/P0ACL9,isobutyric_mutant
2,poxB,https://www.uniprot.org/uniprot/P07003,isobutyric_mutant
3,tdcE,https://www.uniprot.org/uniprot/P42632,isobutyric_mutant
4,pflDC,,isobutyric_mutant
5,deoC,https://www.uniprot.org/uniprot/P0A6L0,isobutyric_mutant
6,ydbK,https://www.uniprot.org/uniprot/P52647,isobutyric_mutant
7,yqhD,https://www.uniprot.org/uniprot/Q46856,isobutyric_mutant
8,araBA,,isobutyric_mutant
9,xylAB,https://www.uniprot.org/uniprot/D1MDQ9,isobutyric_mutant


In [5]:
df_isobutyric = df_isobutyric.dropna().reset_index(drop=True)
df_isobutyric

Unnamed: 0,gene,uniprot,strain
0,pflB,https://www.uniprot.org/uniprot/P09373,isobutyric_mutant
1,aceEF,https://www.uniprot.org/uniprot/P0ACL9,isobutyric_mutant
2,poxB,https://www.uniprot.org/uniprot/P07003,isobutyric_mutant
3,tdcE,https://www.uniprot.org/uniprot/P42632,isobutyric_mutant
4,deoC,https://www.uniprot.org/uniprot/P0A6L0,isobutyric_mutant
5,ydbK,https://www.uniprot.org/uniprot/P52647,isobutyric_mutant
6,yqhD,https://www.uniprot.org/uniprot/Q46856,isobutyric_mutant
7,xylAB,https://www.uniprot.org/uniprot/D1MDQ9,isobutyric_mutant


In [6]:
df = pd.DataFrame(columns=["Entry", "Entry name", "Gene names  (primary )", "Gene names  (synonym )", "Protein names","Gene ontology (molecular function)","Catalytic activity"])
for i in df_isobutyric.uniprot:
    uniprot_id = i.split('/')[-1]
    # Make a query string
    query = "accession:"+uniprot_id
    # Define a list of columns we want to retrive
    columnlist = "id,entry name,genes(PREFERRED),genes(ALTERNATIVE),protein names,go(molecular function),comment(CATALYTIC ACTIVITY)"
    # Run the remote search
    result = service.search(query, frmt="tab", columns=columnlist)
    df_add = (pd.read_table(io.StringIO(result)))
    df = pd.concat([df, df_add])
df = df.reset_index(drop=True)
df_isobutyric = pd.merge(df_isobutyric, df, left_index=True, right_index=True)
df_isobutyric

Unnamed: 0,gene,uniprot,strain,Entry,Entry name,Gene names (primary ),Gene names (synonym ),Protein names,Gene ontology (molecular function),Catalytic activity
0,pflB,https://www.uniprot.org/uniprot/P09373,isobutyric_mutant,P09373,PFLB_ECOLI,pflB,pfl,Formate acetyltransferase 1 (EC 2.3.1.54) (Pyr...,formate C-acetyltransferase activity [GO:0008861],CATALYTIC ACTIVITY: Reaction=acetyl-CoA + form...
1,aceEF,https://www.uniprot.org/uniprot/P0ACL9,isobutyric_mutant,P0ACL9,PDHR_ECOLI,pdhR,aceC genA yacB,Pyruvate dehydrogenase complex repressor,DNA binding [GO:0003677]; DNA-binding transcri...,
2,poxB,https://www.uniprot.org/uniprot/P07003,isobutyric_mutant,P07003,POXB_ECOLI,poxB,,Pyruvate dehydrogenase [ubiquinone] (EC 1.2.5....,flavin adenine dinucleotide binding [GO:005066...,CATALYTIC ACTIVITY: Reaction=a ubiquinone + H2...
3,tdcE,https://www.uniprot.org/uniprot/P42632,isobutyric_mutant,P42632,TDCE_ECOLI,tdcE,yhaS,PFL-like enzyme TdcE (Keto-acid formate acetyl...,2-ketobutyrate formate-lyase activity [GO:0043...,CATALYTIC ACTIVITY: Reaction=2-oxobutanoate + ...
4,deoC,https://www.uniprot.org/uniprot/P0A6L0,isobutyric_mutant,P0A6L0,DEOC_ECOLI,deoC,dra thyR,Deoxyribose-phosphate aldolase (DERA) (EC 4.1....,deoxyribose-phosphate aldolase activity [GO:00...,CATALYTIC ACTIVITY: Reaction=2-deoxy-D-ribose ...
5,ydbK,https://www.uniprot.org/uniprot/P52647,isobutyric_mutant,P52647,NIFJ_ECOLI,ydbK,,Probable pyruvate-flavodoxin oxidoreductase (E...,"4 iron, 4 sulfur cluster binding [GO:0051539];...",CATALYTIC ACTIVITY: Reaction=CoA + 2 H(+) + ox...
6,yqhD,https://www.uniprot.org/uniprot/Q46856,isobutyric_mutant,Q46856,YQHD_ECOLI,yqhD,,Alcohol dehydrogenase YqhD (EC 1.1.1.-),alcohol dehydrogenase (NADP+) activity [GO:000...,CATALYTIC ACTIVITY: Reaction=a primary alcohol...
7,xylAB,https://www.uniprot.org/uniprot/D1MDQ9,isobutyric_mutant,D1MDQ9,D1MDQ9_ECOLX,xylAB,xylB,Xylulose kinase (Xylulokinase) (EC 2.7.1.17) (...,ATP binding [GO:0005524]; xylulokinase activit...,CATALYTIC ACTIVITY: Reaction=ATP + D-xylulose ...


# Check Reaction

In [12]:
reaction_data = []
for counter, i in enumerate(df_xylitol['Catalytic activity']):
    try:
        reaction = i.split(';')[0].replace('CATALYTIC ACTIVITY: Reaction=','')
        print(df_xylitol.gene[counter]+':', reaction)
        reaction_data.append(reaction)
    except:
        print(df_xylitol.gene[counter]+':', 'reaction not available')
        reaction_data.append(np.nan)
df_xylitol = df_xylitol.assign(Reaction = reaction_data)

pflB: acetyl-CoA + formate = CoA + pyruvate
ldhA: (R)-lactate + NAD(+) = H(+) + NADH + pyruvate
adhE: a primary alcohol + NAD(+) = an aldehyde + H(+) + NADH
xylAB: ATP + D-xylulose = ADP + D-xylulose 5-phosphate + H(+)
frdA: a quinone + succinate = a quinol + fumarate
cyoB: 2 a ubiquinol + n H(+)(in) + O2 = 2 a ubiquinone + n H(+)(out) + 2 H2O
appB: 2 a ubiquinol + n H(+)(in) + O2 = 2 a ubiquinone + n H(+)(out) + 2 H2O


In [13]:
df_xylitol.to_csv('data/df_xylitol_knockouts.csv')

In [14]:
reaction_data = []
for counter, i in enumerate(df_isobutyric['Catalytic activity']):
    try:
        reaction = i.split(';')[0].replace('CATALYTIC ACTIVITY: Reaction=','')
        print(df_isobutyric.gene[counter]+':', reaction)
        reaction_data.append(reaction)
    except:
        print(df_isobutyric.gene[counter]+':', 'reaction not available')
        reaction_data.append(np.nan)
df_isobutyric = df_isobutyric.assign(Reaction = reaction_data)

pflB: acetyl-CoA + formate = CoA + pyruvate
aceEF: reaction not available
poxB: a ubiquinone + H2O + pyruvate = a ubiquinol + acetate + CO2
tdcE: 2-oxobutanoate + CoA = formate + propanoyl-CoA
deoC: 2-deoxy-D-ribose 5-phosphate = acetaldehyde + D-glyceraldehyde 3-phosphate
ydbK: CoA + 2 H(+) + oxidized [flavodoxin] + pyruvate = acetyl-CoA + CO2 + reduced [flavodoxin]
yqhD: a primary alcohol + NADP(+) = an aldehyde + H(+) + NADPH
xylAB: ATP + D-xylulose = ADP + D-xylulose 5-phosphate + H(+)


In [15]:
df_isobutyric.to_csv('data/df_isobutyric_knockouts.csv')