## Project Objective / Business Relevance

*Pseudomonas aeruginosa* frequently infects hospitalized patients and has high morbidity and mortality rates. With antibiotic resistance emerging as a major problem in effective *P. aeruginosa* treatment, innovative testing methods are in high-demand to better inform drug prescriptions. 

The aim of this project is to build a classification model to accurately predict the susceptiblity of *P. aeruginosa* isolates to the commonly-used drug tobramycin. The model will be trained using *orfN* gene sequences, a gene which has been shown to mutate first to convey tobramycin resistance in the bacteria.

This model will hopefully serve as the basis for a rapid anti-microbial susceptibility testing method (AST). Using only one type of gene means significantly less data will be required compared to other proposed methods.

In [95]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [96]:
# pd.set_option("display.max_rows", None)
# pd.set_option("display.max_columns", None)
pd.reset_option("display.max_rows")
pd.reset_option("display.max_columns")

## Importing and Cleaning Data

Data will be obtained from two different sources. 

1. The *orfN* gene sequence data for each isolate was obtained from the BV-BRC database using the reference sequence locus tag “PA14_23460”: https://www.bv-brc.org/view/Feature/PATRIC.208963.12.NC_008463.CDS.2040149.2041165.fwd

2. The tobramycin resistance phenotype data was obtained from the “Dataset EV1” file in Khaledi et al. (2020):https://www.embopress.org/doi/full/10.15252/emmm.201910264

In [97]:
# Importing gene sequence data
seq_df = pd.read_csv('BV-BRC_Allstrains.csv')
seq_df.head(2)
seq_df

Unnamed: 0,Genome,Unnamed: 1,Unnamed: 2,Bv-BRC Strains,RefSeq Locus Tag,Alt Locus Tag,Feature ID,Annotation,Feature Type,Start,...,Length,Strand,FIGfam ID,PATRIC genus-specific families (PLfams),PATRIC cross-genus families (PGfams),Protein ID,AA Length,Gene Symbol,Product,GO
0,Pseudomonas,aeruginosa,strain,CF592_Iso2,,,PATRIC.287.12562.287.12562.con.0005.CDS.335292...,PATRIC,CDS,335292,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
1,Pseudomonas,aeruginosa,strain,CF609_Iso3,,,PATRIC.287.12555.287.12555.con.0005.CDS.2376.3...,PATRIC,CDS,2376,...,1041,+,,PLF_286_00001745,PGF_00780840,,346,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
2,Pseudomonas,aeruginosa,strain,CH2500,,,PATRIC.287.12774.287.12774.con.0041.CDS.41.105...,PATRIC,CDS,41,...,1017,-,,PLF_286_00001745,PGF_00780840,,338,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
3,Pseudomonas,aeruginosa,strain,CH2527,,,PATRIC.287.12776.287.12776.con.0001.CDS.811102...,PATRIC,CDS,811102,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
4,Pseudomonas,aeruginosa,strain,CH2543,,,PATRIC.287.12777.287.12777.con.0002.CDS.38770....,PATRIC,CDS,38770,...,1026,+,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
376,Pseudomonas,aeruginosa,strain,ZG5089456,,,PATRIC.287.12548.287.12548.con.0003.CDS.404676...,PATRIC,CDS,404676,...,1020,-,,PLF_286_00001745,PGF_00780840,,339,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
377,Pseudomonas,aeruginosa,strain,ZG8006959,,,PATRIC.287.12550.287.12550.con.0003.CDS.410460...,PATRIC,CDS,410460,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
378,Pseudomonas,aeruginosa,strain,ZG8038581181,,,PATRIC.287.12531.287.12531.con.0002.CDS.45243....,PATRIC,CDS,45243,...,885,+,,PLF_286_00001745,PGF_00780840,,294,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
379,Pseudomonas,aeruginosa,strain,ZG8510487,,,PATRIC.287.12547.287.12547.con.0003.CDS.402424...,PATRIC,CDS,402424,...,1020,-,,PLF_286_00001745,PGF_00780840,,339,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...


In [98]:
# Importing resistance phenotype data; only importing parts of excel sheet that are needed
phen_df = pd.read_csv('BactomeResistanceData.csv', usecols=[1, 2, 3, 4], nrows=377)
phen_df.head(2)
phen_df

Unnamed: 0,Isolate,Supplier (Geographic origin),TOB,MIC*
0,CF592_Iso2,University Hospital Essen (Essen),S,2.0
1,CF609_Iso3,University Hospital Essen (Essen),I,8.0
2,CH2500,Charité Berlin (Berlin),S,0.5
3,CH2527,Charité Berlin (Berlin),S,0.5
4,CH2543,Charité Berlin (Berlin),R,512.0
...,...,...,...,...
372,ZG5089456,Private practice laboratory (Leipzig),R,128.0
373,ZG8006959,Private practice laboratory (Leipzig),R,128.0
374,ZG8038581181,Private practice laboratory (Chemnitz),R,1.0
375,ZG8510487,Private practice laboratory (Leipzig),R,256.0


The datasets are cross-referenced to see for which isolates we have both types of data.

In [99]:
# Dropping extra isolates from seq_df
filt = seq_df['Bv-BRC Strains'].isin(phen_df['Isolate'])
to_drop = seq_df[filt == False]
seq_df.drop(to_drop.index, axis=0, inplace=True)
seq_df

Unnamed: 0,Genome,Unnamed: 1,Unnamed: 2,Bv-BRC Strains,RefSeq Locus Tag,Alt Locus Tag,Feature ID,Annotation,Feature Type,Start,...,Length,Strand,FIGfam ID,PATRIC genus-specific families (PLfams),PATRIC cross-genus families (PGfams),Protein ID,AA Length,Gene Symbol,Product,GO
0,Pseudomonas,aeruginosa,strain,CF592_Iso2,,,PATRIC.287.12562.287.12562.con.0005.CDS.335292...,PATRIC,CDS,335292,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
1,Pseudomonas,aeruginosa,strain,CF609_Iso3,,,PATRIC.287.12555.287.12555.con.0005.CDS.2376.3...,PATRIC,CDS,2376,...,1041,+,,PLF_286_00001745,PGF_00780840,,346,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
2,Pseudomonas,aeruginosa,strain,CH2500,,,PATRIC.287.12774.287.12774.con.0041.CDS.41.105...,PATRIC,CDS,41,...,1017,-,,PLF_286_00001745,PGF_00780840,,338,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
3,Pseudomonas,aeruginosa,strain,CH2527,,,PATRIC.287.12776.287.12776.con.0001.CDS.811102...,PATRIC,CDS,811102,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
4,Pseudomonas,aeruginosa,strain,CH2543,,,PATRIC.287.12777.287.12777.con.0002.CDS.38770....,PATRIC,CDS,38770,...,1026,+,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
376,Pseudomonas,aeruginosa,strain,ZG5089456,,,PATRIC.287.12548.287.12548.con.0003.CDS.404676...,PATRIC,CDS,404676,...,1020,-,,PLF_286_00001745,PGF_00780840,,339,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
377,Pseudomonas,aeruginosa,strain,ZG8006959,,,PATRIC.287.12550.287.12550.con.0003.CDS.410460...,PATRIC,CDS,410460,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
378,Pseudomonas,aeruginosa,strain,ZG8038581181,,,PATRIC.287.12531.287.12531.con.0002.CDS.45243....,PATRIC,CDS,45243,...,885,+,,PLF_286_00001745,PGF_00780840,,294,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
379,Pseudomonas,aeruginosa,strain,ZG8510487,,,PATRIC.287.12547.287.12547.con.0003.CDS.402424...,PATRIC,CDS,402424,...,1020,-,,PLF_286_00001745,PGF_00780840,,339,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...


There are 4 isolates in the seq_df dataframe that we do not have phenotype data for. These isolates are dropped.

The isolates with “intermediate” susceptibility to tobramycin are dropped to broaden the gap between resistant and susceptible isolates. A gene length limit of +/- 30% of the reference sequence length (1017 base pairs) was also imposed to limit sequence variability.

In [100]:
# Dropping isolates with I from phen_df
i_to_drop = phen_df[phen_df['TOB'] == 'I']
phen_df.drop(i_to_drop.index, axis=0, inplace=True)

# Dropping corresponding isolates from seq_df
i_filter = seq_df['Bv-BRC Strains'].isin(i_to_drop['Isolate'])
seq_df = seq_df[~i_filter]

assert len(phen_df) == len(seq_df)

In [101]:
# # Creating bounds for gene length filtering
# lower_lim = 1017 - 0.3 * 1017
# upper_lim = 1017 + 0.3 * 1017

# # Dropping isolates outside of length bounds from seq_df
# len_to_drop = seq_df[(seq_df['Length'] > upper_lim) | (seq_df['Length'] < lower_lim)]
# seq_df = seq_df.drop(len_to_drop.index, axis=0)

# # Dropping corresponding isolates from phen_df
# len_filter = phen_df['Isolate'].isin(len_to_drop['Bv-BRC Strains'])
# phen_df = phen_df[~len_filter]

# print(f"Length of phen_df: {len(phen_df)}, length of seq_df: {len(seq_df)}")

In [102]:
# Trying to use smaller range
# Creating bounds for gene length filtering
lower_lim = 1017 - 36
upper_lim = 1017 + 36

# Dropping isolates outside of length bounds from seq_df
len_to_drop = seq_df[(seq_df['Length'] > upper_lim) | (seq_df['Length'] < lower_lim)]
seq_df = seq_df.drop(len_to_drop.index, axis=0)

# Dropping corresponding isolates from phen_df
len_filter = phen_df['Isolate'].isin(len_to_drop['Bv-BRC Strains'])
phen_df = phen_df[~len_filter]

print(f"Length of phen_df: {len(phen_df)}, length of seq_df: {len(seq_df)}")

Length of phen_df: 334, length of seq_df: 334


This cleaning has left us with 367 isolates. The next step is to use multiple sequence alignment (MSA) using the BV-BRC website. This technique uses the “Mafft” aligner to align the isolate gene sequences as best as possible relative to the PA14 reference *orfN* sequence taken from the Pseudomonas Genome Database: https://www.pseudomonas.com/feature/show/?id=1654623&view=sequence

**possible improvments could be doing MSA with consensus as reference sequence OR not using a reference sequence**

In [103]:
from Bio import AlignIO
# Load MSA file in FASTA format
msa = AlignIO.read('BVBRC_msa_refPA14.fasta', 'fasta')
msa

<<class 'Bio.Align.MultipleSeqAlignment'> instance (369 records of length 1073) at 12a312490>

In [80]:
# Create a dataframe for msa
msa_df = pd.DataFrame()
# Add sequences from msa to dataframe
for i, record in enumerate(msa):
    msa_df[i] = list(record.seq)
# Flip rows and columns
msa_df = msa_df.transpose()
pd.reset_option("display.max_rows")
pd.reset_option("display.max_columns")
msa_df

  msa_df[i] = list(record.seq)


Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1063,1064,1065,1066,1067,1068,1069,1070,1071,1072
0,-,-,-,A,T,G,A,A,C,T,...,-,-,-,-,-,T,A,T,A,G
1,-,-,-,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a
2,a,t,g,a,t,g,a,t,c,t,...,-,-,-,-,-,c,c,t,a,a
3,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
4,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
364,a,t,g,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a
365,a,t,g,a,t,g,a,t,c,t,...,-,-,-,-,-,c,c,t,a,a
366,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
367,a,t,g,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a


In [81]:
# Set the index labels to the BRC ID's for each isolate
msa_df.index = [rec.id for rec in msa]
msa_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1063,1064,1065,1066,1067,1068,1069,1070,1071,1072
reference_seq,-,-,-,A,T,G,A,A,C,T,...,-,-,-,-,-,T,A,T,A,G
fig:287.12739.peg.662,-,-,-,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a
fig:287.12529.peg.1653,a,t,g,a,t,g,a,t,c,t,...,-,-,-,-,-,c,c,t,a,a
fig:287.12511.peg.818,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
fig:287.12550.peg.1478,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
fig:287.12728.peg.1837,a,t,g,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a
fig:287.12488.peg.43,a,t,g,a,t,g,a,t,c,t,...,-,-,-,-,-,c,c,t,a,a
fig:287.12507.peg.1222,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
fig:287.12497.peg.3325,a,t,g,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a


In [82]:
# Importing dataframe with corresponding BRC-ID's and strain names
mapping_df = pd.read_csv('BV-BRC ID Data.csv', usecols=['Strain', 'BRC ID'])
# Creating dictionnary that maps BRC ID's to strain names
mapping_dict = dict(zip(mapping_df['BRC ID'], mapping_df['Strain']))
# Rename msa_df index with mapping dictionnary
msa_df.index = msa_df.index.map(mapping_dict)
msa_df

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1063,1064,1065,1066,1067,1068,1069,1070,1071,1072
Reference PA14,-,-,-,A,T,G,A,A,C,T,...,-,-,-,-,-,T,A,T,A,G
PSAE1984,-,-,-,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a
ZG205565,a,t,g,a,t,g,a,t,c,t,...,-,-,-,-,-,c,c,t,a,a
MHH17441,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
ZG8006959,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ESP084,a,t,g,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a
CH5363,a,t,g,a,t,g,a,t,c,t,...,-,-,-,-,-,c,c,t,a,a
CH5695,a,t,g,g,a,a,g,a,a,t,...,-,-,a,g,g,c,c,t,a,a
CH5548,a,t,g,a,t,g,a,a,t,c,...,-,-,-,-,g,c,t,t,g,a


In [106]:
# Missing values are in present as '-' 
msa_df.replace('-', np.nan, inplace=True)

In [107]:
msa_df.loc[:, msa_df.isnull().any()]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1063,1064,1065,1066,1067,1068,1069,1070,1071,1072
Reference PA14,,,,A,T,G,A,A,C,T,...,,,,,,T,A,T,A,G
PSAE1984,,,,a,t,g,a,a,t,c,...,,,,,g,c,t,t,g,a
ZG205565,a,t,g,a,t,g,a,t,c,t,...,,,,,,c,c,t,a,a
MHH17441,a,t,g,g,a,a,g,a,a,t,...,,,a,g,g,c,c,t,a,a
ZG8006959,a,t,g,g,a,a,g,a,a,t,...,,,a,g,g,c,c,t,a,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ESP084,a,t,g,a,t,g,a,a,t,c,...,,,,,g,c,t,t,g,a
CH5363,a,t,g,a,t,g,a,t,c,t,...,,,,,,c,c,t,a,a
CH5695,a,t,g,g,a,a,g,a,a,t,...,,,a,g,g,c,c,t,a,a
CH5548,a,t,g,a,t,g,a,a,t,c,...,,,,,g,c,t,t,g,a


In [90]:
# There are 512 columns with missing values
list_na = []
for col in msa_df.columns:
    bool = msa_df[col].str.contains('-').any()
    if bool == True:
        list_na.append(col)
len(list_na)

512

In [108]:
# How mant rows contain missing values
msa_df[msa_df.isnull().any(axis=1)]

Unnamed: 0,0,1,2,3,4,5,6,7,8,9,...,1063,1064,1065,1066,1067,1068,1069,1070,1071,1072
Reference PA14,,,,A,T,G,A,A,C,T,...,,,,,,T,A,T,A,G
PSAE1984,,,,a,t,g,a,a,t,c,...,,,,,g,c,t,t,g,a
ZG205565,a,t,g,a,t,g,a,t,c,t,...,,,,,,c,c,t,a,a
MHH17441,a,t,g,g,a,a,g,a,a,t,...,,,a,g,g,c,c,t,a,a
ZG8006959,a,t,g,g,a,a,g,a,a,t,...,,,a,g,g,c,c,t,a,a
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
ESP084,a,t,g,a,t,g,a,a,t,c,...,,,,,g,c,t,t,g,a
CH5363,a,t,g,a,t,g,a,t,c,t,...,,,,,,c,c,t,a,a
CH5695,a,t,g,g,a,a,g,a,a,t,...,,,a,g,g,c,c,t,a,a
CH5548,a,t,g,a,t,g,a,a,t,c,...,,,,,g,c,t,t,g,a


Let's see if limiting length of allowable sequences lowers missing value numbers

Limiting sequence length from 981 - 1053 (will remove 33 rows)

Will have to redo MSA without these sequences

In [92]:
# Looking at len
seq_df.describe()

Unnamed: 0,Alt Locus Tag,Start,End,Length,Protein ID,AA Length,Gene Symbol
count,0.0,367.0,367.0,367.0,0.0,367.0,0.0
mean,,263934.6,264947.7,1014.065395,,337.021798,
std,,291140.1,291137.1,44.896007,,14.965336,
min,,41.0,1057.0,789.0,,262.0,
25%,,40285.5,41315.0,1020.0,,339.0,
50%,,46536.0,47555.0,1026.0,,341.0,
75%,,442711.5,443730.5,1026.0,,341.0,
max,,1149773.0,1150813.0,1053.0,,350.0,


In [94]:
seq_df[seq_df['Length'] > 981]

Unnamed: 0,Genome,Unnamed: 1,Unnamed: 2,Bv-BRC Strains,RefSeq Locus Tag,Alt Locus Tag,Feature ID,Annotation,Feature Type,Start,...,Length,Strand,FIGfam ID,PATRIC genus-specific families (PLfams),PATRIC cross-genus families (PGfams),Protein ID,AA Length,Gene Symbol,Product,GO
0,Pseudomonas,aeruginosa,strain,CF592_Iso2,,,PATRIC.287.12562.287.12562.con.0005.CDS.335292...,PATRIC,CDS,335292,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
2,Pseudomonas,aeruginosa,strain,CH2500,,,PATRIC.287.12774.287.12774.con.0041.CDS.41.105...,PATRIC,CDS,41,...,1017,-,,PLF_286_00001745,PGF_00780840,,338,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
3,Pseudomonas,aeruginosa,strain,CH2527,,,PATRIC.287.12776.287.12776.con.0001.CDS.811102...,PATRIC,CDS,811102,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
4,Pseudomonas,aeruginosa,strain,CH2543,,,PATRIC.287.12777.287.12777.con.0002.CDS.38770....,PATRIC,CDS,38770,...,1026,+,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
5,Pseudomonas,aeruginosa,strain,CH2560,,,PATRIC.287.12779.287.12779.con.0003.CDS.449170...,PATRIC,CDS,449170,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
375,Pseudomonas,aeruginosa,strain,ZG5051896,,,PATRIC.287.12546.287.12546.con.0022.CDS.11785....,PATRIC,CDS,11785,...,1041,-,,PLF_286_00001745,PGF_00780840,,346,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
376,Pseudomonas,aeruginosa,strain,ZG5089456,,,PATRIC.287.12548.287.12548.con.0003.CDS.404676...,PATRIC,CDS,404676,...,1020,-,,PLF_286_00001745,PGF_00780840,,339,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
377,Pseudomonas,aeruginosa,strain,ZG8006959,,,PATRIC.287.12550.287.12550.con.0003.CDS.410460...,PATRIC,CDS,410460,...,1026,-,,PLF_286_00001745,PGF_00780840,,341,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
379,Pseudomonas,aeruginosa,strain,ZG8510487,,,PATRIC.287.12547.287.12547.con.0003.CDS.402424...,PATRIC,CDS,402424,...,1020,-,,PLF_286_00001745,PGF_00780840,,339,,Undecaprenyl-phosphate alpha-N-acetylglucosami...,GO:0036380|UDP-N-acetylglucosamine-undecapreny...
