### Exemple 02

Establishment and characterization of persistent *Pseudomonas aeruginosa* infections in air–liquid interface cultures of human airway epithelial cells.

doi: https://doi.org/10.1128/iai.00603-24

**Ref:** Bouheraoua S, Cleeves S, Preusse M, Müsken M, Braubach P, Fuchs M, Falk C, Sewald K, Häussler S. 2025. Establishment and characterization of persistent Pseudomonas aeruginosa infections in air–liquid interface cultures of human airway epithelial cells. Infect Immun 93:e00603-24.
https://doi.org/10.1128/iai.00603-24

- The exemple was done for the Calu-3 PAO1 Day 5 vs Inoculum

Obj: extract background and upregulated genes for functional enrichment analysis

In [89]:
## Get the data
import pandas as pd
df = pd.read_excel("iai.00603-24-s0006.xlsx", sheet_name='Calu-3 PAO1 Day 5 vs Inoculum')

In [90]:
new_header = df.iloc[0]

df = df[1:]  
df.columns = new_header
# df

In [45]:
len(df)

1469

In [91]:
## Check for NAs in Locus Tag == Gene id
len(df[df["Locus Tag"].isna()])

0

In [47]:
len(df[df["Locus Tag"].duplicated()])

0

#### Background genes

All gene detected in the RNA-Seq analysis - without duplicates

In [92]:
genes_background = []
for gene in df["Locus Tag"]:
    if gene not in genes_background:
        genes_background.append(gene)
len(genes_background)

1469

In [93]:
with open('background_ex2.txt', 'w') as file:
    for gene in genes_background:
        file.write(f"{gene}\n")

In [94]:
print(len(genes_background))
with open("background_ex2.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
print(len(lines))

1469
1469


#### Upregulated genes

All genes that log2FC > 1, and FDR < 0.05

In [95]:
upregulated = df[(df['log2FC'] > 1) & (df['FDR'] < 0.05)]
# upregulated

In [52]:
len(upregulated[upregulated["Locus Tag"].duplicated()])

0

In [53]:
upregulated = upregulated.drop_duplicates(subset='Locus Tag')
len(upregulated)

731

In [54]:
with open('upregulated_ex2.txt', 'w') as file:
    for gene in upregulated["Locus Tag"]:
        file.write(f"{gene}\n")

In [55]:
print(len(upregulated))
with open("upregulated_ex2.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
print(len(lines))

731
731


##### Upregulated genes name

In [56]:
len(upregulated[upregulated["Gene name"].isna()])

481

In [57]:
from ResPathExplorer.mapper_KeggFunctions import get_gene_name_by_kegg_id
list_gene_ids = upregulated[upregulated["Gene name"].isna()]['Locus Tag'].tolist()

dict_g = {}
not_found_list = []

for id_g in list_gene_ids:
    id_go = "pae:" + id_g

    try:
        name = get_gene_name_by_kegg_id(id_go)

        if name:
            dict_g[id_g] = name
        else:
            dict_g[id_g] = ""

    except Exception as e:
        not_found_list.append(id_g)
        continue


In [96]:
# dict_g

In [59]:
not_found_list

['PA1427',
 'PA0805.1',
 'PA1426',
 'PA5471.1',
 'PA3991',
 'PA0852.1',
 'PA0708.1',
 'PA3218',
 'PA0717',
 'PA0980',
 'PA4028',
 'PA3090']

In [61]:
filtered_dict_g = {k: v for k, v in dict_g.items() if v != ""}
filtered_dict_g

{}

In [62]:
upregulated['Gene name'] = upregulated.apply(
    lambda row: filtered_dict_g[row['Locus Tag']] if row['Locus Tag'] in filtered_dict_g else row['Gene name'],
    axis=1)

In [83]:
upregulated.to_excel("df_genesup.xlsx")

In [65]:
gene_list = upregulated['Gene name'].dropna().unique().tolist()
len(gene_list)

250

In [66]:
with open('upregulatedGName_ex2.txt', 'w') as file:
    for gene in gene_list:
        file.write(f"{gene}\n")

In [67]:
with open("upregulatedGName_ex2.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
print(len(lines))

250


#### Downreulated genes

All genes that log2FC < -1, and FDR < 0.05

In [97]:
downregulated = df[(df['log2FC'] < -1) & (df['FDR'] < 0.05)]
# downregulated

In [69]:
len(downregulated[downregulated["Locus Tag"].duplicated()])

0

In [70]:
downregulated = downregulated.drop_duplicates(subset='Locus Tag')
len(downregulated)

694

In [71]:
with open('downregulated_ex2.txt', 'w') as file:
    for gene in downregulated["Locus Tag"]:
        file.write(f"{gene}\n")

In [72]:
print(len(downregulated))
with open("downregulated_ex2.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
print(len(lines))

694
694


#### Downregulated genes names

In [73]:
len(downregulated[downregulated["Gene name"].isna()])

299

In [74]:
list_gene_ids = downregulated[downregulated["Gene name"].isna()]['Locus Tag'].tolist()

dict_g = {}
not_found_list = []

for id_g in list_gene_ids:
    id_go = "pae:" + id_g

    try:
        name = get_gene_name_by_kegg_id(id_go)

        if name:
            dict_g[id_g] = name
        else:
            dict_g[id_g] = ""

    except Exception as e:
        not_found_list.append(id_g)
        continue

In [98]:
# dict_g

In [76]:
filtered_dict_g = {k: v for k, v in dict_g.items() if v != ""}
filtered_dict_g

{'PA4389': 'speA'}

In [77]:
downregulated['Gene name'] = downregulated.apply(
    lambda row: filtered_dict_g[row['Locus Tag']] if row['Locus Tag'] in filtered_dict_g else row['Gene name'],
    axis=1)

In [78]:
gene_list = downregulated['Gene name'].dropna().unique().tolist()
len(gene_list)

396

In [79]:
with open('downregulatedGName_ex2.txt', 'w') as file:
    for gene in gene_list:
        file.write(f"{gene}\n")

In [80]:
with open("downregulatedGName_ex2.txt", "r", encoding="utf-8") as f:
    lines = f.readlines()
print(len(lines))

396


In [84]:
downregulated.to_excel("df_genesdown.xlsx")