# A comparative genomic analysis of Pseudomonas aeruginosa strains isolated from oil-contaminates environments in Peru (Script Report)

### Notebook by: reymonera (Camila Castillo-Vilcahuaman)

This is a notebook that hosts the logic behind the bioinformatics workflow used during the "A comparative genomic analysis of Pseudomonas aeruginosa strains isolated from oil-contaminated environments in Peru". The aim of this notebook is to make our code more reproducible. It should be noted that this notebook begins after the assembly of the genomes used in this paper.

First, here's a test on how bash scripting is working in this environment.

In [1]:
%%sh
ls

LICENSE
Pseudomonas_notebook.ipynb
README.md


Now, here we will activate all environments used on this project. All environments were managed using `conda` and I normally manage each program in its own environment. If you use any of these environments in other ways, then you should change the code here.

In [12]:
%%sh
pwd
pip install rpy2

/home/marlen/Documentos/program/pseudomonas_notebook


In [1]:
#Load the rpy2 feature so that you can execute R code during this notebook.
%load_ext rpy2.ipython



## ANI

ANI was executed using `PyAni`, activated through `conda`. A directory was created using the available genomes in public repositories plus the strain sequenced for this study. This directory was used as a database.

In [10]:
!pip install biopython

from Bio import Entrez

# Lista de códigos de ensamblaje del NCBI
assemblies = [
    "GCA_024706435",
    "GCA_025210085",
    "GCA_025209735",
    "GCA_025210135",
    "GCA_025209925",
    "GCA_025380225",
    "GCA_000006765",
    "CP046602",
    "GCA_001516365",
    
]

# Función para descargar FASTA usando Biopython
def download_fasta(assembly):
    try:
        # Realizar la búsqueda en NCBI
        search_handle = Entrez.esearch(db="assembly", term=assembly)
        record = Entrez.read(search_handle)
        search_handle.close()
        
        # Obtener el ID de la secuencia
        assembly_id = record["IdList"][0]
        
        # Obtener la secuencia en formato FASTA
        fetch_handle = Entrez.efetch(db="nuccore", id=assembly_id, rettype="fasta", retmode="text")
        fasta_data = fetch_handle.read()
        fetch_handle.close()
        
        # Guardar el archivo FASTA
        with open(f"{assembly}.fna", "w") as fasta_file:
            fasta_file.write(fasta_data)
        
        print(f"Descarga completada para {assembly}")
        
    except Exception as e:
        print(f"Error al descargar {assembly}: {e}")

# Descargar todos los ensamblajes de la lista
for assembly in assemblies:
    download_fasta(assembly)

!mkdir db_pseudomonas
!mv *.fna db_pseudomonas/
!ls db_pseudomonas/

Error al descargar GCA_024706435: HTTP Error 400: Bad Request
Descarga completada para GCA_025210085
Error al descargar GCA_025209735: HTTP Error 400: Bad Request
Descarga completada para GCA_025210135
Descarga completada para GCA_025209925
Descarga completada para GCA_025380225
Error al descargar GCA_000006765: HTTP Error 400: Bad Request
Descarga completada para CP046602
Descarga completada para GCA_001516365
CP046602.fna	   GCA_025209925.fna  GCA_025210135.fna
GCA_001516365.fna  GCA_025210085.fna  GCA_025380225.fna


In [13]:
!average_nucleotide_identity.py -i db_pseudomonas -o ANIm_output -m ANIm -g
!average_nucleotide_identity.py -i db_pseudomonas -o ANIb_output -m ANIb -g

no change     /home/marlen/miniforge-pypy3/condabin/conda
no change     /home/marlen/miniforge-pypy3/bin/conda
no change     /home/marlen/miniforge-pypy3/bin/conda-env
no change     /home/marlen/miniforge-pypy3/bin/activate
no change     /home/marlen/miniforge-pypy3/bin/deactivate
no change     /home/marlen/miniforge-pypy3/etc/profile.d/conda.sh
no change     /home/marlen/miniforge-pypy3/etc/fish/conf.d/conda.fish
no change     /home/marlen/miniforge-pypy3/shell/condabin/Conda.psm1
no change     /home/marlen/miniforge-pypy3/shell/condabin/conda-hook.ps1
no change     /home/marlen/miniforge-pypy3/lib/python3.9/site-packages/xontrib/conda.xsh
no change     /home/marlen/miniforge-pypy3/etc/profile.d/conda.csh
no change     /home/marlen/.bashrc
No action taken.
/bin/bash: línea 1: activate: No existe el archivo o el directorio

CondaError: Run 'conda init' before 'conda activate'

/bin/bash: línea 1: average_nucleotide_identity.py: orden no encontrada
/bin/bash: línea 1: average_nucleotide

After this, using the `pheatmap` library in R is a requirement to get the heatmap that was used in this paper.

In [2]:
%%R
getwd()

[1] "/home/marlen/Documentos/program/pseudomonas_notebook"


In [None]:
%%R
#Se instala
install.packages("pheatmap")

#Se ejecuta la librerìa
library(pheatmap)

#Se importa matriz desde pyAni
matrix_ani <- read.csv('ANIb_output/ANIb_percentage_identity.tab', row.names=1, sep="\t")

#Se traslada a pheatmap
pheatmap(matrix_ani)

#Se produce heatmap colorblind-friendly
pheatmap(matrix_ani, color = hcl.colors(50, "Blue-Red 3"))