# ***<font color="RoyalBlue">Execution of NCBI AMRfinderPlus</font>***

Detection of antimicrobial resistance genes, pathogenicity-related genes, and other genetic elements.
For details on NCBI AMRFinderPlus, please refer to the following website:
https://github.com/ncbi/amr

<br>
<br>

---

# ***<u><font color="RoyalBlue">Analysis Procedure</font></u>***

## 1. Upload Input Files

Upload FASTA files (assembled sequences) to the analysis project folder.  
You can upload files using the ⬆︎ button at the top of the left sidebar, or by dragging and dropping files into the open analysis project folder.
<br>

## 2. Create a Strain List

Create a strain list using one of the following methods:

### 2.1. Create in Jupyter Lab
- Click the "<font color="Tomato">+</font>" button at the top of the left sidebar → select "<font color="Tomato">Text file</font>" from "<font color="Tomato">Other</font>" to create a new file.
- After editing is complete, press Ctrl+s (Mac: Command+s) to save the file with a desired filename.
### 2.2. Create using Notepad, Text Editor, Vim, etc., and copy to the analysis folder.
- If created on Windows, the analysis may not work correctly due to differences in line break codes.
- Please create the file on Linux or Mac.
### 2.3. Create using the following command.
- After "<font color="Tomato">file_name = </font>", enter the list filename enclosed in double quotes (") (example: "filename")
- After "<font color="Tomato">user_input = ("""</font>", enter the strain list (one strain name per line).

<font color="Tomato">────────────── ↓↓↓ ***Execute Command*** ↓↓↓ ──────────────</font>

In [None]:
# Enter the list filename
file_name = "list_amrfinder.txt"

# Paste the strain list
user_input = ("""
A0001
A0002
A0003
A0004
A0005
A0006
A0007
A0008
"""
)

# Remove leading/trailing newlines and add a newline at the end
user_input = user_input.strip("\n") + "\n"

# Write the input to a file
with open(file_name, "w") as file:
    file.write(user_input)
!echo 'Complete!'

<font color="Tomato">────────────────────────────────────────────</font>
<br>

## 3. Execute amrfinder_batch.sh

- After "<font color="Tomato">file_name = </font>", enter the strain list filename created in the previous step, enclosed in double quotes (").
- If you need to modify any parameters listed under "<font color="Tomato">Set Parameters</font>", modify them and execute the command.
    - You can also execute the command with the default parameters.

<font color="Tomato">────────────── ↓↓↓ ***Execute Command*** ↓↓↓ ──────────────</font>

In [None]:
import subprocess

####################################################
# Set Parameters
####################################################
# Enter the strain list filename created in step 2
file_name = "list_amrfinder.txt"
# Enter the FASTA file extension
file_extension = ".fasta"
# Enter the number of threads
threads = "8"
# To use the plus feature (detects pathogenicity genes, metal resistance genes, etc., in addition to antimicrobial resistance genes),
# enter "1"; to disable, enter "0"
plus = "1"
# To specify a genus/species,
# uncomment only one from the following list.
# If the genus/species is unknown or not in the list, leave all commented (accuracy may be reduced).
# species = "Acinetobacter_baumannii"
# species = "Bordetella_pertussis"
# species = "Burkholderia_cepacia"
# species = "Burkholderia_mallei"
# species = "Burkholderia_pseudomallei"
# species = "Campylobacter"
# species = "Citrobacter_freundii"
# species = "Clostridioides_difficile"
# species = "Corynebacterium_diphtheriae"
# species = "Enterobacter_asburiae"
# species = "Enterobacter_cloacae"
# species = "Enterococcus_faecalis"
# species = "Enterococcus_faecium"
# species = "Escherichia"
# species = "Haemophilus_influenzae"
# species = "Helicobacter_pylori"
# species = "Klebsiella_oxytoca"
# species = "Klebsiella_pneumoniae"
# species = "Neisseria_gonorrhoeae"
# species = "Neisseria_meningitidis"
# species = "Pseudomonas_aeruginosa"
# species = "Salmonella"
# species = "Serratia_marcescens"
# species = "Staphylococcus_aureus"
# species = "Staphylococcus_pseudintermedius"
# species = "Streptococcus_agalactiae"
# species = "Streptococcus_pneumoniae"
# species = "Streptococcus_pyogenes"
# species = "Vibrio_cholerae"
# species = "Vibrio_parahaemolyticus"
# species = "Vibrio_vulnificus"


####################################################
# Run amrfinder_batch.sh
# **Do not modify the following**
####################################################
# Run amrfinder_batch.sh
plus_option = f"--plus " if plus == "1" else ""
species_option = f"-s {species} " if 'species' in locals() else ""

!bash amrfinder_batch.sh \
    -i $file_name \
    -e $file_extension \
    -t $threads $plus_option$species_option\
    | tee amrfinder_batch.log
!echo 'Complete!'

<font color="Tomato">────────────────────────────────────────────</font>

<br>

### 4. After Execution

After execution, a folder named `amrfinder_[date]_[time]_[list_name]` will be created. The files in the folder are as follows:

- [strain_name]_out.tsv: Result file when the "plus" option is disabled.
- [strain_name]_plus_out.tsv: Result file when the "plus" option is enabled.
- log folder: Logs are collected in this folder. Please refer to this folder if the analysis does not work as expected.

## 5. Output Format Description

The description of the columns in the result file (TSV format) generated after execution is as follows:

| Column Name | Description |
|------|------|
| **Protein id** | ID obtained from the FASTA header line of the protein or DNA sequence |
| **Contig id** | Contig name |
| **Start** | Gene start position on the DNA contig (1-based) |
| **Stop** | Gene end position on the DNA contig (1-based) |
| **Strand** | Sequence direction ("+" = forward strand, "-" = reverse strand) |
| **Element symbol** | Gene symbol or SNP definition. For point mutations, the gene symbol and mutation definition are joined with "_" |
| **Element name** | Full name of protein, RNA, or point mutation |
| **Scope** | Database scope: "core" (resistance effect expected) or "plus" (additional genes of interest) |
| **Type** | Functional category: "AMR" (antimicrobial resistance), "STRESS" (stress response), or "VIRULENCE" (pathogenicity) |
| **Subtype** | Further detailed classification of the type |
| **Class** | For AMR genes, the drug class involved in this gene |
| **Subclass** | Details within the drug class |
| **Method** | Hit type (ALLELE, EXACT, BLAST, PARTIAL, HMM, INTERNAL_STOP, POINT, etc.) |
| **Target length** | Length of query protein or gene (amino acids or nucleotides) |
| **Reference sequence length** | Length of the reference sequence in the database (if BLAST hit detected) |
| **% Coverage of reference** | Percentage of reference covered by BLAST hit (%) |
| **% Identity to reference** | Similarity to reference protein or nucleotide sequence (%) |
| **Alignment length** | Length of BLAST alignment (amino acids or nucleotides) |
| **Closest reference accession** | RefSeq accession number of the BLAST hit |
| **Closest reference name** | Full name of the reference sequence of the BLAST hit |
| **HMM accession** | HMM accession number (for HMM search), NA otherwise |
| **HMM description** | Family name associated with HMM (for HMM search), NA otherwise |