# <center>Exploring Antimicrobial Resistance (AMR) genes within wild and domestic animal populations</center>


**Paper**: Skarżyńska M, Leekitcharoenphon P, Hendriksen RS, Aarestrup FM, Wasyl D (2020) A metagenomic glimpse into the gut of wildand domestic animals: Quantification of antimicrobial resistance and more. PLoS ONE 15(12):e0242987. https://doi.org/10.1371/journal.pone.0242987

Background information...

<center><div style="max-width:800px">
    
![image.png](attachment:image.png)    
    
</div></center>


Learning Objectives...

Learning the basics of bioinformatics

Understanding Quality Control metrics

Analyzing, comparing, and interpreting microbiome taxonomic composition

Using databases to characterize antimicrobial resistant genes within metagenomic samples



## <center>Bioinformatic Workflow</center> 

<center><div style="max-width:250px">
    
![image.png](attachment:image.png)
    
</div></center>

## <center>Part 0A: Sign into the BV-BRC and access exercise material</center>

At this point in time, you should have already created an account on the [BV-BRC website](https://www.bv-brc.org/). If you haven't, please do so now!

You can access all the necessary material through this [workspace](https://www.bv-brc.org/workspace/jsheriff@bvbrc/BIOS450_AMR_Exercise). You must be signed in **first** to access the workspace. Please let us know if any issues arise.


### Copying the sequences to your own workspace

While you can access publicly-available workspaces, you should copy the "Exercise Material" folder over to your own workspace **first** before beginning your analyses. You can do this by clicking the folder and selecting "Copy" on the sidebar found on the right-hand side of your screen.

<center><div style="max-width:800px">
    
![image.png](attachment:image.png)
    
</div></center>

**Next**, you will need to select which folder/workspace you would like as the destination for the copied folder. This destination could be your home workspace, or a workspace you created for this assignment.

<center><div style="max-width:800px">
    
![image.png](attachment:image-4.png)
    
</div></center>

If you decide not to choose any folder shown on the Copy menu, the copied folder will automatically be placed in your home workspace.

<center><div style="max-width:800px">
    
![image.png](attachment:image-3.png)
    
</div></center>

## <center>Part 0B: FASTQ Utilies: Quality control and trimming</center>

Before starting any bioinformatic analysis, it is essential to understand the quality of the FASTQ reads you intend to use. Quality control and read trimming are necessary steps used to remove low-quality reads, adapter sequences, and contaminants prior to downstream analyses. Doing so can ensure the presence of only high-quality sequence data, allowing for more accurate and reliable results.  

### FASTQ Utilities Service

Quality control and trimming of your sequence data can be performed with the FASTQ Utilities Service, which is accessed under the Tools and Services drop-down menu, within the Utilities section. 

<center><div style="max-width:800px">
    
![image.png](attachment:image.png)
    
</div></center>

#### Parameters

Under this section, you can select the **Output Folder** where you would like the FastQ Utilities Service to upload your results. Under **Output Name**, you should choose a unique title to help you distinguish the results of this service from other results you will recieve later down the pipeline. 

#### Pipeline

The FastQ Utilities Service allows us to run multiple pipelines under a single job submission, rather than submitting multiple jobs for each pipeline seperately. The pipelines can be selected under the drop-down menu and will be initiated in the order that they are selected. It is important to select each pipeline (by choosing your option from the menu and clicking the **add** button) in the following order:

1. Paired_Filter: Used to ensure only paired reads are maintained within the dataset, helping to maintain synchronicity between paired-end sequence files.

2. FastQC: Uses the [FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc) program to perform quality checks on raw sequencing data from high throughput sequencing pipelines.

3. Trim: Uses [Trim Galore](http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/) to find and remove adapter sequences from your raw reads.

4. FastQC: Performs quality checks on your **trimmed** sequencing data for comparison with the intitial raw reads. 

#### Paired read library

Here, you will select the sequence data you will be performing the above pipelines on. Each sample contains one forward (R1) and one reverse (R2) paired-end read. Forward reads should be selected under **Read File 1**, and reverse reads should be selected under **Read File 2**. Once completed, click the **arrow** button in the right-hand corner to add these files to your **Selected Libraries**. Repeat this for all four samples and hit **Submit** once complete. 

### Results



## <center>Part 1: Taxonomic Classification Service (TCS)</center>

The BV-BRC taxonomic classification service is a useful tool for exploring the microbial composition of metagenomic samples. With this, we can compare the relative abundance of taxa accounting for at least 1% of the total read hits between the various domestic and wild animal populations within the study. Additionally, the TCS will return the quality control metrics for the raw reads of each sample and provide us insight into the structure of each microbial community based on alpha and beta-diversity metrics. 

<center><div style="max-width:800px">
    
![image.png](attachment:857f13d0-dc0c-42e9-8dc5-cd7be93ca4ac.png)

</div></center>

#### Input File 

To use this service, raw FASTQ files must be entered as input files, either as single or paired-end reads, or they can be accessed directly from the NCBI database with an SRA Accession number. Regardless, of the input type, you should choose a sample identifier name that will easily distinguish each sample from one another. 

Before a second sample can be added, you will need to click the **arrow** in the top-right corner to add your input file to the **Selected Libraries**.

<center><div style="max-width:800px">
    
![image.png](attachment:e6cc0a29-0bc4-4732-9e81-378053493cbe.png)

</div></center>

#### Parameters

Adjusting the parameters prior to submitting the job allows us to control **how** the service should analyze our samples. Since we are working with metagenomic samples. **Whole Genome Sequencing** should be selected under **Sequencing Type**. We will perform a **Microbiome Analysis** and will use the **BV-BRC Database** as our reference database. Filtering host reads is optional, but it may be useful to filter our **Homo sapien** reads from your dataset. The **Confidence Interval** should be set to **0.1**, and the **Output Folder** and **Output Name** should be selected appropriately. Hit submit once complete.

### Job Results

The status of your job submmission can be viewed by either clicking the **Jobs** tab in the bottom-right corner of your screen or by clicking **My Jobs** under the **Workspaces** drop-down menu. 

<center><div style="max-width:800px">
    
![image.png](attachment:b27f8700-c82f-4c80-b557-ca0161faec75.png)

</div></center>

#### Raw read quality scores

While the FASTQ utilities service is a tool developed specifically for the processing of raw sequencing data, the quality control metrics for each sample can also be viewed within their respective **folder**. After accessing the sample folder, you can view the QC reports by selecting the **fastqc_results** folder and choosing either the **host_removed_reads** or the **raw_reads** folder. All job results that can be viewed directly within your web browser will be saved under a **.html** extension. 

<center><div style="max-width:800px">

![image.png](attachment:55eb1d19-92d1-433a-83a0-651d835c4f3c.png)

</div></center>

Before getting too deep into your analysis, it's important to first ensure the quality of your data is good enough to provide you with reliable and accurate insights into your microbiome sample. To do this, you should view the **per base sequence quality** graphic for your forward and reverse (e.g., R1 and R2) reads.  

This report gives you the average (the blue line), median (the red line), and overall distribution (the yellow box plot) of quality scores (the y-axis) for all of the reads within your files at each nucleotide position (the x-axis). Having a "good" quality score, or being within the "green zone" of values higher than 28, can be interpreted as a nucleotide position having a low probability of representing a sequencing error (a good thing!). It is normal for quality scores to drop slightly near the end of your reads, but as long as the **average** quality scores remain above 28, there is no need to make any adjustments to your raw reads before continuing your bioinformatic workflow. 

## <center>&#128187; Task 1: TCS Data Visualization</center>

While there are a multitude of informative outputs provided by the TCS, the **Taxonomic-Classification-Service-BVBRC_multiqc_report.html** file contains interactive plots and diagrams used to illustrate the taxonomic composition, quality, and general statistics of your metagenomic samples. For your first task, you should access this document and:

> 1) Using the results from the **Bracken** computational tool, create a **stacked bar graph** that illustrates the **percent abundance** (or the relative abundance) of the top 5 **phyla** for each sample. You can save this graph by using the **Export Plot** button to download your graph as a PNG file.
> 2) For **each sample**, identify the top **bacterial** phyla and include its **percent abundance**.
>> NOTE: *Chordata* is not a bacterial phyla!

## <center>Part 2: Metagenomic Read Mapping Service (MRMS)</center>

The MRMS is a valuable resource for researchers interested in identifying antimicrobial resistance genes present within metagenomic samples. To do this, the MRMS uses [k-mer alignment](https://bmcbioinformatics.biomedcentral.com/articles/10.1186/s12859-018-2336-6) to align sequence data to reference genes within the [Comprehensive Antibiotic Resistance Database (CARD)](https://pubmed.ncbi.nlm.nih.gov/31665441/). Not only can this service determine the number of different AMR genes present within a metagenomic sample, but it can also provide insights into the abundance of each gene based on the mapped sequencing **depth** (e.g., the number of reads that align with a specific AMR reference gene).

<center><div style="max-width:800px">

![image.png](attachment:1d965a08-5f06-410d-a1ee-a7701d4c5d94.png)

</div></center>

#### Input File 

Similar to the TCS, raw FASTQ files must be entered as input files, either as single or paired-end reads, or they can be accessed directly from the NCBI database with an SRA Accession number. However, an MRMS job submission should only consist of **one** sample at a time since this service will survey all metagenomic reads collectively from every sample listed within the **selected libraries**.

#### Parameters

For the parameters, **Predefined List** should be indicated under **Gene Set Type**, and **CARD** should be chosen as the **Predefined Gene Set Name**. Same as the previous analysis, **Output Folder** and **Output Name** should be selected appropriately. Hit submit once complete.

### Job Result

To simplify your learning experience using this BV-BRC resource, the original MRMS outputs have been converted to CSV files that can be easily interpreted and viewed directly within the BV-BRC Workspace. The files can be found within the **MRM_CSV_FILES** folder with the rest of the **Exercise Material**.

## <center>&#128187; Task 2: Comparing the AMR gene population between wild and domestic animals</center>



# PART 3: Find and analyze your own metagenomic sample

Cat
Dog
Ferritt
Horse
Deer
Fox
Boar
Pigs
Rats
Mice
Monkey
Raccoons
Opossums
Rabbits
Bears
Rhino
Zebra
Elephants
Armadillo
Sheep
Capibara
Cheetah
Chimpanzees
Gorilla
Panda
Marmaset
Kangaroos
Red Panda
Squirrels
Somali wild ass


## Resources

Skarżyńska M, Leekitcharoenphon P, Hendriksen RS, Aarestrup FM, Wasyl D(2020) A metagenomic glimpse into the gut of wildand domestic animals: Quantification of antimicrobial resistance and more. PLoS ONE 15(12):e0242987. https://doi.org/10.1371/journal.pone.0242987

Wattam, A. R., Bowers, N., Brettin, T., Conrad, N., Cucinell, C., Davis, J. J., Dickerman, A. W., Dietrich, E. M., Kenyon, R. W., Machi, D., Mao, C., Nguyen, M., Olson, R. D., Overbeek, R., Parrello, B., Pusch, G. D., Shukla, M., Stevens, R. L., Vonstein, V., & Warren, A. S. (2024). Comparative Genomic Analysis of Bacterial Data in BV-BRC: An Example Exploring Antimicrobial Resistance. In J. C. Setubal, P. F. Stadler, & J. Stoye (Eds.), Comparative Genomics (Vol. 2802, pp. 547–571). Springer US. https://doi.org/10.1007/978-1-0716-3838-5_18

Andrews S. (2010). FastQC: a quality control tool for high throughput sequence data. Available online at: http://www.bioinformatics.babraham.ac.uk/projects/fastqc

Krueger, F. (2012). Trim Galore: a wrapper tool around Cutadapt and FastQC to consistently apply quality and adapter trimming to FastQ files, with some extra functionality for MspI-digested RRBS-type (Reduced Representation Bisufite-Seq) libraries. URL http://www.bioinformatics.babraham.ac.uk/projects/trim_galore/

Clausen, P.T., F.M. Aarestrup, and O. Lund, Rapid and precise alignment of raw reads against redundant databases with KMA. BMC bioinformatics, 2018. 19(1): p. 307.

Alcock, B.P., et al., CARD 2020: antibiotic resistome surveillance with the comprehensive antibiotic resistance database. Nucleic acids research, 2020. 48(D1): p. D517-D525.