# Project 2

## Scientific Question: Is it possible that M. vaccae can be used to help prevent mental health disorders such as anxiety and PTSD?

A novel anti-inflammatory molecule, 1,2,3-tri [Z-hexadecenoyl] glycerol, was derived from Mycobacterium vaccae, and is responsible for the soil-dwelling bacterium's unique anti-inflammatory properties (Smith et al., 2019).

Previous studies have shown that immunization with M. vaccae has led to the prevention of stress-induced and anti-inflammatory responses in different experimental situations. Yet, not much was known about the effects of M. vaccae in "anxiety- and fear-related defensive behavioral responses" (Smith et al., 2019).

Since the discovery of the novel anti-inflammatory molecule and its effects, many studies have shifted their focus to studying the effects of immunization with M. vaccae in stress coping behaviors and fear-conditioned responses. Moreso, there are many studies that focuses on the protective effects of M. vaccace in a mice model of tuberculosis (Gong et al., 2020). 

There are multiple studies that have studied M. vaccae and its effects. The sequence of Mycobacterium vaccae can be found in NCBI's Sequenced Read Archive. As stated on their website: 

"The Sequence Read Archive (SRA) stores raw sequence data from "next-generation" sequencing technologies including Illumina, 454, IonTorrent, Complete Genomics, PacBio and OxfordNanopores. In addition to raw sequence data, SRA now stores alignment information in the form of read placements on a reference sequence."



## Scientific Hypothesis: If M. vaccae has protective effect such as inhibiting inflammation, then the bacterium can be used to prevent mental health disorders such as anxiety and PTSD.

RNA seq analysis was conducted to analyze M. vaccae's effects by specifically looking at which genes were being differentially expressed in certain inflammatory pathways. Then, heat plots were used to display which genes were in fact being differentially expressed, and linked to either anti-inflammatory or inflammatory pathways.

To test the scientific question and hypothesis, RNA seq analysis was performed using the data from SRA. SRA contains many raw sequence data and every sequence has an accession number assigned to it for identification. The accession number corresponding to M. vaccae was retrieved and converted to an FASTQ file. From there, RNA seq analysis was conducted to check for differentially expressed genes. To perform RNA seq analysis, pyrpipe was downloaded and loaded on Jupyter notebook.



## Part 1: Load the packages

- pyrpipe: a python package specifically created for the RNA seq workflow

In [2]:
pip install pyrpipe --upgrade

Requirement already up-to-date: pyrpipe in c:\users\owner\anaconda3\lib\site-packages (0.0.5)



In [3]:
conda install -c bioconda pyrpipe

Collecting package metadata (current_repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... failed with repodata from current_repodata.json, will retry with next repodata source.
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... failed with initial frozen solve. Retrying with flexible solve.
Solving environment: ...working... 
Found conflicts! Looking for incompatible packages.
This can take several minutes.  Press CTRL-C to abort.
failed
Note: you may need to restart the kernel to use updated packages.




Building graph of deps:   0%|          | 0/4 [00:00<?, ?it/s]
Examining pyrpipe:   0%|          | 0/4 [00:00<?, ?it/s]     
Examining python=3.8:  25%|##5       | 1/4 [00:00<00:00, 15.52it/s]
Examining @/win-64::__archspec==1=x86_64:  50%|#####     | 2/4 [00:00<00:00,  6.48it/s]
Examining @/win-64::__archspec==1=x86_64:  75%|#######5  | 3/4 [00:00<00:00,  9.72it/s]
Examining @/win-64::__win==0=0:  75%|#######5  | 3/4 [00:00<00:00,  9.72it/s]          
                                                                             

Determining conflicts:   0%|          | 0/4 [00:00<?, ?it/s]
Examining conflict for pyrpipe python:   0%|          | 0/4 [00:00<?, ?it/s]
                                                                            

UnsatisfiableError: The following specifications were found to be incompatible with each other:

Output in format: Requested package -> Available versions



In [5]:
from pyrpipe import sra
from pyrpipe import mapping
from pyrpipe import assembly
from pyrpipe import qc
from pyrpipe import tools

OSError: no library called "cairo" was found
no library called "libcairo-2" was found
cannot load library 'libcairo.so.2': error 0x7e
cannot load library 'libcairo.2.dylib': error 0x7e
cannot load library 'libcairo-2.dll': error 0x7e

## Part 2: Load in the data and perform Bioinformatics Analyses (RNA Seq)

In [None]:
# SRR accessions used in the data

In [None]:
runs = ['SRR8510482', 'SRR8510481', 'SRR8510480']
workingDir = ''

In [None]:
# create SRA objects for the SRR accessions

In [None]:
sraObs = []
for x in runs:
    ob = sra.SRA(x,workingDir)

In [None]:
# to download reads from SRA to fastq

In [None]:
 if ob.download_fastq():
        sraObs.append(ob)

In [None]:
# created Trim Galore object

In [None]:
tgOptions = {"--cores": "10"}
tg = qc.Trimgalore()

In [None]:
# performed quality filtering using Trim Galore

In [None]:
for ob in sraObs:
    ob.perform_qc(tg,**Opts)

In [None]:
# created STAR aligner object and StringTie object

In [None]:
starParams = {'--outFilterType' : 'BySJout' , '--outSAMtype' : 'BAM SortedByCoordinate'}

star = mapping.Star(index = 'index')
st = assembly.Stringtie()

In [None]:
# loop

In [None]:
for ob in sraObs:
    bam = star.perform_alignment(ob,* *starParams)
    st.perform_assembly(reference_gtf = 'ref.gtf', bam)