# Install Samtools
` sudo apt-get install samtools `

# Install Cufflinks
` sudo apt-get install cufflinks`


## Samtools: Sorting and Indexing BAM Files

`Samtools` is a suite of programs for interacting with high-throughput sequencing data. It is primarily used for processing and analyzing alignment files in BAM (Binary Alignment Map) format. In this notebook, we use `samtools` for two key tasks:

1. **Sorting BAM Files**: Sorting the alignment data by genomic coordinates. This step is crucial for subsequent analysis steps, such as variant calling and visualization.
   
2. **Indexing BAM Files**: Creating an index for the sorted BAM files to allow for fast retrieval of data from specific regions of the genome.

### Commands Used

- **Sorting**: The `samtools sort` command is used to sort BAM files by genomic coordinates.
- **Indexing**: The `samtools index` command creates an index for the sorted BAM files.

### Example Usage

Here is an example of how `samtools` commands are used in this notebook:

```bash
# Sorting BAM files
samtools sort -o sorted_sample.bam unsorted_sample.bam

# Indexing BAM files
samtools index sorted_sample.bam


## Cufflinks: Transcript Assembly and Quantification

`Cufflinks` is a popular software tool used for assembling transcripts, estimating their abundances, and testing for differential expression and regulation in RNA-Seq samples. It takes alignments from RNA-Seq reads and assembles them into a parsimonious set of transcripts. `Cufflinks` is particularly useful for identifying new isoforms and quantifying known transcripts, providing a measure of gene expression levels.

### Key Functions of Cufflinks

1. **Transcript Assembly**: `Cufflinks` assembles RNA-Seq alignments into a set of transcripts, identifying both known and novel isoforms based on the reference genome and provided annotations.
   
2. **Quantification**: It estimates the expression levels of each transcript in terms of Fragments Per Kilobase of transcript per Million mapped reads (FPKM), which normalizes for transcript length and sequencing depth.

### Commands Used

- **Assembly**: The `cufflinks` command is used to assemble the transcripts from aligned reads (in BAM format) and quantify their expression levels.
- **Reference Annotations**: The `-G` option allows the use of reference annotations to guide the assembly and quantification process, ensuring accurate estimation of known transcripts.

### Example Usage

Here is an example of how `cufflinks` commands are used in this notebook:

```bash
# Assembling transcripts and quantifying gene expression
cufflinks -G /path/to/annotation.gtf -o /path/to/output/ -p 8 sorted_sample.bam


## Annotation File Selection

For this analysis, we have selected an annotation file from Ensembl for the human genome. Ensembl provides comprehensive, up-to-date annotations for various species, which include information on genes, transcripts, exons, and other genomic features.

### Annotation File Used

- **Species**: Homo sapiens (human)
- **Genome Version**: GRCh38
- **Ensembl Release**: 112
- **Annotation File**: `Homo_sapiens.GRCh38.112.gtf.gz`

### Why This File?

The chosen annotation file, `Homo_sapiens.GRCh38.112.gtf.gz`, is from Ensembl's GRCh38 release (version 112). This file includes detailed gene and transcript information, which is essential for accurate RNA-Seq analysis and quantification. Using this file ensures that our analysis aligns with the latest genomic annotations for the human genome.

### How to Download the File

To download this annotation file, you can follow these steps:

1. **Visit Ensembl**: Go to the [Ensembl website](https://www.ensembl.org).
2. **Select Species**: Choose "Human" from the list of species.
3. **Navigate to Downloads**: Click on "Download" under the "Data" section.
4. **Find the GTF File**: Look for the GTF file under the "Genes" section for the GRCh38 release.
5. **Download**: Click on the link for `Homo_sapiens.GRCh38.112.gtf.gz` to download the annotation file.

Using this annotation file, we can accurately map and quantify transcripts, providing a reliable basis for downstream analyses such as differential expression and gene regulation studies.

---

