## Return of the owl: How many reads map to a gene?
![image.png](attachment:image.png)


Now that we have our reads aligned to the genome, we want to get a sense of where these reads are mapping. Specifically, at a gene level do we see more or less reads. This can give us a sense of over- or under-expression at a gene level. To do this we can count the number of reads that map to each gene. One way to do this is with the **featureCounts** tool.

Try to find the user documentation for featureCounts. Googling "user manual featurecounts" should help. If not you might find [this](http://manpages.org/featurecounts) helpful. You can also find the documentation in a larger package called subread. [Subread user guide](http://bioinf.wehi.edu.au/subread-package/SubreadUsersGuide.pdf)

To begin as usual, let's check that you have featureCounts installed properly:

`which featureCounts` --> This should yield `~/miniconda3/bin/featureCounts`

If featureCounts is not installed (though it should be - see [installations](https://github.com/jvtalwar/2021-MSTP-Bioinformatics-Bootcamp/tree/master/Day_0_Setup/Installations) section 4.4), install it now with conda:

`conda install -c bioconda subread`

Refer to the subread manual to find the command you want to run. Scroll down in the manual to section 6: Read summarization. This package also contains an aligner, but as you hopefully recall we used STAR instead. So we will skip that part and only use the part of the package important for quantification of reads mapping to genes.

Discuss the arguments/flags you can use. Maybe break into groups of 2 or 3 (depending on how easy this is to do over Zoom)? 

**Hint:** Notice you can count multiple files at the same time! Use this to your advantage to "count" all the bam files at once. Try not to look below until after discussion.

### Write a .sh script with your command. We will use 1 node, 2 processors per node, and a 1 hour walltime

Put this output, which is a form of processed data somewhere meaningful (i.e., where YOU know it is)! Some suggestions include making a new folder called featureCounts (I'll be doing that), or you can simply include it in another folder with featureCounts in the filename. You have complete discretion here, but make sure you understand your setup/organizational system.

Here is my completed script below:

 - **Note:** Here I have given specific paths for my output and error files. This is an organizational choice. You can also simply run this with just _featureCounts.out_ and _featureCounts.err_.

`#!/bin/bash`<br>
`#PBS -q hotel`<br>
`#PBS -N FeatureCountDooku`<br>
`#PBS -l nodes=1:ppn=2`<br>
`#PBS -l walltime=1:00:00`<br>
`#PBS -o /home/ucsd-train##/out/featureCounts.out`<br> 
`#PBS -e /home/ucsd-train##/err/featureCounts.err`<br>

`featureCounts -a ~/scratch/annotations/hg19/gencode.v19.annotation.gtf -G ~/scratch/annotations/hg19/allchrom.fa -o ~/scratch/featureCounts/hangauer.results.counts ~/scratch/star_alignment/*sorted.bam
`