🧪 Lesson 4: Summarize Read Counts with featureCounts
🎯 Goal - Use featureCounts to count how many reads map to each gene in Clostridium autoethanogenum.

📁 Prerequisites Recap:
You have a bam_files/ folder with aligned .bam files.
You have the Clostridium autoethanogenum genome annotation .gtf file downloaded from NCBI here.

✅ STEP 1:  the Annotation File
Download and unzip the GTF if not already done:

In [None]:
mkdir -p reference
cd reference
wget https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/484/505/GCF_000484505.2_ASM48450v2/GCF_000484505.2_ASM48450v2_genomic.gtf.gz
gunzip GCF_000484505.2_ASM48450v2_genomic.gtf.gz
cd ..

✅ STEP 2: Count Reads Using featureCounts
Here’s the optimized script to run featureCounts on all BAM files at once, using 8 threads:

In [None]:
#!/bin/bash
# File: count_reads_featureCounts.sh
# Location: rnaseq_project folder
# Description: Count gene-level reads from BAMs

mkdir -p counts

featureCounts \
  -T 8 \
  -p \
  -t gene \
  -g gene_id \
  -a reference/GCF_000484505.2_ASM48450v2_genomic.gtf \
  -o counts/gene_counts.txt \
  bam_files/*.bam

🧠 Explanation:

-T 8: use 8 threads
-p: paired-end reads
-t gene: look for “gene” features in the GTF
-g gene_id: use gene_id as count grouping
-a: path to annotation
-o: output file
bam_files/*.bam: input BAMs

Make it executable and run the script:

In [3]:
!chmod +x count_reads_featureCounts.sh
!bash count_reads_featureCounts.sh


       [44;37m =====      [0m[36m   / ____| |  | |  _ \|  __ \|  ____|   /\   |  __ \ 
       [44;37m   =====    [0m[36m  | (___ | |  | | |_) | |__) | |__     /  \  | |  | |
       [44;37m     ====   [0m[36m   \___ \| |  | |  _ <|  _  /|  __|   / /\ \ | |  | |
       [44;37m       ==== [0m[36m   ____) | |__| | |_) | | \ \| |____ / ____ \| |__| |
	  v2.1.1

||  [0m                                                                          ||
||             Input files : [36m15 BAM files  [0m [0m                                  ||
||  [0m                                                                          ||
||                           [36mSRR23426208.bam[0m [0m                                 ||
||                           [36mSRR23426209.bam[0m [0m                                 ||
||                           [36mSRR23426210.bam[0m [0m                                 ||
||                           [36mSRR23426211.bam[0m [0m                        

✅ STEP 3: Inspect Output

You should now have:

counts/gene_counts.txt: actual count matrix (tab-delimited)

counts/gene_counts.txt.summary: summary report per BAM

Check the matrix:

In [4]:
!head counts/gene_counts.txt

# Program:featureCounts v2.1.1; Command:"featureCounts" "-T" "8" "-p" "-t" "gene" "-g" "gene_id" "-a" "reference/GCF_000484505.2_ASM48450v2_genomic.gtf" "-o" "counts/gene_counts.txt" "bam_files/SRR23426208.bam" "bam_files/SRR23426209.bam" "bam_files/SRR23426210.bam" "bam_files/SRR23426211.bam" "bam_files/SRR23426212.bam" "bam_files/SRR23426213.bam" "bam_files/SRR23426214.bam" "bam_files/SRR23426215.bam" "bam_files/SRR23426216.bam" "bam_files/SRR23426217.bam" "bam_files/SRR23426218.bam" "bam_files/SRR23426219.bam" "bam_files/SRR23426220.bam" "bam_files/SRR23426221.bam" "bam_files/SRR23426222.bam" 
Geneid	Chr	Start	End	Strand	Length	bam_files/SRR23426208.bam	bam_files/SRR23426209.bam	bam_files/SRR23426210.bam	bam_files/SRR23426211.bam	bam_files/SRR23426212.bam	bam_files/SRR23426213.bam	bam_files/SRR23426214.bam	bam_files/SRR23426215.bam	bam_files/SRR23426216.bam	bam_files/SRR23426217.bam	bam_files/SRR23426218.bam	bam_files/SRR23426219.bam	bam_files/SRR23426220.bam	bam_files/SRR23426221.b

#🔁 STEP 4: Git Update - Let’s save your work so far to GitHub.
#
#run in motherfolder (BifrostOmics)
git add rnaseq_project/counts/
git add rnaseq_project/reference/GCF_000484505.2_ASM48450v2_genomic.gtf
git commit -m "Lesson 4 complete: Gene counts generated with featureCounts"
git push