RNA-seq Bioinformatics Pipelines

This repository contains two Python-based RNA-seq analysis pipelines:

Part 1: From raw FASTQ files to read counts using HISAT2 and HTSeq
Part 2: From read counts to differentially expressed genes (DEGs) using DESeq2 via R

Part 1: RNA-seq Reads Count Pipeline

Script: RNA-seq-ReadsCount.py
This pipeline processes paired-end RNA-seq FASTQ files to produce read count matrices using standard bioinformatics tools.

Steps:

Quality Control (FastQC) – Checks raw FASTQ quality
Genome Alignment (HISAT2) – Aligns reads to the reference genome
Read Counting (HTSeq-count) – Counts reads mapped to annotated genes
(Optional) PCA clustering analysis

Requirements

Conda-installed tools:

conda install -c bioconda fastqc conda install -c bioconda hisat2 conda install -c bioconda samtools conda install -c bioconda htseq

Python packages:

pip install pandas scikit-learn matplotlib seaborn

⚠️ If in a managed system, use a virtual environment:

python3 -m venv env source env/bin/activate pip install pandas scikit-learn matplotlib seaborn

How to Run

python RNA-seq-ReadsCount.py -i Fastq -o hisat-count-dir -g /path/to/hisat2_index_prefix -a /path/to/annotation.gtf

Input Files

FastQ/: Folder containing paired-end .fq.gz files (e.g., sample1_1.fq.gz, sample1_2.fq.gz) HISAT2-indexed reference genome (with .ht2 files) Gene annotation file (GTF or GFF3)

Output Files

FastQC_reports/: Quality reports per FASTQ

aligned_bam_files/: Aligned, sorted, and indexed BAM files

htseq_counts/: Gene-level read count files

merged_counts_matrix.csv: Combined count matrix

pca_plot.png: Optional PCA clustering plot

Part 2: Differential Expression Analysis (DEG)

Script: run-DEG-analysis.py

This pipeline performs differential gene expression analysis using DESeq2 via R, then generates summary files and heatmaps.

Requirements

Conda-installed tools:

conda install -c bioconda bioconductor-deseq2 conda install -c conda-forge rpy2 conda install pandas matplotlib seaborn scikit-learn

⚠️ Make sure = R versions are compatible with rpy2

How to Run

python run-DEG-analysis.py counts.txt control1,control2,control3 case1_rep1,case1_rep2,case1_rep3 ...

Input Files

• counts.txt: Gene expression matrix (rows = genes, columns = samples), which is generated by RNA-seq-ReadsCount.py

• Sample names must match across: counts.txt header, Input sample list to this script

• The first group (group1) is the control, others are test conditions to be compared with group1

• All replicates are comma-separated

Output Files

All results are saved in the DEG_results/ folder:

*_filtered.csv Filtered DEGs with padj < 0.05 and |log2FC| ≥ 1

*_unfiltered.csv All DESeq2 results

*_heatmap.png Heatmap of all filtered DEGs

summary_all_DEG.csv Combined results from all group comparisons

summary_filtered_DEG.csv Combined filtered DEGs from all comparisons

Notes

Supports any number of test groups, each compared independently to the control DESeq2 handles normalization and dispersion estimation automatically Heatmaps show expression levels of all filtered DEGs

Suggested Workflow

Run RNA-seq-ReadsCount.py to generate counts.txt
Use run-DEG-analysis.py to analyze DEGs and visualize heatmaps

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
test-data-script		test-data-script
test_data		test_data
README.envrionment.md		README.envrionment.md
README.md		README.md
RNA-seq-ReadsCount.py		RNA-seq-ReadsCount.py
envrionment.yml		envrionment.yml
new_version.py		new_version.py
run-DEG-analysis.py		run-DEG-analysis.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

RNA-seq Bioinformatics Pipelines

Part 1: RNA-seq Reads Count Pipeline

Steps:

Requirements

Conda-installed tools:

Python packages:

How to Run

Input Files

Output Files

Part 2: Differential Expression Analysis (DEG)

Requirements

Conda-installed tools:

How to Run

Input Files

Output Files

Notes

Suggested Workflow

About

Uh oh!

Releases

Packages

Languages

lipingzengGitHub/RNA-seq-Python

Folders and files

Latest commit

History

Repository files navigation

RNA-seq Bioinformatics Pipelines

Part 1: RNA-seq Reads Count Pipeline

Steps:

Requirements

Conda-installed tools:

Python packages:

How to Run

Input Files

Output Files

Part 2: Differential Expression Analysis (DEG)

Requirements

Conda-installed tools:

How to Run

Input Files

Output Files

Notes

Suggested Workflow

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages