Genome Analysis Pipeline

This repository contains scripts and tools for analyzing whole genome sequencing data using GATK4. The pipeline includes several steps, such as quality control analysis, alignment, variant calling, variant filtering, and functional analysis.

Getting Started

To get started with the genome analysis pipeline, follow these instructions:

Clone this repository: git clone https://github.com/edielam/Genome-Analysis-Pipeline.git
Install the required dependencies (see the Dependencies section for details).
Prepare your input data, including raw sequencing reads, reference genome, and known variant sites.
Customize the pipeline by updating the configuration files and script parameters according to your analysis requirements.
Execute the pipeline by running the provided scripts in the specified order.

Dependencies

The following dependencies are required to run the genome analysis pipeline:

GATK4
FastQC
BWA
Samtools
Picard
VCFtools
Variant Effect Predictor (VEP)
Python (with pandas, matplotlib, and other required libraries)

Ensure that these dependencies are installed and accessible in your environment before running the pipeline.

Usage

The pipeline is divided into several steps, each implemented in a separate script. Follow the instructions in each script to perform the specific analysis step.

QC Analysis, Alignment, and Mark Duplicates: Run qc-analysis_alignment_and_markduplicates.sh to perform quality control analysis, read alignment, and duplicate marking in a single step.
BQSR and Variant Calling: Execute bqsr_and_variant-calling.sh to perform Base Quality Score Recalibration (BQSR) and variant calling using GATK4's HaplotypeCaller.
Variant Filtering and Annotation: Use variant-filtering-and_annotation.sh to filter variant calls based on quality, coverage, and other criteria, and annotate the variants using Variant Effect Predictor (VEP).
Functional Analysis: Utilize functional_analysis.py to perform functional analysis, including gene prioritization and pathway enrichment. Use protein_sequence_generation.py for protein sequence generation.
Visualization: Use visualization_of_variants.py for generating visuals to be included in your report.

Refer to the individual scripts and their respective README files for detailed instructions on usage, inputs, and outputs.

Scripts

qc-analysis_alignment_and_markduplicates.sh: Script for performing quality control analysis, alignment, and marking duplicates.
bqsr_and_variant-calling.sh: Script for performing Base Quality Score Recalibration (BQSR) and variant calling.
variant-filtering-and_annotation.sh: Script for filtering variant calls based on quality, coverage, and other criteria, and annotating variants using VEP.
functional_analysis.py: Python script for performing functional analysis, including gene prioritization, pathway enrichment, visualization of variants, and protein sequence generation.
visualization_of_variants.py: Python script for generating visualizations of variants, such as bar charts and network plots, using annotated variant data.
protein_sequence_generation.py: Python script for generating protein sequences incorporating gene variants using Biopython.

Contributing

Contributions are welcome! If you have suggestions, bug reports, or feature requests, please open an issue or submit a pull request.

License

This project is licensed under the [MIT License]

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
Notes and Instructions		Notes and Instructions
README.md		README.md
bqsr_and_variant-calling.sh		bqsr_and_variant-calling.sh
functional_analysis.py		functional_analysis.py
protein_sequence_generation.py		protein_sequence_generation.py
qc-analysis_alignment_and_markduplicates.sh		qc-analysis_alignment_and_markduplicates.sh
variant-filtering-and _annotation.sh		variant-filtering-and _annotation.sh
visualization_of_variants.py		visualization_of_variants.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Genome Analysis Pipeline

Table of Contents

Getting Started

Dependencies

Usage

Scripts

Contributing

License

About

Releases

Packages

Languages

edielam/Genome-Analysis-Pipeline

Folders and files

Latest commit

History

Repository files navigation

Genome Analysis Pipeline

Table of Contents

Getting Started

Dependencies

Usage

Scripts

Contributing

License

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages