Unlocking Genomic Insights: Efficient Population Analysis using Snakemake

This GitHub repository includes steps to run snakemake file that help in performing population genomics analysis using whole genome resequencing data. Here, you'll find Snakemake file designed to parallelize the steps involved in these analyses, making the workflow more efficient and speeding up the process.

Current Workflow: The current workflow covers the entire process from short read data processing to the establishment of population structure i.e. PCA (Principal Component Analysis) plotting. This setup uses Snakemake workflow language to streamline and automate these tasks.

Future Updates: Stay tuned for future updates, which will include additional analyses such as Fst analysis, admixture analysis, and more. Feel free to explore the repositories, and don't hesitate to reach out if you have any questions or suggestions!

This snakmake workflow was created using commands from Elahe Parvizi's GitHub repositories.

FOR SNAKEMAKE RUN

Step 1: Project Folder

Create a project folder and give it a meaningful Project_ID.

mkdir <project-id>

Step 2: Copy Files into Project Folder

Copy the following files into the project folder:

Snakefile
Stats.R
config.yaml

Step 3: Create a Sub-folder "01_Data"

Inside the project folder, create a sub-folder named 01_Data.

mkdir <project-id>/01_Data

Step 4: Copy Sample Files to 01_Data

Copy the sample files into the 01_Data folder.

cp *.fq <project-id>/01_Data

Ensure that the fastq files are named according to this pattern. Ex: Featherston_01.fastq, Featherston_02.fastq, Mosburn_01.fastq

Explanation of Example

To help you understand how to label the files correctly:

Each file should have a name followed by an underscore and a two-digit number.
The name represents a specific population or sample, such as "Featherston" or "Mosburn".
The two-digit number distinguishes different files from the same population or sample.

Step 5: Use Config File to Add Additional Information

Utilize the config.yaml file to add any additional information required for the workflow.

config.yaml content for snakemake workflow (Example file)

##### Sequencing platform info (mostly keep this constant)
PL: "Illumina"
PM: "HISEQ"

##### Assign threads 
THREADS: 16

##### Provide path to reference file (Ensure reference is indexed using BWA index command and available in path provided)
fasta_path: /path/to/.fasta

######## Variant calling filter parameters ########
min_MQ: 20
min_BQ: 20
vcf_name: 'M_aethio_MOSS_LIN_FEA' #Used to assign names to output files generated in most steps
MAF: 'MAF > 0.05'

######## PLINK parameters #################
GENO: 0.1

Step 6: Run Snakemake

Navigate to the project folder in your terminal.

Perform dry-run to test the script using "-n" flag:

snakemake --configfile=config.yaml --cores 8 -n

Proper execution use the following:

snakemake --configfile=config.yaml --cores 8

Name		Name	Last commit message	Last commit date
Latest commit History 21 Commits
README.md		README.md
Snakefile		Snakefile
Stats.r		Stats.r
config.yaml		config.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Unlocking Genomic Insights: Efficient Population Analysis using Snakemake

FOR SNAKEMAKE RUN

Step 1: Project Folder

Step 2: Copy Files into Project Folder

Step 3: Create a Sub-folder "01_Data"

Step 4: Copy Sample Files to 01_Data

Explanation of Example

Step 5: Use Config File to Add Additional Information

config.yaml content for snakemake workflow (Example file)

Step 6: Run Snakemake

About

Releases

Packages

Languages

meeranhussain/Population_genomic_analysis

Folders and files

Latest commit

History

Repository files navigation

Unlocking Genomic Insights: Efficient Population Analysis using Snakemake

FOR SNAKEMAKE RUN

Step 1: Project Folder

Step 2: Copy Files into Project Folder

Step 3: Create a Sub-folder "01_Data"

Step 4: Copy Sample Files to 01_Data

Explanation of Example

Step 5: Use Config File to Add Additional Information

config.yaml content for snakemake workflow (Example file)

Step 6: Run Snakemake

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages