Here you will find a Snakemake pipeline to get dN/dS from Sorghum pan-transcriptome data

Requirements

Software

Snakemake version xx
MAFFT version xx
pal2nal version xx
CODEML version xx
R version xx

Files

Sample file with a list of orthgroups to be analyzed (the orthogroups follow this pattern OGXXXXXXX)
CDS fasta for pan-transcriptome ( ./../data/cds_pergenotype/all.cds.fasta)
Proteins orthogroups fasta (from orthofinder: OGXXXXXXX.fa)
Pan-transcriptome classification table (in this case ../../sorghumxxxxx/xxxx/panTranscriptomeClassificationTable_I2.7.tsv )
Template configuration file for CODEML: template_CODEML.ctl

Overview

This repository aims to calculate the Non-synonymous to Synonymous (dN/dS) mutation ratio across various elements of the sorghum pan-transcriptome. The process involves several key steps:

Protein Orthogroups Identification: We begin by utilizing the protein orthogroups identified in the sorghum pan-transcriptome.
Protein Alignment (MAFFT): For each orthogroup, we perform protein alignment using MAFFT.
Nucleotide Alignment (pal2nal): Using the CDS sequences and protein ids in each orthogroup we build a cds fasta for each orthogrouo (ensuring identical IDs between CDS and orthogroups), with the protein alignment and the correspondant cds fasta we generate a protein-informed nucleotide alignment through the pal2nal tool.
dN/dS Calculation (CODEML): Employing CODEML with runmode = -2, we calculate the dN/dS ratio from the informed nucleotide alignment.
Data Aggregation: We aggregate the results into a table, providing dN/dS values for each orthogroup and classifying them into pan-transcriptome categories (hard-core, soft-core, exclusive, and accessory).
Result Visualization: Finally, an R script is employed to generate visualizations, enhancing the interpretability of the calculated dN/dS ratios.

Feel free to explore the code and adapt the workflow based on your specific needs. Contributions and suggestions are welcome!

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
code		code
results		results
run_pipeline		run_pipeline
README.MD		README.MD
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

code

code

results

results

run_pipeline

run_pipeline

README.MD

README.MD

README.md

README.md

Repository files navigation

Here you will find a Snakemake pipeline to get dN/dS from Sorghum pan-transcriptome data

Requirements

Software

Files

Overview

About

Releases

Packages

Contributors 2

Languages

labbces/dNdS

Folders and files

Latest commit

History

Repository files navigation

Here you will find a Snakemake pipeline to get dN/dS from Sorghum pan-transcriptome data

Requirements

Software

Files

Overview

About

Resources

Stars

Watchers

Forks

Languages