popgen_stats_by_gene

Scripts used to generate population genetic stats (Dxy, Fst, Pi) for each gene in an annotation file.

Dxy Scripts Order

array_make_vcf.sh #This uses vcftools to make vcfs and frq.count files for each population

array_Dxy.sh

Commands list to use: All_351_Dxy_Commands.txt OR All_351_Dxy_Commands_Un.txt

#First uses Cave_fish_Dxy.py, which takes two count files and calculates pairwise Dxy at each site in the genome #Second uses Dxy_Summary_stats.py, which takes the pairwise _Dxy.txt file generated in the first step (with Cave_fish_Dxy.py), and gives a summary table (summary_Dxy.txt) #Third uses Fixed_Differences_FASTSITES.py which takes a pairwise Dxy.txt file and finds fixed sites (fixed_differences.txt)

Use module load python2

array_Dxy_StatsFixed.sh

Commands list to use: All_351_Commands_Dxy_Summary_per_Gene_ID_gzip.txt OR All_351_Commands_Dxy_Summary_per_Gene_ID_gzip_Un.txt

#uses Dxy_Summary_per_gene_ensemblGTF.py, which takes AllScaffLengths.txt, _Dxy.txt , and Astyanax_mexicanus-2.0.ensembl101_NCBI_renamed.gtf and gives Dxy-by-gene for a pop pair

array_Dxy_small_pops.sh #This takes the commnads list ALL_Japlin_Micos_Commands.txt, which contain commands to run Cave_fish_Dxy_small_pops.py with AllScaffLengths.txt and COUNT_* files for two populations provided. (Array 1-75) #Need to use this script for smaller pops (Micos, Toro, Japlin) bc they have more missing data sites that were not filtered out from the whole data set (earlier in the pipeline I applied a filter that removed any sites with >20% missing data in any population that had 4+ samples; for Japlin and Micos n=1, for Toro n=3).

———————————————————————————————————————

Fst Scripts Order

Generate_Fst_Commands.txt #Takes list of each pair of populations (Amex_Pop_Pairs.txt) and generates Fst_Commands.txt for use in array_Fst.sh

array_Fst.sh #Commnads list to use: Fst_Commands.txt #Takes the big vcf and calculates pairwise Fst between two populations with vcftools and two sample lists (*_samples.txt )

array_sed.sh #Commnads list to use: fix_commnads.txt #Removes NAs ("-nan") in _Fst.txt.weir.fst and replaces them with 0s.

array_Fst_by_GeneID.sh #Commands list to use: Fst_by_gene_commands_gzip.txt #Uses FST_Summary_by_geneID.py, which takes AllScaffLengths.txt, a pairwise _Fst.txt.weir.fst_fixed file, and the gtf (Astyanax_mexicanus-2.0.ensembl101_NCBI_renamed.gtf), and produces the file *_Fst_by_geneID.txt for a population pair

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

popgen_stats_by_gene

Commands list to use: All_351_Dxy_Commands.txt OR All_351_Dxy_Commands_Un.txt

Use module load python2

Commands list to use: All_351_Commands_Dxy_Summary_per_Gene_ID_gzip.txt OR All_351_Commands_Dxy_Summary_per_Gene_ID_gzip_Un.txt

About

Releases 1

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
AllScaffLengths.txt		AllScaffLengths.txt
All_351_Commands_Dxy_Summary_per_Gene_ID_gzip.txt		All_351_Commands_Dxy_Summary_per_Gene_ID_gzip.txt
All_351_Dxy_Commands.txt		All_351_Dxy_Commands.txt
All_Small_Pop_Commands.txt		All_Small_Pop_Commands.txt
Amex_Pop_Pairs.txt		Amex_Pop_Pairs.txt
Cave_fish_Dxy.py		Cave_fish_Dxy.py
Cave_fish_Dxy_small_pops.py		Cave_fish_Dxy_small_pops.py
Dxy_Summary_per_gene_ensemblGTF.py		Dxy_Summary_per_gene_ensemblGTF.py
Dxy_Summary_stats.py		Dxy_Summary_stats.py
FST_Summary_by_geneID.py		FST_Summary_by_geneID.py
Fixed_Differences_FASTSITES.py		Fixed_Differences_FASTSITES.py
Fst_Commands.txt		Fst_Commands.txt
Fst_by_gene_commands_gzip.txt		Fst_by_gene_commands_gzip.txt
Generate_Fst_Commands.txt		Generate_Fst_Commands.txt
Japlin_Micos_Commands.txt		Japlin_Micos_Commands.txt
README.md		README.md
array_Dxy.sh		array_Dxy.sh
array_Dxy_small_pops.sh		array_Dxy_small_pops.sh
array_Fst.sh		array_Fst.sh
array_Fst_by_GeneID.sh		array_Fst_by_GeneID.sh
array_make_vcf.sh		array_make_vcf.sh
array_sed.sh		array_sed.sh
fix_commands.txt		fix_commands.txt
make_vcf_commands.txt		make_vcf_commands.txt

rachelmoran28/popgen_stats_by_gene

Folders and files

Latest commit

History

Repository files navigation

popgen_stats_by_gene

Commands list to use: All_351_Dxy_Commands.txt OR All_351_Dxy_Commands_Un.txt

Use module load python2

Commands list to use: All_351_Commands_Dxy_Summary_per_Gene_ID_gzip.txt OR All_351_Commands_Dxy_Summary_per_Gene_ID_gzip_Un.txt

About

Resources

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

Packages