qc html #195

jasonwjohns · 2024-05-16T19:49:52Z

Hey guys,

It's your favorite noob here. I took the filtered.vcf file from the snpArcher output, pruned it for LD using plink, and now I'm trying to feed it back through the qc module to re-produce the nice QC figures you guys coded. If all else fails I can produce my own figures, but your workflow is much cleaner than anything I would write.

I changed the Snakefile a bit, as you can see below. Namely, I deleted the subsample_snps rule and adjusted the plink rule to call the right file.

The qc module runs well until it gets to producing the maps, which is the output I'm most interested in at this point. My workingdir has the config and results directories in it, and the results directory has both the data and summary_stats directories. My command line and output with the error message is below. If I comment out lines 531-573 on the qc_dashboard_interactive.Rmd file the module runs to completion, but neither map is produced.

Thank you in advance for any help you can give!
Jason

`
(snparcher) johns@ qc2 % snakemake -s /Users/johns/snpArcher/workflow/modules/qc2/Snakefile -d . --profile /Users/johns/snpArcher/profiles/default --cores 22
Using profile /Users/johns/snpArcher/profiles/default for setting default command line arguments.
Assuming unrestricted shared filesystem usage.
Building DAG of jobs...
Your conda installation is not configured to use strict channel priorities. This is however crucial for having robust and correct environments (for details, see https://conda-forge.org/docs/user/tipsandtricks.html). Please consider to configure strict priorities by executing 'conda config --set channel_priority strict'.
Using shell: /opt/homebrew/bin/bash
Provided cores: 22
Rules claiming more threads will be scaled down.
Job stats:
job count

all 1
qc_plots 1
total 2

Select jobs to execute...
Execute 1 jobs...

[Thu May 16 12:06:54 2024]
localrule qc_plots:
input: results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.eigenvec, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.eigenval, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.idepth, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.dist, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.dist.id, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.king, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.imiss, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.3.Q, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.2.Q, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia_fai_tmp.txt, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.bed, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.bim, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.fam, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia_bam_sumstats.txt, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.FILTER.summary, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.het, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.fna.fai, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.coords.txt
output: results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia_qc.html
jobid: 1
reason: Updated input files: results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.coords.txt
wildcards: refGenome=20240328.dmAquExim1.NCBI.hap2, prefix=32-Aquilegia
resources: mem_mb=2000, mem_mib=1908, disk_mb=1000, disk_mib=954, tmpdir=/var/folders/xn/7zzgchrs3n749xgsx_qxz6vm0000gt/T, mem_mb_reduced=1800

Activating conda environment: .snakemake/conda/e70e9ce0dad747af639b891b79732a30_

processing file: qc_dashboard_interactive.Rmd
|............................................... | 92% [unnamed-chunk-15]
Quitting from lines 531-575 [unnamed-chunk-15] (qc_dashboard_interactive.Rmd)
Error in is.finite():
! default method not implemented for type 'list'
Backtrace:

plotly::ggplotly(...)
plotly:::ggplotly.ggplot(...)
plotly::gg2list(...)
plotly (local) ggplotly_build(p)
base::lapply(data, ggfun("scales_transform_df"), scales = scales)
...
ggplot2 (local) f(..., self = self)
base::lapply(df[aesthetics], self$transform)
ggplot2 (local) FUN(X[[i]], ...)
ggplot2 (local) f(..., self = self)
ggplot2:::check_transformation(x, new_x, self$scale_name, axis)

Execution halted
RuleException:
CalledProcessError in file /Users/johns/snpArcher/workflow/modules/qc2/Snakefile, line 172:
Command 'source /Applications/miniforge3/bin/activate '/Users/johns/temp/ccgp_popgen/qc2/.snakemake/conda/e70e9ce0dad747af639b891b79732a30_'; set -euo pipefail; Rscript --vanilla /Users/johns/temp/ccgp_popgen/qc2/.snakemake/scripts/tmppnkqgn4z.qc_dashboard_render.R' returned non-zero exit status 1.
[Thu May 16 12:07:20 2024]
Error in rule qc_plots:
jobid: 1
input: results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.eigenvec, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.eigenval, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.idepth, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.dist, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.dist.id, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.king, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.imiss, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.3.Q, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.2.Q, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia_fai_tmp.txt, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.bed, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.bim, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.fam, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia_bam_sumstats.txt, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.FILTER.summary, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.het, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.fna.fai, results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia.coords.txt
output: results/20240328.dmAquExim1.NCBI.hap2/QC/32-Aquilegia_qc.html
conda-env: /Users/johns/temp/ccgp_popgen/qc2/.snakemake/conda/e70e9ce0dad747af639b891b79732a30_

Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: .snakemake/log/2024-05-16T120653.069723.snakemake.log
WorkflowError:
At least one job did not complete successfully.

`

Altered Snakefile:
`
configfile: "config/config.yaml"
include: "common.smk"

samples = snparcher_utils.parse_sample_sheet(config)
REFGENOME = samples['refGenome'].unique().tolist()

rule all:
input:
expand("results/{refGenome}/QC/{prefix}_qc.html", refGenome=REFGENOME, prefix=config['final_prefix'])

rule check_fai:
"""
checks fai file for numeric first column, then do not run plink and rest of workflow if they are all numeric
"""
input:
vcf = "results/{refGenome}/{prefix}_raw.vcf.gz",
fai = "results/{refGenome}/data/genome/{refGenome}.fna.fai",
output:
faiResult = "results/{refGenome}/QC/{prefix}_fai_tmp.txt"
resources:
mem_mb = lambda wildcards, attempt: attempt * 1000
run:
check_contig_names(input.fai, output.faiResult)

rule vcftools_individuals:
input:
vcf = "results/{refGenome}/{prefix}_raw.vcf.gz"
output:
depth = "results/{refGenome}/QC/{prefix}.idepth",
miss = "results/{refGenome}/QC/{prefix}.imiss",
samps = "results/{refGenome}/QC/{prefix}.samps.txt",
summ = "results/{refGenome}/QC/{prefix}.FILTER.summary",
het = "results/{refGenome}/QC/{prefix}.het"
conda:
"envs/vcftools_individuals.yml"
params:
prefix = lambda wc, input: os.path.join(input.vcf.rsplit("/", 1)[0], "QC", wc.prefix),
min_depth = config["min_depth"]
log:
"logs/{refGenome}/QC/vcftools_individuals/{prefix}.txt"
resources:
mem_mb = lambda wildcards, attempt: attempt * 4000
shell:
"""
vcftools --gzvcf {input.vcf} --FILTER-summary --out {params.prefix} &> {log}
vcftools --gzvcf {input.vcf} --out {params.prefix} --depth &>> {log}
vcftools --gzvcf {input.vcf} --out {params.prefix} --het &>> {log}
vcftools --gzvcf {input.vcf} --out {params.prefix} --missing-indv &>> {log}
tail -n +2 {output.depth} | awk '$3>{params.min_depth} {{print $1}}'> {output.samps} 2>> {log}
"""

rule plink:
"""
Call plink PCA.
"""
input:
vcf = "results/{refGenome}/QC/{prefix}_filtered.vcf.gz",
faiResult = "results/{refGenome}/QC/{prefix}_fai_tmp.txt"
params:
prefix = lambda wc, input: input.vcf.replace("_filtered.vcf.gz", "")
output:
bed = "results/{refGenome}/QC/{prefix}.bed",
bim = "results/{refGenome}/QC/{prefix}.bim",
fam = "results/{refGenome}/QC/{prefix}.fam",
eigenvec = "results/{refGenome}/QC/{prefix}.eigenvec",
eigenval = "results/{refGenome}/QC/{prefix}.eigenval",
dist = "results/{refGenome}/QC/{prefix}.dist",
distid = "results/{refGenome}/QC/{prefix}.dist.id",
king = "results/{refGenome}/QC/{prefix}.king"
conda:
"envs/plink.yml"
log:
"logs/{refGenome}/QC/plink/{prefix}.txt"
resources:
mem_mb = lambda wildcards, attempt: attempt * 2000
shell:
#plink 2 for king relatedness matrix (robust to structure) and plink 1.9 for distance matrix
"""
plink2 --vcf {input.vcf} --pca 10 --out {params.prefix} --allow-extra-chr --autosome-num 95 --make-bed --make-king square --const-fid --bad-freqs &> {log}
plink --vcf {input.vcf} --out {params.prefix} --allow-extra-chr --autosome-num 95 --distance square --const-fid &>> {log}
"""

rule setup_admixture:
"""
admixture requires all chromosome names to be integers, this sets them to be 1:n
"""
input:
bim = "results/{refGenome}/QC/{prefix}.bim",
fai = "results/{refGenome}/data/genome/{refGenome}.fna.fai",
output:
bim = "results/{refGenome}/QC/{prefix}.bim_fixed",
bim_back = "results/{refGenome}/QC/{prefix}.bim.orig"
script:
"scripts/contigs4admixture.py"

rule admixture:
"""
Call Admixture. First, make a bim file that has no charecters in the chromosomes
"""
input:
bed = "results/{refGenome}/QC/{prefix}.bed",
bim = "results/{refGenome}/QC/{prefix}.bim",
fam = "results/{refGenome}/QC/{prefix}.fam",
bim_fixed = "results/{refGenome}/QC/{prefix}.bim_fixed",
bim_back = "results/{refGenome}/QC/{prefix}.bim.orig"
output:
admix = "results/{refGenome}/QC/{prefix}.3.Q",
admix2 = "results/{refGenome}/QC/{prefix}.2.Q"
params:
outdir = lambda wc, input: input.bed.rsplit("/", 1)[0]
log:
"logs/{refGenome}/QC/admixture/{prefix}.txt"
resources:
mem_mb = lambda wildcards, attempt: attempt * 4000
conda:
"envs/admixture.yml"
shell:
"""
mv {input.bim_fixed} {input.bim} 2> {log}

    admixture {input.bed} 2 &>> {log}
    admixture {input.bed} 3 &>> {log}

    mv "{wildcards.prefix}".2.* {params.outdir} &>> {log}
    mv "{wildcards.prefix}".3.* {params.outdir} &>> {log}
    """

rule generate_coords_file:
output:
"results/{refGenome}/QC/{prefix}.coords.txt"
run:
out_df = samples.loc[(samples['refGenome'] == wildcards.refGenome)][["BioSample", "long", "lat"]]
out_df.drop_duplicates("BioSample", inplace=True)
out_df.dropna(subset=["long", "lat"], thresh=1, inplace=True)
out_df.to_csv(output[0], index=False, sep="\t", header=False)

rule qc_plots:
"""
Call plotting script
"""
input:
eigenvec = "results/{refGenome}/QC/{prefix}.eigenvec",
eigenval = "results/{refGenome}/QC/{prefix}.eigenval",
depth = "results/{refGenome}/QC/{prefix}.idepth",
dist = "results/{refGenome}/QC/{prefix}.dist",
distid = "results/{refGenome}/QC/{prefix}.dist.id",
king = "results/{refGenome}/QC/{prefix}.king",
miss = "results/{refGenome}/QC/{prefix}.imiss",
admix3 = "results/{refGenome}/QC/{prefix}.3.Q",
admix2 = "results/{refGenome}/QC/{prefix}.2.Q",
faiResult = "results/{refGenome}/QC/{prefix}_fai_tmp.txt",
bed = "results/{refGenome}/QC/{prefix}.bed",
bim = "results/{refGenome}/QC/{prefix}.bim",
fam = "results/{refGenome}/QC/{prefix}.fam",
sumstats = "results/{refGenome}/QC/{prefix}_bam_sumstats.txt",
summ = "results/{refGenome}/QC/{prefix}.FILTER.summary",
het = "results/{refGenome}/QC/{prefix}.het",
fai = "results/{refGenome}/QC/{prefix}.fna.fai",
coords = get_coords_if_available
params:
prefix = lambda wc, input: input.het[:-4],
nClusters = config['nClusters'],
GMKey = config['GoogleAPIKey']
resources:
mem_mb = lambda wildcards, attempt: attempt * 2000
output:
qcpdf = "results/{refGenome}/QC/{prefix}_qc.html"
conda:
"envs/qc.yml"
script:
"scripts/qc_dashboard_render.R"
`

The text was updated successfully, but these errors were encountered:

cademirch · 2024-05-17T16:42:06Z

Hi Jason,

Happy to help - for the future, could you please wrap your code/errors in triple back ticks to make a code block? It makes it much easier to read :)

As for the error, I'm not super sure. It may be easiest if you can supply the pruned VCF and other files so I can try to run the workflow and diagnose. Lmk if that would work for you.

jasonwjohns · 2024-05-17T17:49:47Z

Hi Cade, I will *absolutely* do that. Thank you for passing along that trick, and I'm sorry that was such a mess to read. I *think I dropped all of the necessary directories/files in this Google Drive folder <https://drive.google.com/drive/folders/1HzLVYrrwoeEovvaixlpxeBEhIKlBTeNL?usp=drive_link>. As you'll recognize, the 'qc2' directory comes from snpArcher/workflow/modules/qc2. Of course, let me know if I missed anything! Thank you so much! Jason

…

On Fri, May 17, 2024 at 9:42 AM Cade Mirchandani ***@***.***> wrote: Hi Jason, Happy to help - for the future, could you please wrap your code/errors in triple back ticks to make a code block <https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/creating-and-highlighting-code-blocks#fenced-code-blocks>? It makes it much easier to read :) As for the error, I'm not super sure. It may be easiest if you can supply the pruned VCF and other files so I can try to run the workflow and diagnose. Lmk if that would work for you. — Reply to this email directly, view it on GitHub <#195 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AIYTVHEMHI6BKMZMSW4SKL3ZCYXPHAVCNFSM6AAAAABH23OCKKVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMJXHE4DGNRZGY> . You are receiving this because you authored the thread.Message ID: ***@***.***>

-- *Postdoctoral Researcher* One People One Reef <https://onepeopleonereef.org/> - Bernardi Lab, UC Santa Cruz California Conservation Genomics Project <https://www.ccgproject.org/> - Hodges Lab, UC Santa Barbara

cademirch · 2024-05-17T17:55:28Z

Thanks Jason. I'll try to take a look today if I get a chance!

jasonwjohns · 2024-05-17T18:10:06Z

You’re the man! Thanks Cade, and no rush of course.JasonOn May 17, 2024, at 10:55, Cade Mirchandani ***@***.***> wrote: Thanks Jason. I'll try to take a look today if I get a chance! —Reply to this email directly, view it on GitHub, or unsubscribe.You are receiving this because you authored the thread.Message ID: ***@***.***>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

qc html #195

qc html #195

jasonwjohns commented May 16, 2024

cademirch commented May 17, 2024

jasonwjohns commented May 17, 2024 via email

cademirch commented May 17, 2024

jasonwjohns commented May 17, 2024 via email

qc html #195

qc html #195

Comments

jasonwjohns commented May 16, 2024

cademirch commented May 17, 2024

jasonwjohns commented May 17, 2024 via email

cademirch commented May 17, 2024

jasonwjohns commented May 17, 2024 via email