Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnboundLocalError local variable 'sample_column' referenced before assignment #6

Closed
colindaven opened this issue Feb 29, 2016 · 3 comments

Comments

@colindaven
Copy link

Have you seen this error ? Seems most files were produced, but the VCF is empty. Could it have something to do with the lack of IDs in the VCF ? I am rerunning with "--unique_ids 1 --unphased_vars 1", as the goal is gene level ASE.

Output:
out.allele_config.txt
out.allelic_counts.txt
out.haplotypes.txt
out.haplotypic_counts.txt
out.variant_connections.txt
out.vcf

out.vcf only contains a header, no data

"

FORMAT=<ID=PB,NUMBER=1,TYPE=String,Description="phASER Local Block">

FORMAT=<ID=PI,NUMBER=1,TYPE=String,Description="phASER Local Block Index (unique for each block)">

FORMAT=<ID=PW,NUMBER=1,TYPE=String,Description="phASER Genome Wide Genotype">

FORMAT=<ID=PC,NUMBER=1,TYPE=String,Description="phASER Genome Wide Confidence">

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT K564

"

``python phaser.py --bam in.bam --vcf in.vcf --o test1_out --sample testsample --threads 48 --mapq 10 --baseq 10 --pass_only 0

          Welcome to phASER v0.2

Author: Stephane Castel (scastel@nygenome.org)

#1. Loading heterozygous variants into intervals...

 loading VCF into memory...
 parsing VCF...
      371607 total heterozygous variants, 0 indels excluded, 0 blacklisted variants
 creating genomic intervals...

#2. Retrieving reads that overlap heterozygous sites...

 Reads are being written to disk, this will impact performance.
 file: K564_2DB4008_Rep1_6.bam
      minimum mapq: 10
      retrieved 8418748 reads

      using alignment score cutoff of 188
      splitting reads into 84 files with 100000 reads
      assigning reads to variants...

#3. Identifying connected variants...

 sequencing noise level estimated at 0.006877
 24323 variant connections dropped because of conflicting configurations (threshold = 0.010000)
 68195 variants covered by at least 1 read

#4. Identifying haplotype blocks...
#5. Phasing blocks...

 phasing large (>15 variants) blocks...
 identifying haplotypes with most support...

#6. Outputting haplotypes...
#7. Outputting phased VCF...

Traceback (most recent call last):
File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 1756, in
main();
File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 1012, in main
write_vcf();
File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 1123, in write_vcf
genotype = list(vcf_columns[sample_column].split(":")[gt_index]);
UnboundLocalError: local variable 'sample_column' referenced before assignment
``

@secastel
Copy link
Owner

Hmm, seems to be that when regenerating the VCF with phASER phasing it couldn't find the sample specified in --sample, which is weird since it would of had to have found it originally to load the heterozygous sites. I made a change which might fix this issue, I'm not sure since I can't test. If you wouldn't mind, could you please try again with the most recent version (v0.3, just committed).

Also, a side note, I see that you are running phASER with 48 threads, this is probably too many and will result in decreased performance. The reason for this is that there is some level of overhead splitting data into parts to be used by multiple processes. In the case of large files the overhead is small in comparison to the increase in speed gained by parallelization, however at some point, the cost of adding another process outweighs the advantage gained from parallelization. I hope this make sense. I'd try running with 16 threads and seeing if that decreases the runtime.

@secastel
Copy link
Owner

secastel commented Mar 1, 2016

One other thing, there was a bug in phaser_gene_ae.py which prevented counts from singleton variants from being included. So I would pull the latest commit and run the script again.

@colindaven
Copy link
Author

Thanks for the threading tips, I have changed to 16.

The recent version 0.3 fixed this issue, thanks. I now have a VCF being successfully output without any errors. This also works fine in the next step. So, thank you.

However, it seems the data I am using is not really appropriate (fault on our side) to get really proper results (this is just the first test of many). Apparently the heterozygosity is way too low. I need to create/impute/genotype the relevant offspring and try it on them for a full test. Thanks for all your help and the nice program though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants