UnboundLocalError local variable 'sample_column' referenced before assignment #6

colindaven · 2016-02-29T08:57:16Z

Have you seen this error ? Seems most files were produced, but the VCF is empty. Could it have something to do with the lack of IDs in the VCF ? I am rerunning with "--unique_ids 1 --unphased_vars 1", as the goal is gene level ASE.

Output:
out.allele_config.txt
out.allelic_counts.txt
out.haplotypes.txt
out.haplotypic_counts.txt
out.variant_connections.txt
out.vcf

out.vcf only contains a header, no data

"

FORMAT=<ID=PB,NUMBER=1,TYPE=String,Description="phASER Local Block">

FORMAT=<ID=PI,NUMBER=1,TYPE=String,Description="phASER Local Block Index (unique for each block)">

FORMAT=<ID=PW,NUMBER=1,TYPE=String,Description="phASER Genome Wide Genotype">

FORMAT=<ID=PC,NUMBER=1,TYPE=String,Description="phASER Genome Wide Confidence">

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT K564

"

``python phaser.py --bam in.bam --vcf in.vcf --o test1_out --sample testsample --threads 48 --mapq 10 --baseq 10 --pass_only 0

          Welcome to phASER v0.2

Author: Stephane Castel (scastel@nygenome.org)

#1. Loading heterozygous variants into intervals...

 loading VCF into memory...
 parsing VCF...
      371607 total heterozygous variants, 0 indels excluded, 0 blacklisted variants
 creating genomic intervals...

#2. Retrieving reads that overlap heterozygous sites...

 Reads are being written to disk, this will impact performance.
 file: K564_2DB4008_Rep1_6.bam
      minimum mapq: 10
      retrieved 8418748 reads

      using alignment score cutoff of 188
      splitting reads into 84 files with 100000 reads
      assigning reads to variants...

#3. Identifying connected variants...

 sequencing noise level estimated at 0.006877
 24323 variant connections dropped because of conflicting configurations (threshold = 0.010000)
 68195 variants covered by at least 1 read

#4. Identifying haplotype blocks...
#5. Phasing blocks...

 phasing large (>15 variants) blocks...
 identifying haplotypes with most support...

#6. Outputting haplotypes...
#7. Outputting phased VCF...

Traceback (most recent call last):
File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 1756, in
main();
File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 1012, in main
write_vcf();
File "/home/bioinformatics/NAS01/programs/phaser/phaser/phaser/phaser.py", line 1123, in write_vcf
genotype = list(vcf_columns[sample_column].split(":")[gt_index]);
UnboundLocalError: local variable 'sample_column' referenced before assignment
``

The text was updated successfully, but these errors were encountered:

secastel · 2016-02-29T20:58:25Z

Hmm, seems to be that when regenerating the VCF with phASER phasing it couldn't find the sample specified in --sample, which is weird since it would of had to have found it originally to load the heterozygous sites. I made a change which might fix this issue, I'm not sure since I can't test. If you wouldn't mind, could you please try again with the most recent version (v0.3, just committed).

Also, a side note, I see that you are running phASER with 48 threads, this is probably too many and will result in decreased performance. The reason for this is that there is some level of overhead splitting data into parts to be used by multiple processes. In the case of large files the overhead is small in comparison to the increase in speed gained by parallelization, however at some point, the cost of adding another process outweighs the advantage gained from parallelization. I hope this make sense. I'd try running with 16 threads and seeing if that decreases the runtime.

secastel · 2016-03-01T04:03:53Z

One other thing, there was a bug in phaser_gene_ae.py which prevented counts from singleton variants from being included. So I would pull the latest commit and run the script again.

colindaven · 2016-03-02T13:09:10Z

Thanks for the threading tips, I have changed to 16.

The recent version 0.3 fixed this issue, thanks. I now have a VCF being successfully output without any errors. This also works fine in the next step. So, thank you.

However, it seems the data I am using is not really appropriate (fault on our side) to get really proper results (this is just the first test of many). Apparently the heterozygosity is way too low. I need to create/impute/genotype the relevant offspring and try it on them for a full test. Thanks for all your help and the nice program though.

colindaven closed this as completed Mar 2, 2016

colindaven mentioned this issue Nov 4, 2016

unsupported operand type(s) for +=: 'int' and 'str' #15

Closed

everestial mentioned this issue Jun 20, 2018

Additional Commits #47

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnboundLocalError local variable 'sample_column' referenced before assignment #6

UnboundLocalError local variable 'sample_column' referenced before assignment #6

colindaven commented Feb 29, 2016

secastel commented Feb 29, 2016

secastel commented Mar 1, 2016

colindaven commented Mar 2, 2016

UnboundLocalError local variable 'sample_column' referenced before assignment #6

UnboundLocalError local variable 'sample_column' referenced before assignment #6

Comments

colindaven commented Feb 29, 2016

FORMAT=<ID=PB,NUMBER=1,TYPE=String,Description="phASER Local Block">

FORMAT=<ID=PI,NUMBER=1,TYPE=String,Description="phASER Local Block Index (unique for each block)">

FORMAT=<ID=PW,NUMBER=1,TYPE=String,Description="phASER Genome Wide Genotype">

FORMAT=<ID=PC,NUMBER=1,TYPE=String,Description="phASER Genome Wide Confidence">

CHROM POS ID REF ALT QUAL FILTER INFO FORMAT K564

secastel commented Feb 29, 2016

secastel commented Mar 1, 2016

colindaven commented Mar 2, 2016