Skip to content

Conversation

@chrisvittal
Copy link
Collaborator

In generated code, we keep reference genomes as static fields on container classes.

We were writing to this field in every task, but we were doing so before the reference genome was in a good state (heal had not been called). Reordering the heal/store fixes the issue.

closes #10722

@chrisvittal
Copy link
Collaborator Author

Tested using modified code from Lindsay Liang on zulip:

import hail as hl
hl.init(log='hail.log')
rg37 = hl.get_reference('GRCh37')

rg38 = hl.get_reference('GRCh38')
rg37.add_liftover('gs://hail-common/references/grch37_to_grch38.over.chain.gz', rg38)
gnomad_ht = hl.read_table('gs://gcp-public-data--gnomad/release/2.1.1/ht/exomes/gnomad.exomes.r2.1.1.sites.ht')

gnomad_ht = gnomad_ht.annotate(new_locus=hl.liftover(gnomad_ht.locus, 'GRCh38'))
gnomad_ht = gnomad_ht.key_by(locus=gnomad_ht.new_locus, alleles=gnomad_ht.alleles)
mt = hl.balding_nichols_model(3, 100, 10_000, reference_genome='GRCh38')
mt = mt.annotate_entries(AD=hl.zeros(hl.len(mt.alleles)))
mt = mt.annotate_rows(gnomad_non_neuro_AF =
                      gnomad_ht.index(mt.row_key).freq[hl.eval(gnomad_ht.freq_index_dict["non_neuro"])].AF)
mt = mt.annotate_entries(pAB = hl.or_missing(mt.GT.is_het(),
                                             hl.binom_test(mt.AD[1], hl.sum(mt.AD), 0.5, 'two-sided')))
mt._force_count_rows()

This faithfully replicated the issue and went from pretty much every task failing at least once as they read bad state to no tasks failing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[query] Liftover NoSuchElementException

3 participants