Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] add contig_recoding to import_bed and import_locus_intervals #7013

Merged
merged 9 commits into from Sep 9, 2019

Conversation

@danking
Copy link
Collaborator

commented Sep 6, 2019

No description provided.

def recode_contig(x):
if contig_recoding is None:
return x
return contig_recoding[x]

This comment has been minimized.

Copy link
@jigold

jigold Sep 6, 2019

Collaborator

This should be the equivalent of .get. The dictionary of recordings doesn't have to have all contigs.

This comment has been minimized.

Copy link
@danking

danking Sep 7, 2019

Author Collaborator

Good catch, I added tests of both cases where the file has a mix of chr22 and 22.

def recode_contig(x):
if contig_recoding is None:
return x
return contig_recoding[x]

This comment has been minimized.

Copy link
@jigold

jigold Sep 6, 2019

Collaborator

See above.

reference_genome='GRCh38')
t._force_count()
self.assertTrue(t.count() == 2)
self.assertEqual(t.interval.dtype.point_type, hl.tstruct(contig=hl.tstr, position=hl.tint32))

This comment has been minimized.

Copy link
@jigold

jigold Sep 6, 2019

Collaborator

Shouldn't this type be GRCh38 locus?

This comment has been minimized.

Copy link
@danking

danking Sep 7, 2019

Author Collaborator

Fixed.

def test_import_locus_intervals_recoding(self):
interval_file = resource('annotinterall.grch38.no.chr.interval_list')
t = hl.import_locus_intervals(interval_file,
contig_recoding={str(i): f'chr{i}' for i in [*range(22), 'X', 'Y', 'MT']},

This comment has been minimized.

Copy link
@jigold

jigold Sep 6, 2019

Collaborator

I'd take out MT here. I'm not sure chrMT is a thing in GRCh38.

This comment has been minimized.

Copy link
@danking

danking Sep 7, 2019

Author Collaborator

Changed to M

with open(bed_file) as f:
for line in f:
if len(line.strip()) != 0:
try:

This comment has been minimized.

Copy link
@jigold

jigold Sep 6, 2019

Collaborator

Is this just a complicated way to count the lines in the file? I like your tests above better.

This comment has been minimized.

Copy link
@danking

danking Sep 7, 2019

Author Collaborator

I copied this from a test you added a year ago above ;). I hardcoded the expected count and added the type check.

danking added 3 commits Sep 7, 2019
fix
fix
bed = hl.import_bed(bed_file,
reference_genome='GRCh38',
contig_recoding={str(i): f'chr{i}' for i in [*range(1, 23), 'X', 'Y', 'M']})
self.assertEqual(bed._force_count(), 3)

This comment has been minimized.

Copy link
@jigold

jigold Sep 8, 2019

Collaborator

Shouldn't this be 5?

contig_recoding={str(i): f'chr{i}' for i in [*range(1, 23), 'X', 'Y', 'M']},
reference_genome='GRCh38')
t._force_count()
self.assertTrue(t.count() == 3)

This comment has been minimized.

Copy link
@jigold

jigold Sep 8, 2019

Collaborator

You can combine these two lines as you did below.

This comment has been minimized.

Copy link
@tpoterba

tpoterba Sep 8, 2019

Collaborator

also, don't use assertTrue. use assertEqual, but standard python assert t.count() == 3 is preferred with pytest

This comment has been minimized.

Copy link
@danking

danking Sep 9, 2019

Author Collaborator

I removed every occurrence of assertTrue(... == ...) from this file.

danking added 3 commits Sep 9, 2019
@danking danking dismissed jigold’s stale review Sep 9, 2019

addressed

@jigold
jigold approved these changes Sep 9, 2019
fix
@danking danking merged commit 9ba2bea into hail-is:master Sep 9, 2019
1 check passed
1 check passed
ci-test success
Details
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants
You can’t perform that action at this time.