Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[hail] add contig_recoding to import_bed and import_locus_intervals #7013

Merged
merged 9 commits into from Sep 9, 2019

Conversation

@danking
Copy link
Collaborator

@danking danking commented Sep 6, 2019

No description provided.

def recode_contig(x):
if contig_recoding is None:
return x
return contig_recoding[x]
Copy link
Collaborator

@jigold jigold Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be the equivalent of .get. The dictionary of recordings doesn't have to have all contigs.

Copy link
Collaborator Author

@danking danking Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, I added tests of both cases where the file has a mix of chr22 and 22.

def recode_contig(x):
if contig_recoding is None:
return x
return contig_recoding[x]
Copy link
Collaborator

@jigold jigold Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See above.

reference_genome='GRCh38')
t._force_count()
self.assertTrue(t.count() == 2)
self.assertEqual(t.interval.dtype.point_type, hl.tstruct(contig=hl.tstr, position=hl.tint32))
Copy link
Collaborator

@jigold jigold Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this type be GRCh38 locus?

Copy link
Collaborator Author

@danking danking Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

def test_import_locus_intervals_recoding(self):
interval_file = resource('annotinterall.grch38.no.chr.interval_list')
t = hl.import_locus_intervals(interval_file,
contig_recoding={str(i): f'chr{i}' for i in [*range(22), 'X', 'Y', 'MT']},
Copy link
Collaborator

@jigold jigold Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd take out MT here. I'm not sure chrMT is a thing in GRCh38.

Copy link
Collaborator Author

@danking danking Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to M

with open(bed_file) as f:
for line in f:
if len(line.strip()) != 0:
try:
Copy link
Collaborator

@jigold jigold Sep 6, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this just a complicated way to count the lines in the file? I like your tests above better.

Copy link
Collaborator Author

@danking danking Sep 7, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I copied this from a test you added a year ago above ;). I hardcoded the expected count and added the type check.

bed = hl.import_bed(bed_file,
reference_genome='GRCh38',
contig_recoding={str(i): f'chr{i}' for i in [*range(1, 23), 'X', 'Y', 'M']})
self.assertEqual(bed._force_count(), 3)
Copy link
Collaborator

@jigold jigold Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be 5?

contig_recoding={str(i): f'chr{i}' for i in [*range(1, 23), 'X', 'Y', 'M']},
reference_genome='GRCh38')
t._force_count()
self.assertTrue(t.count() == 3)
Copy link
Collaborator

@jigold jigold Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can combine these two lines as you did below.

Copy link
Collaborator

@tpoterba tpoterba Sep 8, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, don't use assertTrue. use assertEqual, but standard python assert t.count() == 3 is preferred with pytest

Copy link
Collaborator Author

@danking danking Sep 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed every occurrence of assertTrue(... == ...) from this file.

@danking danking dismissed jigold’s stale review Sep 9, 2019

addressed

jigold
jigold approved these changes Sep 9, 2019
@danking danking merged commit 9ba2bea into hail-is:master Sep 9, 2019
1 check passed
@danking danking deleted the foo branch Dec 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

3 participants