Skip to content

[hail] Implement concordance in Python.#6224

Merged
danking merged 4 commits intohail-is:masterfrom
tpoterba:concordance-python
May 31, 2019
Merged

[hail] Implement concordance in Python.#6224
danking merged 4 commits intohail-is:masterfrom
tpoterba:concordance-python

Conversation

@tpoterba
Copy link
Copy Markdown
Contributor

This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.

This is unfortunately about 2x slower -- partly due to the fact
that the column + global concordance calculations are not fused,
and partly because the AggArrayPerElement stuff seems pretty
slow right now and is dragging down the per-sample concordance.
Comment thread hail/python/hail/methods/qc.py Outdated


def concordance2(left, right, *, _localize_global_statistics=True):
print('conc2')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The print is clearly debugging code. Is concordance2 itself also?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

whoops! Yes, the whole function can be removed.

Copy link
Copy Markdown
Member

@patrick-schultz patrick-schultz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one thought, which you're free to ignore. Otherwise looks good.


lit = hl.literal(included, dtype=hl.tset(hl.tstr))
left = left.filter_cols(lit.contains(left.col_key[0]))
right = right.filter_cols(lit.contains(right.col_key[0]))
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This might be faster using semi_join_cols. I assume the semi-join is an ordered merge, so linear, while this is n log n, doing a binary search for each column.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

semi join can't do an ordered merge -- we don't store the columns ordered

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh of course. Never mind!

@danking danking merged commit 9d103e1 into hail-is:master May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants