Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mash matrix #66

Closed
wants to merge 34 commits into from
Closed

mash matrix #66

wants to merge 34 commits into from

Conversation

kloetzl
Copy link
Contributor

@kloetzl kloetzl commented Oct 26, 2017

I was frustrated with all existing solutions to produce a tree from mash distances, so I created an extra command: mash matrix. As the name suggests, given a set of sequences it produces a distance matrix (see issue #9).

Note that I tailored it specifically to my needs. This includes output sequence names (compare issue #13) and the removal of a bunch of "features" I don't care about: p-values, reading sketches might be broken, static libcapnp. You might have to cherry-pick or do some adaptation. Also, you will have to enhance the documentation.

This new tool is bloody fast: just 47 seconds to compute a distance matrix for all ~2600 E. coli from Ensembl.

Previously, for every position all 21 or so characters forming the kmer
would be checked for validity. This lead to repeated checking of valid
characters. I turned this into a strictly linear algorithm.
@kloetzl
Copy link
Contributor Author

kloetzl commented Jul 8, 2018

Whoops, I should not have based the PR off the master branch. This was only supposed to be until
c885abf. (You could also pull 08fb829 while your at it.)

using a global PARAMETERS object now.
Thanks to @sebhtml for correcting me here. We now only hash the kmer
with lower lexicographic ordering. This also reduces the total amount
of kmers hashed.
@kloetzl
Copy link
Contributor Author

kloetzl commented Mar 18, 2019

I am closing this PR now. It is just too much of a mess with little gain as there is a mash triangle command now. I might open pull requests for individual changes later on.

@kloetzl kloetzl closed this Mar 18, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants