-
-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Panther algorithm per #3849 #3886
Conversation
Hello @mrecachinas! Thanks for updating this PR.
Please install and run psf/black. Comment last updated at 2020-09-05 14:56:05 UTC |
There is only a reason to keep a Thanks for this! I think there could be demand for Panther++. Let me think about where small tweaks -- typo "ictionary" in docs for |
This also adds an example in the docstring for generate_random_paths and fixes the relevant tests.
@dschult Fixed the typo and moved Let me know if there's anything else. In the meantime, I'll finish my implementation of Panther++. |
Change generate_random_paths tests with new API for generate_random_paths
Any news guys? Will it be released in the next version? |
@dschult Think I addressed everything. Let me know if there's anything else you would like changed! Also, I'll submit another PR for Panther++ once I finish it. |
@mrecachinas great job! Waiting for the panther++ algorithm also... |
Quick question: In the docs, delta is: What are R and phi? You can answer here-- but if you prefer, you can add more to that doc string. |
@dschult according to the original Panther article, ref: Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. (2015). Panther: Fast Top-k Similarity Search on Large Networks. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1445–1454. https://doi.org/10.1145/2783258.2783267 |
Good Thanks! But what is S... ? I don't think these symbols are defined in the doc_string.... only in the article. |
Sorry - S is just the similarity adjacency matrix. Updating the docstring. |
@jarrodmillman Let me know if I've addressed your change requests. |
@mrecachinas Sorry for the delay. Ross or I will take a look very soon! We are planning to release 2.6 soonish and I want to get this in first. Thanks for your patience. Despite our slow response time, we would also like to get Panther++ in soon. So I hope you are still interested in contributing it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I went ahead an made some small touch-ups in rossbar/networkx@0973646c. I tried to push these to your branch (to spare you the spam via the github suggestion feature) but was unable to do so, likely because it seems like your fork originated from somewhere other than networkx/networkx
. I wasn't able to PR against your fork either, so if you could apply those changes via cherry-pick that'd be great. If you're not comfortable doing so, you can either:
- Add rossbar as a collaborator to your fork and I'll open a PR against your fork or
- We just put this in and I can make a follow-on PR
Generally this LGTM - I'm not very familiar with the algorithm so my opinion shouldn't be weighted too heavily but the implementation seems good! I too was struck by the question of whether there is a better place for generate_random_paths
, but I don't have any ideas and I don't think that should be a sticking point. Thanks @mrecachinas !
@jarrodmillman I'll dust off and finish the Panther++ I started a few months ago (but I'll put it in a separate PR). @rossbar Ahh, I think I forked someone else's fork for a previous PR (adding SimRank, IIRC), so yeah that's probably the issue. I'll cherry-pick if you're good with it. |
* MAINT: use pre-defined local variable.
* Add initial pass at panther algorithm * Fix example imports and references * Fix pep8 issues * Remove unnecessary compare in panther conditional * Fix doctest failure in generate_random_paths * Fix n_choose_k when n == k * Add tests for panther, n_choose_k, generate_random_paths * Clean up panther_similarity docstring * Fix panther and generate_random_paths to return node names * Fix pep8 issue in similarity.py * Fix typo in generate_random_paths docstring * Add small microoptimizations in calculating transition probs * Handle k > n in n_choose_k * Change generate_random_paths to accept optional index_map vs return it This also adds an example in the docstring for generate_random_paths and fixes the relevant tests. * Rename v to source in panther_similarity * Change doc for c in panther_similarity * Change generate_random_paths to return generator instead of list Change generate_random_paths tests with new API for generate_random_paths * Increase random path sample size in docstring example * Fix docstring example for generate_random_paths * Update similarity.py per suggestions - Remove `n_choose_k` from `__all__` - Compute `1 / sample_size` once, and use that value in the loop - Replace `setdefault` with if-block * Add panther_similarity funcs to similarity.rst * Remove nx import from docstring in similarity.py * Fix typo in generate_random_paths indexmap -> index_map * Remove nx prefix from n_choose_k docstring examples * Fix unit test for n_choose_k by importing it directly * Add more context to panther docstring * Rename n_choose_k to _n_choose_k and add deprecation note * * DOC: minor documentation touch-ups. * MAINT: use pre-defined local variable. * Run black on test_similarity.py * Remove second setdefault in generate_random_paths * Optimize panther using the inverted vertex-path index map * Fix variable shadowing in panther similarity * Remove debug printing in panther_similarity Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>
This addresses #3849.
A couple remaining questions:
generate_random_paths
? It’s a generic Markovian random walk that could be used elsewhere. Alternatively, if this already exists or there's a more efficient approach, I'm all ears.numpy
for a bit of this. Is there a desire to have a pure Python version as well (likesimrank_similarity
vssimrank_similarity_numpy
)?n_choose_k
be elsewhere or perhaps should we usescipy.special.binom
instead?scipy.spatial
)?