Add Panther algorithm per #3849 #3886

mrecachinas · 2020-03-31T23:03:28Z

This addresses #3849.

A couple remaining questions:

Is there a better home for generate_random_paths? It’s a generic Markovian random walk that could be used elsewhere. Alternatively, if this already exists or there's a more efficient approach, I'm all ears.
I used numpy for a bit of this. Is there a desire to have a pure Python version as well (like simrank_similarity vs simrank_similarity_numpy)?
Should n_choose_k be elsewhere or perhaps should we use scipy.special.binom instead?
Do we want Panther++ added (requires a KD-tree, which we can use from scipy.spatial)?

pep8speaks · 2020-03-31T23:03:32Z

Hello @mrecachinas! Thanks for updating this PR.

In the file networkx/algorithms/similarity.py:

Line 1460:1: W293 blank line contains whitespace
Line 1516:89: E501 line too long (92 > 88 characters)
Line 1517:54: W605 invalid escape sequence '\p'
Line 1518:47: W605 invalid escape sequence '\s'
Line 1642:89: E501 line too long (98 > 88 characters)

Please install and run psf/black.

Comment last updated at 2020-09-05 14:56:05 UTC

dschult · 2020-05-20T01:47:17Z

n_choose_k is nicely expressed in Python 3.8+ as math.comb(n, k).
But we probably need to keep something around until we stop supporting Python 3.7.

There is only a reason to keep a numpy version and plain python if there is an advantage to each approach. For example, the algorithm used might be different with one using matrix manipulations and one using graph search techniques, or similar.

Thanks for this! I think there could be demand for Panther++. Let me think about where generate_random_paths might fit better.

small tweaks -- typo "ictionary" in docs for generate_random_paths. And could the example do soemthing with random_path to show that it has two outputs and what someone might want to do with them? Perhaps you could return just the paths and have an optional argument index_map that could be filled if supplied. ??

This also adds an example in the docstring for generate_random_paths and fixes the relevant tests.

mrecachinas · 2020-05-20T11:09:36Z

@dschult Fixed the typo and moved index_map to be an optional kwarg that, if provided, will be populated. I also added an example in the docstring illustrating that.

Let me know if there's anything else. In the meantime, I'll finish my implementation of Panther++.

networkx/algorithms/similarity.py

Change generate_random_paths tests with new API for generate_random_paths

storopoli · 2020-09-02T07:54:47Z

Any news guys? Will it be released in the next version?

indexmap -> index_map

mrecachinas · 2020-09-03T17:22:12Z

@dschult Think I addressed everything. Let me know if there's anything else you would like changed!

Also, I'll submit another PR for Panther++ once I finish it.

storopoli · 2020-09-03T17:31:51Z

@mrecachinas great job! Waiting for the panther++ algorithm also...

dschult · 2020-09-03T19:00:53Z

Quick question: In the docs, delta is:
The probability that 𝑆 is not an epsilon-approximation to (R, phi)

What are R and phi?

You can answer here-- but if you prefer, you can add more to that doc string.

storopoli · 2020-09-03T19:47:08Z

@dschult according to the original Panther article, $R$ is the number of random paths and $\phi$ is the probability that an element sampled from a set A $\subseteq$ D, where D is the domain.

ref: Zhang, J., Tang, J., Ma, C., Tong, H., Jing, Y., & Li, J. (2015). Panther: Fast Top-k Similarity Search on Large Networks. Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 1445–1454. https://doi.org/10.1145/2783258.2783267

dschult · 2020-09-03T20:19:55Z

Good Thanks!

But what is S... ? I don't think these symbols are defined in the doc_string.... only in the article.
The code refers to "S", but it is an array of values, which is hard to see how to be epsilon close to a 2-tuple (R, phi)...

mrecachinas · 2020-09-04T00:14:21Z

Sorry - S is just the similarity adjacency matrix. Updating the docstring.

networkx/algorithms/similarity.py

mrecachinas · 2020-11-23T21:50:09Z

@jarrodmillman Let me know if I've addressed your change requests.

jarrodmillman · 2020-11-24T21:20:50Z

@mrecachinas Sorry for the delay. Ross or I will take a look very soon! We are planning to release 2.6 soonish and I want to get this in first. Thanks for your patience.

Despite our slow response time, we would also like to get Panther++ in soon. So I hope you are still interested in contributing it!

rossbar

I went ahead an made some small touch-ups in rossbar/networkx@0973646c. I tried to push these to your branch (to spare you the spam via the github suggestion feature) but was unable to do so, likely because it seems like your fork originated from somewhere other than networkx/networkx. I wasn't able to PR against your fork either, so if you could apply those changes via cherry-pick that'd be great. If you're not comfortable doing so, you can either:

Add rossbar as a collaborator to your fork and I'll open a PR against your fork or
We just put this in and I can make a follow-on PR

Generally this LGTM - I'm not very familiar with the algorithm so my opinion shouldn't be weighted too heavily but the implementation seems good! I too was struck by the question of whether there is a better place for generate_random_paths, but I don't have any ideas and I don't think that should be a sticking point. Thanks @mrecachinas !

mrecachinas · 2020-11-25T01:48:19Z

@jarrodmillman I'll dust off and finish the Panther++ I started a few months ago (but I'll put it in a separate PR).

@rossbar Ahh, I think I forked someone else's fork for a previous PR (adding SimRank, IIRC), so yeah that's probably the issue. I'll cherry-pick if you're good with it.

* MAINT: use pre-defined local variable.

* Add initial pass at panther algorithm * Fix example imports and references * Fix pep8 issues * Remove unnecessary compare in panther conditional * Fix doctest failure in generate_random_paths * Fix n_choose_k when n == k * Add tests for panther, n_choose_k, generate_random_paths * Clean up panther_similarity docstring * Fix panther and generate_random_paths to return node names * Fix pep8 issue in similarity.py * Fix typo in generate_random_paths docstring * Add small microoptimizations in calculating transition probs * Handle k > n in n_choose_k * Change generate_random_paths to accept optional index_map vs return it This also adds an example in the docstring for generate_random_paths and fixes the relevant tests. * Rename v to source in panther_similarity * Change doc for c in panther_similarity * Change generate_random_paths to return generator instead of list Change generate_random_paths tests with new API for generate_random_paths * Increase random path sample size in docstring example * Fix docstring example for generate_random_paths * Update similarity.py per suggestions - Remove `n_choose_k` from `__all__` - Compute `1 / sample_size` once, and use that value in the loop - Replace `setdefault` with if-block * Add panther_similarity funcs to similarity.rst * Remove nx import from docstring in similarity.py * Fix typo in generate_random_paths indexmap -> index_map * Remove nx prefix from n_choose_k docstring examples * Fix unit test for n_choose_k by importing it directly * Add more context to panther docstring * Rename n_choose_k to _n_choose_k and add deprecation note * * DOC: minor documentation touch-ups. * MAINT: use pre-defined local variable. * Run black on test_similarity.py * Remove second setdefault in generate_random_paths * Optimize panther using the inverted vertex-path index map * Fix variable shadowing in panther similarity * Remove debug printing in panther_similarity Co-authored-by: Ross Barnowski <rossbar@berkeley.edu>

Add initial pass at panther algorithm

8179832

mrecachinas and others added 6 commits April 1, 2020 00:23

Fix example imports and references

0ab7d04

Fix pep8 issues

8e41ffb

Remove unnecessary compare in panther conditional

3ec8fd2

Fix doctest failure in generate_random_paths

7aa75d9

Fix n_choose_k when n == k

ce2b246

Add tests for panther, n_choose_k, generate_random_paths

5ff061f

mrecachinas changed the title ~~[WIP] Add Panther algorithm per #3849~~ Add Panther algorithm per #3849 Apr 1, 2020

Michael Recachinas and others added 3 commits April 1, 2020 16:46

Clean up panther_similarity docstring

896f562

Fix panther and generate_random_paths to return node names

998939a

Fix pep8 issue in similarity.py

19ba08b

mrecachinas mentioned this pull request May 19, 2020

Panther Algorithm for Fast top-k Similarity Search #3849

Closed

Michael Recachinas added 4 commits May 20, 2020 11:25

Fix typo in generate_random_paths docstring

3008229

Add small microoptimizations in calculating transition probs

cdfae0b

Handle k > n in n_choose_k

041b480

Change generate_random_paths to accept optional index_map vs return it

1983c2c

This also adds an example in the docstring for generate_random_paths and fixes the relevant tests.

dschult reviewed Jul 5, 2020

View reviewed changes

networkx/algorithms/similarity.py Outdated Show resolved Hide resolved

dschult reviewed Jul 5, 2020

View reviewed changes

networkx/algorithms/similarity.py Outdated Show resolved Hide resolved

dschult reviewed Jul 5, 2020

View reviewed changes

networkx/algorithms/similarity.py Show resolved Hide resolved

dschult reviewed Jul 5, 2020

View reviewed changes

networkx/algorithms/similarity.py Outdated Show resolved Hide resolved

mrecachinas added 6 commits July 5, 2020 19:48

Rename v to source in panther_similarity

81e2307

Change doc for c in panther_similarity

221da98

Change generate_random_paths to return generator instead of list

49302de

Change generate_random_paths tests with new API for generate_random_paths

Increase random path sample size in docstring example

b201fe8

Merge branch 'master' into feature/panther

799ca8c

Fix docstring example for generate_random_paths

3b3b285

mrecachinas added 3 commits September 3, 2020 09:54

Fix typo in generate_random_paths

fc3190d

indexmap -> index_map

Remove nx prefix from n_choose_k docstring examples

634e5bd

Fix unit test for n_choose_k by importing it directly

a432b46

dschult added the type: Enhancements label Sep 3, 2020

dschult added this to the networkx-2.6 milestone Sep 3, 2020

Add more context to panther docstring

cb037f9

jarrodmillman requested changes Sep 5, 2020

View reviewed changes

networkx/algorithms/similarity.py Show resolved Hide resolved

networkx/algorithms/similarity.py Outdated Show resolved Hide resolved

networkx/algorithms/similarity.py Outdated Show resolved Hide resolved

Rename n_choose_k to _n_choose_k and add deprecation note

6835d0d

dschult approved these changes Sep 5, 2020

View reviewed changes

mrecachinas requested a review from jarrodmillman September 7, 2020 04:06

rossbar self-requested a review November 24, 2020 07:16

rossbar approved these changes Nov 25, 2020

View reviewed changes

rossbar and others added 6 commits November 24, 2020 21:04

* DOC: minor documentation touch-ups.

7d58f60

* MAINT: use pre-defined local variable.

Run black on test_similarity.py

ab78371

Remove second setdefault in generate_random_paths

afa74f7

Optimize panther using the inverted vertex-path index map

e8bddb5

Fix variable shadowing in panther similarity

6e4e353

Remove debug printing in panther_similarity

deff1c3

jarrodmillman merged commit a1cad29 into networkx:master Nov 28, 2020

mrecachinas mentioned this pull request Nov 29, 2020

[WIP] Add first pass at panther++ #4400

Open

5 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Panther algorithm per #3849 #3886

Add Panther algorithm per #3849 #3886

mrecachinas commented Mar 31, 2020 •

edited

Loading

pep8speaks commented Mar 31, 2020 •

edited

Loading

dschult commented May 20, 2020

mrecachinas commented May 20, 2020 •

edited

Loading

storopoli commented Sep 2, 2020

mrecachinas commented Sep 3, 2020

storopoli commented Sep 3, 2020

dschult commented Sep 3, 2020

storopoli commented Sep 3, 2020

dschult commented Sep 3, 2020

mrecachinas commented Sep 4, 2020 •

edited

Loading

mrecachinas commented Nov 23, 2020

jarrodmillman commented Nov 24, 2020

rossbar left a comment

mrecachinas commented Nov 25, 2020 •

edited

Loading

Add Panther algorithm per #3849 #3886

Add Panther algorithm per #3849 #3886

Conversation

mrecachinas commented Mar 31, 2020 • edited Loading

pep8speaks commented Mar 31, 2020 • edited Loading

Comment last updated at 2020-09-05 14:56:05 UTC

dschult commented May 20, 2020

mrecachinas commented May 20, 2020 • edited Loading

storopoli commented Sep 2, 2020

mrecachinas commented Sep 3, 2020

storopoli commented Sep 3, 2020

dschult commented Sep 3, 2020

storopoli commented Sep 3, 2020

dschult commented Sep 3, 2020

mrecachinas commented Sep 4, 2020 • edited Loading

mrecachinas commented Nov 23, 2020

jarrodmillman commented Nov 24, 2020

rossbar left a comment

Choose a reason for hiding this comment

mrecachinas commented Nov 25, 2020 • edited Loading

mrecachinas commented Mar 31, 2020 •

edited

Loading

pep8speaks commented Mar 31, 2020 •

edited

Loading

mrecachinas commented May 20, 2020 •

edited

Loading

mrecachinas commented Sep 4, 2020 •

edited

Loading

mrecachinas commented Nov 25, 2020 •

edited

Loading