[REVIEW] Personalized Page Rank#300
[REVIEW] Personalized Page Rank#300BradReesWork merged 1 commit intorapidsai:branch-0.8from kaatish:fea-personalized-pr
Conversation
afender
left a comment
There was a problem hiding this comment.
Personalized Page Rank looks great, surgically inserted in the exiting Pagerank code and nicely exposed with the Nx-like API 👍
I'd just recommend a couple of python testing additions to cover cyber use case and the newly added "guess" feature
| * @Param[in] has_guess This parameter is used to notify cuGRAPH if it should use a user-provided initial guess. False means the user doesn't have a guess, in this case cuGRAPH will use a uniform vector set to 1/V. | ||
| * If the value is True, cuGRAPH will read the pagerank parameter and use this as an initial guess. | ||
| * The initial guess must not be the vector of 0s. Any value other than 1 or 0 is treated as an invalid value. | ||
| * @Param[in] pagerank (optional) Initial guess if has_guess=true |
There was a problem hiding this comment.
We should test the newly exposed guess support at the python level too
There was a problem hiding this comment.
Discussed on slack. Plan :
Since we run cugraph after Nx we could provide Nx output as guess to cugraph and expect to converge in 1 or 2 iterations.
| MAX_ITERATIONS = [500] | ||
| TOLERANCE = [1.0e-06] | ||
| ALPHA = [0.85] | ||
| PERSONALIZATION_PERC = [0, 10, 50] |
There was a problem hiding this comment.
I think we should add a comment explaining how these parameters impact the way the personalization is generated in networkx_call .
| raise TypeError('Shape is not square') | ||
|
|
||
| personalization = None | ||
| if personalization_perc != 0: |
There was a problem hiding this comment.
- We should explain how the personalization vector is set. I'm not completely sure about what's in there in the end.
- In cyber, some users don't need values. They just have a set of vertex they want to parametrize. We would need to add a test for that where
val[i] = 1.0/num_personalized_verticesifiis parametrized and0.0otherwise.
There was a problem hiding this comment.
- Fixed here
- That case is already tested in personalization=None
| int prsLen = 0; | ||
| GDF_REQUIRE((personalization_subset == nullptr) == (personalization_values == nullptr), GDF_INVALID_API_CALL); | ||
| if (personalization_subset != nullptr) { | ||
| has_personalization = true; |
There was a problem hiding this comment.
What happens when personalization_subset != nullptr but its size is 0? Does it run the regular PageRank?
There was a problem hiding this comment.
Good catch. Right now it will still try to run personalized page rank. This needs to be fixed to revert to normal pagerank.
cpp/src/link_analysis/pagerank.cu
Outdated
| fill(n, b, randomProbability); | ||
| if (has_personalization) { | ||
| fill(n, b, static_cast<ValueType>(0)); | ||
| scatter(prsLen, prsVal, b, prsVtx); |
There was a problem hiding this comment.
Should b sum to one from the mathematical perspective of the problem?
If so, what if the user's values don't? Should we normalize by nrm1 when it is larger or fill the rest of the vector with the correct uniform value when it is smaller? We could also return an error.
There was a problem hiding this comment.
Discussed on slack with Aatish and Haekyu. We may normalize the input.
if |Q| == 1.0: continue
else if Q == 0.0 <- uniform dist for all nodes
else Q <- Q / |Q|
BradReesWork
left a comment
There was a problem hiding this comment.
Looks great. Just need to address Alex's comments
No description provided.