Skip to content

[REVIEW] Personalized Page Rank#300

Merged
BradReesWork merged 1 commit intorapidsai:branch-0.8from
kaatish:fea-personalized-pr
May 31, 2019
Merged

[REVIEW] Personalized Page Rank#300
BradReesWork merged 1 commit intorapidsai:branch-0.8from
kaatish:fea-personalized-pr

Conversation

@kaatish
Copy link
Copy Markdown
Collaborator

@kaatish kaatish commented May 24, 2019

No description provided.

@kaatish kaatish changed the title Personalized Page Rank [WIP] Personalized Page Rank May 24, 2019
@kaatish kaatish changed the title [WIP] Personalized Page Rank [REVIEW] Personalized Page Rank May 24, 2019
@afender afender self-requested a review May 24, 2019 18:38
Copy link
Copy Markdown
Member

@afender afender left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personalized Page Rank looks great, surgically inserted in the exiting Pagerank code and nicely exposed with the Nx-like API 👍

I'd just recommend a couple of python testing additions to cover cyber use case and the newly added "guess" feature

* @Param[in] has_guess This parameter is used to notify cuGRAPH if it should use a user-provided initial guess. False means the user doesn't have a guess, in this case cuGRAPH will use a uniform vector set to 1/V.
* If the value is True, cuGRAPH will read the pagerank parameter and use this as an initial guess.
* The initial guess must not be the vector of 0s. Any value other than 1 or 0 is treated as an invalid value.
* @Param[in] pagerank (optional) Initial guess if has_guess=true
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should test the newly exposed guess support at the python level too

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on slack. Plan :
Since we run cugraph after Nx we could provide Nx output as guess to cugraph and expect to converge in 1 or 2 iterations.

MAX_ITERATIONS = [500]
TOLERANCE = [1.0e-06]
ALPHA = [0.85]
PERSONALIZATION_PERC = [0, 10, 50]
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should add a comment explaining how these parameters impact the way the personalization is generated in networkx_call .

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here

raise TypeError('Shape is not square')

personalization = None
if personalization_perc != 0:
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We should explain how the personalization vector is set. I'm not completely sure about what's in there in the end.
  2. In cyber, some users don't need values. They just have a set of vertex they want to parametrize. We would need to add a test for that where val[i] = 1.0/num_personalized_vertices if i is parametrized and 0.0 otherwise.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Fixed here
  2. That case is already tested in personalization=None

int prsLen = 0;
GDF_REQUIRE((personalization_subset == nullptr) == (personalization_values == nullptr), GDF_INVALID_API_CALL);
if (personalization_subset != nullptr) {
has_personalization = true;
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens when personalization_subset != nullptr but its size is 0? Does it run the regular PageRank?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. Right now it will still try to run personalized page rank. This needs to be fixed to revert to normal pagerank.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here

fill(n, b, randomProbability);
if (has_personalization) {
fill(n, b, static_cast<ValueType>(0));
scatter(prsLen, prsVal, b, prsVtx);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should b sum to one from the mathematical perspective of the problem?
If so, what if the user's values don't? Should we normalize by nrm1 when it is larger or fill the rest of the vector with the correct uniform value when it is smaller? We could also return an error.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed on slack with Aatish and Haekyu. We may normalize the input.

if |Q| == 1.0: continue
else if Q == 0.0 <- uniform dist for all nodes
else Q <- Q / |Q| 

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed here

Copy link
Copy Markdown
Member

@BradReesWork BradReesWork left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Just need to address Alex's comments

@BradReesWork BradReesWork merged commit 7c14d69 into rapidsai:branch-0.8 May 31, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants