-
Notifications
You must be signed in to change notification settings - Fork 196
WIP: Personalized PageRank #669
Conversation
|
|
||
|
|
||
| List<Node> sourceNodes = configuration.get("sourceNodes", new ArrayList<>()); | ||
| Stream<Long> sourceNodeIds = sourceNodes.stream().map(Node::getId); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could use mapToLong to avoid boxing to Long instances
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have updated this now
| concurrency, | ||
| idMapping.nodeCount(), | ||
| dampingFactor, | ||
| sourceNodeIds.map(graph::toMappedNodeId).collect(Collectors.toList()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this is Huge, we should use toHugeMappedNodeId to get the internal long id.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, a Node may not have been mapped if there was any Label/Type filtering, so maybe add a filter(mappedId -> mappedId != -1L) to remove the nodes that we don't know about.
| concurrency, | ||
| idMapping.nodeCount(), | ||
| dampingFactor, | ||
| sourceNodeIds.map(graph::toMappedNodeId).collect(Collectors.toList()), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we use the LongStream here, toArray would probably better.
| .toArray(); | ||
|
|
||
| for (long sourceNodeId : partitionSourceNodeIds) { | ||
| partitionRank[Math.toIntExact(sourceNodeId - this.startNode)] = alpha; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@knutwalker is this the correct way to find the place in the array to update?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mneedham yes, that's fine; we usually have (int) (nodeId - startNode) for local nodes or (int) (nodeId - starts[partitionIndex]) for nodes in a different partition.
|
LGTM |
* test showing just normal PageRank * PPR test WIP * wip * more wip * add calculation to huge as well * this factory doesn't seem to listen to my undirected suggestion * Fixing boxing * fix imports * fixing a bug in where we set alpha. Offset should be based on our startNode * filter unmapped nodes * updating docs to mention PPR * oops we don't need this
No description provided.