New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ditch ordered mapping #141
Conversation
I mean they passed before I added the assertion... 😅 |
The red is |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The code looks great to me!
Unfortunately, I guess. Given the benchmark it would have been a better news if I spotted something terrible in this... :)
I can kinda, reluctantly, accept that this can be slower, although even that I find unexpected. But how come we don't see a bigger memory impact? How did you measure memory usage?
@@ -38,12 +38,12 @@ func init() { | |||
defer networkit.DeletePartition(p) | |||
vs := &VertexSet{} | |||
vs.MappingToUnordered = make([]int64, p.NumberOfSubsets()) | |||
vs.MappingToOrdered = make(map[int64]SphynxId) | |||
mappingToOrdered := make(map[int64]SphynxId) | |||
ss := p.GetSubsetIdsVector() | |||
defer networkit.DeleteUint64Vector(ss) | |||
for i := range vs.MappingToUnordered { | |||
vs.MappingToUnordered[i] = int64(ss.Get(i)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we know that ss.Get(i)
is monotonous is i
? If so, let's add a comment stating this somewhere.
@@ -25,6 +25,7 @@ func doStripDuplicateEdgesFromBundle(es *EdgeBundle) *EdgeBundle { | |||
uniqueBundle.EdgeMapping[i] = id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we just use i
instead of id
here? Would that violate some contract? Then I guess we wouldn't need to sort.
rows[j].Src = int64(i) | ||
j++ | ||
} else { | ||
i++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding an assertion for vs1.MappingToUnordered[i] < rows[j].Src
here. Or if we are worried it has performance price then maybe just a comment.
rows[j].Dst = int64(i) | ||
j++ | ||
} else { | ||
i++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
if vs.MappingToUnordered[i] == row.Field(idIndex).Int() { | ||
values.Index(i).Set(row.Field(valueIndex)) | ||
defined.Index(i).Set(true) | ||
j++ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same applies here: add a comment or an assertion in an else branch.
|
||
func doVertexSetIntersection(vertexSets []*VertexSet) (intersection *VertexSet, firstEmbedding *EdgeBundle) { | ||
mergeVertices := make(MergeVertexEntrySlice, len(vertexSets[0].MappingToUnordered)) | ||
mergeVertices := make([]MergeVertexEntry, len(vertexSets[0].MappingToUnordered)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could avoid this map as well via a multi-merge in a single nested loop. But no need to it in this PR.
return vs.MappingToOrdered | ||
type VertexSet struct { | ||
// This slice contains the Spark IDs in ascending order. | ||
MappingToUnordered []int64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! At least VertexSet
will not lie about its estimated memory usage.
Thanks for the comments! (Hi Gabor!)
I restarted Sphynx (but not LynxKite). Then I changed the random seed and waited for the average shortest path to get computed in this workspace: Then I looked at "RES" in I've set up
While this profile is talking about 1 GB, |
Awesome! I don't pretend I understand all of this, but here's some remarks:
|
Thanks for the comments, Gabor! I checked with But a large part of sorting is the reflection! (In the central area.) I switched to a fast concurrent sort library (https://github.com/jfcg/sorty) and avoided reflection during sorting. It's much better: I can barely see sorting on this chart now. (There's a bit on the very left.) This is reflected in the overall speed. The slight slowdown at least is gone. Is this good to merge? Am I missing any compatibility issues? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, looks great!
The idea (from @xandrew-lynx) being that
MappingToOrdered
takes up a lot of memory. The tests seem to be passing locally. I haven't measured the impact on memory use yet. I also haven't thought backward compatibility entirely through.