Ditch ordered mapping #141

darabos · 2021-02-11T18:17:28Z

The idea (from @xandrew-lynx) being that MappingToOrdered takes up a lot of memory. The tests seem to be passing locally. I haven't measured the impact on memory use yet. I also haven't thought backward compatibility entirely through.

…ingToOrdered.

darabos · 2021-02-11T18:57:56Z

I mean they passed before I added the assertion... 😅

darabos · 2021-02-12T19:41:37Z

I did some testing.

The red is master, blue is this PR. The memory use decreased a bit, but the runtime increased a bit more. It's just 4 runs each with 8 different random graphs with 10,000,000 vertices and 30,000,000 edges. We compute shortest path on it. (This was one of the benchmark problems I used on the 256 GB test.)

xandrew-lynx

The code looks great to me!

Unfortunately, I guess. Given the benchmark it would have been a better news if I spotted something terrible in this... :)

I can kinda, reluctantly, accept that this can be slower, although even that I find unexpected. But how come we don't see a bigger memory impact? How did you measure memory usage?

xandrew-lynx · 2021-02-15T07:23:03Z

sphynx/lynxkite-sphynx/networkit_community_detection.go

@@ -38,12 +38,12 @@ func init() {
 			defer networkit.DeletePartition(p)
 			vs := &VertexSet{}
 			vs.MappingToUnordered = make([]int64, p.NumberOfSubsets())
-			vs.MappingToOrdered = make(map[int64]SphynxId)
+			mappingToOrdered := make(map[int64]SphynxId)
 			ss := p.GetSubsetIdsVector()
 			defer networkit.DeleteUint64Vector(ss)
 			for i := range vs.MappingToUnordered {
 				vs.MappingToUnordered[i] = int64(ss.Get(i))


Do we know that ss.Get(i) is monotonous is i? If so, let's add a comment stating this somewhere.

xandrew-lynx · 2021-02-15T07:35:14Z

sphynx/lynxkite-sphynx/strip_duplicate_edges_from_bundle.go

@@ -25,6 +25,7 @@ func doStripDuplicateEdgesFromBundle(es *EdgeBundle) *EdgeBundle {
 		uniqueBundle.EdgeMapping[i] = id


Can't we just use i instead of id here? Would that violate some contract? Then I guess we wouldn't need to sort.

xandrew-lynx · 2021-02-15T07:46:40Z

sphynx/lynxkite-sphynx/unordered_disk_io.go

+				rows[j].Src = int64(i)
+				j++
+			} else {
+				i++


Consider adding an assertion for vs1.MappingToUnordered[i] < rows[j].Src here. Or if we are worried it has performance price then maybe just a comment.

xandrew-lynx · 2021-02-15T07:48:42Z

sphynx/lynxkite-sphynx/unordered_disk_io.go

+				rows[j].Dst = int64(i)
+				j++
+			} else {
+				i++


xandrew-lynx · 2021-02-15T07:55:53Z

sphynx/lynxkite-sphynx/unordered_disk_io.go

+			if vs.MappingToUnordered[i] == row.Field(idIndex).Int() {
+				values.Index(i).Set(row.Field(valueIndex))
+				defined.Index(i).Set(true)
+				j++


Same applies here: add a comment or an assertion in an else branch.

xandrew-lynx · 2021-02-15T08:00:51Z

sphynx/lynxkite-sphynx/vertex_set_intersection.go


 func doVertexSetIntersection(vertexSets []*VertexSet) (intersection *VertexSet, firstEmbedding *EdgeBundle) {
-	mergeVertices := make(MergeVertexEntrySlice, len(vertexSets[0].MappingToUnordered))
+	mergeVertices := make([]MergeVertexEntry, len(vertexSets[0].MappingToUnordered))


We could avoid this map as well via a multi-merge in a single nested loop. But no need to it in this PR.

olahg · 2021-02-15T17:56:21Z

sphynx/lynxkite-sphynx/types.go

-	return vs.MappingToOrdered
+type VertexSet struct {
+	// This slice contains the Spark IDs in ascending order.
+	MappingToUnordered []int64


Cool! At least VertexSet will not lie about its estimated memory usage.

darabos · 2021-02-15T19:15:25Z

Thanks for the comments! (Hi Gabor!)

I can kinda, reluctantly, accept that this can be slower, although even that I find unexpected. But how come we don't see a bigger memory impact? How did you measure memory usage?

I restarted Sphynx (but not LynxKite). Then I changed the random seed and waited for the average shortest path to get computed in this workspace:

Then I looked at "RES" in top. You can see the data I collected in the same benchmarking spreadsheet I used last week in the "no mapping" sheet.

I've set up pprof now. (Click to enlarge.)

Before	After

While this profile is talking about 1 GB, top shows 6.4 GB RES. I guess that's just fallout from memory management for GC.

olahg · 2021-02-16T08:31:40Z

Awesome! I don't pretend I understand all of this, but here's some remarks:

If it's just the memory usage you're worried about, maybe you could just use the ordered mapping when you need it, and then throw it away without caching it.
The upper 32 bits of the original ids are random, but the lower 32 bits are totally predictable. Maybe it could be somehow utilized to save more memory.
Sorting in Go is slow. I remember experimenting with it which working an the ranking attribute operation, and a c++ version was about twice as fast as the Go version. Maybe you could just sort lazily, only those vertex sets that really need it.
You could sort considerably faster if you implemented your own non-interface-based custom sort. The interface-based solution calls a function for every swap and every comparison, which do their own bounds checking for every access.

darabos · 2021-03-02T14:20:34Z

Thanks for the comments, Gabor!

I checked with pprof and indeed sorting is a significant part of it:

But a large part of sorting is the reflection! (In the central area.)

I switched to a fast concurrent sort library (https://github.com/jfcg/sorty) and avoided reflection during sorting. It's much better:

I can barely see sorting on this chart now. (There's a bit on the very left.)

This is reflected in the overall speed. The slight slowdown at least is gone.

Is this good to merge? Am I missing any compatibility issues?

olahg

LGTM

xandrew-lynx

Thanks, looks great!

darabos added 5 commits February 11, 2021 16:26

No MappingToOrdered in VertexSetIntersection.

68cf602

Use Printf from log instead of fmt.

9f6a83f

Handle reading unordered files by sorting and merging instead of Mapp…

7fc14cb

…ingToOrdered.

Fix/delete NetworKit tests.

f85f115

Sort when loading, assert when saving ordered data.

950e954

darabos added 4 commits February 12, 2021 10:52

Bit clearer logging.

6d3c231

Sort edge bundle in StripDuplicateEdgesFromBundle.

3acca47

Upgrade testcontainers to fix the Neo4j test.

eb7a666

Improve "guid missing" error message.

5640955

xandrew-lynx reviewed Feb 15, 2021

View reviewed changes

olahg reviewed Feb 15, 2021

View reviewed changes

Expose pprof on a debug port.

db6e7d6

darabos added 3 commits March 2, 2021 13:56

More flexible parallelism instead of fixed 4.

2b0aa50

Use a fast concurrent sort library. (sorty)

3861bb3

Fix error.

38018b3

darabos changed the title ~~[WIP] Ditch ordered mapping~~ Ditch ordered mapping Mar 2, 2021

olahg approved these changes Mar 6, 2021

View reviewed changes

darabos added this to the LynxKite 4.3.0 milestone Mar 24, 2021

xandrew-lynx approved these changes Mar 24, 2021

View reviewed changes

darabos merged commit 6083147 into master Mar 24, 2021

darabos deleted the darabos-no-ordered-mapping branch March 24, 2021 16:08

darabos mentioned this pull request Feb 22, 2022

Data corruption in Sphynx when reading legacy data #223

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ditch ordered mapping #141

Ditch ordered mapping #141

darabos commented Feb 11, 2021

darabos commented Feb 11, 2021

darabos commented Feb 12, 2021

xandrew-lynx left a comment

xandrew-lynx Feb 15, 2021

xandrew-lynx Feb 15, 2021

xandrew-lynx Feb 15, 2021

xandrew-lynx Feb 15, 2021

xandrew-lynx Feb 15, 2021

xandrew-lynx Feb 15, 2021

olahg Feb 15, 2021

darabos commented Feb 15, 2021

olahg commented Feb 16, 2021

darabos commented Mar 2, 2021

olahg left a comment

xandrew-lynx left a comment

		@@ -25,6 +25,7 @@ func doStripDuplicateEdgesFromBundle(es EdgeBundle) EdgeBundle {
		uniqueBundle.EdgeMapping[i] = id

Ditch ordered mapping #141

Ditch ordered mapping #141

Conversation

darabos commented Feb 11, 2021

darabos commented Feb 11, 2021

darabos commented Feb 12, 2021

xandrew-lynx left a comment

Choose a reason for hiding this comment

xandrew-lynx Feb 15, 2021

Choose a reason for hiding this comment

xandrew-lynx Feb 15, 2021

Choose a reason for hiding this comment

xandrew-lynx Feb 15, 2021

Choose a reason for hiding this comment

xandrew-lynx Feb 15, 2021

Choose a reason for hiding this comment

xandrew-lynx Feb 15, 2021

Choose a reason for hiding this comment

xandrew-lynx Feb 15, 2021

Choose a reason for hiding this comment

olahg Feb 15, 2021

Choose a reason for hiding this comment

darabos commented Feb 15, 2021

olahg commented Feb 16, 2021

darabos commented Mar 2, 2021

olahg left a comment

Choose a reason for hiding this comment

xandrew-lynx left a comment

Choose a reason for hiding this comment