Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
In LynxKite 4.2.1 I've discovered an attribute corruption bug. I've seen all values or everything except the first value disappear from an attribute. It's happening somewhat randomly, but not on small data. In my repro I'm importing a CSV with 600,000 lines, using it as a graph, and out of the 5 attributes there is a good chance that one or more will be affected.
I've tracked it to
PulledOverVertexAttribute
. It gets a good Parquet file as the input and writes out a bad Arrow file (undefined almost everywhere). I've added debug stuff inPulledOverVertexAttribute
and saw that it sees the input attribute already as mostly undefined.I've added debug stuff in
unordered_disk_io.go
and here I saw this inpermutation[:10]
:[0 0 0 419538 1 419539 209908 209909 209910 2]
. That is not a permutation! Why does it have 0 three times?We create this array by sorting an array of [0, 1, 2, 3, ...] in
sortedPermutation
. But the sort call did deviate slightly from Sorty's docs. (https://pkg.go.dev/github.com/jfcg/sorty#Sort) I didn't checkif r != s
. Seemed unnecessary I guess? What's wrong with swapping an element with itself?But adding that
if
fixes the issue! Sorty sorts with a parallel algorithm. So I guess it's some horrible fallout from multi-threading? 馃あ