Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix attribute corruption #176

Merged
merged 2 commits into from May 6, 2021
Merged

Conversation

darabos
Copy link
Contributor

@darabos darabos commented Apr 30, 2021

In LynxKite 4.2.1 I've discovered an attribute corruption bug. I've seen all values or everything except the first value disappear from an attribute. It's happening somewhat randomly, but not on small data. In my repro I'm importing a CSV with 600,000 lines, using it as a graph, and out of the 5 attributes there is a good chance that one or more will be affected.

I've tracked it to PulledOverVertexAttribute. It gets a good Parquet file as the input and writes out a bad Arrow file (undefined almost everywhere). I've added debug stuff in PulledOverVertexAttribute and saw that it sees the input attribute already as mostly undefined.

I've added debug stuff in unordered_disk_io.go and here I saw this in permutation[:10]: [0 0 0 419538 1 419539 209908 209909 209910 2]. That is not a permutation! Why does it have 0 three times?

We create this array by sorting an array of [0, 1, 2, 3, ...] in sortedPermutation. But the sort call did deviate slightly from Sorty's docs. (https://pkg.go.dev/github.com/jfcg/sorty#Sort) I didn't check if r != s. Seemed unnecessary I guess? What's wrong with swapping an element with itself?

But adding that if fixes the issue! Sorty sorts with a parallel algorithm. So I guess it's some horrible fallout from multi-threading? 馃あ

Copy link
Contributor

@xandrew-lynx xandrew-lynx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this earlier. Looks good of course, thanks for tracking it down!

All this - but even the API itself... - really makes me wonder how on earth sorty does its magic!

@darabos darabos merged commit 6acea40 into master May 6, 2021
@darabos darabos deleted the darabos-attribute-corruption-fix branch May 6, 2021 16:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants