Skip to content

[query] fix bug in new dict decoder#13939

Merged
danking merged 1 commit intohail-is:mainfrom
patrick-schultz:dict-decoder-bugfix
Oct 30, 2023
Merged

[query] fix bug in new dict decoder#13939
danking merged 1 commit intohail-is:mainfrom
patrick-schultz:dict-decoder-bugfix

Conversation

@patrick-schultz
Copy link
Member

The decoder uses a StagedArrayBuilder to hold elements while being sorted. The array builder is stored in a class field. When the same decoder function is called more than once, that array builder is reused.

Before this fix, the array builder was never cleared, so if the decoder function was called more than once, the array builder would still contain the elements from previously decoded dicts.

Since it's highly non-obvious that you need to call clear immediately after new StagedCodeBuilder, this PR makes the constructor take a CodeBuilder, and always inserts a clear at the call site. I also took the opportunity to CodeBuilderify the rest of the interface.

@danking
Copy link
Contributor

danking commented Oct 28, 2023

It still seems like a foot gun that I can’t use two staged code builders at once. Like, if I was implementing a “partition” function that produced two arrays from one, I might naively try to use two staged array builders. Are you saying they’ll actually be the same array builder?

@patrick-schultz
Copy link
Member Author

patrick-schultz commented Oct 29, 2023

Using multiple staged array builders is fine. It's using a staged array builder in a function (or loop) which is called multiple times that's the issue.

Calling new StagedArrayBuilder(...) before only took a method builder, not a code builder. It created a new field on the containing class to hold the runtime array builder, and added code to the class constructor to initialize it. So if the generated code using that array builder is executed multiple times, it will be reusing the same array builder object in the same class field. But calling new StagedArrayBuilder multiple times will create multiple array builder objects in multiple fields.

The fix here is for the StagedArrayBuilder constructor to take a code builder, and to insert code to initialize the array builder to an empty state in the actual function, not just the containing class constructor.

@danking
Copy link
Contributor

danking commented Oct 30, 2023

@patrick-schultz OK, I understand now. Can you ping Ed or find someone else to review? Let's get this in post-haste.

@danking danking merged commit 28e56f7 into hail-is:main Oct 30, 2023
@ehigham
Copy link
Member

ehigham commented Oct 30, 2023

I'm a little puzzled by this solution. I feel that we shouldn't be modifying class composition in method calls; the structure of our code seems somewhat upside down. If anything the StagedArrayBuilder should only take a class builder.

@patrick-schultz patrick-schultz deleted the dict-decoder-bugfix branch January 2, 2025 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants