Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial DataChunk API support #233

Merged
merged 5 commits into from
Jun 15, 2024

Conversation

taniabogatsch
Copy link
Collaborator

This PR introduces a new DataChunk API. See the discussion here. Ideally, all interaction between go-duckdb and duckdb's data chunks eventually goes through this API without compromising performance. As an outlook, in #219, I further used this API for go-duckdb's scanner.

As a first step, this PR moves all interaction between the Appender and data chunks to the DataChunk API and its setter functions, further decreasing the complexity of the Appender code.

This PR adds the following functions:

  • func (chunk *DataChunk) InitFromTypes(ptr unsafe.Pointer, types []C.duckdb_logical_type) error
  • func (chunk *DataChunk) Destroy()
  • func (chunk *DataChunk) SetSize() error
  • func (chunk *DataChunk) SetValue(colIdx int, rowIdx int, val any) error

Additionally, I reverted the column index in the data chunk when returning an unsupported type error message to 0-based indexing.

Questions

  • I introduced a new helper.go file. However, I recommend renaming this file or moving its functions to a different place. What do you think?
  • The DataChunk, and by implication, the vector structs are becoming universal components without ties to the Appender. Thus, I renamed appender_vector.go to vector.go. Also, I added a new file, vector_setters.go, to create some functionality separation and decrease file sizes. Is this a good approach?

@taniabogatsch taniabogatsch added enhancement New feature or request feature [feature] request or PR labels Jun 7, 2024
@ajzo90
Copy link
Contributor

ajzo90 commented Jun 8, 2024

Nice! A few suggestions on the public interface.

Perhaps InitFromTypes and SetSize should be private methods (for now)? InitFromTypes because it's using internal arguments and SetSize since it's likely not necessary to expose publicly?

Destroy can also be private until we have cases where it needs to be public. If/when it goes public I think the signature should be Close() error to follow go naming conventions.

@taniabogatsch
Copy link
Collaborator Author

@ajzo90, good call, I've made them private. Do you have any other suggestions?

@marcboeker marcboeker merged commit 8eaf120 into marcboeker:main Jun 15, 2024
4 checks passed
@JAicewizard
Copy link
Contributor

Small question, is there a reason projection support wasn't added? If not could this still be added via another PR? and if yet should we make a new type to support projection?

@JAicewizard
Copy link
Contributor

JAicewizard commented Jun 16, 2024

Also for table functions SetSize is must be exposed, as a tablefunction might not want to fill the entire chunk, I will expose this in my PR. (Same for GetSize)

Copy link
Contributor

@JAicewizard JAicewizard left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I am just reviewing this now, I wasn't aware of the PR and didn't have any time to review this week. In the case these need to be fixed, maybe a fix PR would be appropriate?

vector.go Show resolved Hide resolved
data_chunk.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request feature [feature] request or PR ready to merge
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants