Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support extensions when using a connection pool #62

Closed
begelundmuller opened this issue Dec 21, 2022 · 1 comment · Fixed by #65
Closed

Support extensions when using a connection pool #62

begelundmuller opened this issue Dec 21, 2022 · 1 comment · Fixed by #65

Comments

@begelundmuller
Copy link
Contributor

begelundmuller commented Dec 21, 2022

We want to use the built-in connection pool in database/sql to run concurrent DuckDB queries. So far, we have not done so (disabled by setting db.SetMaxOpenConns(1)) because of two consistency problems:

  1. Changes made in one connection were not immediately available in other connections – that should be fixed by conn pool consistency fix #61
  2. Extensions need to be loaded on each new connection (they're not shared), but database/sql automatically creates new connections (and to my knowledge, there's no hook to do custom init)

We would like to submit a PR that addresses point 2. I can think of a two different solutions:

  1. Support a custom ?extensions=json,parquet,... config syntax, which the driver intercepts. It will then call INSTALL for each specified extension when the DB is opened, and subsequently call LOAD for each extension when a new connection is created.
  2. Add an init callback option to the connector, which you can pass using sql.OpenDB – e.g. something like sql.OpenDB(duckdb.NewConnector(dsn, connInitFn)) where connInitFn will be invoked each time a new connection is created for the DB handle.

What do you think of these options – or do you have a better idea for how we can solve this problem?

As an aside, we have done some benchmarking of using multiple DuckDB connections, and achieved a 2x performance boost for processing 1k queries when using a connection pool of 10 (matching the number of cores on the system) instead of 1.

@begelundmuller begelundmuller changed the title Support extensions when using a connection pool on each connection Support extensions when using a connection pool Dec 21, 2022
@marcboeker
Copy link
Owner

Interesting find, that the extensions are not shared with new connections. From a simplicity point of view I would go with option 1. But it is a special case, which could be confusing, as currently all DSN parameters are directly passed to DuckDB.

Option 2 seems very elegant, flexible and extendable for further features to me, but looks like more work.

tl;dr: I'm fine with both 🙂

Thanks in advance. It's great to see, how much you are contributing to this project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants