Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: should support a way to perform bulk actions #5171

Open
the42 opened this issue Mar 31, 2013 · 16 comments
Open

database/sql: should support a way to perform bulk actions #5171

the42 opened this issue Mar 31, 2013 · 16 comments

Comments

@the42
Copy link

@the42 the42 commented Mar 31, 2013

The current implementation of database/sql doesn't provide a way to insert multiple rows
at once, without getting to the wire upon every call to

db.Exec

There are APIs outside which either provide a general way for bulk actions

cf.  SQLBulkCopy of ADO.NET 2.0 [1]

or has a specifier upto how many statements should be grouped together

cf. Batch Update of ADO.NET 2.0 DataAdapter Object [1]

or simply supports an array to be bound to Exec, open which Exec iterates internally,
preventing execessive wire communication. [2]



[1] Codeproject, "Multiple Ways to do Multiple Inserts"
http://www.codeproject.com/Articles/25457/Multiple-Ways-to-do-Multiple-Inserts
[2] Python PEP 249 -- Python Database API Specification v2.0
http://www.python.org/dev/peps/pep-0249/#executemany
@bradfitz

This comment has been minimized.

Copy link
Contributor

@bradfitz bradfitz commented Mar 31, 2013

Comment 1:

Labels changed: added priority-later, suggested, removed priority-triage.

Status changed to Accepted.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Nov 27, 2013

Comment 2:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 3:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Copy link
Contributor

@rsc rsc commented Dec 4, 2013

Comment 4:

Labels changed: added repo-main.

@kardianos

This comment has been minimized.

Copy link
Contributor

@kardianos kardianos commented Aug 19, 2014

Comment 5:

Of note, doing this well would require new database/sql API for both the driver and the
front end code.
Having implemented several protocols that include a bulk copy method, this needs to be
called out as different, as much of the control that is available in a Insert statement
is exposed differently then in SQL.
I'm wary of any suggestions to bind to an array, as arrays are legitimate data types in
several rdbms. I'll be implementing a general interface shortly in the rdb front end.
Send me a line if you'd like to discuss.
@bmharper

This comment has been minimized.

Copy link

@bmharper bmharper commented Jan 27, 2015

For what it's worth - I've just tried using the lib/pq driver's Copy functionality to do bulk loading, and although I can't comment on whether this would work for other DB drivers, it seems like a reasonable API.

@perillo

This comment has been minimized.

Copy link

@perillo perillo commented Mar 7, 2016

The standard syntax for multi-value INSERT is (from PostgreSQL documentation):

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

What about adding a Value type that can be passed as the args argument to Exec or Query?

As an example:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

This will not require any changes to the existing interface.

@kostya-sh

This comment has been minimized.

Copy link
Contributor

@kostya-sh kostya-sh commented Mar 7, 2016

This will require holding data for the whole batch in memory. Also this proposal doesn't allow to batch updates if supported by database.

FYI there is implementation of batch insert in github.com/lib/pq based on COPY.

@perillo

This comment has been minimized.

Copy link

@perillo perillo commented Mar 7, 2016

On Mon, Mar 7, 2016 at 4:17 PM, kostya-sh notifications@github.com wrote:

This will require holding data for the whole batch in memory.

Of course, this is required by standard INSERT statement.

Also this proposal doesn't allow to batch updates if supported by database.

By batch, do you mean multiple SQL statements in the same query, separated
by semicolon?

I will be happy with just the support for multi-value INSERT, since it is
directly supported by the SQL standard (using the VALUES construct):

type Tuple []interface{}

type Values []Tuple

FYI there is implementation of batch insert in github.com/lib/pq based on

COPY.

COPY is not standard, and the github.com/lib/pq seems (just looking at the
API) to store the whole batch in memory.

@kostya-sh

This comment has been minimized.

Copy link
Contributor

@kostya-sh kostya-sh commented Mar 7, 2016

By batch, do you mean multiple SQL statements in the same query, separated by semicolon?

Yes, this would require to support multiple return values though.

COPY is not standard, and the github.com/lib/pq seems (just looking at the API) to store the whole batch in memory.

COPY is not standard indeed but it is fast and the driver doesn't hold the whole batch in memory. Have a look at the implementation at https://github.com/lib/pq/blob/master/copy.go

I agree it would be nice to have generic batch API but it is quite difficult to design a single API that will allow drivers to choose the optimal method to implement batched operations. I think using driver library directly is a quite good compromise.

BTW, in postgresql it is also possible to use the following SQL for bulk insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

@perillo

This comment has been minimized.

Copy link

@perillo perillo commented Mar 7, 2016

On Mon, Mar 7, 2016 at 5:47 PM, kostya-sh notifications@github.com wrote:

[...]

BTW, in postgresql it is also possible to use the following SQL for bulk
insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

This is what I was speaking about. And it is not PostgreSQL specific, but
SQL standard.
It does not use an array, but the VALUES statement:
http://www.postgresql.org/docs/9.5/static/sql-values.html

In Go, it can be defined, e.g.:

type Tuple []interface{} // Since Row and Value are already defined

type Values []Tuple

This have the advantage that a Values value can be specified as a parameter
to Query or Exec function without changing the sql package API.

@natemurthy

This comment has been minimized.

Copy link

@natemurthy natemurthy commented Apr 27, 2018

@perillo I tried your method:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

but results in the error:

     |  Error:          Expected nil, but got: &errors.errorString{s:"sql: converting argument $1 type: unsupported type []Value, a slice of slice"}
@pimbrouwers

This comment has been minimized.

Copy link

@pimbrouwers pimbrouwers commented Sep 5, 2018

Totally beyond the scope of golang. You could always manually open a transaction, process all your inserts individually (don't worry there's connection pooling), and commit the transaction. Avoiding the overhead of the implicit transaction, on each iteration, will be huge win.

For accessing the bulk features of the various RDBMS, like bcp in SQL Server, for example you can always save the csv to disk and use exec to run the batch.

@kardianos

This comment has been minimized.

Copy link
Contributor

@kardianos kardianos commented Sep 5, 2018

@pimbrouwers I disagree. I think it would be great to create a standard bulk data interface. Yes, opening a transaction will increase the speed of many inserts. But bulk interfaces are also useful. I personally hate relying on native commands like bcp to be present on the host system.

@pimbrouwers

This comment has been minimized.

Copy link

@pimbrouwers pimbrouwers commented Sep 6, 2018

Unless the standard interface has platform specific adapters, it won't work, because there isn't an adhered to standard for bulk action, for example SQL Server uses BULK INSERT... and MySQL uses LOAD DATA INFILE....

As far as relying on the presence of bulk tools, I see nothing wrong with that. You physically cannot install SQL Server with bcp, which SQL Server relies upon internally for so many things. So the presence of SQL Server means bcp is also present.

@kardianos

This comment has been minimized.

Copy link
Contributor

@kardianos kardianos commented Sep 6, 2018

Yes, like query protocols, each would need a driver.

I run my app on Linux and Ms SQL server on Windows. The app won't have bcp installed. Often different boxes or app in bare container. Or database hosted as a service.

@ALTree ALTree added help wanted and removed Suggested labels Jul 1, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
10 participants
You can’t perform that action at this time.