database/sql: should support a way to perform bulk actions #5171

Open
the42 opened this Issue Mar 31, 2013 · 11 comments

Projects

None yet

7 participants

@the42
the42 commented Mar 31, 2013
The current implementation of database/sql doesn't provide a way to insert multiple rows
at once, without getting to the wire upon every call to

db.Exec

There are APIs outside which either provide a general way for bulk actions

cf.  SQLBulkCopy of ADO.NET 2.0 [1]

or has a specifier upto how many statements should be grouped together

cf. Batch Update of ADO.NET 2.0 DataAdapter Object [1]

or simply supports an array to be bound to Exec, open which Exec iterates internally,
preventing execessive wire communication. [2]



[1] Codeproject, "Multiple Ways to do Multiple Inserts"
http://www.codeproject.com/Articles/25457/Multiple-Ways-to-do-Multiple-Inserts
[2] Python PEP 249 -- Python Database API Specification v2.0
http://www.python.org/dev/peps/pep-0249/#executemany
@bradfitz
Member

Comment 1:

Labels changed: added priority-later, suggested, removed priority-triage.

Status changed to Accepted.

@rsc
Contributor
rsc commented Nov 27, 2013

Comment 2:

Labels changed: added go1.3maybe.

@rsc
Contributor
rsc commented Dec 4, 2013

Comment 3:

Labels changed: added release-none, removed go1.3maybe.

@rsc
Contributor
rsc commented Dec 4, 2013

Comment 4:

Labels changed: added repo-main.

@kardianos
Contributor

Comment 5:

Of note, doing this well would require new database/sql API for both the driver and the
front end code.
Having implemented several protocols that include a bulk copy method, this needs to be
called out as different, as much of the control that is available in a Insert statement
is exposed differently then in SQL.
I'm wary of any suggestions to bind to an array, as arrays are legitimate data types in
several rdbms. I'll be implementing a general interface shortly in the rdb front end.
Send me a line if you'd like to discuss.
@bmharper

For what it's worth - I've just tried using the lib/pq driver's Copy functionality to do bulk loading, and although I can't comment on whether this would work for other DB drivers, it seems like a reasonable API.

@rsc rsc added this to the Unplanned milestone Apr 10, 2015
@perillo
perillo commented Mar 7, 2016

The standard syntax for multi-value INSERT is (from PostgreSQL documentation):

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

What about adding a Value type that can be passed as the args argument to Exec or Query?

As an example:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

This will not require any changes to the existing interface.

@kostya-sh
Contributor

This will require holding data for the whole batch in memory. Also this proposal doesn't allow to batch updates if supported by database.

FYI there is implementation of batch insert in github.com/lib/pq based on COPY.

@perillo
perillo commented Mar 7, 2016

On Mon, Mar 7, 2016 at 4:17 PM, kostya-sh notifications@github.com wrote:

This will require holding data for the whole batch in memory.

Of course, this is required by standard INSERT statement.

Also this proposal doesn't allow to batch updates if supported by database.

By batch, do you mean multiple SQL statements in the same query, separated
by semicolon?

I will be happy with just the support for multi-value INSERT, since it is
directly supported by the SQL standard (using the VALUES construct):

type Tuple []interface{}

type Values []Tuple

FYI there is implementation of batch insert in github.com/lib/pq based on

COPY.

COPY is not standard, and the github.com/lib/pq seems (just looking at the
API) to store the whole batch in memory.

@kostya-sh
Contributor

By batch, do you mean multiple SQL statements in the same query, separated by semicolon?

Yes, this would require to support multiple return values though.

COPY is not standard, and the github.com/lib/pq seems (just looking at the API) to store the whole batch in memory.

COPY is not standard indeed but it is fast and the driver doesn't hold the whole batch in memory. Have a look at the implementation at https://github.com/lib/pq/blob/master/copy.go

I agree it would be nice to have generic batch API but it is quite difficult to design a single API that will allow drivers to choose the optimal method to implement batched operations. I think using driver library directly is a quite good compromise.

BTW, in postgresql it is also possible to use the following SQL for bulk insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

@perillo
perillo commented Mar 7, 2016

On Mon, Mar 7, 2016 at 5:47 PM, kostya-sh notifications@github.com wrote:

[...]

BTW, in postgresql it is also possible to use the following SQL for bulk
insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

This is what I was speaking about. And it is not PostgreSQL specific, but
SQL standard.
It does not use an array, but the VALUES statement:
http://www.postgresql.org/docs/9.5/static/sql-values.html

In Go, it can be defined, e.g.:

type Tuple []interface{} // Since Row and Value are already defined

type Values []Tuple

This have the advantage that a Values value can be specified as a parameter
to Query or Exec function without changing the sql package API.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment