New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: should support a way to perform bulk actions #5171

Open
the42 opened this Issue Mar 31, 2013 · 12 comments

Comments

Projects
None yet
8 participants
@the42

the42 commented Mar 31, 2013

The current implementation of database/sql doesn't provide a way to insert multiple rows
at once, without getting to the wire upon every call to

db.Exec

There are APIs outside which either provide a general way for bulk actions

cf.  SQLBulkCopy of ADO.NET 2.0 [1]

or has a specifier upto how many statements should be grouped together

cf. Batch Update of ADO.NET 2.0 DataAdapter Object [1]

or simply supports an array to be bound to Exec, open which Exec iterates internally,
preventing execessive wire communication. [2]



[1] Codeproject, "Multiple Ways to do Multiple Inserts"
http://www.codeproject.com/Articles/25457/Multiple-Ways-to-do-Multiple-Inserts
[2] Python PEP 249 -- Python Database API Specification v2.0
http://www.python.org/dev/peps/pep-0249/#executemany
@bradfitz

This comment has been minimized.

Show comment
Hide comment
@bradfitz

bradfitz Mar 31, 2013

Member

Comment 1:

Labels changed: added priority-later, suggested, removed priority-triage.

Status changed to Accepted.

Member

bradfitz commented Mar 31, 2013

Comment 1:

Labels changed: added priority-later, suggested, removed priority-triage.

Status changed to Accepted.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Nov 27, 2013

Contributor

Comment 2:

Labels changed: added go1.3maybe.

Contributor

rsc commented Nov 27, 2013

Comment 2:

Labels changed: added go1.3maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 3:

Labels changed: added release-none, removed go1.3maybe.

Contributor

rsc commented Dec 4, 2013

Comment 3:

Labels changed: added release-none, removed go1.3maybe.

@rsc

This comment has been minimized.

Show comment
Hide comment
@rsc

rsc Dec 4, 2013

Contributor

Comment 4:

Labels changed: added repo-main.

Contributor

rsc commented Dec 4, 2013

Comment 4:

Labels changed: added repo-main.

@kardianos

This comment has been minimized.

Show comment
Hide comment
@kardianos

kardianos Aug 19, 2014

Contributor

Comment 5:

Of note, doing this well would require new database/sql API for both the driver and the
front end code.
Having implemented several protocols that include a bulk copy method, this needs to be
called out as different, as much of the control that is available in a Insert statement
is exposed differently then in SQL.
I'm wary of any suggestions to bind to an array, as arrays are legitimate data types in
several rdbms. I'll be implementing a general interface shortly in the rdb front end.
Send me a line if you'd like to discuss.
Contributor

kardianos commented Aug 19, 2014

Comment 5:

Of note, doing this well would require new database/sql API for both the driver and the
front end code.
Having implemented several protocols that include a bulk copy method, this needs to be
called out as different, as much of the control that is available in a Insert statement
is exposed differently then in SQL.
I'm wary of any suggestions to bind to an array, as arrays are legitimate data types in
several rdbms. I'll be implementing a general interface shortly in the rdb front end.
Send me a line if you'd like to discuss.
@bmharper

This comment has been minimized.

Show comment
Hide comment
@bmharper

bmharper Jan 27, 2015

For what it's worth - I've just tried using the lib/pq driver's Copy functionality to do bulk loading, and although I can't comment on whether this would work for other DB drivers, it seems like a reasonable API.

For what it's worth - I've just tried using the lib/pq driver's Copy functionality to do bulk loading, and although I can't comment on whether this would work for other DB drivers, it seems like a reasonable API.

@perillo

This comment has been minimized.

Show comment
Hide comment
@perillo

perillo Mar 7, 2016

The standard syntax for multi-value INSERT is (from PostgreSQL documentation):

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

What about adding a Value type that can be passed as the args argument to Exec or Query?

As an example:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

This will not require any changes to the existing interface.

perillo commented Mar 7, 2016

The standard syntax for multi-value INSERT is (from PostgreSQL documentation):

INSERT INTO films (code, title, did, date_prod, kind) VALUES
    ('B6717', 'Tampopo', 110, '1985-02-10', 'Comedy'),
    ('HG120', 'The Dinner Game', 140, DEFAULT, 'Comedy');

What about adding a Value type that can be passed as the args argument to Exec or Query?

As an example:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

This will not require any changes to the existing interface.

@kostya-sh

This comment has been minimized.

Show comment
Hide comment
@kostya-sh

kostya-sh Mar 7, 2016

Contributor

This will require holding data for the whole batch in memory. Also this proposal doesn't allow to batch updates if supported by database.

FYI there is implementation of batch insert in github.com/lib/pq based on COPY.

Contributor

kostya-sh commented Mar 7, 2016

This will require holding data for the whole batch in memory. Also this proposal doesn't allow to batch updates if supported by database.

FYI there is implementation of batch insert in github.com/lib/pq based on COPY.

@perillo

This comment has been minimized.

Show comment
Hide comment
@perillo

perillo Mar 7, 2016

On Mon, Mar 7, 2016 at 4:17 PM, kostya-sh notifications@github.com wrote:

This will require holding data for the whole batch in memory.

Of course, this is required by standard INSERT statement.

Also this proposal doesn't allow to batch updates if supported by database.

By batch, do you mean multiple SQL statements in the same query, separated
by semicolon?

I will be happy with just the support for multi-value INSERT, since it is
directly supported by the SQL standard (using the VALUES construct):

type Tuple []interface{}

type Values []Tuple

FYI there is implementation of batch insert in github.com/lib/pq based on

COPY.

COPY is not standard, and the github.com/lib/pq seems (just looking at the
API) to store the whole batch in memory.

perillo commented Mar 7, 2016

On Mon, Mar 7, 2016 at 4:17 PM, kostya-sh notifications@github.com wrote:

This will require holding data for the whole batch in memory.

Of course, this is required by standard INSERT statement.

Also this proposal doesn't allow to batch updates if supported by database.

By batch, do you mean multiple SQL statements in the same query, separated
by semicolon?

I will be happy with just the support for multi-value INSERT, since it is
directly supported by the SQL standard (using the VALUES construct):

type Tuple []interface{}

type Values []Tuple

FYI there is implementation of batch insert in github.com/lib/pq based on

COPY.

COPY is not standard, and the github.com/lib/pq seems (just looking at the
API) to store the whole batch in memory.

@kostya-sh

This comment has been minimized.

Show comment
Hide comment
@kostya-sh

kostya-sh Mar 7, 2016

Contributor

By batch, do you mean multiple SQL statements in the same query, separated by semicolon?

Yes, this would require to support multiple return values though.

COPY is not standard, and the github.com/lib/pq seems (just looking at the API) to store the whole batch in memory.

COPY is not standard indeed but it is fast and the driver doesn't hold the whole batch in memory. Have a look at the implementation at https://github.com/lib/pq/blob/master/copy.go

I agree it would be nice to have generic batch API but it is quite difficult to design a single API that will allow drivers to choose the optimal method to implement batched operations. I think using driver library directly is a quite good compromise.

BTW, in postgresql it is also possible to use the following SQL for bulk insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

Contributor

kostya-sh commented Mar 7, 2016

By batch, do you mean multiple SQL statements in the same query, separated by semicolon?

Yes, this would require to support multiple return values though.

COPY is not standard, and the github.com/lib/pq seems (just looking at the API) to store the whole batch in memory.

COPY is not standard indeed but it is fast and the driver doesn't hold the whole batch in memory. Have a look at the implementation at https://github.com/lib/pq/blob/master/copy.go

I agree it would be nice to have generic batch API but it is quite difficult to design a single API that will allow drivers to choose the optimal method to implement batched operations. I think using driver library directly is a quite good compromise.

BTW, in postgresql it is also possible to use the following SQL for bulk insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

@perillo

This comment has been minimized.

Show comment
Hide comment
@perillo

perillo Mar 7, 2016

On Mon, Mar 7, 2016 at 5:47 PM, kostya-sh notifications@github.com wrote:

[...]

BTW, in postgresql it is also possible to use the following SQL for bulk
insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

This is what I was speaking about. And it is not PostgreSQL specific, but
SQL standard.
It does not use an array, but the VALUES statement:
http://www.postgresql.org/docs/9.5/static/sql-values.html

In Go, it can be defined, e.g.:

type Tuple []interface{} // Since Row and Value are already defined

type Values []Tuple

This have the advantage that a Values value can be specified as a parameter
to Query or Exec function without changing the sql package API.

perillo commented Mar 7, 2016

On Mon, Mar 7, 2016 at 5:47 PM, kostya-sh notifications@github.com wrote:

[...]

BTW, in postgresql it is also possible to use the following SQL for bulk
insert:

INSERT INTO mytable (col1, col2, col3) VALUES (unnest(?), unnest(?), unnest(?))

I haven't use it though and I don't know if Go driver supports arrays.

This is what I was speaking about. And it is not PostgreSQL specific, but
SQL standard.
It does not use an array, but the VALUES statement:
http://www.postgresql.org/docs/9.5/static/sql-values.html

In Go, it can be defined, e.g.:

type Tuple []interface{} // Since Row and Value are already defined

type Values []Tuple

This have the advantage that a Values value can be specified as a parameter
to Query or Exec function without changing the sql package API.

@natemurthy

This comment has been minimized.

Show comment
Hide comment
@natemurthy

natemurthy Apr 27, 2018

@perillo I tried your method:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

but results in the error:

     |  Error:          Expected nil, but got: &errors.errorString{s:"sql: converting argument $1 type: unsupported type []Value, a slice of slice"}

natemurthy commented Apr 27, 2018

@perillo I tried your method:

type Value []interface{} // defined in the sql package

batch := []Value
for i := 0; i < N; i++ {
    batch = append(batch, Value{1, 1.3, "x"})
}
db.Exec("INSERT INTO films (code, title, did, date_prod, kind) VALUES ?", batch)

but results in the error:

     |  Error:          Expected nil, but got: &errors.errorString{s:"sql: converting argument $1 type: unsupported type []Value, a slice of slice"}
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment