Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

database/sql: add support for returning multiple result sets #12382

Closed
kardianos opened this issue Aug 28, 2015 · 30 comments
Closed

database/sql: add support for returning multiple result sets #12382

kardianos opened this issue Aug 28, 2015 · 30 comments
Milestone

Comments

@kardianos
Copy link
Contributor

@kardianos kardianos commented Aug 28, 2015

Many databases support returning multiple result sets. RDBMS that support this include MySQL, Postgresql, MS SQL Server, Oracle.

An example query that demonstrates this is:

select * from app.Users;
select * from app.Dancers;

The above would be expected to return two data sets, each with their own arity.

The above example is simple to see the effects, but is not motivating. A more motivating example usually includes some degree of procedural logic where multiple arities should be preserved and returned separately.
Example (1) Select data to form a matrix on the client:

    // Get Sample data rows.
    // Get SampleResultValue rows.
    // Get distinct list of Locus used data rows.
    sql.WriteString(`
    declare @Sample table (ID bigint);
    declare @Locus table (ID bigint);

    insert into @Sample
    select s.ID
    from
        data.Sample s
        join data.OrganizationLOB ol on ol.ID = s.OrganizationLOB
    where
        s.ID in (select ID from @SampleSelect)
        and s.Deleted = 0
    ;

    select
        s.ID,
        ls.SampleIdentifier,
        ls.ExternalIdentifier,
        s.Outcome,
        s.DateCollected
    from
        data.Sample s
        join datav.LabelSample ls on ls.Sample = s.ID
    where
        s.ID in (select ID from @Sample)

    select
        srv.Sample, srv.Value, srv.Interpretation, ResultValueType = rvt.ID
    from
        data.SampleResultValue srv
        join data.ResultValueType rvt on rvt.ID = srv.ResultValueType
    where
        srv.Sample in (select ID from @Sample)
    ;

    select
        rvt.ID, Name = l.Name + ' ' + rft.Name
    from
        data.ResultValueType rvt
        join enum.ResultFieldType rft on rft.ID = rvt.FieldType
        join data.Locus l on l.ID = rvt.Locus
    where
        rvt.ID in (
            select distinct
                srv.ResultValueType
            from
                data.SampleResultValue srv
            where
                srv.Sample in (select ID from @Sample)
        )
    order by
        rvt.SortOrder
    ;
    `)

Example (2) make an update, but return the updated data to the client so it can update its UI.

    res, err := Query(db.Conn, &rdb.Command{
        Converter: conv,
        Sql: `
insert into data.SamplePanel (
    Sample, Panel,
    DateDue,
    OrderGroup,
    Deleted, TimeCreated, AccountCreated, TimeUpdated, AccountUpdated
    )
output ` + listColumns(db, "inserted.", "data.SamplePanel") + `
select
    s.ID, sgp.Panel,
    data.WorkDayAdd(s.DateReceived, data.OrganizationPanelTAT(@Today, ol.Organization, sgp.Panel)),
    1,
    0, @Now, @Account, @Now, @Account
from
    data.SampleGroupPanel sgp
    join data.Sample s on s.SampleGroup = sgp.SampleGroup
    join data.OrganizationLOB ol on ol.ID = s.OrganizationLOB
where
    s.ID = @Sample
;

--See also PostUpdate
insert into data.SampleLocus (
    Sample, Locus,
    TimeCreated, AccountCreated, TimeUpdated, AccountUpdated
    )
output ` + listColumns(db, "inserted.", "data.SampleLocus") + `
select distinct
    @Sample, pl.Locus,
    @Now, @Account, @Now, @Account
from
    data.PanelLocus pl
    join data.SamplePanel sp on sp.Panel = pl.Panel
where
    sp.Sample = @Sample
;

select
    s.ID, s.Price
from
    datav.Sample s
where
    s.ID = @Sample
;
    `,
    },
        rdb.Param{Name: "Sample", Type: rdb.Integer, Value: sample},
        rdb.Param{Name: "Account", Type: rdb.Integer, Value: userInfo.ID},
        rdb.Param{Name: "Now", Type: rdb.Time, Value: time.Now().UTC()},
        rdb.Param{Name: "Today", Type: rdb.TypeDate, Value: time.Now()},
    )
@ianlancetaylor ianlancetaylor added this to the Unplanned milestone Aug 29, 2015
@kostya-sh
Copy link
Contributor

@kostya-sh kostya-sh commented Aug 29, 2015

I might be missing something but why not to call Query twice?

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Aug 29, 2015

@kostya-sh In the simple example I gave up top, yes, that would be fine.

In every case you could return results and and re-query the next segment. In larger queries that have many steps and many queries in them that rely on the data passed to them this quickly becomes sub-optimal, possibly a deal breaker for large datasets where you don't want to send intermediate result sets back and forth. Again you can work around that with global or session temp tables, but again that isn't always optimal.

@kostya-sh
Copy link
Contributor

@kostya-sh kostya-sh commented Aug 31, 2015

@kardianos, thank you for the clarification. I have never used multiple return results myself but I agree that there are some rare cases when they could be useful.

Just for the reference the JDBC API supports multiple result sets via execute() and getMoreResults().

@nightlyone
Copy link
Contributor

@nightlyone nightlyone commented Aug 31, 2015

Another motivating examples are stored procedures. Those are often provided to keep the API for complex operations stable, while allowing changes to the actual operation that is performed.

See go-sql-driver/mysql#66 for how cumbersome using stored procedures actually is in real world cases with Go and database/sql

@petermattis
Copy link

@petermattis petermattis commented Feb 3, 2016

The lack of an API in database/sql for retrieving multiple results can almost be hacked around at the driver level. Take a look at lib/pq#425 for an example of achieving this in the lib/pq postgres driver. This hack is sort of ugly, though, as it relies on the driver recognizing a special SQL statement (NEXT) in order to return subsequent results. Besides the ugliness, it is incomplete as it relies on the NEXT query being sent to the same sql/database/driver.Conn as the original query. This works fine on Tx structures, but not for plain calls to DB.Query (unless you do some other hack like DB.SetMaxConns(1)).

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Feb 3, 2016

I have an alternate API that I'm using that works when tables are larger and queries more complex. The implementation is not the best but the API is roughly what I need: https://bitbucket.org/kardianos/rdb/src .

@realpg
Copy link

@realpg realpg commented May 9, 2016

Agreed. I wanna say a lot of procedures return multiple result sets, and we can't modify them because they are part of existing systems.

@joegrasse
Copy link

@joegrasse joegrasse commented Aug 10, 2016

Just curious, since the milestone is set to unplanned, does this mean it probably won't get implemented for a long time?

/cc @bradfitz

@ianlancetaylor
Copy link
Contributor

@ianlancetaylor ianlancetaylor commented Aug 10, 2016

The unplanned milestone means that nobody plans to work on it. That could mean that it will never get implemented. Or, if somebody decides to work on it, it could get implemented in the next release. Go is an open source project, anybody could fix this, even you.

@joegrasse
Copy link

@joegrasse joegrasse commented Aug 11, 2016

So just thinking about how this would work. Does something like the following make sense, minus the implementation details?

func (db *DB) QueryResults(query string, args ...interface{}) (*Results, error) {

}

type Results struct {

} 

func (rslt *Results) Next() bool {

}

func (rslt *Results) Rows() (*Rows, error) {

}

func (rslt *Results) Err() error {

}

func (rslt *Results) Close() error {

}

usage example:

results, err := db.QueryResults("SomeQuery")
if err != nil {
  log.Fatal(err)
}
defer results.Close()
for results.Next() {
  rows, err := results.Rows()
  if err != nil {
    log.Fatal(err)
  }
  defer rows.Close()
  for rows.Next() {
    // process rows
  }
  if err := rows.Err(); err != nil {
    log.Fatal(err)
  }
}
if err := results.Err(); err != nil {
  log.Fatal(err)
}

/cc @bradfitz @ianlancetaylor @julienschmidt

@freeformz
Copy link
Contributor

@freeformz freeformz commented Aug 11, 2016

An alternative, more minimal option maybe?

// NextResultSet returns the next result set from the query
// NextResultSet always returns a non-nil value.
// Errors are deferred until *Rows.Scan method is called.
func (r *Rows) NextResultSet() *Rows

Example:

func DoQuery() error {
  rows, err := db.Query("query with 2 result sets")
  if err != nil {
    return err
  }
  defer rows.Close()
  // Handle result set 1
  for rows.Next() {
    err := rows.Scan(...)
  }
  if err := rows.Err(); err != nil {
    return err
  }
  rows = rows.NextResultSet()
  defer rows.Close()
  for rows.Next() {
    err := rows.Scan(...)
    if err == sql.ErrNoRows {
      //Application error
    }
  }
  return rows.Err()

This is not perfect, and I'm not even sure it's possible, but it looks like it could be as Rows owns the connection.

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Aug 12, 2016

@freeformz Yes, I think Rows.NextResultSet() would be more ideal. With the exception of certain drivers that give you the option of multiplexing multiple result at the same time (uncommonly used, only a few RDBMS support), result sets are streamed sequentially.

However, I think the signature would be func (Rows) NextResultSet() bool (maybe also return an error?) and reuse the existing rows statement. After all, the previous result set can't be accessed anymore anyway and a new column list should be fetched as well.

@bradfitz Opinion on this signature?

@freeformz
Copy link
Contributor

@freeformz freeformz commented Aug 12, 2016

@kardianos FWIW: Rows.NextResultSet() bool is where I started, then I wrote out an example and didn't like it. I went through 2 more iterations before I got to the one I pasted. Maybe I had some implied misconceptions though. Can you provide an example of how you think NextResultSet() bool would be used?

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Aug 12, 2016

I was thinking something like this. But I also tend to push alot to frameworks too.

func DoQuery() error {
  rows, err := db.Query("query with 2 result sets")
  if err != nil {
    return err
  }
  defer rows.Close()

  for {
    cols, err := rows.Columns()
    if if err != nil {
        return err
    }
    for rows.Next() {
      err := rows.Scan(...)
    }
    if err := rows.Err(); err != nil {
        return err
    }
    if !rows.NextResultSet() {
        return nil
    }
  }
  panic("unreachable")
}

@freeformz
Copy link
Contributor

@freeformz freeformz commented Aug 12, 2016

So...

// NextResultSet prepares *Rows for the next result set and returns true.
// If there are no more result sets it returns false.
func (r *Rows) NextResultSet() bool

But you probably don't want to use an enclosing for { ... } loop though as you will likely have to keep track of which result set your are processing as you'll likely be (using your example at the top) Scaning into different structs depending on which result set you are processing. So probably more like

func DoQuery() error {
  rows, err := db.Query("query with 2 result sets")
  if err != nil {
    return err
  }
  defer rows.Close()
  // Handle result set 1
  for rows.Next() {
    if err := rows.Scan(&structTypeA); err != nil {
      return err
    }
    ...
  }
  if err := rows.Err(); err != nil {
    return err
  }
  if !rows.NextResultSet() {
    panic("Expected more rows!")
  }
  for rows.Next() {
    if err := rows.Scan(&structTypeB); err != nil {
      return err
    }
    ...
  }
  if err := rows.Err(); err != nil {
    return err
  }
  if rows.NextResultSet() {
    panic("Didn't export more rows!")
  }
}
@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Aug 12, 2016

Yeah, my example could be how you generically buffer all the result sets up before you deal with them.

Your example would be how you might use it interactively.

The downside is that rows.NextResultSet() would involve reading the wire, so reading rows.Err() would be needed to call afterwords. So rather then panicing you'd probably want to `return errors.Wrap(rows.Err(), "expected more rows") or something like that.

It would work. I think it would be better then Rows.NextResultSet() (bool, error) as wouldn't need to deal with odd combinations, and behavior would depend on what order they were checked.

@freeformz
Copy link
Contributor

@freeformz freeformz commented Aug 12, 2016

I would expect any error to cause NextResultSet() to return false, so yeah.

if !rows.NextResultSet() {
  return errors.wrap(rows.Err(), "expected another result set")
}
@asifjalil
Copy link

@asifjalil asifjalil commented Aug 31, 2016

If NextResultSet returns bool and error only, then how will we get the next result set?

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Aug 31, 2016

By calling Result.Next(). Internally, multiple result sets come linearly off the wire, one after another. 1 query results in N sequential result sets.

@andlabs
Copy link
Contributor

@andlabs andlabs commented Sep 2, 2016

My coworker is proposing a stored procedure API that works akin to

CREATE PROCEDURE GetThing(thingID INTEGER, getRelatedThing1 BOOLEAN, getRelatedThing2 BOOLEAN)

where the result set can have one to three sets of rows. How about something like this?

func GetThing(t *sql.Tx, thingID int, getRelatedThing1 bool, getRelatedThing2 bool) (t *Thing, r1 *Related, r2 *Related2, err error) {
    r, err := t.Query("CALL GetThing(?, ?, ?);", thingID, getRelatedThing1, getRelatedThing2)
    if err != nil { return nil, nil, nil err }
    defer r.Close()

    for r.Next() {
        // r.Scan() into t
    }
    if err := r.Err(); err != nil { return nil, nil, nil, err }

    if getRelatedThing1 {
        for r.Next() {
            // r.Scan() into r1
        }
        if err := r.Err(); err != nil { return nil, nil, nil, err }
    }

    if getRelatedThing2 {
        for r.Next() {
            // r.Scan() into r2
        }
        if err := r.Err(); err != nil { return nil, nil, nil, err }
    }

    return
}

Or should the NextResultSet() be explicit? In that case

    if getRelatedThing1 {
        if err := r.NextSet(); err != nil { return nil, nil, nil, err }
        for r.Next() {

?

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Sep 2, 2016

@andlabs I don't understand what you are trying to do. Please ensure you have read and understood the example in #12382 (comment) .

@andlabs
Copy link
Contributor

@andlabs andlabs commented Sep 2, 2016

Hm, yeah; reading it again it seems mostly equivalent to what I have anyway.

My question is is the NextResultSet() function really necessary? Or should just another call to Next() suffice? So repeated calls to Next() will return a set like

true  // result set one - three rows
true
true
false // end of result set one
true  // result set two - two rows
true
false // end of result set two
false // result set three - no rows
true  // result set four - one row
false // end of result set four
false // no more result sets; return false forever
false
.
.
.

If one of the result sets has no rows, then it'd just return false twice in a row. (This should not have any problems because of how Next() loops are already written; your own data structures would just be empty.)

If NextResultSet() is necessary, perhaps it should be

func (r *Rows) NextResultSet() (err error)

If the result set is empty, nil is returned, but the next call to Next() will immediately return false.

My reasoning for this is that since you know how many result sets the SQL query should return (and if you don't, why are you making a random query?), and you can assume there is no bug in the driver package, the only scenario for which there would be no more result sets early is if some error occurred; for example, a network connection problem. The final check in the linked comment is also not necessary for the same reason; once you have read the last row of what you already know is the last result set, you can safely just Close().

(Of course, with that thought process, we can conclude that NextResultSet() is still unnecessary; that can just be rolled into Next(), and the error handling remains simple.)

One additional question: how will stored procedure that issue statements that return a Result instead of rows be handled, such as an INSERT statement?

@noblehng
Copy link

@noblehng noblehng commented Sep 12, 2016

I agree with @andlabs , implicit calling *.Rows.Next() or explicit calling a new *Rows.NextResultSet() could both works. And the underlying database/sql/driver.Rows interface doesn't have to change too.

When a result set had been read to io.EOF, the next call to Next() method of the underlying driver should return a error (io.EOF or a ErrNoNewResultSet) if all result sets had been consumed, otherwise it should return nil and populate column nams from the next result set for the following Columns() calls.

And methods of database/sql.*Rows should modify accordingly. It will need to call the underlying driver's Next() method again (explicitly or impliedly in its own Next() method) and reset the closed flag and lasterr, if needed.

And we could use extra arguments in Exec() and Query() for output variable binding, which put after all the input parameter binding args. This way drivers can bind them and kown the number of output result sets or variables. The output binding could pass a pointer to a Value type if the output is a Value, or pass a integer for positional and a string for named ref cursor binding.

As for interleaved multiple result sets fetching support like MSSQL's MARS, as @kardianos pointed out, it is uncommon, so I think it could be left out. And I think true session mutiplexing like Oracle supports is better, which could also fits in here.

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Sep 12, 2016

@andlabs @noblehng I don't think that would be (1) backwards compatible and (2) you are creating an ad-hoc API that does two different things.

If a current code base calls Next() even after it returns false, it could still work, even if non-optimal. However, with this suggestion it would break. I will grant you it could be considered a bug in the usage. I really think we are doing two logical things.

I agree just using Next() is possible, I think it is just less optimal.

I think output parameters support should be included in the named parameter discussion.

Right, Ii don't want to support MARS or Oracle session multiplexing. I have yet to encounter any code in any organization I've encountered use these features. I just want to support sequential multiple result sets.

@andlabs
Copy link
Contributor

@andlabs andlabs commented Sep 12, 2016

In that case what do you think of what I said about NextResultSet() returning an error?

@noblehng
Copy link

@noblehng noblehng commented Sep 13, 2016

@kardianos It wouldn't break current code bases, because current code bases are all dealing with single result set, so calling Next() after consuming the result set will still keep returning false. But I do also think it is better to be more explicit at the database/sql surface with a new *Rows.NextResultSet() api.

I just think the underlying database/sql/driver interface for driver implementations doesn't has to be changed. It will be backward compatible because old drivers dealing only with single result set will always return a io.EOF after io.EOF. The other way to keep backward compatible with old drivers will be adding a separated interface in database/sql/driver because we can't just add that to the Rows interface, and testing against that interface in database/sql, which could still break if the new interface method name had already being used by old driver.

*Rows.NextResultSet() could be:

func (rs *Rows) NextResultSet() error {
    if !rs.closed {
         return ErrProcessing
    }
    if err := rs.rowsi.Next(nil); err != nil {
        return err
    }
    rs.lasterr, rs.closed = nil, false
    return nil
}
@noblehng
Copy link

@noblehng noblehng commented Sep 13, 2016

The meaning of database/sql.*Rows.Close() also needs to be changed too. Whether we are closing a result set or aborting all remained result sets. This could be a bigger problem for backward compability.

Currently, Close() will close the statement and release the connection to the pool, and it is impliedly called in Next() if the underlying driver's Next() return a error. For user code backward compability, we should keep the semantic and behavior of Close(). Then we will need a *Rows.CloseResultSet() and change Next() to calling that instead of Close(). This can be done by adding a resultsetClosed field to Rows, and NextResultSet() should use rs.resultsetClosed instead of rs.closed.

Now to implement of *Rows.CloseResultSet(), it could still use the underlying driver's Close() method like NextResultSet() use Next() of underlying driver, but it would be weird for driver to handle that Close() means CloseResultSet if the result set is open and means CloseCommand otherwise.

So I think it would be better adding a new interface to database/sql/driver:

type ResultSetsIter interface {
    CloseResultSet() error
    NextResultSet() error
}

And in database/sql:

func (rs *Rows) NextResultSet() error {
    if err := rs.lasterr; err != nil && err != io.EOF {
         return err
    }
    if !rs.resultsetClosed {
         return ErrProcessing
    }
    next, ok := rs.rowsi.(driver.ResultSetsIter)
    if !ok {
        return ErrNotSupport
    }
    if err := next.NextResultSet(); err != nil {
        return err
    }
    rs.resultsetClosed = false
    return nil
}

func  (rs *Rows) CloseResultSet() error {
    if rs.resultsetClosed {
        return nil
    }
    rs.resultsetClosed = true
    next, ok := rs.rowsi.(driver.ResultSetsIter)
    if !ok {
        return nil
    }
    return next.CloseResultSet()
}

Then modify *Rows.Close() and *Rows.Next() in database/sql to call *Rows.CloseResultSet().

@kardianos
Copy link
Contributor Author

@kardianos kardianos commented Sep 13, 2016

@noblehng *Rows has an Err() method that is often used to get the error from and should be checked before returning anyway. *Rows.NextResultSet() also means "close" (read past) current result set. It is critical that *Rows.Close() means "close current open statement and return connection to pool".

I'm unsure at this time what the driver interface would look like, you could be correct there. Unsure.

@noblehng
Copy link

@noblehng noblehng commented Sep 13, 2016

@kardianos the "close" (read past) current result set part can be implicitly done inside NextResultSet(), but you still need a way to do it in the underlying driver. May as well be explicit and export the CloseResultset() method.

@gopherbot
Copy link

@gopherbot gopherbot commented Oct 6, 2016

CL https://golang.org/cl/30592 mentions this issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
You can’t perform that action at this time.