Skip to content
This repository has been archived by the owner on Jan 28, 2021. It is now read-only.

sql: implement EXPLODE and generators #720

Merged
merged 1 commit into from
Jun 3, 2019

Conversation

erizocosmico
Copy link
Contributor

This pull request implements the EXPLODE expressions and generators,
which are used together to provide generation of rows based on a
column or expression that can yield multiple values.

For this, there have been some quite core changes:

  • SQL method of sql.Type now returns the sqltypes.Value and an error.
    This avoids panics in this method. Since generators can yield errors
    it's better to just return an error and not just kill the server.
  • Array type can now internally handle either arrays or generators,
    making it transparent for the user. If it's used in an EXPLODE
    expression, the generator will be used. Otherwise, it will be
    converted automatically to an array and used without the user even
    knowing what happened behind the scenes. That way, we don't need
    yet another type to represent generators.

Aside from that, there are some new additions:

  • New EXPLODE function, which is just a placeholder. EXPLODE type is the
    underlying type of its argument for resolution purposes. During analysis
    Explode nodes will be replaced by Generate functions and a Generate node.
  • New Generate function, which is the non-placeholder version of EXPLODE
    and the one that goes into the final execution tree. This one returns
    the same type as its argument and let's the Generate plan node be the
    one that returns the underlying type once the values are generated.
  • Generate node, which wraps a Project in which there is one and only one
    explode expression.
  • resolve_generators analysis rule, which will turn projects with an
    Explode expression into a Generate node with the project as a children,
    replacing the Explode expressions with Generate expressions.
  • validate_explode_usage validation rule, which will ensure explode is
    not used outside a Project node.

@erizocosmico erizocosmico added the enhancement New feature or request label May 22, 2019
@erizocosmico erizocosmico requested a review from a team May 22, 2019 13:01
@erizocosmico erizocosmico self-assigned this May 22, 2019
sql/plan/generate.go Outdated Show resolved Hide resolved
sql/plan/generate.go Outdated Show resolved Hide resolved
@erizocosmico erizocosmico requested a review from a team May 29, 2019 09:16
var ErrNotGenerator = errors.NewKind("cannot convert value of type %T to a generator")

// ToGenerator converts a value to a generator if possible.
func ToGenerator(v interface{}) (Generator, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since it only really converts arrays, could be called ArrayToGenerator.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ArrayToGenerator suggests you need to pass an array and get a generator. This, however, receives any kind of value and returns a Generator or an error if the value could not be converted. The fact that arrays are converted is an implementation detail.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is expected to convert other types in the future? (like maps for example).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope, because there are not any other types that can be a generator besides arrays and Generator. But still calling it ArrayToGenerator implies the input is always an Array, which is not the case. It's meant to return the generator as is if the input is a generator, convert array to a generator and return it or fail in any other case.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, then since the interface{} doesn't provide documentation about it to the user this could be specified in the function doc.

Copy link
Contributor Author

@erizocosmico erizocosmico May 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The user does not need to know which types are handled here, that's an implementation detail. Only needs to know that if the value can be converted to a generator it will, if not, it will error.

erizocosmico added a commit to erizocosmico/go-mysql-server that referenced this pull request May 30, 2019
Depends on src-d#720

This PR implements struct types, which will be serialized to JSON
before being sent to the mysql client using the mysql wire proto.

Internally, structs are just `map[string]interface{}`, but they can
actually be anything that's convertible to that, because of the way
the Convert method of the struct type is implemented. That means,
the result of an UDF or a table that returns a struct may be an
actual Go struct and it will then be transformed into the internal
map. It does have a penalty, though, because structs require
encoding to JSON and then decoding into `map[string]interface{}`.
Structs have a schema, identical to a table schema, except their
`Source` will always be empty.

Resolution of columns has also been slightly change in order to
resolve getting fields from structs using the `.` operator, which
required some trade-offs in some rules, such as not erroring
anymore in `qualify_columns` when the table is not found. That
error was delegated to `resolve_columns` in order to make resolution
possible, as the syntax is a bit ambiguous.
The advantage of using dot is the fact that no changes have to be made
to the parser in order for it to work.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
erizocosmico added a commit to erizocosmico/go-mysql-server that referenced this pull request May 30, 2019
Depends on src-d#720

This PR implements struct types, which will be serialized to JSON
before being sent to the mysql client using the mysql wire proto.

Internally, structs are just `map[string]interface{}`, but they can
actually be anything that's convertible to that, because of the way
the Convert method of the struct type is implemented. That means,
the result of an UDF or a table that returns a struct may be an
actual Go struct and it will then be transformed into the internal
map. It does have a penalty, though, because structs require
encoding to JSON and then decoding into `map[string]interface{}`.
Structs have a schema, identical to a table schema, except their
`Source` will always be empty.

Resolution of columns has also been slightly change in order to
resolve getting fields from structs using the `.` operator, which
required some trade-offs in some rules, such as not erroring
anymore in `qualify_columns` when the table is not found. That
error was delegated to `resolve_columns` in order to make resolution
possible, as the syntax is a bit ambiguous.
The advantage of using dot is the fact that no changes have to be made
to the parser in order for it to work.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
@juanjux
Copy link
Contributor

juanjux commented Jun 3, 2019

Needs rebase, can then be merged? (I can use this on the BLAME PR for the integration tests).

@ajnavarro
Copy link
Contributor

LGTM after rebase.

@erizocosmico
Copy link
Contributor Author

Rebased

@ajnavarro
Copy link
Contributor

@erizocosmico could you check tests?

This pull request implements the EXPLODE expressions and generators,
which are used together to provide generation of rows based on a
column or expression that can yield multiple values.

For this, there have been some quite core changes:
- SQL method of sql.Type now returns the sqltypes.Value and an error.
  This avoids panics in this method. Since generators can yield errors
  it's better to just return an error and not just kill the server.
- Array type can now internally handle either arrays or generators,
  making it transparent for the user. If it's used in an EXPLODE
  expression, the generator will be used. Otherwise, it will be
  converted automatically to an array and used without the user even
  knowing what happened behind the scenes. That way, we don't need
  yet another type to represent generators.

Aside from that, there are some new additions:
- New EXPLODE function, which is just a placeholder. EXPLODE type is the
  underlying type of its argument for resolution purposes. During analysis
  Explode nodes will be replaced by Generate functions and a Generate node.
- New Generate function, which is the non-placeholder version of EXPLODE
  and the one that goes into the final execution tree. This one returns
  the same type as its argument and let's the Generate plan node be the
  one that returns the underlying type once the values are generated.
- Generate node, which wraps a Project in which there is one and only one
  explode expression.
- resolve_generators analysis rule, which will turn projects with an
  Explode expression into a Generate node with the project as a children,
  replacing the Explode expressions with Generate expressions.
- validate_explode_usage validation rule, which will ensure explode is
  not used outside a Project node.

Signed-off-by: Miguel Molina <miguel@erizocosmi.co>
@erizocosmico
Copy link
Contributor Author

I rebased, but didn't remember to change the imports to the new paths 😅

Fixed

@ajnavarro ajnavarro merged commit ab5656f into src-d:master Jun 3, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants