sql: implement EXPLODE and generators #720

erizocosmico · 2019-05-22T13:01:46Z

This pull request implements the EXPLODE expressions and generators,
which are used together to provide generation of rows based on a
column or expression that can yield multiple values.

For this, there have been some quite core changes:

SQL method of sql.Type now returns the sqltypes.Value and an error.
This avoids panics in this method. Since generators can yield errors
it's better to just return an error and not just kill the server.
Array type can now internally handle either arrays or generators,
making it transparent for the user. If it's used in an EXPLODE
expression, the generator will be used. Otherwise, it will be
converted automatically to an array and used without the user even
knowing what happened behind the scenes. That way, we don't need
yet another type to represent generators.

Aside from that, there are some new additions:

New EXPLODE function, which is just a placeholder. EXPLODE type is the
underlying type of its argument for resolution purposes. During analysis
Explode nodes will be replaced by Generate functions and a Generate node.
New Generate function, which is the non-placeholder version of EXPLODE
and the one that goes into the final execution tree. This one returns
the same type as its argument and let's the Generate plan node be the
one that returns the underlying type once the values are generated.
Generate node, which wraps a Project in which there is one and only one
explode expression.
resolve_generators analysis rule, which will turn projects with an
Explode expression into a Generate node with the project as a children,
replacing the Explode expressions with Generate expressions.
validate_explode_usage validation rule, which will ensure explode is
not used outside a Project node.

sql/plan/generate.go

sql/analyzer/resolve_generators.go

juanjux · 2019-05-29T11:05:47Z

sql/generator.go

+var ErrNotGenerator = errors.NewKind("cannot convert value of type %T to a generator")
+
+// ToGenerator converts a value to a generator if possible.
+func ToGenerator(v interface{}) (Generator, error) {


Since it only really converts arrays, could be called ArrayToGenerator.

ArrayToGenerator suggests you need to pass an array and get a generator. This, however, receives any kind of value and returns a Generator or an error if the value could not be converted. The fact that arrays are converted is an implementation detail.

It is expected to convert other types in the future? (like maps for example).

Nope, because there are not any other types that can be a generator besides arrays and Generator. But still calling it ArrayToGenerator implies the input is always an Array, which is not the case. It's meant to return the generator as is if the input is a generator, convert array to a generator and return it or fail in any other case.

Mmm, then since the interface{} doesn't provide documentation about it to the user this could be specified in the function doc.

The user does not need to know which types are handled here, that's an implementation detail. Only needs to know that if the value can be converted to a generator it will, if not, it will error.

Depends on src-d#720 This PR implements struct types, which will be serialized to JSON before being sent to the mysql client using the mysql wire proto. Internally, structs are just `map[string]interface{}`, but they can actually be anything that's convertible to that, because of the way the Convert method of the struct type is implemented. That means, the result of an UDF or a table that returns a struct may be an actual Go struct and it will then be transformed into the internal map. It does have a penalty, though, because structs require encoding to JSON and then decoding into `map[string]interface{}`. Structs have a schema, identical to a table schema, except their `Source` will always be empty. Resolution of columns has also been slightly change in order to resolve getting fields from structs using the `.` operator, which required some trade-offs in some rules, such as not erroring anymore in `qualify_columns` when the table is not found. That error was delegated to `resolve_columns` in order to make resolution possible, as the syntax is a bit ambiguous. The advantage of using dot is the fact that no changes have to be made to the parser in order for it to work. Signed-off-by: Miguel Molina <miguel@erizocosmi.co>

juanjux · 2019-06-03T07:52:20Z

Needs rebase, can then be merged? (I can use this on the BLAME PR for the integration tests).

ajnavarro · 2019-06-03T08:20:50Z

LGTM after rebase.

erizocosmico · 2019-06-03T08:43:11Z

Rebased

ajnavarro · 2019-06-03T10:09:35Z

@erizocosmico could you check tests?

This pull request implements the EXPLODE expressions and generators, which are used together to provide generation of rows based on a column or expression that can yield multiple values. For this, there have been some quite core changes: - SQL method of sql.Type now returns the sqltypes.Value and an error. This avoids panics in this method. Since generators can yield errors it's better to just return an error and not just kill the server. - Array type can now internally handle either arrays or generators, making it transparent for the user. If it's used in an EXPLODE expression, the generator will be used. Otherwise, it will be converted automatically to an array and used without the user even knowing what happened behind the scenes. That way, we don't need yet another type to represent generators. Aside from that, there are some new additions: - New EXPLODE function, which is just a placeholder. EXPLODE type is the underlying type of its argument for resolution purposes. During analysis Explode nodes will be replaced by Generate functions and a Generate node. - New Generate function, which is the non-placeholder version of EXPLODE and the one that goes into the final execution tree. This one returns the same type as its argument and let's the Generate plan node be the one that returns the underlying type once the values are generated. - Generate node, which wraps a Project in which there is one and only one explode expression. - resolve_generators analysis rule, which will turn projects with an Explode expression into a Generate node with the project as a children, replacing the Explode expressions with Generate expressions. - validate_explode_usage validation rule, which will ensure explode is not used outside a Project node. Signed-off-by: Miguel Molina <miguel@erizocosmi.co>

erizocosmico · 2019-06-03T10:12:40Z

I rebased, but didn't remember to change the imports to the new paths 😅

Fixed

erizocosmico added the enhancement New feature or request label May 22, 2019

erizocosmico requested a review from a team May 22, 2019 13:01

erizocosmico self-assigned this May 22, 2019

golangcibot reviewed May 22, 2019

View reviewed changes

sql/plan/generate.go Outdated Show resolved Hide resolved

sql/plan/generate.go Outdated Show resolved Hide resolved

erizocosmico force-pushed the feature/explode branch from 7f72909 to 814b219 Compare May 22, 2019 13:03

kuba-- reviewed May 22, 2019

View reviewed changes

sql/analyzer/resolve_generators.go Show resolved Hide resolved

kuba-- approved these changes May 23, 2019

View reviewed changes

erizocosmico requested a review from a team May 29, 2019 09:16

juanjux reviewed May 29, 2019

View reviewed changes

juanjux approved these changes May 30, 2019

View reviewed changes

erizocosmico mentioned this pull request May 30, 2019

sql: implement struct types #735

Closed

ajnavarro approved these changes Jun 3, 2019

View reviewed changes

erizocosmico force-pushed the feature/explode branch from 814b219 to 8ab4261 Compare June 3, 2019 08:43

erizocosmico force-pushed the feature/explode branch from 8ab4261 to 728f747 Compare June 3, 2019 10:12

ajnavarro merged commit ab5656f into src-d:master Jun 3, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sql: implement EXPLODE and generators #720

sql: implement EXPLODE and generators #720

erizocosmico commented May 22, 2019

juanjux May 29, 2019

erizocosmico May 29, 2019

juanjux May 29, 2019

erizocosmico May 29, 2019

juanjux May 29, 2019

erizocosmico May 30, 2019 •

edited

Loading

juanjux commented Jun 3, 2019

ajnavarro commented Jun 3, 2019

erizocosmico commented Jun 3, 2019

ajnavarro commented Jun 3, 2019

erizocosmico commented Jun 3, 2019

sql: implement EXPLODE and generators #720

sql: implement EXPLODE and generators #720

Conversation

erizocosmico commented May 22, 2019

juanjux May 29, 2019

Choose a reason for hiding this comment

erizocosmico May 29, 2019

Choose a reason for hiding this comment

juanjux May 29, 2019

Choose a reason for hiding this comment

erizocosmico May 29, 2019

Choose a reason for hiding this comment

juanjux May 29, 2019

Choose a reason for hiding this comment

erizocosmico May 30, 2019 • edited Loading

Choose a reason for hiding this comment

juanjux commented Jun 3, 2019

ajnavarro commented Jun 3, 2019

erizocosmico commented Jun 3, 2019

ajnavarro commented Jun 3, 2019

erizocosmico commented Jun 3, 2019

erizocosmico May 30, 2019 •

edited

Loading