FromDDL rework #158

svaningelgem · 2021-02-27T20:21:11Z

I basically took the tests from spark itself, and wrote a lex-parser around it (as I couldn't find an easy way to extract the one from scala).
It supports every test that was in the scala program, including:

basic datatypes
maps
structs
arrays
weird names (like a+b)
comments
reserved names-names (like int: int -- int is both the name of the column, and the datatype! -- But "int int" doesn't work the same way, that errors out.)

@tools4origins : could you check if this can be integrated into your PR because I believe my implementation is more feature complete?

…ibilities :-)

tools4origins · 2021-02-28T13:45:30Z

pysparkling/tests/test_fromDDL.py

+    StructField, StructType
+)
+# DataType
+from pysparkling.sql.utils import ParseException


StructType does not expose fromDDL in Python, (maybe we should not either), but the one in Scala does not show the behavior expected in these tests:

scala> StructType.fromDDL("some_str string, some_int integer, some_date date") res2: org.apache.spark.sql.types.StructType = StructType(StructField(some_str,StringType,true), StructField(some_int,IntegerType,true), StructField(some_date,DateType,true))

scala> StructType.fromDDL("int") org.apache.spark.sql.catalyst.parser.ParseException: mismatched input '<EOF>'

(Spark 3)

Multiple types of syntaxes are supported, fromDDL only handles one of them in Spark:
https://github.com/apache/spark/blob/master/python/pyspark/sql/types.py#L821-L841

Maybe we want not to rely on StructType.fromDDL to do this parsing?

I think this method is wrongly called. We shouldn't call this fromDDL. This is actually _parse_datatype_string. And it's supported by Spark ;-). This is mainly used when you create a dataframe with a string as the field description.

~~Just check the implementation in pyspark, it supports multiple ways and I do agree that we should drop fromDDL (as it's not in pyspark at all).~~
I saw you posted the link to the sources... Yep, that's the one. That's the one I implemented with the lexer.

How should I proceed here?

tools4origins · 2021-02-28T15:49:54Z

pysparkling/sql/tests/test_types_parser.py

+
+
+def test_void():
+    assert parser.parse("void") == NullType()


Where is this from? It fails in pyspark:

>>> spark.createDataFrame([], "id void") [...[ pyspark.sql.utils.ParseException: DataType void is not supported.(line 1, pos 3) == SQL == id void ---^^^

https://github.com/apache/spark/blob/54c053afb0c9d3fcc7ac311100c8db9deeb163c0/sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/parser/DataTypeParserSuite.scala#L64

@SInCE

* Added @SInCE and @until decorators. This is very useful when going to implement methods in pysparkling. Example: "weekday" exists since 2.4.0. This way you can decorate the method/class with @SInCE('2.4.0'). In your pysparkling program you can say `pysparkling.config.spark_version = 2.3.2`. And if you would be using SOMEWHERE "weekday" in the code, it would fire the `NotSupportedByThisSparkVersion` exception. Thus correctly failing your pysparkling program. * Improved tests. * Fix import order.

svaningelgem added 2 commits February 27, 2021 21:06

fromDDL reworked, immediately implemented all possible "fromDDL" poss…

da1d047

…ibilities :-)

Forgot dependency.

20b7e03

tools4origins reviewed Feb 28, 2021

View reviewed changes

svaningelgem added 2 commits March 5, 2021 15:34

Merge branch 'master' into fromDDL_rework

f24f014

svaningelgem closed this Mar 5, 2021

svaningelgem deleted the fromDDL_rework branch March 5, 2021 14:39

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FromDDL rework #158

FromDDL rework #158

svaningelgem commented Feb 27, 2021 •

edited

Loading

tools4origins Feb 28, 2021

tools4origins Feb 28, 2021

svaningelgem Feb 28, 2021 •

edited

Loading

tools4origins Feb 28, 2021 •

edited

Loading

svaningelgem Feb 28, 2021

FromDDL rework #158

FromDDL rework #158

Conversation

svaningelgem commented Feb 27, 2021 • edited Loading

tools4origins Feb 28, 2021

Choose a reason for hiding this comment

tools4origins Feb 28, 2021

Choose a reason for hiding this comment

svaningelgem Feb 28, 2021 • edited Loading

Choose a reason for hiding this comment

tools4origins Feb 28, 2021 • edited Loading

Choose a reason for hiding this comment

svaningelgem Feb 28, 2021

Choose a reason for hiding this comment

svaningelgem commented Feb 27, 2021 •

edited

Loading

svaningelgem Feb 28, 2021 •

edited

Loading

tools4origins Feb 28, 2021 •

edited

Loading