New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

quill-spark: Group by with multiple columns in group by fails #1023

Open
grantnicholas opened this Issue Feb 8, 2018 · 0 comments

Comments

Projects
None yet
3 participants
@grantnicholas

grantnicholas commented Feb 8, 2018

Version: "io.getquill" %% "quill-spark" % "2.3.2" (with scala: 2.11.8)
Module: quill-spark

Expected behavior

Grouping by multiple columns should create a valid spark SQL statement

Actual behavior

Grouping by multiple columns generates invalid spark SQL

Steps to reproduce the behavior

run{
  liftQuery(records).groupBy(rec => (rec.col_one, rec.col_two)).map(tup => {
    val ((col_one, col_two), recs) = tup
    (col_one, col_two, recs.size)
  })
}

generated SQL:

SELECT rec.col_one _1, rec.col_two _2, COUNT(*) _3 FROM (ds1) rec GROUP BY (rec.col_one, rec.col_two)

The spark sql grammar does not support parens around the group by args. when the parser tries to parse this query you get an error like follows:

org.apache.spark.sql.AnalysisException: expression 'rec.`col_one`' is neither present in the group by, nor is it an aggregate function. 

If you copy paste this query and remove the parens around the group by the query is valid and works correctly.

@grantnicholas grantnicholas changed the title from Group by with multiple columns in group by fails to quill-spark: Group by with multiple columns in group by fails Feb 8, 2018

@fwbrasil fwbrasil added the bug label Feb 10, 2018

@mentegy mentegy added the easy label Feb 27, 2018

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment