Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Start a SQL style guide #9

Closed
harterrt opened this issue Feb 14, 2018 · 15 comments
Closed

Start a SQL style guide #9

harterrt opened this issue Feb 14, 2018 · 15 comments

Comments

@harterrt
Copy link
Contributor

No description provided.

@mreid-moz
Copy link
Contributor

  • SQL keywords (UPDATE, SELECT, AS, ...) should be uppercase.
  • SQL identifiers, including table names and field names, should be lowercase unless quoted*.

*different SQL variants use different quoting mechanisms :(

@chutten
Copy link

chutten commented Apr 5, 2018

  • Presto functions (COUNT, CARDINALITY, SUBSTR) should be uppercase

I'm happy to hear input on chutten style:

SELECT
    DATE_TRUNC('day', DATE_PARSE(field1, '%Y%m%d')) AS pretty_field1,
    field2 AS pretty_field2,
FROM table_name
WHERE
    field3 IS NOT NULL
    AND (
        field4 IN ('20180301', '20180302')
        OR SUBSTR(field5, 1, 1) = 'a'
    )
GROUP BY 1

Notable points I expect contention, and am willing to be swayed, on:

  • having table_name on the same line as FROM (which I defend because there's only one. If there are multiple table names, I would indent like SELECT and WHERE)
  • having GROUP BY 1 (this is because I don't want to write GROUP BY DATE_TRUNC('day', DATE_PARSE(field1, '%Y%m%d')) and have to change it every time I want to see what 'minute' or 'hour' might look.)
  • Putting AND and OR at the beginning of lines (I think they'd get lost at the end)
  • Where I put my parentheses for nested conditionals (It's essentially the one-true-brace style, but with rounder braces)

@harterrt
Copy link
Contributor Author

harterrt commented Apr 5, 2018

+1 to chutten style. Comments:

  1. Strongly agree: table_name on the same line as FROM. I argue that we should prefer explicit JOIN statements and shouldn't use multiple table names in a FROM statement
  2. I prefer GROUP BY 1, but I can see the argument for explicit column names in reviewed queries. I think we should allow GROUP BY 1
  3. Strongly agree with AND and OR at the beginning of the line. I can't see an argument for end of line. I'm open to allowing rivers in the WHERE statement, but my gut says we should outlaw rivers everywhere E.g.:
WHERE field1 IS NOT NULL
  AND field2 IS > 10
  1. Strongly agree with parens formatting.

@mreid-moz
Copy link
Contributor

  • Function names: I somewhat prefer lowercase
  • Table names: I prefer "always indented on next line", even if just for one table
  • GROUP BY 1: strongly agree (though nit in this case it should be GROUP BY 1, 2 to be valid)
  • AND and OR at the beginning of lines: I agree, but I kind of like the "rivers" alignment approach despised by @harterrt for this case.
  • parentheses for nested conditionals: 👍

@harterrt
Copy link
Contributor Author

harterrt commented Apr 5, 2018

@mreid-moz - any argument for indenting table names on the next line? I haven't seen this preference before. Does it add to readability? Make queries easier to write?

@mreid-moz
Copy link
Contributor

Definitely readability. You end up with the most important SQL keywords on their own lines which makes it very easy to scan for the logical part you're interested in:

SELECT
  ... some stuff
FROM
  ... some tables
WHERE
  ... some clauses
GROUP BY
  ... some groupings
HAVING
  ... more clauses

@fbertsch
Copy link

fbertsch commented Apr 5, 2018

  • Function names: We should match the documentation, which is lower case. This is not a strong preference.
  • Table names: always always indent on the next line
  • Group by: values always go on the next line, e.g.
GROUP BY
  field1,
  field2
  • Require JOIN rather than multiple table names

Open question: how to specify the join columns, e.g.

FROM
  table1 a
INNER JOIN
  table2 b
  ON a.id = b.a_id

@fbertsch
Copy link

fbertsch commented Apr 5, 2018

  • Disallow nested queries, and require CTEs
  • CTEs should be indented one level
  • Published queries should not have GROUP BY $numeric, opting instead for CTEs:
WITH sample AS (
  SELECT
      DATE_TRUNC('day', DATE_PARSE(field1, '%Y%m%d')) AS pretty_field1,
      field2 AS pretty_field2
  FROM
    table_name
  WHERE
      field3 IS NOT NULL
      AND (
          field4 IN ('20180301', '20180302')
          OR SUBSTR(field5, 1, 1) = 'a'
      )
)

SELECT
  pretty_field1,
  aggregate(pretty_field2)
FROM
  sample
GROUP BY
  pretty_field1

@fbertsch
Copy link

fbertsch commented Apr 6, 2018

Here is a list of reserved keywords which should always be capitalized: https://prestodb.io/docs/current/language/reserved.html

@chutten
Copy link

chutten commented Apr 6, 2018

Having to use a CTE just for sensible GROUP BY offends my sense of justice. GROUP BY should "just" freakin' take pretty_field1 as an acceptable identifier.

Also, I contend that CTEs make queries harder to understand, but that might just be me.

@mreid-moz
Copy link
Contributor

I don't think CTEs should be mandatory either, and IMO GROUP BY 1 makes maintenance easier in general.

@chutten
Copy link

chutten commented Apr 23, 2018

Seems as though conversation has died out. Shall we omit the controversial subjects (GROUP BY and CTEs) and publish so we don't lose momentum?

I'd like to reference a guide when people contribute SQL examples for docs.tmo

@harterrt
Copy link
Contributor Author

Sounds good to me. @chutten do you have time to take a first pass?

@harterrt
Copy link
Contributor Author

First pass, open for comments:

mozilla/data-docs#133

@harterrt
Copy link
Contributor Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants