Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SQL style guide #133

Merged
merged 2 commits into from
May 17, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions .spelling
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ blocklist
boolean
booleans
bugzilla
CamelCase
CEP
changesets
chromehangs
Expand Down Expand Up @@ -93,6 +94,7 @@ NumPy
ORM-like
parquet2hive
performant
Pep8
Pingsender
Plotly
Postgres
Expand Down
1 change: 1 addition & 0 deletions SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
* [Event Pipeline Detail](/concepts/pipeline/event_pipeline.md)
* [Analysis Interfaces](tools/interfaces.md)
* [Custom analysis with Spark](tools/spark.md)
* [SQL Style Guide](concepts/sql_style.md)
* [Cookbooks](cookbooks/README.md)
* [Alerts](tools/alerts.md)
* [Working with Parquet Data on ATMO Clusters](cookbooks/parquet.md)
Expand Down
261 changes: 261 additions & 0 deletions concepts/sql_style.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,261 @@
# SQL Style Guide

## Table of Contents

<!-- toc -->

## Consistency

From [Pep8](https://www.python.org/dev/peps/pep-0008/#a-foolish-consistency-is-the-hobgoblin-of-little-minds):

> A style guide is about consistency.
> Consistency with this style guide is important.
> Consistency within a project is more important.
> Consistency within one module or function is the most important.
>
> However, know when to be inconsistent --
> sometimes style guide recommendations just aren't applicable.
> When in doubt, use your best judgment.
> Look at other examples and decide what looks best.
> And don't hesitate to ask!

## Reserved Words

Always use uppercase for reserved keywords like `SELECT`, `WHERE`, or `AS`.

## Variable Names

1. Use consistent and descriptive identifiers and names.
1. Use lower case names with underscores, such as `first_name`.
Do not use CamelCase.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: surround CamelCase with backticks

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added this to .spelling instead.

1. Presto functions, such as `cardinality`, `approx_distinct`, or `substr`,
[are identifiers](https://www.postgresql.org/docs/10/static/sql-syntax-lexical.html#SQL-SYNTAX-IDENTIFIERS)
and should be treated like variable names.
1. Names must begin with a letter and may not end in an underscore.
1. Only use letters, numbers, and underscores in variable names.

## Aliasing

Always include the `AS` keyword when aliasing a variable,
it's easier to read when explicit.

**Good**
```sql
SELECT
SUBSTR(submission_date, 1, 6) AS month
FROM
main_summary
LIMIT
10
```

**Bad**
```sql
SELECT
SUBSTR(submission_date, 1, 6) month
FROM
main_summary
LIMIT
10
```

## Left Align Root Keywords

Root keywords should all start on the same character boundary.
This is counter to the common "rivers" pattern
[described here](https://www.sqlstyle.guide/#spaces).

**Good**:

```sql
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
AND submission_date > '20180101'
LIMIT
10
```

**Bad**:

```sql
SELECT client_id,
submission_date
FROM main_summary
WHERE sample_id = '42'
AND submission_date > '20180101'
```

## Code Blocks

Root keywords should be on their own line.
For example:

**Good**:
```sql
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT
10
```

It's acceptable to include an argument on the same line as the root keyword,
if there is exactly one argument.

**Acceptable**:
```sql
SELECT
client_id,
submission_date
FROM main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
```

Do not include multiple arguments on one line.

**Bad**:
```sql
SELECT client_id, submission_date
FROM main_summary
WHERE
submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
```

**Bad**
```sql
SELECT
client_id,
submission_date
FROM main_summary
WHERE submission_date > '20180101'
AND sample_id = '42'
LIMIT 10
```

## Parentheses

If parentheses span multiple lines:

1. The opening parenthesis should terminate the line.
1. The closing parenthesis should be lined up under
the first character of the line that starts the multi-line construct.
1. The contents of the parentheses should be indented one level.

For example:

**Good**
```sql
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42'
)
```

**Bad** (Terminating parenthesis on shared line)
```sql
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42')
```

**Bad** (No indent)
```sql
WITH sample AS (
SELECT
client_id,
FROM
main_summary
WHERE
sample_id = '42'
)
```

## Boolean at the Beginning of Line

`AND` and `OR` should always be at the beginning of the line.
For example:

**Good**
```sql
...
WHERE
submission_date > 20180101
AND sample_id = '42'
```

**Bad**
```sql
...
WHERE
submission_date > 20180101 AND
sample_id = '42'
```

## Nested Queries

Do not use nested queries.
Instead, use common table expressions to improve readability.

**Good**:
```sql
WITH sample AS (
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
)

SELECT *
FROM sample
LIMIT 10
```

**Bad**:
```sql
SELECT *
FROM (
SELECT
client_id,
submission_date
FROM
main_summary
WHERE
sample_id = '42'
)
LIMIT 10
```

## About this Document

This document was heavily influenced by https://www.sqlstyle.guide/

Changes to the style guide should be reviewed by at least one member of
both the Data Engineering team and the Data Science team.