Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
231 changes: 231 additions & 0 deletions docs/window-function-cte-implementation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,231 @@
# Window Functions and CTE Implementation for StackQL Parser

## Overview

This document describes the experimental implementation of SQL window functions and CTEs (Common Table Expressions) in the stackql-parser fork. This is a proof-of-concept implementation to validate the approach before implementing it properly in the main stackql-parser repository.

## Summary of Changes

### Files Modified

The following files in `internal/stackql-parser-fork/go/vt/sqlparser/` were modified:

1. **ast.go** - Added AST types for window functions and CTEs
2. **sql.y** - Added grammar rules for parsing
3. **token.go** - Added keyword mappings
4. **constants.go** - Added constants for frame types
5. **external_visitor.go** - Added Accept methods for new types

### New Test Files Created

- `window_test.go` - Unit tests for window function parsing
- `cte_test.go` - Unit tests for CTE parsing

## Implementation Details

### Window Functions

#### AST Types Added (ast.go)

```go
// OverClause represents an OVER clause for window functions
OverClause struct {
WindowName ColIdent
WindowSpec *WindowSpec
}

// WindowSpec represents a window specification
WindowSpec struct {
PartitionBy Exprs
OrderBy OrderBy
Frame *FrameClause
}

// FrameClause represents a frame clause (ROWS/RANGE)
FrameClause struct {
Unit string // ROWS or RANGE
Start *FramePoint
End *FramePoint
}

// FramePoint represents a frame boundary
FramePoint struct {
Type string // UNBOUNDED PRECEDING, CURRENT ROW, etc.
Expr Expr // for N PRECEDING or N FOLLOWING
}
```

The `FuncExpr` struct was extended with an `Over *OverClause` field.

#### Grammar Rules Added (sql.y)

- `over_clause_opt` - Optional OVER clause after function calls
- `window_spec` - Window specification (PARTITION BY, ORDER BY, frame)
- `partition_by_opt` - Optional PARTITION BY clause
- `frame_clause_opt` - Optional frame specification (ROWS/RANGE)
- `frame_point` - Frame boundary points

#### Tokens Added

- `OVER`, `ROWS`, `RANGE`, `UNBOUNDED`, `PRECEDING`, `FOLLOWING`, `CURRENT`, `ROW`

#### Supported Syntax

```sql
-- Simple window function
SELECT SUM(count) OVER () FROM t

-- With ORDER BY
SELECT RANK() OVER (ORDER BY count DESC) FROM t

-- With PARTITION BY
SELECT SUM(count) OVER (PARTITION BY category) FROM t

-- With PARTITION BY and ORDER BY
SELECT SUM(count) OVER (PARTITION BY category ORDER BY name) FROM t

-- With frame clause
SELECT SUM(count) OVER (ORDER BY id ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) FROM t

-- Multiple window functions
SELECT SUM(x) OVER (), COUNT(*) OVER (ORDER BY y) FROM t

-- Window functions with aggregates
SELECT serviceName, COUNT(*) as count, SUM(COUNT(*)) OVER () as total FROM t GROUP BY serviceName
```

### CTEs (Common Table Expressions)

#### AST Types Added (ast.go)

```go
// With represents a WITH clause (CTE)
With struct {
Recursive bool
CTEs []*CommonTableExpr
}

// CommonTableExpr represents a single CTE definition
CommonTableExpr struct {
Name TableIdent
Columns Columns
Subquery *Subquery
}
```

The `Select` struct was extended with a `With *With` field.

#### Grammar Rules Added (sql.y)

- `cte_list` - List of CTE definitions
- `cte` - Single CTE definition
- Extended `base_select` with WITH clause alternatives

#### Tokens Added

- `RECURSIVE`

#### Supported Syntax

```sql
-- Simple CTE
WITH cte AS (SELECT id FROM t) SELECT * FROM cte

-- CTE with column list
WITH cte (col1, col2) AS (SELECT id, name FROM t) SELECT * FROM cte

-- Multiple CTEs
WITH cte1 AS (SELECT id FROM t1), cte2 AS (SELECT id FROM t2) SELECT * FROM cte1 JOIN cte2

-- Recursive CTE
WITH RECURSIVE cte AS (SELECT 1 AS n UNION ALL SELECT n + 1 FROM cte WHERE n < 10) SELECT * FROM cte

-- CTE with window function
WITH sales AS (SELECT product, amount FROM orders)
SELECT product, SUM(amount) OVER (ORDER BY product) FROM sales
```

## Key Design Decisions

### Window Functions

1. **OVER clause placement**: Added `over_clause_opt` to the `function_call_generic` rule to allow OVER on any generic function call.

2. **Frame specification**: Supports both ROWS and RANGE frame types with:
- UNBOUNDED PRECEDING
- UNBOUNDED FOLLOWING
- CURRENT ROW
- N PRECEDING
- N FOLLOWING

3. **Named windows**: The grammar supports `OVER window_name` syntax for referencing named windows (though WINDOW clause definition is not yet implemented).

### CTEs

1. **Grammar approach**: Instead of using an optional `with_clause_opt` rule that includes an empty alternative (which caused grammar conflicts), we directly added WITH alternatives to the `base_select` rule.

2. **Recursive CTEs**: Supported via the `WITH RECURSIVE` syntax.

3. **Column lists**: Optional column list specification for CTEs is supported.

## Parser Conflicts

The implementation increases reduce/reduce conflicts from 461 to 464. This is acceptable for an experimental implementation.

## Testing

### Unit Tests

All parser unit tests pass:
- 8 window function tests
- 5 CTE tests

### Running Tests

```bash
cd /home/user/stackql-devel/internal/stackql-parser-fork/go/vt/sqlparser
go test -run "TestWindowFunctions|TestCTEs" -v
```

## Next Steps for Production Implementation

1. **Upstream the changes**: Apply these changes to the main `stackql-parser` repository.

2. **Execution layer**: Implement window function and CTE execution in the SQLite backend:
- SQLite already supports window functions and CTEs natively
- Need to ensure the parsed AST is correctly converted to SQL for execution

3. **Named Windows**: Add support for the `WINDOW` clause to define named windows:
```sql
SELECT SUM(x) OVER w FROM t WINDOW w AS (PARTITION BY y ORDER BY z)
```

4. **Additional window functions**: The parser supports any function name with OVER. Consider adding specific handling for:
- ROW_NUMBER()
- RANK()
- DENSE_RANK()
- LEAD()
- LAG()
- FIRST_VALUE()
- LAST_VALUE()
- NTH_VALUE()
- NTILE()

5. **Robot tests**: Add integration tests that verify window functions and CTEs work end-to-end with actual cloud provider data.

## Known Limitations

1. **No execution support**: This implementation only adds parsing support. The execution layer still needs to be updated to handle window functions and CTEs.

2. **Pre-existing test failures**: The parser has some pre-existing test failures unrelated to window functions/CTEs (table name quoting, OR operator rendering). These should be addressed separately.

3. **Build complexity**: The local fork approach with replace directive in go.mod can cause directory conflicts when building the main stackql binary. For production, the changes should be upstreamed to stackql-parser.

## Files to Review

- `internal/stackql-parser-fork/go/vt/sqlparser/ast.go` - AST type definitions
- `internal/stackql-parser-fork/go/vt/sqlparser/sql.y` - Grammar rules (lines ~3074-3410)
- `internal/stackql-parser-fork/go/vt/sqlparser/token.go` - Keyword mappings
- `internal/stackql-parser-fork/go/vt/sqlparser/constants.go` - Frame type constants
- `internal/stackql-parser-fork/go/vt/sqlparser/window_test.go` - Window function tests
- `internal/stackql-parser-fork/go/vt/sqlparser/cte_test.go` - CTE tests
2 changes: 2 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -142,3 +142,5 @@ require (
replace github.com/chzyer/readline => github.com/stackql/readline v0.0.2-alpha05

replace github.com/mattn/go-sqlite3 => github.com/stackql/stackql-go-sqlite3 v1.0.4-stackql

replace github.com/stackql/stackql-parser => ./internal/stackql-parser-fork
44 changes: 44 additions & 0 deletions internal/stackql-parser-fork/.codeclimate.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
engines:
gofmt:
enabled: true
golint:
enabled: true
govet:
enabled: true
shellcheck:
enabled: true
duplication:
enabled: false

ratings:
paths:
- "**.go"
- "**.sh"

checks:
argument-count:
enabled: false
complex-logic:
enabled: false
file-lines:
enabled: false
method-complexity:
enabled: false
method-count:
enabled: false
method-lines:
enabled: false
nested-control-flow:
enabled: false
return-statements:
enabled: false
similar-code:
enabled: false
identical-code:
enabled: false

# Ignore generated code.
exclude_paths:
- "go/vt/proto/"
- "go/vt/sqlparser/sql.go"
- "py/util/grpc_with_metadata.py"
11 changes: 11 additions & 0 deletions internal/stackql-parser-fork/.dockerignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
Godeps/_workspace/pkg
Godeps/_workspace/bin
_test
java/*/target
java/*/bin
php/vendor
releases
/dist/
/vthook/
/bin/
/vtdataroot/
6 changes: 6 additions & 0 deletions internal/stackql-parser-fork/.github/CODEOWNERS
Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
* @sougou

/docker/ @derekperkins @dkhenry
/helm/ @derekperkins @dkhenry
/config/mycnf/ @morgo
/go/vt/mysqlctl/mysqld.go @morgo
68 changes: 68 additions & 0 deletions internal/stackql-parser-fork/.github/ISSUE_TEMPLATE/bug_report.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
---
name: Bug Report
about: You're experiencing an issue with Vitess that is different than the documented behavior.
---

When filing a bug, please include the following headings if
possible. Any example text in this template can be deleted.

#### Overview of the Issue

A paragraph or two about the issue you're experiencing.

#### Reproduction Steps

Steps to reproduce this issue, example:

1. Deploy the following `vschema`:

```javascript
{
"sharded": true,
"vindexes": {
"hash": {
"type": "hash"
},
"tables": {
"user": {
"column_vindexes": [
{
"column": "user_id",
"name": "hash"
}
]
}
}
}
```

1. Deploy the following `schema`:

```sql
create table user(user_id bigint, name varchar(128), primary key(user_id));
```

1. Run `SELECT...`
1. View error

#### Binary version
Example:

```sh
giaquinti@workspace:~$ vtgate --version
Version: a95cf5d (Git branch 'HEAD') built on Fri May 18 16:54:26 PDT 2018 by giaquinti@workspace using go1.10 linux/amd64
```

#### Operating system and Environment details

OS, Architecture, and any other information you can provide
about the environment.

- Operating system (output of `cat /etc/os-release`):
- Kernel version (output of `uname -sr`):
- Architecture (output of `uname -m`):

#### Log Fragments

Include appropriate log fragments. If the log is longer than a few dozen lines, please
include the URL to the [gist](https://gist.github.com/) of the log instead of posting it in the issue.
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
---
name: Feature Request
about: If you have something you think Vitess could improve or add support for.
---

Please search the existing issues for relevant feature requests, and use the [reaction feature](https://blog.github.com/2016-03-10-add-reactions-to-pull-requests-issues-and-comments/) to add upvotes to pre-existing requests.

#### Feature Description

A written overview of the feature.

#### Use Case(s)

Any relevant use-cases that you see.
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
---
name: Question
about: If you have a question, please check out our other community resources instead of opening an issue.
---

Issues on GitHub are intended to be related to bugs or feature requests, so we recommend using our other community resources instead of asking here.

- [Vitess User Guide](https://vitess.io/user-guide/introduction/)
- Any other questions can be asked in the community [Slack workspace](https://vitess.io/slack)
Loading
Loading