You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
> 🚧 This is in active development and not ready for use yet.
2
+
1
3
# postgres_lsp
2
-
A Language Server for Postgres
4
+
5
+
A Language Server for Postgres. Not SQL with flavours, just Postgres.
6
+
7
+
## Rationale
8
+
9
+
Despite an ever rising popularity of Postgres, support for the PostgreSQL language in the IDE or editor of your choice is still very sparse. There are a few proprietary ones (e.g. [DataGrip](https://www.jetbrains.com/datagrip/)) that work well, but are only available within the respective IDE. Open Source attempts (e.g. [sql-language-server](https://github.com/joe-re/sql-language-server), [pgFormatter](https://github.com/darold/pgFormatter/tree/master), [sql-parser-cst](https://github.com/nene/sql-parser-cst)) mostly attempt to provide a generic SQL language server, and implement the Postgres syntax only as a flavor of their parser. This always falls short due to the ever evolving and complex syntax of PostgreSQL. This project only ever wants to support PostgreSQL, and leverages parts of the PostgreSQL server source (see [libg_query](https://github.com/pganalyze/libpg_query)) to parse the source code reliabaly. This is slightly crazy, but is the only reliable way of parsing all valid PostgreSQL queries. You can find a longer rationale on why This is the way™ [here](https://pganalyze.com/blog/parse-postgresql-queries-in-ruby). Of course, libg_query was built to execute SQL, and not to build a language server, but all of the resulting shortcomings were successfully mitigated in the [`parser`](./crates/parser/src/lib.rs) crate.
10
+
11
+
Once the parser is stable, and a robust and scalable data model is implemented, the language server will not only provide basic features such as semantic highlighting, code completion and syntax error diagnostics, but also serve as the user interface for all the great tooling of the Postgres ecosystem.
12
+
13
+
## Roadmap
14
+
15
+
At this point however, this is merely a proof of concept for building both a concrete syntax tree and an abstract syntax tree from a potentially malformed PostgreSQL source code. The `postgres_lsp` crate was created only to proof that it works e2e, and is just a very basic language server with semantic highlighting and error diagnostics. Before actual feature development can start, we have to do a bit of groundwork.
16
+
17
+
1._Finish the parser_
18
+
- The parser works, but the enum values for all the different syntax elements and internal conversations are manually written or copied, and, in some places, only cover a few elements required for a simple select statement. To have full coverage without possibilities for a copy and past error, they should be generated from pg_query.rs source code.
19
+
- There are a few cases such as nested and named dollar quoted strings that cause the parser to fail due to limitations of the regex-based lexer. Nothing that is impossible to fix, or requires any change in the approach though.
20
+
2._Implement a robust and scalable data model_
21
+
- TODO
22
+
3._Setup the language server properly_
23
+
- TODO
24
+
4._Implement basic language server features_
25
+
- Semantic Highlighting
26
+
- Syntax Error Diagnostics
27
+
- Show SQL comments on hover
28
+
- Auto-Completion
29
+
- Code Actions, such as `Execute the statement under the cursor`, or `Execute the current file`
30
+
- ... anything you can think of really
31
+
5._Integrate all the existing open source tooling_
32
+
- Show migration file lint errors from [squawk](https://github.com/sbdchd/squawk)
33
+
- Show plpsql lint errors from [plpsql_check](https://github.com/okbob/plpgsql_check)
34
+
6._Build missing pieces_
35
+
- An optionated code formatter (think prettier for PostgreSQL)
36
+
7._(Maybe) Support advanced features with declarative schema management_
37
+
- Jump to definition
38
+
- ... anything you can think of really
39
+
40
+
## Acknowledgments
41
+
42
+
-[rust-analyzer](https://github.com/rust-lang/rust-analyzer) for implementing such a robust, well documented, and feature-rich language server. Great place to learn from.
43
+
-[squawk](https://github.com/sbdchd/squawk) and [pganalyze](https://pganalyze.com) for inspiring the use of libg_query.
Copy file name to clipboardExpand all lines: crates/parser/src/lib.rs
+14-54Lines changed: 14 additions & 54 deletions
Original file line number
Diff line number
Diff line change
@@ -1,59 +1,19 @@
1
-
//! The SQL parser.
1
+
//! The Postgres parser.
2
2
//!
3
+
//! This crate provides a parser for the Postgres SQL dialect.
4
+
//! It is based in the pg_query.rs crate, which is a wrapper around the PostgreSQL query parser.
5
+
//! The main `Parser` struct parses a source file and individual statements.
6
+
//! The `Parse` struct contains the resulting concrete syntax tree, syntax errors, and the abtract syntax tree, which is a list of pg_query statements and their positions.
3
7
//!
4
-
//
5
-
// TODO: implement parser similarly to rust_analyzer
6
-
// result is a stream of events (including errors) and a list of errors
7
-
//
8
-
//
9
-
//
10
-
// we can use Vec::new() in constructor and then set nodes in parse() if parsing was successful
11
-
//
12
-
//
13
-
//
14
-
//
15
-
// differences to rust_analyzer
16
-
// 1.
17
-
// since we always have to parse just text, there is no need to have lexer and parser separated
18
-
// input of the parser is a string and we always parse the full string
19
-
// syntax crate does not know about lexers and their tokens
20
-
// --> input is just a string
21
-
// 2.
22
-
// in rust_analyzer, the output is just a stream of 32-bit encoded events WITHOUT the text
23
-
// again, this extra layer of abstraction is not necessary for us, since we always parse text
24
-
// the output of the parser is pretty much the same as the input but with nodes
25
-
// --> the parser takes fn that is called for every node and token to build the tree
26
-
// so we skip the intermediate list of events and just build the tree directly
27
-
// we can define a trait that is implemented by the GreenNodeBuilder
28
-
//
29
-
//
30
-
// SyntaxNode in the syntax create is just the SyntaxKind from the parser
31
-
// cst is build with the SyntaxKind type
32
-
// in the syntax crate, the SyntaxTreeBuilder is created and the events are fed into it to build
33
-
// the three
34
-
//
35
-
//
36
-
// how does rust_analyzer know what parts of text is an error?
37
-
// errors are not added to the tree in SyntaxTreeBuilder, which means the tokens must include the
38
-
// erronous parts of the text
39
-
// but the parser output does not include text, so how does the cst can have correct text?
40
-
// easy: the tokenizer is running beforehand, so we always have the tokens, and the errors are just
41
-
// added afterwards when parsing the tokens using the grammar.
42
-
// so there is a never-failing tokenizer step which is followed by the parser that knows the
43
-
// grammar and emits errors
44
-
// --> we will do the same, but with a multi-step tokenizer and parser that fallbacks to simpler
45
-
// and simpler tokens
46
-
//
47
-
//
48
-
// api has to cover parse source file and parse statement
49
-
//
50
-
//
51
-
// we will also have to add a cache for pg_query parsing results using fingerprinting
52
-
//
53
-
// all parsers can be just a function that iterates the base lexer
54
-
// so we will have a `parse_statement` and a `parse_source_file` function
55
-
// the tree always covers all text since we use the scantokens and, if failing, the StatementTokens
56
-
// errors are added to a list, and are not part of the tree
8
+
//! The idea is to offload the heavy lifting to the same parser that the PostgreSQL server uses,
9
+
//! and just fill in the gaps to be able to build both cst and ast from a a source file that
10
+
//! potentially contains erroneous statements.
11
+
//!
12
+
//! The main drawbacks of the PostgreSQL query parser mitigated by this parser are:
13
+
//! - it only parsed a full source text, and if there is any syntax error in a file, it will not parse anything and return an error.
14
+
//! - it does not parse whitespaces and newlines, so it is not possible to build a concrete syntax tree build a concrete syntax tree.
15
+
//!
16
+
//! To see how these drawbacks are mitigated, see the `statement.rs` and the `source_file.rs` module.
0 commit comments