Skip to content

parsing

Alexey Borzov edited this page Nov 29, 2021 · 3 revisions

Parsing the SQL

Parsing is done by \sad_spirit\pg_builder\Parser class backed by \sad_spirit\pg_builder\Lexer. The latter splits the SQL string into tokens, the former goes over the tokens building the Abstract Syntax Tree. This section deals with using these classes, there is also one dealing with implementation details.

Lexer class

The class has only one public method tokenize(string $sql): \sad_spirit\pg_builder\TokenStream which tokenizes the input string. Usually you don't need to call it yourself as it is automatically called by Parser when a string is passed to any of its parseSomething() methods.

You may need to set options via Lexer's constructor, however:

  • 'standard_conforming_strings' - has the same meaning as postgresql.conf parameter of the same name: when true (default), then backslashes in '...' strings are treated literally, when false they are treated as escape characters. Backslashes in e'...' strings are always treated as escape characters, of course.
use sad_spirit\pg_builder\Lexer;

$strings = <<<TEST
'foo\\\\bar' e'foo\\\\bar'
TEST;

$lexerStandard = new Lexer([
    'standard_conforming_strings' => true
]);

$lexerNonStandard = new Lexer([
    'standard_conforming_strings' => false
]);

echo $lexerStandard->tokenize($strings)
     . "\n\n"
     . $lexerNonStandard->tokenize($strings);

will output

string literal 'foo\\bar' at position 0
string literal 'foo\bar' at position 11
end of input

string literal 'foo\bar' at position 0
string literal 'foo\bar' at position 11
end of input

Parser class

Parser constructor accepts an instance of Lexer and an instance of a class implementing CacheItemPoolInterface from PSR-6.

Cache implementation can also be added with setCache(CacheItemPoolInterface $cache): void method.

parseSomething() methods

All Parser methods that have parse prefix and process (parts of) SQL statements are overloaded via __call() magic method. It contains code for calling Lexer and using cache if available and later calls a protected method doing actual parsing work. A couple of such methods:

Other methods are used internally by Node subclasses that accept strings for their properties or array offsets.