New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Initial API for Scanner
.
#12
Conversation
Codecov Report
@@ Coverage Diff @@
## main #12 +/- ##
==========================================
- Coverage 91.37% 90.31% -1.07%
==========================================
Files 6 7 +1
Lines 58 258 +200
==========================================
+ Hits 53 233 +180
- Misses 5 25 +20
Continue to review full report at Codecov.
|
partiql-parser/src/peg.rs
Outdated
/// Extension methods for working with [`Pairs`]. | ||
pub(crate) trait PairsExt<'val, R: RuleType> { | ||
/// Consumes a [`Pairs`] as a singleton, returning an error if there are less or more than | ||
/// one [`Pair`]. | ||
fn exactly_one(self) -> ParserResult<Pair<'val, R>>; | ||
} | ||
|
||
impl<'val, R: RuleType> PairsExt<'val, R> for Pairs<'val, R> { | ||
fn exactly_one(mut self) -> ParserResult<Pair<'val, R>> { | ||
match self.next() { | ||
Some(pair) => { | ||
// make sure there isn't something more... | ||
if let Some(other_pair) = self.next() { | ||
syntax_error( | ||
format!("Expected one token pair, got: {:?}, {:?}", pair, other_pair), | ||
pair.start_loc().into(), | ||
)?; | ||
} | ||
Ok(pair) | ||
} | ||
None => syntax_error( | ||
"Expected at one token pair, got nothing!", | ||
Position::Unknown, | ||
), | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A successful parse from Pest
returns Pairs
which is an iterator of Pair
which itself is a terrible name for a rule that matched some span of text (I believe the term "pair" alluding to the start and end of a match).
A rule match returns zero or more Pair
, this API says, return an error unless it is exactly one Pair
.
partiql-parser/src/peg.rs
Outdated
pub(crate) trait PairExt<'val, R: RuleType> { | ||
/// Translates the start position of the [`Pair`] into a [`LineAndColumn`]. | ||
fn start_loc(&self) -> LineAndColumn; | ||
|
||
/// Translates the end position of the [`Pair`] into a [`LineAndColumn`]. | ||
fn end_loc(&self) -> LineAndColumn; | ||
} | ||
|
||
impl<'val, R: RuleType> PairExt<'val, R> for Pair<'val, R> { | ||
#[inline] | ||
fn start_loc(&self) -> LineAndColumn { | ||
self.as_span().start_pos().line_col().into() | ||
} | ||
|
||
#[inline] | ||
fn end_loc(&self) -> LineAndColumn { | ||
self.as_span().end_pos().line_col().into() | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Translates the baroque Pair
/Span
to start/end line positioning.
@@ -18,7 +69,7 @@ struct PartiQLParser; | |||
/// | |||
/// This API will be replaced with one that produces an AST in the future. | |||
pub fn recognize_partiql(input: &str) -> ParserResult<()> { | |||
PartiQLParser::parse(Rule::Keywords, input)?; | |||
PartiQLParser::parse(Rule::Query, input)?; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better entry point name.
partiql-parser/src/result.rs
Outdated
/// Returns a [`LineAndColumn`] that moves to the given position relative | ||
/// to this one as a sort of "origin." | ||
/// | ||
/// Note that this positioning is 1-based, so moving `(1, 1)` to `(1, 1)` is a no-op. | ||
/// | ||
/// ## Examples | ||
/// ``` | ||
/// # use partiql_parser::prelude::*; | ||
/// assert_eq!( | ||
/// LineAndColumn::at(1, 1), | ||
/// LineAndColumn::at(1, 1).move_relative(LineAndColumn::at(1, 1)) | ||
/// ); | ||
/// ``` | ||
/// | ||
/// ``` | ||
/// # use partiql_parser::prelude::*; | ||
/// assert_eq!( | ||
/// LineAndColumn::at(1, 2), | ||
/// LineAndColumn::at(1, 1).move_relative(LineAndColumn::at(1, 2)) | ||
/// ); | ||
/// ``` | ||
/// | ||
/// ``` | ||
/// # use partiql_parser::prelude::*; | ||
/// assert_eq!( | ||
/// LineAndColumn::at(5, 10), | ||
/// LineAndColumn::at(5, 7).move_relative(LineAndColumn::at(1, 4)) | ||
/// ); | ||
/// ``` | ||
/// | ||
/// ``` | ||
/// # use partiql_parser::prelude::*; | ||
/// assert_eq!( | ||
/// LineAndColumn::at(21, 2), | ||
/// LineAndColumn::at(2, 15).move_relative(LineAndColumn::at(20, 2)) | ||
/// ); | ||
/// ``` | ||
pub fn move_relative(self, location: LineAndColumn) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This API is needed so we can use the PEG "continuation" style. Call in, remember where we left off, and re-translate the parsers line/column positioning because every time we call into the PEG, it thinks we're starting at (1, 1)
.
partiql-parser/src/result.rs
Outdated
impl From<(usize, usize)> for LineAndColumn { | ||
fn from(line_and_column: (usize, usize)) -> Self { | ||
let (line, column) = line_and_column; | ||
Self::at(line, column) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes it so we can convert a raw tuple into our typed one by just saying something like (x, y).into()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this panic/fail if either value is 0
? Same question for LineAndColumn::at
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good question, I am really gun shy around panics, but using an error hear also feels like overkill, I think the answer is probably error, but this is pretty close to where I'd use an assert in C.
Unknown, | ||
At { line: usize, column: usize }, | ||
/// Variant indicating that there *is* a known location in source for some context. | ||
At(LineAndColumn), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is split out of the enum
, because we have contexts where there is always a LineAndColumn
and other contexts where there may not be.
partiql-parser/src/scanner.rs
Outdated
let start_location = self.remainder.offset.move_relative(pair.start_loc()); | ||
let end_location = self.remainder.offset.move_relative(pair.end_loc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Notice here we translate the parser which has a logical line/column from its start, to where we left off.
partiql-parser/src/scanner.rs
Outdated
// make sure to translate line/column position from where we started | ||
Position::At(location) => Position::At(start_loc.move_relative(location)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here, we translate the line information from our continuation.
partiql-parser/src/scanner.rs
Outdated
start_location: LineAndColumn, | ||
/// End location for this token. | ||
end_location: LineAndColumn, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should these be in a struct by themselves?
struct Span {
/// Start location for this token.
start_location: LineAndColumn,
/// End location for this token.
end_location: LineAndColumn
}
??
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They could be--though the question is to what end in the API? Do you want to see Token::span() -> Span
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added #14 to track this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Token::span() -> Span
is the idea.
//! } | ||
//! // the entire text of a token can be fetched--which looks the roughly the | ||
//! // same for a keyword. | ||
//! assert_eq!("SELECT", first.text()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
was expecting as_str()
here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting, we have a similar API in SymbolToken
that is like this, do you think we should make that as_str()
? I only ask, because I do think there is an aesthetics question here. I want to emphasize that the &str
being returned is the text of the token, maybe that is easily just as_str()
, but I hadn't thought of it as a conversion though, which is what I think as_nnn
tends to mean.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SymbolToken can not have a str, right ($0
)? so maybe that's an important difference - that text()
might not be meaningful. Whereas here it seems obvious that the keyword can be "viewed as a str".
I see your pov. though, and I agree it makes sense.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW, not that this is canonical or anything, but I've been more or less inspired by:
https://rust-lang.github.io/api-guidelines/naming.html
I think what I have here is more like a getter versus a conversion:
https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter
The difference between getters and conversions (C-CONV) can be subtle and is not always clear-cut. For example
TempDir::path
can be understood as a getter for the filesystem path of the temporary directory, whileTempDir::into_path
is a conversion that transfers responsibility for deleting the temporary directory to the caller. Sincepath
is a getter, it would not be correct to call itget_path
oras_path
.
} | ||
None => syntax_error( | ||
"Expected at one token pair, got nothing!", | ||
Position::Unknown, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FIXME for what? This is the case where the Pairs
iterator type is empty, are you saying that it should have a Position
? If so, I might need to check the APIs for this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry that was a lame comment!
is it always the case there is no position? I guess it could be none because you're at EOF or something..?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the only way this could happen is if I wrote a Pest grammar with a rule that matched something but it and all sub-rules were silent and did not produce a Pair
at all. This would mean that I as the rule writer wrote something pretty insane and I think it is almost certainly a bug.
Add this to the reason why I think Pest is a little too low-level for my tastes as far as its APIs are concerned (though I think logically consistent).
partiql-parser/src/result.rs
Outdated
/// Returns a [`LineAndColumn`] that moves to the given position relative | ||
/// to this one as a sort of "origin." | ||
/// | ||
/// Note that this positioning is 1-based, so moving `(1, 1)` to `(1, 1)` is a no-op. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the move 1-based? "Move 1 line" not moving 1 line seems weird
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe the name is not right--this isn't taking a delta, change location
to use self
as the origin. I am not sure what I should call that, but move_relative
is not right--let me think about that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, ok
partiql-parser/src/result.rs
Outdated
impl From<(usize, usize)> for LineAndColumn { | ||
fn from(line_and_column: (usize, usize)) -> Self { | ||
let (line, column) = line_and_column; | ||
Self::at(line, column) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this panic/fail if either value is 0
? Same question for LineAndColumn::at
.
partiql-parser/src/result.rs
Outdated
if diff_line > 0 { | ||
// we're moving lines, adjust the line and take the target column as-is | ||
LineAndColumn::at(base_line + diff_line, dest_column) | ||
} else { | ||
// same line from base, adjust only the column | ||
let diff_column = dest_column - 1; | ||
LineAndColumn::at(base_line, base_column + diff_column) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we only adjust the column if we're staying on the same line? If we're treating the base position as the new origin, wouldn't the dest_column
always be relative to the base_column
?
EDIT: Having read ahead to Remainder
, I now see what this is driving at. I think the content of these code comments should be promoted to / reproduced in the doc comment to call out that this is the expected behavior. The last doc comment example shows this, but it's easy to miss and I'd worry that other people would read the word "origin" the way that I did.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Promoted these comments into the doc tests.
partiql-parser/src/scanner.rs
Outdated
} | ||
|
||
/// Returns the remainder of the input after this token. | ||
pub fn rem(&self) -> &str { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd prefer to spell this out if you don't object.
pub fn rem(&self) -> &str { | |
pub fn remainder(&self) -> &str { |
(CTRL+F searching shows one usage of it elsewhere in the diff.) If/when #14 is added, I think we can simplify start_loc
and end_loc
to start
and end
.
Should this function return a copy of self.remainder
instead of just a &str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is an interesting question--I wasn't really planning on exposing Remainder
as a public API. How about
text_after(&self) -> &str
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, I will rename the various start_loc
and end_loc
to just start
/end
.
None => syntax_error( | ||
"Expected at one token pair, got nothing!", | ||
Position::Unknown, | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably better modeled as illegal state.
#[derive(Parser)] | ||
#[grammar = "partiql.pest"] | ||
struct PartiQLParser; | ||
pub(crate) struct PartiQLParser; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might want to consider excluding this from coverage metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already done as #22
Provides an interface for _lexing_ PartiQL. Traditionally we would factor this as the low-level interface for the parser, but with the PEG based implementation, we just surface the relevant parts of the parser as a rule to parse with. This commit helps explore how to effectively integrate with the somewhat low-level APIs for parsing that Pest provides. * Adds `scanner` module. * Adds `Scanner` trait to the prelude. * Makes `PartiQLParser` public to crate. * Refactors `LineAndColumn` as a tuple for `Position::At`. - Adds this to the `prelude`. * Changes entry point for PEG to `Query` and added `Scanner` as the entry point rules for implementing the `Scanner` API. * Adds `PairsExt`/`PairExt` trait/impl to add utility methods for working with Pest `Pairs`/`Pair`. * Adds `LineAndColumn::position_from` and cleans up some doc/doc tests. Resolves partiql#13.
Co-authored-by: Zack Slayton <zack.slayton@gmail.com>
* Renames `start_loc`/`end_loc` to just `start`/`end`. * Renames `start_location`\`end_location` similarly. * Adds doc comments about how `position_from` works.
The rationale is that even though there are some aspects of this that we could assert as internal logic problems, we rely on external APIs to construct this location and we should not panic as a result of those. Refactors relevant APIs/tests to use this new return result. Adds `TryFrom` and `TryInto` into the prelude. This also cleans up field access for the `LineAndColumn` struct.
Provides an interface for lexing PartiQL. Traditionally we would
factor this as the low-level interface for the parser, but with the PEG
based implementation, we just surface the relevant parts of the parser
as a rule to parse with.
This commit helps explore how to effectively integrate with the somewhat
low-level APIs for parsing that Pest provides.
Notes:
scanner
module.Scanner
trait to the prelude.PartiQLParser
public to crate.LineAndColumn
as a tuple forPosition::At
.prelude
.Query
and addedScanner
as theentry point rules for implementing the
Scanner
API.PairsExt
/PairExt
trait/impl to add utility methods for workingwith Pest
Pairs
/Pair
.LineAndColumn::move_relative
and cleans up some doc/doc tests.Pest links for reference:
This PR is dependent on #8 and as such is targeting a staging branch but I did not want to prevent review.
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.