Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial API for Scanner. #12

Merged
merged 5 commits into from May 26, 2021
Merged

Initial API for Scanner. #12

merged 5 commits into from May 26, 2021

Conversation

almann
Copy link
Contributor

@almann almann commented May 10, 2021

Provides an interface for lexing PartiQL. Traditionally we would
factor this as the low-level interface for the parser, but with the PEG
based implementation, we just surface the relevant parts of the parser
as a rule to parse with.

This commit helps explore how to effectively integrate with the somewhat
low-level APIs for parsing that Pest provides.

Notes:

  • Adds scanner module.
  • Adds Scanner trait to the prelude.
  • Makes PartiQLParser public to crate.
  • Refactors LineAndColumn as a tuple for Position::At.
    • Adds this to the prelude.
  • Changes entry point for PEG to Query and added Scanner as the
    entry point rules for implementing the Scanner API.
  • Adds PairsExt/PairExt trait/impl to add utility methods for working
    with Pest Pairs/Pair.
  • Adds LineAndColumn::move_relative and cleans up some doc/doc tests.

Pest links for reference:

This PR is dependent on #8 and as such is targeting a staging branch but I did not want to prevent review.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@codecov
Copy link

codecov bot commented May 10, 2021

Codecov Report

Merging #12 (ba4398c) into main (d748ebd) will decrease coverage by 1.06%.
The diff coverage is 91.28%.

Impacted file tree graph

@@            Coverage Diff             @@
##             main      #12      +/-   ##
==========================================
- Coverage   91.37%   90.31%   -1.07%     
==========================================
  Files           6        7       +1     
  Lines          58      258     +200     
==========================================
+ Hits           53      233     +180     
- Misses          5       25      +20     
Impacted Files Coverage Δ
partiql-parser/src/lib.rs 100.00% <ø> (ø)
partiql-parser/src/result.rs 82.94% <81.81%> (-6.42%) ⬇️
partiql-parser/src/scanner.rs 97.45% <97.45%> (ø)

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update d748ebd...ba4398c. Read the comment docs.

partiql-parser/src/partiql.pest Outdated Show resolved Hide resolved
Comment on lines 16 to 42
/// Extension methods for working with [`Pairs`].
pub(crate) trait PairsExt<'val, R: RuleType> {
/// Consumes a [`Pairs`] as a singleton, returning an error if there are less or more than
/// one [`Pair`].
fn exactly_one(self) -> ParserResult<Pair<'val, R>>;
}

impl<'val, R: RuleType> PairsExt<'val, R> for Pairs<'val, R> {
fn exactly_one(mut self) -> ParserResult<Pair<'val, R>> {
match self.next() {
Some(pair) => {
// make sure there isn't something more...
if let Some(other_pair) = self.next() {
syntax_error(
format!("Expected one token pair, got: {:?}, {:?}", pair, other_pair),
pair.start_loc().into(),
)?;
}
Ok(pair)
}
None => syntax_error(
"Expected at one token pair, got nothing!",
Position::Unknown,
),
}
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A successful parse from Pest returns Pairs which is an iterator of Pair which itself is a terrible name for a rule that matched some span of text (I believe the term "pair" alluding to the start and end of a match).

A rule match returns zero or more Pair, this API says, return an error unless it is exactly one Pair.

Comment on lines 45 to 63
pub(crate) trait PairExt<'val, R: RuleType> {
/// Translates the start position of the [`Pair`] into a [`LineAndColumn`].
fn start_loc(&self) -> LineAndColumn;

/// Translates the end position of the [`Pair`] into a [`LineAndColumn`].
fn end_loc(&self) -> LineAndColumn;
}

impl<'val, R: RuleType> PairExt<'val, R> for Pair<'val, R> {
#[inline]
fn start_loc(&self) -> LineAndColumn {
self.as_span().start_pos().line_col().into()
}

#[inline]
fn end_loc(&self) -> LineAndColumn {
self.as_span().end_pos().line_col().into()
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Translates the baroque Pair/Span to start/end line positioning.

@@ -18,7 +69,7 @@ struct PartiQLParser;
///
/// This API will be replaced with one that produces an AST in the future.
pub fn recognize_partiql(input: &str) -> ParserResult<()> {
PartiQLParser::parse(Rule::Keywords, input)?;
PartiQLParser::parse(Rule::Query, input)?;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better entry point name.

Comment on lines 29 to 66
/// Returns a [`LineAndColumn`] that moves to the given position relative
/// to this one as a sort of "origin."
///
/// Note that this positioning is 1-based, so moving `(1, 1)` to `(1, 1)` is a no-op.
///
/// ## Examples
/// ```
/// # use partiql_parser::prelude::*;
/// assert_eq!(
/// LineAndColumn::at(1, 1),
/// LineAndColumn::at(1, 1).move_relative(LineAndColumn::at(1, 1))
/// );
/// ```
///
/// ```
/// # use partiql_parser::prelude::*;
/// assert_eq!(
/// LineAndColumn::at(1, 2),
/// LineAndColumn::at(1, 1).move_relative(LineAndColumn::at(1, 2))
/// );
/// ```
///
/// ```
/// # use partiql_parser::prelude::*;
/// assert_eq!(
/// LineAndColumn::at(5, 10),
/// LineAndColumn::at(5, 7).move_relative(LineAndColumn::at(1, 4))
/// );
/// ```
///
/// ```
/// # use partiql_parser::prelude::*;
/// assert_eq!(
/// LineAndColumn::at(21, 2),
/// LineAndColumn::at(2, 15).move_relative(LineAndColumn::at(20, 2))
/// );
/// ```
pub fn move_relative(self, location: LineAndColumn) -> Self {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This API is needed so we can use the PEG "continuation" style. Call in, remember where we left off, and re-translate the parsers line/column positioning because every time we call into the PEG, it thinks we're starting at (1, 1).

Comment on lines 83 to 136
impl From<(usize, usize)> for LineAndColumn {
fn from(line_and_column: (usize, usize)) -> Self {
let (line, column) = line_and_column;
Self::at(line, column)
}
}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes it so we can convert a raw tuple into our typed one by just saying something like (x, y).into().

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this panic/fail if either value is 0? Same question for LineAndColumn::at.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good question, I am really gun shy around panics, but using an error hear also feels like overkill, I think the answer is probably error, but this is pretty close to where I'd use an assert in C.

Unknown,
At { line: usize, column: usize },
/// Variant indicating that there *is* a known location in source for some context.
At(LineAndColumn),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is split out of the enum, because we have contexts where there is always a LineAndColumn and other contexts where there may not be.

Comment on lines 113 to 114
let start_location = self.remainder.offset.move_relative(pair.start_loc());
let end_location = self.remainder.offset.move_relative(pair.end_loc());
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice here we translate the parser which has a logical line/column from its start, to where we left off.

Comment on lines 148 to 149
// make sure to translate line/column position from where we started
Position::At(location) => Position::At(start_loc.move_relative(location)),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, we translate the line information from our continuation.

@almann almann mentioned this pull request May 10, 2021
@almann almann linked an issue May 10, 2021 that may be closed by this pull request
@almann
Copy link
Contributor Author

almann commented May 10, 2021

I added a doc test to illustrate usage of the Scanner API, here it is rendered as an image

image

Comment on lines 60 to 62
start_location: LineAndColumn,
/// End location for this token.
end_location: LineAndColumn,
Copy link
Member

@dlurton dlurton May 10, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these be in a struct by themselves?

struct Span {
    /// Start location for this token.
    start_location: LineAndColumn,
    /// End location for this token.
    end_location: LineAndColumn
}

??

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

They could be--though the question is to what end in the API? Do you want to see Token::span() -> Span

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added #14 to track this.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Token::span() -> Span is the idea.

partiql-parser/src/lib.rs Show resolved Hide resolved
//! }
//! // the entire text of a token can be fetched--which looks the roughly the
//! // same for a keyword.
//! assert_eq!("SELECT", first.text());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was expecting as_str() here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting, we have a similar API in SymbolToken that is like this, do you think we should make that as_str()? I only ask, because I do think there is an aesthetics question here. I want to emphasize that the &str being returned is the text of the token, maybe that is easily just as_str(), but I hadn't thought of it as a conversion though, which is what I think as_nnn tends to mean.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SymbolToken can not have a str, right ($0)? so maybe that's an important difference - that text() might not be meaningful. Whereas here it seems obvious that the keyword can be "viewed as a str".

I see your pov. though, and I agree it makes sense.

Copy link
Contributor Author

@almann almann May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FWIW, not that this is canonical or anything, but I've been more or less inspired by:

https://rust-lang.github.io/api-guidelines/naming.html

I think what I have here is more like a getter versus a conversion:

https://rust-lang.github.io/api-guidelines/naming.html#getter-names-follow-rust-convention-c-getter

The difference between getters and conversions (C-CONV) can be subtle and is not always clear-cut. For example TempDir::path can be understood as a getter for the filesystem path of the temporary directory, while TempDir::into_path is a conversion that transfers responsibility for deleting the temporary directory to the caller. Since path is a getter, it would not be correct to call it get_path or as_path.

partiql-parser/src/partiql.pest Outdated Show resolved Hide resolved
}
None => syntax_error(
"Expected at one token pair, got nothing!",
Position::Unknown,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FIXME for what? This is the case where the Pairs iterator type is empty, are you saying that it should have a Position? If so, I might need to check the APIs for this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry that was a lame comment!

is it always the case there is no position? I guess it could be none because you're at EOF or something..?

Copy link
Contributor Author

@almann almann May 14, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only way this could happen is if I wrote a Pest grammar with a rule that matched something but it and all sub-rules were silent and did not produce a Pair at all. This would mean that I as the rule writer wrote something pretty insane and I think it is almost certainly a bug.

Add this to the reason why I think Pest is a little too low-level for my tastes as far as its APIs are concerned (though I think logically consistent).

/// Returns a [`LineAndColumn`] that moves to the given position relative
/// to this one as a sort of "origin."
///
/// Note that this positioning is 1-based, so moving `(1, 1)` to `(1, 1)` is a no-op.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the move 1-based? "Move 1 line" not moving 1 line seems weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe the name is not right--this isn't taking a delta, change location to use self as the origin. I am not sure what I should call that, but move_relative is not right--let me think about that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, ok

partiql-parser/src/result.rs Show resolved Hide resolved
Comment on lines 83 to 136
impl From<(usize, usize)> for LineAndColumn {
fn from(line_and_column: (usize, usize)) -> Self {
let (line, column) = line_and_column;
Self::at(line, column)
}
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this panic/fail if either value is 0? Same question for LineAndColumn::at.

Comment on lines 70 to 76
if diff_line > 0 {
// we're moving lines, adjust the line and take the target column as-is
LineAndColumn::at(base_line + diff_line, dest_column)
} else {
// same line from base, adjust only the column
let diff_column = dest_column - 1;
LineAndColumn::at(base_line, base_column + diff_column)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we only adjust the column if we're staying on the same line? If we're treating the base position as the new origin, wouldn't the dest_column always be relative to the base_column?

EDIT: Having read ahead to Remainder, I now see what this is driving at. I think the content of these code comments should be promoted to / reproduced in the doc comment to call out that this is the expected behavior. The last doc comment example shows this, but it's easy to miss and I'd worry that other people would read the word "origin" the way that I did.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Promoted these comments into the doc tests.

partiql-parser/src/scanner.rs Outdated Show resolved Hide resolved
partiql-parser/src/scanner.rs Outdated Show resolved Hide resolved
}

/// Returns the remainder of the input after this token.
pub fn rem(&self) -> &str {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer to spell this out if you don't object.

Suggested change
pub fn rem(&self) -> &str {
pub fn remainder(&self) -> &str {

(CTRL+F searching shows one usage of it elsewhere in the diff.) If/when #14 is added, I think we can simplify start_loc and end_loc to start and end.

Should this function return a copy of self.remainder instead of just a &str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is an interesting question--I wasn't really planning on exposing Remainder as a public API. How about

text_after(&self) -> &str

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I will rename the various start_loc and end_loc to just start/end.

@almann almann changed the base branch from main to scanner-base May 14, 2021 20:04
Comment on lines +36 to +39
None => syntax_error(
"Expected at one token pair, got nothing!",
Position::Unknown,
),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably better modeled as illegal state.

Comment on lines 12 to +14
#[derive(Parser)]
#[grammar = "partiql.pest"]
struct PartiQLParser;
pub(crate) struct PartiQLParser;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to consider excluding this from coverage metrics.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already done as #22

almann and others added 5 commits May 25, 2021 20:42
Provides an interface for _lexing_ PartiQL.  Traditionally we would
factor this as the low-level interface for the parser, but with the PEG
based implementation, we just surface the relevant parts of the parser
as a rule to parse with.

This commit helps explore how to effectively integrate with the somewhat
low-level APIs for parsing that Pest provides.

* Adds `scanner` module.
* Adds `Scanner` trait to the prelude.
* Makes `PartiQLParser` public to crate.
* Refactors `LineAndColumn` as a tuple for `Position::At`.
  - Adds this to the `prelude`.
* Changes entry point for PEG to `Query` and added `Scanner` as the
  entry point rules for implementing the `Scanner` API.
* Adds `PairsExt`/`PairExt` trait/impl to add utility methods for working
  with Pest `Pairs`/`Pair`.
* Adds `LineAndColumn::position_from` and cleans up some doc/doc tests.

Resolves partiql#13.
Co-authored-by: Zack Slayton <zack.slayton@gmail.com>
* Renames `start_loc`/`end_loc` to just `start`/`end`.
* Renames `start_location`\`end_location` similarly.
* Adds doc comments about how `position_from` works.
The rationale is that even though there are some aspects of this that
we could assert as internal logic problems, we rely on external APIs
to construct this location and we should not panic as a result of those.

Refactors relevant APIs/tests to use this new return result.

Adds `TryFrom` and `TryInto` into the prelude.

This also cleans up field access for the `LineAndColumn` struct.
@almann almann changed the base branch from scanner-base to main May 26, 2021 03:43
@almann almann mentioned this pull request May 26, 2021
@almann almann merged commit f227d0d into partiql:main May 26, 2021
@almann almann deleted the scanner branch May 26, 2021 04:23
@tbmreza tbmreza mentioned this pull request May 28, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement a Scanner Trait
4 participants