Skip to content
This repository has been archived by the owner on Sep 12, 2018. It is now read-only.

Add tx{-parser} crates; start parsing transactions. #164

Merged
merged 2 commits into from
Jan 13, 2017

Conversation

ncalexan
Copy link
Member

This depends on edn and uses the combine parser combinator library.

@rnewman, you can see my progress here. There's an abstraction around parsing the internals of vector to be worked out, but it's actually quite nice to work with (once you've absorbed some combine magic). I think some of the constructors mappings might be able to be shortened, but it's not worth waiting for.

@ncalexan
Copy link
Member Author

I don't see these errors with

⋊> ~/M/mentat on parse-tx  rustc --version                                                                                                                    16:53:55
rustc 1.16.0-nightly (468227129 2017-01-03)

Will try with stable.

@rnewman
Copy link
Collaborator

rnewman commented Jan 12, 2017

Builds for me locally with stable.

@rnewman
Copy link
Collaborator

rnewman commented Jan 12, 2017

Oh, no it doesn't. mentat_tx is fine; mentat_tx_parser fails.

@rnewman
Copy link
Collaborator

rnewman commented Jan 12, 2017

Copy link
Collaborator

@rnewman rnewman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll need rebasing on top of keywords (please, so I don't have to 'rebase' this when I land!), but otherwise it looks excellent.


use combine::{any, eof, many, optional, parser, satisfy_map, token, Parser, ParseResult, Stream};
use combine::combinator::{Expected, FnParser};
// TODO: understand why this is self::edn rather than just edn.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I fixed this in my branch.

}

fn integer_(input: I) -> ParseResult<i64, I> {
return satisfy_map(|x: Value| if let Value::Integer(y) = x {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've read combinator.rs, and do not claim to understand it. But this is a nice abstraction for allowing failure to arbitrary depth.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

combinator.rs is tricky; it took me three days of digging to really understand what is happening :)

fn entid_(input: I) -> ParseResult<EntId, I> {
let p = Tx::<I>::integer()
.map(|x| EntId::EntId(x))
.or(Tx::<I>::keyword().map(|x| EntId::Ident(x)))
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting — you're unwrapping the keyword, right?

This will need to change a little on top of #163, but in a good way — you'll still be unwrapping, but you'll get e.g., a NamespacedKeyword here.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I am unwrapping, although there's no real reason for this.

}

fn keyword_(input: I) -> ParseResult<String, I> {
return satisfy_map(|x: Value| if let Value::Keyword(y) = x {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Really makes you want ??, doesn't it?


fn lookup_ref_(input: I) -> ParseResult<LookupRef, I> {
return satisfy_map(|x: Value| if let Value::Vector(y) = x {
let mut p = (Tx::<&[Value]>::entid(), any(), eof())
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the parser library memoize the construction of these parsers? Should I be worried about multiple invocations of fn_parser?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, and there's a subtle thing happening with consuming Parser instances that I don't yet understand. You'll note all these parsers are mutable; that's because they're consumed by parse and friends. So you can't avoid the invocations and allocations. It's unclear how this impacts performance overall.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because FnParsers (and probably other parsers, too) are implicitly stateful — they wrap a function, and there's no concept in Rust of a pure function?

(Indeed, with 'try' for lookahead, one expects LL(n) parser implementations to be stateful.)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does the parser library memoize the construction of these parsers? Should I be worried about multiple invocations of fn_parser?

Author of combine here. Constructing parsers should be free/very cheap since Parser follow the same model as Iterator so its all stack allocations and since most parsers are either zero-sized or just a few bytes for the function or parameters they take to construct them. As long as LLVM inlines properly there should be zero overhead.

(Indeed, with 'try' for lookahead, one expects LL(n) parser implementations to be stateful.)

That is almost true but the try parser has no state itself, that is all contained in Parser::Input (try, the only thing it contains is the parser it wraps).

use self::edn::types::Value;

#[derive(Clone, Debug, PartialEq)]
pub enum EntId {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you steal/clean up the EntId stuff in the top-level repo?


extern crate edn;

// TODO: understand why this is self::edn rather than just edn.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See earlier note.

Also try ::edn::types::Value, see how that differs…

#[derive(Clone, Debug, PartialEq)]
pub struct LookupRef {
pub a: EntId,
// TODO: consider boxing to allow recursive lookup refs.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You crazy.

#[derive(Clone, Debug, PartialEq)]
pub enum EntId {
EntId(i64),
Ident(String),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This'll eventually be Ident(NamespacedOrPlainKeyword), I expect.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might just ban PlainKeyword, since it's bad form in general.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, fair. That's one reason I split, actually: the 'syntax' of the query parser is the main place I expect Keywords (:find, :in), with all of the 'data' being namespaced.

Add {
e: EntIdOrLookupRef,
a: EntId,
v: ValueOrLookupRef,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strongly typed cat approves of your strongly typed format.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a strongly typed cat meme generator, stat!

@ncalexan
Copy link
Member Author

An identical function exists in the std lib.

Yes, I eventually found it. Sadly, it requires a type annotation in this case, which I couldn't quite make work. (There are TODO's later in the commit). If you can work this out, please do! Thanks!

@ncalexan
Copy link
Member Author

Worth calling out that the combine dependency should be safe: combine is MIT and its dependency ascii is MIT/Apache and it's dev dependency byteorder is Unlicense/MIT. (As of combine version 2.1.1, from https://crates.io/crates/combine and GitHub.)

@ncalexan
Copy link
Member Author

@rnewman final stamp before I fold? I'm going to punt on the entid simplification until we have a motivation to fix it. There are some decisions about crate organization to be made that don't block this landing.


// TODO: abstract the "match Vector, parse internal stream" pattern to remove this boilerplate.
fn add_(input: I) -> ParseResult<Entity, I> {
return satisfy_map(|x: Value| -> Option<Entity> {
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I may, I believe you can change this parser into this to preserve errors from parsing the inner value.

satisfy_map(|x: Value| -> Option<Entity> {
 if let Value::Vector(y) = x { Some(y) } else { None }
}).flat_map(|y| {
    let  mut p = ...;
    p.parse(&y[..]).map(|t| t.0)
})

https://docs.rs/combine/2.1.1/combine/trait.Parser.html#method.flat_map

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I try this, I get errors like:

error[E0271]: type mismatch resolving `<parse::combine::combinator::SatisfyMap<&[error::edn::Value], [closure@query-parser/src/parse.rs:95:20: 95:94]> as parse::combine::Parser>::Input == I`
   --> query-parser/src/parse.rs:101:9
    |
101 |        .parse_stream(input)
    |         ^^^^^^^^^^^^ expected reference, found type parameter
    |
    = note: expected type `&[error::edn::Value]`
    = note:    found type `I`
    = note: required because of the requirements on the impl of `parse::combine::Parser` for `parse::combine::combinator::FlatMap<parse::combine::combinator::SatisfyMap<&[error::edn::Value], [closure@query-parser/src/parse.rs:95:20: 95:94]>, [closure@query-parser/src/parse.rs:96:22: 100:9]>`

so this isn't a trivial change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In order to do this, I needed to either use SliceStream or work with the ::Range type, so that the lifetime was preserved. It can be done, but it's awkward. See also the discussion in Marwes/combine#74 (comment).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/me gives up, runs git checkout -- .

@ncalexan
Copy link
Member Author

Wonder if it's worth consting these defined keyword tokens at the top level.

I think it is, but it's not possible! You can only construct enums and structs, and the NamespacedKeyword members are String instances, which you can't construct at compile time. Frustrating!

@ncalexan ncalexan merged commit b11b9b9 into mozilla:rust Jan 13, 2017
@ncalexan
Copy link
Member Author

I think it is, but it's not possible! You can only construct enums and structs, and the NamespacedKeyword members are String instances, which you can't construct at compile time. Frustrating!

@rnewman static has the same restriction, so that's out too.

@rnewman
Copy link
Collaborator

rnewman commented Jan 13, 2017 via email

victorporof pushed a commit that referenced this pull request Feb 2, 2017
This depends on edn and uses the combine parser combinator library.
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants