Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inital port to nom 4.0 #10

Merged
merged 7 commits into from
Jun 12, 2018
Merged

Inital port to nom 4.0 #10

merged 7 commits into from
Jun 12, 2018

Conversation

CAD97
Copy link
Contributor

@CAD97 CAD97 commented Dec 13, 2017

Compiles against Geal/nom#a53febdfdf99d16c0177f5c605bd98481a4ec50f

I was playing around with a small parser and tried out the latest changes with upstream nom. These are the changes that I had to patch into this library to get it compiling.

Very, very minimally tested (as in, it compiles and I have a micro-sized nom parser using it), and nom might change again before the proper 4.0 release, but I thought I'd share the work I did to get this to compile today.

@CAD97
Copy link
Contributor Author

CAD97 commented Dec 13, 2017

Does not change any Cargo.toml settings, so CI is going to fail since it's grabbing a different version of nom.

@CAD97
Copy link
Contributor Author

CAD97 commented Dec 13, 2017

See rust-bakery/nom#637 for information about the new AtEof trait. I did not add impls for LocatedSpan wrapping CompleteByteSlice or CompleteStr yet, which should probably be done.

@fflorent
Copy link
Owner

Nice, thank you!

Do you experiment something special requiring nom_locate + nom 4? If so, I can do some efforts to make that build on some branch.

Florent

src/lib.rs Outdated
line: 1,
offset: 0,
fragment: program
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

src/lib.rs Outdated
impl<'a, 'b> FindToken<LocatedSpan<$fragment_type>> for $for_type {
fn find_token(&self, input: LocatedSpan<$fragment_type>) -> bool {
self.find_token(input.fragment)
impl<'a, 'b> FindToken<$for_type> for LocatedSpan<$fragment_type> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah great, I guess this is because of this commit rust-bakery/nom@45f5b00#diff-44e8606db87b39b048179b1f7af8d244 ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like it. I preferred swapping the macro's impl around rather than definitions here, but it might be more appropriate to restructure the macro here ($fragment:ty, $token:ty?). What do you think?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, yes, if you don't mind :).

@CAD97
Copy link
Contributor Author

CAD97 commented Dec 20, 2017

I'm experimenting with implementing a language that's been bouncing around in my head for a few years now, and just moved from a hand-rolled nom derivative to nom. Because I'm chasing master though, I'm mirroring nom_locate in-library for now; I was unable to, in experiments in my fork, provide a version I could depend on that would unify the versions of nom. Maybe I'm just weak at cargo's [patch] magic. That and nom 4.0.0-alpha1 is actually broken in verbose-errors mode.

Because I'm chasing nom master, for now I think I personally am better served mirroring nom_locate. I would re-target this to 4.0.0-alpha1, but it's already using some changes that got in after that alpha build as well.

Once nom 4 gets another crates.io release, though, I'm definitely happy to come and help update.

@CAD97
Copy link
Contributor Author

CAD97 commented Jan 31, 2018

Everything compiles against nom 4.0.0-alpha2 now.

failures:
failures:

---- it_locates_u8_fragments stdout ----
	thread 'it_locates_u8_fragments' panicked at 'the parser should run successfully', tests\integration_tests.rs:52:4
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys_common::backtrace::_print
             at C:\projects\rust\src\libstd\sys_common\backtrace.rs:91
   1: std::panicking::default_hook::{{closure}}
             at C:\projects\rust\src\libstd\panicking.rs:380
   2: std::panicking::default_hook
             at C:\projects\rust\src\libstd\panicking.rs:391
   3: std::panicking::rust_panic_with_hook
             at C:\projects\rust\src\libstd\panicking.rs:577
   4: std::panicking::begin_panic<str*>
             at C:\projects\rust\src\libstd\panicking.rs:538
   5: integration_tests::test_str_fragments<fn(nom_locate::LocatedSpan<slice<u8>*>) -> core::result::Result<(nom_locate::LocatedSpan<slice<u8>*>, alloc::vec::Vec<nom_locate::LocatedSpan<slice<u8>*>>), nom::internal::Err<nom_locate::LocatedSpan<slice<u8>*>, u32>>,slice<u8>*>
             at .\tests\integration_tests.rs:52
   6: integration_tests::it_locates_u8_fragments
             at .\tests\integration_tests.rs:87
   7: test::{{impl}}::call_box<(),closure>
             at C:\projects\rust\src\libtest\lib.rs:142
   8: panic_unwind::__rust_maybe_catch_panic
             at C:\projects\rust\src\libpanic_unwind\lib.rs:101

---- it_locates_str_fragments stdout ----
	thread 'it_locates_str_fragments' panicked at 'the parser should run successfully', tests\integration_tests.rs:52:4
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
stack backtrace:
   0: std::sys_common::backtrace::_print
             at C:\projects\rust\src\libstd\sys_common\backtrace.rs:91
   1: std::panicking::default_hook::{{closure}}
             at C:\projects\rust\src\libstd\panicking.rs:380
   2: std::panicking::default_hook
             at C:\projects\rust\src\libstd\panicking.rs:391
   3: std::panicking::rust_panic_with_hook
             at C:\projects\rust\src\libstd\panicking.rs:577
   4: std::panicking::begin_panic<str*>
             at C:\projects\rust\src\libstd\panicking.rs:538
   5: integration_tests::test_str_fragments<fn(nom_locate::LocatedSpan<str*>) -> core::result::Result<(nom_locate::LocatedSpan<str*>, alloc::vec::Vec<nom_locate::LocatedSpan<str*>>), nom::internal::Err<nom_locate::LocatedSpan<str*>, u32>>,str*>
             at .\tests\integration_tests.rs:52
   6: integration_tests::it_locates_str_fragments
             at .\tests\integration_tests.rs:69
   7: test::{{impl}}::call_box<(),closure>
             at C:\projects\rust\src\libtest\lib.rs:142
   8: panic_unwind::__rust_maybe_catch_panic
             at C:\projects\rust\src\libpanic_unwind\lib.rs:101


failures:
    it_locates_str_fragments
    it_locates_u8_fragments

Unfortunately the test output doesn't tell us what went wrong. My intuition is the stricter incomplete handling, though.

We should consider AtEof and consider how we want to handle it (this PR still does not implement any of this crate's traits for Complete* yet). Do we want to mark LocatedSpan as complete, or require the user to use e.g. <'a> LocatedSpan<CompleteStr<'a>> / give them the choice not to mark the data as complete? (Does it ever make sense for a LocatedSpan to contain incomplete data?)

@CAD97
Copy link
Contributor Author

CAD97 commented Jan 31, 2018

rust-bakery/nom@47deaa1 makes many0! return Incomplete when at EoF rather than completing. This is why the two tests started failing.

@CAD97
Copy link
Contributor Author

CAD97 commented Jan 31, 2018

@fflorent The tests pass now, and we're updated against alpha2. I chose to make LocatedSpan always report an AtEof of true; CompleteStr/CompleteByteSlice don't implement AsBytes (rust-bakery/nom#672).

@CAD97
Copy link
Contributor Author

CAD97 commented Feb 8, 2018

CompleteStr/CompleteByteSlice now always implement AsBytes, so the implementation that only implements AtEof when the wrapped type does and wraps Complete* is now viable for -alpha3 or later.

Offering to wrap those types is good for consistency, but I'm still not sure if LocatedSpan has a viable use case when streaming.

@fflorent
Copy link
Owner

fflorent commented Feb 8, 2018

Hi @CAD97! Thanks very much for your work on this.
I am currently a bit busy. I'll take a look as soon as possible.

Florent

Copy link
Owner

@fflorent fflorent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's great, thank you! I just posted some few comments :).

src/lib.rs Outdated
@@ -222,6 +222,12 @@ impl<T: InputLength> InputLength for LocatedSpan<T> {
}
}

impl<T> AtEof for LocatedSpan<T> {
fn at_eof(&self) -> bool {
true
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be wiser to call self.fragment.at_eof() instead? As it is inline, if I am not wrong, it shouldn't cost anything, right?

That being said, most of our users must prefer dealing with complete input.

I'd rather be transparent with them and tell them in the examples to rather use :

type Span<'a> = LocatedSpan<CompleteStr<'a>>;

What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, this seems like a good idea. However, it is blocked on rust-bakery/nom#672; LocatedSpan<T> requires T: AsBytes, which isn't true in alpha2.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mind if we wait for the next alpha to be released before merging in some branch?

Florent

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems prudent to me. I'll add the CompleteStr impls and rebase to a cleaner story once the next alpha comes out.

And obviously you shouldn't publish anything that isn't a -alpha when building against -alpha nom.

src/lib.rs Outdated
impl<'a, 'b> FindToken<LocatedSpan<$fragment_type>> for $for_type {
fn find_token(&self, input: LocatedSpan<$fragment_type>) -> bool {
self.find_token(input.fragment)
impl<'a, 'b> FindToken<$for_type> for LocatedSpan<$fragment_type> {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hum, yes, if you don't mind :).

@CAD97
Copy link
Contributor Author

CAD97 commented Feb 20, 2018

Alright, -beta1 is out, let me update and rebase

Existing CI on my project broke because a commit got garbage collected somewhere, so I'm really motivated to get finished all my cleaning-up work to -beta1 😛

@CAD97
Copy link
Contributor Author

CAD97 commented Feb 20, 2018

Already there's another type added (rust-bakery/nom@e05c2c2) that'll need delegation added, but Complete* is enough to get everything working nice and happy again.

@CAD97
Copy link
Contributor Author

CAD97 commented Feb 20, 2018

Alright @fflorent we should be updated to nom@4.0.0-alpha1 now

@fflorent
Copy link
Owner

Sounds great! Do you have any clue when nom 4 will be released?

Florent

@sunjay
Copy link

sunjay commented Apr 11, 2018

Hi! I'm trying this fork out with nom 4.0.0-beta2. Thanks for creating it! I noticed that it doesn't currently work perfectly with CompleteStr because the Offset trait is not implemented for CompleteStr. Trying to use LocatedSpan<CompleteStr> results in a bunch of errors about that. That is probably something to be fixed before this is finished. :)

As for AtEof, I think the current behaviour (as implemented in this PR) makes sense. Once this type works for CompleteStr, it should all fit together quite well. Some info about AtEof

@CAD97
Copy link
Contributor Author

CAD97 commented Apr 11, 2018

@sunjay I'm using LocatedSpan<CompleteStr> in my project and nom 4.0.0-beta2, and haven't had any problems. Maybe I just got lucky? Anyway, I just pushed 6710706 which implements nom::Offset using the fragments' offset value for all LocatedSpan.

Having written a full lexer with a cursor of this type, I agree with passing through AtEof. It's very possible to still do partial pull parsing with LocatedSpan on incomplete buffers and keep track of your offset from the true start.

@sunjay
Copy link

sunjay commented Apr 11, 2018

It works for me now! Thank you for the quick update!

@CAD97
Copy link
Contributor Author

CAD97 commented Apr 11, 2018

Actually scratch that please nobody make a LocatedSpan with a starting offset

get_column is implemented such that it makes a pointer to the start of the buffer. That makes lying about the offset hella unsafe

P: Fn(Self::Item) -> bool
{
self.fragment.split_at_position(predicate)
.map(|(_i, o)| {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can just replace _i to _ in order to skip it .

@@ -9,15 +9,16 @@ license = "MIT"
name = "nom_locate"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you deserve to add yourself as an author of the crate ;). Thanks for your contributions !

Err::Incomplete(needed) => Err::Incomplete(needed),
_ => unimplemented!("nom_locate didn't think Err::List was used in InputTakeAtPosition"),
}
})
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are lots of code in common between split_at_position1 and split_at_position. Do you think there is a room for a factorization thanks to a a macro or a private function?

fn compare_no_case(&self, t: LocatedSpan<B>) -> CompareResult {
self.fragment.compare_no_case(t.fragment)
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably worth to add unit tests so we're sure that it works as expected. What do you think?

(well, that's not really unit testing as we don't use mocks, but still :))

@sunjay
Copy link

sunjay commented May 16, 2018

nom 4.0 is released!

@CAD97
Copy link
Contributor Author

CAD97 commented May 17, 2018

Yep, I'll be hitting this on the weekend for the final check for 4.0 compatibility!

@fflorent fflorent mentioned this pull request May 17, 2018
@xpayn
Copy link

xpayn commented May 24, 2018

@CAD97 any news on the nom v4 support front?

@progval
Copy link
Collaborator

progval commented Jun 2, 2018

I'm getting two errors when using this PR on a big parser with nom 4.0.0:

error[E0599]: no method named new_builder found for type nom_locate::LocatedSpan<&str> in the current scope

and

error[E0599]: no method named extend_into found for type nom_locate::LocatedSpan<&str> in the current scope

It can be reproduced with this code:

#[test]
fn test_escaped_string() {
    use nom::Needed; // https://github.com/Geal/nom/issues/780 
    named!(string<StrSpan, String>, delimited!(
        char!('"'),
        escaped_transform!(call!(nom::alpha), '\\', nom::anychar),
        char!('"')
    ));
    
    string(LocatedSpan::new(CompleteStr("\"foo \\\" bar\"")));
}    

EDIT: here is a patch:

diff --git a/src/lib.rs b/src/lib.rs
index 9300fbe..cb17004 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -65,7 +65,7 @@ use std::str::{CharIndices, Chars, FromStr};
 
 use memchr::Memchr;
 use nom::{AsBytes, Compare, CompareResult, FindSubstring, FindToken, InputIter, InputLength,
-          Offset, ParseTo, Slice, InputTake, InputTakeAtPosition, AtEof,
+          Offset, ParseTo, Slice, InputTake, InputTakeAtPosition, AtEof, ExtendInto,
           IResult, ErrorKind, Err, Context};
 use nom::types::{CompleteByteSlice, CompleteStr};
 use bytecount::{naive_num_chars, num_chars};
@@ -560,6 +560,72 @@ impl<T: ToString> ToString for LocatedSpan<T> {
     }
 }
 
+/// Implement nom::ExtendInto for a specific fragment type.
+///
+/// # Parameters
+/// * `$fragment_type` - The LocatedSpan's `fragment` type
+/// * `$item` - The type of the item being iterated (a reference for fragments of type `&[T]`).
+/// * `$extender` - The type of the Extended.
+///
+/// # Example of use
+///
+/// NB: This example is an extract from the nom_locate source code.
+///
+/// ````ignore
+/// #[macro_use]
+/// extern crate nom_locate;
+///
+/// impl_extend_into!(&'a str, char, String);
+/// impl_extend_into!(CompleteStr<'a>, char, String);
+/// impl_extend_into!(&'a [u8], u8, Vec<u8>);
+/// impl_extend_into!(CompleteByteSlice<'a>, u8, Vec<u8>);
+/// ````
+#[macro_export]
+macro_rules! impl_extend_into {
+    ($fragment_type:ty, $item:ty, $extender:ty) => (
+        impl<'a> ExtendInto for LocatedSpan<$fragment_type> {
+            type Item     = $item;
+            type Extender = $extender;
+
+            #[inline]
+            fn new_builder(&self) -> Self::Extender {
+                self.fragment.new_builder()
+            }
+
+            #[inline]
+            fn extend_into(&self, acc: &mut Self::Extender) {
+                self.fragment.extend_into(acc)
+            }
+        }
+    )
+}
+
+impl_extend_into!(&'a str, char, String);
+impl_extend_into!(CompleteStr<'a>, char, String);
+impl_extend_into!(&'a [u8], u8, Vec<u8>);
+impl_extend_into!(CompleteByteSlice<'a>, u8, Vec<u8>);
+
+#[macro_export]
+macro_rules! impl_hex_display {
+    ($fragment_type:ty) => (
+		impl<'a> nom::HexDisplay for LocatedSpan<$fragment_type> {
+			fn to_hex(&self, chunk_size: usize) -> String {
+				self.fragment.to_hex(chunk_size)
+			}
+
+			fn to_hex_from(&self, chunk_size: usize, from: usize) -> String {
+				self.fragment.to_hex_from(chunk_size, from)
+			}
+		}
+	)
+}
+
+impl_hex_display!(&'a str);
+impl_hex_display!(CompleteStr<'a>);
+impl_hex_display!(&'a [u8]);
+impl_hex_display!(CompleteByteSlice<'a>);
+
+
 /// Capture the position of the current fragment
 
 #[macro_export]

--- a/tests/integration_tests.rs
+++ b/tests/integration_tests.rs
@@ -110,6 +110,21 @@ fn find_substring<'a>(input: StrSpan<'a>, substr: &str) -> IResult<StrSpan<'a>,
     }
 }
 
+#[test]
+fn test_escaped_string() {
+    use nom::Needed; // https://github.com/Geal/nom/issues/780
+    named!(string<StrSpan, String>, delimited!(
+        char!('"'),
+        escaped_transform!(call!(nom::alpha), '\\', nom::anychar),
+        char!('"')
+    ));
+
+    assert_eq!(string(LocatedSpan::new(CompleteStr("\"foo\\\"bar\""))), Ok((
+        LocatedSpan { offset: 10, line: 1, fragment: CompleteStr("") },
+        "foo\"bar".to_string()
+    )));
+}
+
 named!(plague<StrSpan, Vec<StrSpan> >, do_parse!(
     bacille: apply!(find_substring, "le bacille") >>
     bacille_pronouns: many0!(apply!(find_substring, "il ")) >>

@progval
Copy link
Collaborator

progval commented Jun 6, 2018

ping? I can send a PR with @CAD97's work and my patch if you'd like

authored-by: ProgVal
@CAD97
Copy link
Contributor Author

CAD97 commented Jun 6, 2018

Sorry everyone for the delay; I've been working on various things and the project that initially created this PR has migrated away from nom for various reasons.

I've merged @progval's changes into this branch, so it should be go for merge.

@fflorent
Copy link
Owner

fflorent commented Jun 6, 2018

Everything seems good. I'll do a final review, merge it and publish the new version this weekend.

Thanks and kudos @CAD97! Also thanks @progval! You both rock! Are you OK with having your name in the authors list?

Florent

@progval
Copy link
Collaborator

progval commented Jun 6, 2018

@CAD97 thanks!

@fflorent Ok, thanks!

@CAD97
Copy link
Contributor Author

CAD97 commented Jun 6, 2018

👍

@xpayn
Copy link

xpayn commented Jun 7, 2018

Thanks a lot, all of you, for your work!

Copy link
Owner

@fflorent fflorent left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @CAD97 I have few questions left. If you could just answer them, I can apply the changes by myself. Thanks in advance :).

Sorry everyone, I said I would merge this weekend and that's my only fault… I'll do my best to publish the new version this week.

fn take_split(&self, count: usize) -> (Self, Self) {
(self.slice(count..), self.slice(..count))
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any reason for not returning the call of self.fragment.take or self.fragment.take_split?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe i'm wrong, but using self.fragment.take or self.fragment.take_split would return a value of the fragment's type and not Self. Furthermore, LocatedSpan<T>.offset and LocatedSpan<T>.line are updated by the slice method.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yeah, obviously, thanks!

}
})
}
}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also here, why not just returning self.fragment.(split_at_position|split_at_position1)?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, return type use Self, not fragment's type, and offset and line need to be updated.

@fflorent fflorent merged commit c030e30 into fflorent:master Jun 12, 2018
@sunjay
Copy link

sunjay commented Jun 12, 2018

Thanks for all your work on this everyone! Happy this has been merged 😄

@progval
Copy link
Collaborator

progval commented Jun 12, 2018

❤️

@fflorent
Copy link
Owner

fflorent commented Jun 17, 2018

Dear all, nom_locate v0.3.0 is now released with nom4 support. Thanks again to @CAD97 and @progval! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants