-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inital port to nom 4.0 #10
Conversation
Does not change any Cargo.toml settings, so CI is going to fail since it's grabbing a different version of nom. |
See rust-bakery/nom#637 for information about the new |
Nice, thank you! Do you experiment something special requiring nom_locate + nom 4? If so, I can do some efforts to make that build on some branch. Florent |
src/lib.rs
Outdated
line: 1, | ||
offset: 0, | ||
fragment: program | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
src/lib.rs
Outdated
impl<'a, 'b> FindToken<LocatedSpan<$fragment_type>> for $for_type { | ||
fn find_token(&self, input: LocatedSpan<$fragment_type>) -> bool { | ||
self.find_token(input.fragment) | ||
impl<'a, 'b> FindToken<$for_type> for LocatedSpan<$fragment_type> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah great, I guess this is because of this commit rust-bakery/nom@45f5b00#diff-44e8606db87b39b048179b1f7af8d244 ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like it. I preferred swapping the macro's impl around rather than definitions here, but it might be more appropriate to restructure the macro here ($fragment:ty, $token:ty
?). What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, yes, if you don't mind :).
I'm experimenting with implementing a language that's been bouncing around in my head for a few years now, and just moved from a hand-rolled nom derivative to nom. Because I'm chasing master though, I'm mirroring nom_locate in-library for now; I was unable to, in experiments in my fork, provide a version I could depend on that would unify the versions of nom. Maybe I'm just weak at cargo's Because I'm chasing nom master, for now I think I personally am better served mirroring nom_locate. I would re-target this to 4.0.0-alpha1, but it's already using some changes that got in after that alpha build as well. Once nom 4 gets another crates.io release, though, I'm definitely happy to come and help update. |
Everything compiles against nom 4.0.0-alpha2 now. failures:
Unfortunately the test output doesn't tell us what went wrong. My intuition is the stricter incomplete handling, though. We should consider AtEof and consider how we want to handle it (this PR still does not implement any of this crate's traits for |
rust-bakery/nom@47deaa1 makes |
@fflorent The tests pass now, and we're updated against alpha2. I chose to make |
Offering to wrap those types is good for consistency, but I'm still not sure if |
Hi @CAD97! Thanks very much for your work on this. Florent |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great, thank you! I just posted some few comments :).
src/lib.rs
Outdated
@@ -222,6 +222,12 @@ impl<T: InputLength> InputLength for LocatedSpan<T> { | |||
} | |||
} | |||
|
|||
impl<T> AtEof for LocatedSpan<T> { | |||
fn at_eof(&self) -> bool { | |||
true |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be wiser to call self.fragment.at_eof()
instead? As it is inline, if I am not wrong, it shouldn't cost anything, right?
That being said, most of our users must prefer dealing with complete input.
I'd rather be transparent with them and tell them in the examples to rather use :
type Span<'a> = LocatedSpan<CompleteStr<'a>>;
What do you think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, this seems like a good idea. However, it is blocked on rust-bakery/nom#672; LocatedSpan<T>
requires T: AsBytes
, which isn't true in alpha2.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mind if we wait for the next alpha to be released before merging in some branch?
Florent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems prudent to me. I'll add the CompleteStr impls and rebase to a cleaner story once the next alpha comes out.
And obviously you shouldn't publish anything that isn't a -alpha when building against -alpha nom.
src/lib.rs
Outdated
impl<'a, 'b> FindToken<LocatedSpan<$fragment_type>> for $for_type { | ||
fn find_token(&self, input: LocatedSpan<$fragment_type>) -> bool { | ||
self.find_token(input.fragment) | ||
impl<'a, 'b> FindToken<$for_type> for LocatedSpan<$fragment_type> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hum, yes, if you don't mind :).
Alright, -beta1 is out, let me update and rebase Existing CI on my project broke because a commit got garbage collected somewhere, so I'm really motivated to get finished all my cleaning-up work to -beta1 😛 |
Already there's another type added (rust-bakery/nom@e05c2c2) that'll need delegation added, but |
Alright @fflorent we should be updated to |
Sounds great! Do you have any clue when nom 4 will be released? Florent |
Hi! I'm trying this fork out with nom 4.0.0-beta2. Thanks for creating it! I noticed that it doesn't currently work perfectly with As for |
@sunjay I'm using Having written a full lexer with a cursor of this type, I agree with passing through |
It works for me now! Thank you for the quick update! |
Actually scratch that please nobody make a
|
P: Fn(Self::Item) -> bool | ||
{ | ||
self.fragment.split_at_position(predicate) | ||
.map(|(_i, o)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can just replace _i
to _
in order to skip it .
@@ -9,15 +9,16 @@ license = "MIT" | |||
name = "nom_locate" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you deserve to add yourself as an author of the crate ;). Thanks for your contributions !
Err::Incomplete(needed) => Err::Incomplete(needed), | ||
_ => unimplemented!("nom_locate didn't think Err::List was used in InputTakeAtPosition"), | ||
} | ||
}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are lots of code in common between split_at_position1
and split_at_position
. Do you think there is a room for a factorization thanks to a a macro or a private function?
fn compare_no_case(&self, t: LocatedSpan<B>) -> CompareResult { | ||
self.fragment.compare_no_case(t.fragment) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably worth to add unit tests so we're sure that it works as expected. What do you think?
(well, that's not really unit testing as we don't use mocks, but still :))
Yep, I'll be hitting this on the weekend for the final check for 4.0 compatibility! |
@CAD97 any news on the nom v4 support front? |
I'm getting two errors when using this PR on a big parser with nom 4.0.0:
and
It can be reproduced with this code: #[test]
fn test_escaped_string() {
use nom::Needed; // https://github.com/Geal/nom/issues/780
named!(string<StrSpan, String>, delimited!(
char!('"'),
escaped_transform!(call!(nom::alpha), '\\', nom::anychar),
char!('"')
));
string(LocatedSpan::new(CompleteStr("\"foo \\\" bar\"")));
} EDIT: here is a patch: diff --git a/src/lib.rs b/src/lib.rs
index 9300fbe..cb17004 100644
--- a/src/lib.rs
+++ b/src/lib.rs
@@ -65,7 +65,7 @@ use std::str::{CharIndices, Chars, FromStr};
use memchr::Memchr;
use nom::{AsBytes, Compare, CompareResult, FindSubstring, FindToken, InputIter, InputLength,
- Offset, ParseTo, Slice, InputTake, InputTakeAtPosition, AtEof,
+ Offset, ParseTo, Slice, InputTake, InputTakeAtPosition, AtEof, ExtendInto,
IResult, ErrorKind, Err, Context};
use nom::types::{CompleteByteSlice, CompleteStr};
use bytecount::{naive_num_chars, num_chars};
@@ -560,6 +560,72 @@ impl<T: ToString> ToString for LocatedSpan<T> {
}
}
+/// Implement nom::ExtendInto for a specific fragment type.
+///
+/// # Parameters
+/// * `$fragment_type` - The LocatedSpan's `fragment` type
+/// * `$item` - The type of the item being iterated (a reference for fragments of type `&[T]`).
+/// * `$extender` - The type of the Extended.
+///
+/// # Example of use
+///
+/// NB: This example is an extract from the nom_locate source code.
+///
+/// ````ignore
+/// #[macro_use]
+/// extern crate nom_locate;
+///
+/// impl_extend_into!(&'a str, char, String);
+/// impl_extend_into!(CompleteStr<'a>, char, String);
+/// impl_extend_into!(&'a [u8], u8, Vec<u8>);
+/// impl_extend_into!(CompleteByteSlice<'a>, u8, Vec<u8>);
+/// ````
+#[macro_export]
+macro_rules! impl_extend_into {
+ ($fragment_type:ty, $item:ty, $extender:ty) => (
+ impl<'a> ExtendInto for LocatedSpan<$fragment_type> {
+ type Item = $item;
+ type Extender = $extender;
+
+ #[inline]
+ fn new_builder(&self) -> Self::Extender {
+ self.fragment.new_builder()
+ }
+
+ #[inline]
+ fn extend_into(&self, acc: &mut Self::Extender) {
+ self.fragment.extend_into(acc)
+ }
+ }
+ )
+}
+
+impl_extend_into!(&'a str, char, String);
+impl_extend_into!(CompleteStr<'a>, char, String);
+impl_extend_into!(&'a [u8], u8, Vec<u8>);
+impl_extend_into!(CompleteByteSlice<'a>, u8, Vec<u8>);
+
+#[macro_export]
+macro_rules! impl_hex_display {
+ ($fragment_type:ty) => (
+ impl<'a> nom::HexDisplay for LocatedSpan<$fragment_type> {
+ fn to_hex(&self, chunk_size: usize) -> String {
+ self.fragment.to_hex(chunk_size)
+ }
+
+ fn to_hex_from(&self, chunk_size: usize, from: usize) -> String {
+ self.fragment.to_hex_from(chunk_size, from)
+ }
+ }
+ )
+}
+
+impl_hex_display!(&'a str);
+impl_hex_display!(CompleteStr<'a>);
+impl_hex_display!(&'a [u8]);
+impl_hex_display!(CompleteByteSlice<'a>);
+
+
/// Capture the position of the current fragment
#[macro_export]
--- a/tests/integration_tests.rs
+++ b/tests/integration_tests.rs
@@ -110,6 +110,21 @@ fn find_substring<'a>(input: StrSpan<'a>, substr: &str) -> IResult<StrSpan<'a>,
}
}
+#[test]
+fn test_escaped_string() {
+ use nom::Needed; // https://github.com/Geal/nom/issues/780
+ named!(string<StrSpan, String>, delimited!(
+ char!('"'),
+ escaped_transform!(call!(nom::alpha), '\\', nom::anychar),
+ char!('"')
+ ));
+
+ assert_eq!(string(LocatedSpan::new(CompleteStr("\"foo\\\"bar\""))), Ok((
+ LocatedSpan { offset: 10, line: 1, fragment: CompleteStr("") },
+ "foo\"bar".to_string()
+ )));
+}
+
named!(plague<StrSpan, Vec<StrSpan> >, do_parse!(
bacille: apply!(find_substring, "le bacille") >>
bacille_pronouns: many0!(apply!(find_substring, "il ")) >> |
ping? I can send a PR with @CAD97's work and my patch if you'd like |
authored-by: ProgVal
Sorry everyone for the delay; I've been working on various things and the project that initially created this PR has migrated away from nom for various reasons. I've merged @progval's changes into this branch, so it should be go for merge. |
👍 |
Thanks a lot, all of you, for your work! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @CAD97 I have few questions left. If you could just answer them, I can apply the changes by myself. Thanks in advance :).
Sorry everyone, I said I would merge this weekend and that's my only fault… I'll do my best to publish the new version this week.
fn take_split(&self, count: usize) -> (Self, Self) { | ||
(self.slice(count..), self.slice(..count)) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any reason for not returning the call of self.fragment.take
or self.fragment.take_split
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe i'm wrong, but using self.fragment.take
or self.fragment.take_split
would return a value of the fragment's type and not Self
. Furthermore, LocatedSpan<T>.offset
and LocatedSpan<T>.line
are updated by the slice
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah yeah, obviously, thanks!
} | ||
}) | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also here, why not just returning self.fragment.(split_at_position|split_at_position1)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, return type use Self
, not fragment's type, and offset and line need to be updated.
Thanks for all your work on this everyone! Happy this has been merged 😄 |
❤️ |
Compiles against Geal/nom#a53febdfdf99d16c0177f5c605bd98481a4ec50f
I was playing around with a small parser and tried out the latest changes with upstream nom. These are the changes that I had to patch into this library to get it compiling.
Very, very minimally tested (as in, it compiles and I have a micro-sized nom parser using it), and nom might change again before the proper 4.0 release, but I thought I'd share the work I did to get this to compile today.