Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature Request: re_take_until! #709

Open
johncf opened this issue Mar 4, 2018 · 11 comments
Open

Feature Request: re_take_until! #709

johncf opened this issue Mar 4, 2018 · 11 comments
Milestone

Comments

@johncf
Copy link

johncf commented Mar 4, 2018

I am trying to parse a not-so-structured document, and this would be a nice feature to have, so that I don't have to directly rely on regex package (and for better code readability).

If you are interested, I can do a PR since it seems fairly straightforward.

@Geal
Copy link
Collaborator

Geal commented Mar 11, 2018

hello,
could you tell me more about what that combinator would do?

@johncf
Copy link
Author

johncf commented Mar 11, 2018

Example: re_take_until!("hello|world") will take_until that regular expression is matched.

Applying the above on It's not the end of the world! should return (remaining input: world!, output: It's not the end of the )

@Geal Geal added this to the 5.0 milestone Aug 18, 2018
@ElectricCoffee
Copy link

Honestly, a regex-based combinator would be absolutely amazing to have.

I don't doubt for one moment that nom can do everything regex can do, but there's just something nice about the succinctness of being able to write something like "[^@]+@[^@\.]+\.\w+" as a rudimentary email address parser that is appealing.

If you're then able to throw that into the greater nom ecosystem, that would be splendid.

@cormacrelf
Copy link

I don't doubt for one moment that nom can do everything regex can do

Without diving into the academic way of looking at this statement, I don't think there is a nom equivalent of this particular proposal. The take_* macros only do T -> bool or &[T], not other whole parsers.

@Geal
Copy link
Collaborator

Geal commented Mar 24, 2019

there's a lot of regex based combinators, you can find them by looking for the prefix re_ on https://docs.rs/nom/4.2.3/nom/

@cormacrelf
Copy link

Not sure if replying to me, but to clarify, this doesn't exist (yet), so I just wrote it myself:

// `take_till_match!(alt!(tag!("John") | tag!("Amanda")))`
// Running that on `"Hello, Amanda"` gives `Ok(("Amanda", "Hello, "))`
macro_rules! take_till_match(
  (__impl $i:expr, $submac2:ident!( $($args2:tt)* )) => (
    {
      use $crate::lib::std::result::Result::*;
      use $crate::lib::std::result::Result::*;
      use $crate::lib::std::option::Option::*;

      // TODO: replace nom with $crate
      use nom::{Err, Needed,need_more_err, ErrorKind};
      use nom::InputLength;
      use nom::FindSubstring;
      use nom::InputTake;
      use nom::Slice;

      let ret;
      let input = $i;
      let mut index = 0;

      loop {
        let slice = input.slice(index..); // XXX: this is bad with multi-byte unicode
        match $submac2!(slice, $($args2)*) {
          Ok((_i, _o)) => {
            ret = Ok(input.take_split(index));
            break;
          },
          Err(_e1)    => {
            if index >= input.len() {
                // XXX: this error is dramatically wrong
                ret = need_more_err(input, Needed::Size(0), ErrorKind::TakeUntil::<u32>);
                break;
            } else {
                index += 1;
            }
          },
        }
      }

      ret
    }
  );
  ($i:expr, $submac2:ident!( $($args2:tt)* )) => (
    take_till_match!(__impl $i, $submac2!($($args2)*));
  );
  ($i:expr, $g:expr) => (
    take_till_match!(__impl $i, call!($g));
  );
  ($i:expr, $submac2:ident!( $($args2:tt)* )) => (
    take_till_match!(__impl $i, $submac2!($($args2)*));
  );
  ($i:expr, $g: expr) => (
    take_till_match!(__impl $i, call!($g));
  );
);

@lawliet89
Copy link

I took @cormacrelf 's macro and made some changes.

First, I added a trait to allow "safe-slicing" of strings.

Secondly, I modified the macro to make use of the trait.

@cormacrelf
Copy link

@lawliet89 that's closer, but you could reuse existing APIs by making the trait give you an Iterator instead. Just abstract &str::char_indices().map(|(i, _)| i) and create an index++ version for byte slices. Here's what I ended up using in my code:

{
      let input = $i;
      for index in input.char_indices().map(|(i, _)| i) {
        let slice = input.slice(index..);
        match $submac2!(slice, $($args2)*) {
          Ok((_i, _o)) => {
            return Ok(input.take_split(index));
          },
          Err(_e1) => { },
        }
      }
      need_more_err(input, Needed::Size(0), ErrorKind::TakeUntil::<u32>)
}

@lawliet89
Copy link

@cormacrelf Thanks for your suggestion! Made some changes and it looks much better.

@Geal Geal modified the milestones: 5.0, 6.0 Apr 6, 2020
@tomalexander
Copy link

tomalexander commented Dec 19, 2020

Hey just stumbled upon this issue, I actually have a PR open for a take_until_parser_matches which it seems like we could then just put a regex parser as the parameter just like any other nom parser, solving the deficiency @cormacrelf pointed out in #709 (comment) . From my first-pass reading of @cormacrelf 's code in #709 (comment) mine functions in a very similar way except its a function instead of a macro and it looks like @cormacrelf 's supports streaming whereas mine is does not.

Unfortunately it seems Geal is very busy right now so I have no idea when it'll get eyes on it again.

PR: #469

@daboross
Copy link
Contributor

daboross commented Mar 8, 2023

I'd propose closing this as regex functions are no longer present in this crate. I've opened up a new issue on nom-regex, rust-bakery/nom-regex#3, to continue the request.

I don't think take_until_parser_matches is a good solution here, as iterating a regex-containing parser multiple times essentially redoes the work of a Regex "find" function, and thus eliminates a big performance benefit of using regex for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants