Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add one_of parser #13

Closed
spookylukey opened this issue Oct 2, 2017 · 7 comments
Closed

Add one_of parser #13

spookylukey opened this issue Oct 2, 2017 · 7 comments
Milestone

Comments

@spookylukey
Copy link
Member

Should accept either a string (which will be split into characters), or a list of strings.

def one_of(strings):
    """
    Returns a parser that matches any of the passed in strings
    """
    # Sort longest first, so that backtracking works correctly
    return alt(*map(parsy.string, sorted(strings, key=lambda s: -len(s))))
@spookylukey spookylukey added this to the 1.0 milestone Oct 2, 2017
@bugaevc
Copy link
Member

bugaevc commented Oct 2, 2017

sorted(strings, key=len, reverse=True)

And the name should indicate it works only for strings and that it is different from alt() in supporting backtracking. With that in mind, wouldn't it be better to just add an optional backtrack argument to alt()?

Off-topic, but I see that now @jneen's repo is marked as a fork of this one; any better way to do that than contacting GitHub support?

@spookylukey
Copy link
Member Author

@bugaevc - it would be difficult to add it into alt, because the sorting logic only works for strings, and because we want to wrap the options in parsy.string.

The motivations is that things like:

alt(string("a"), string("b"), string("c"))
alt(string("foo"), string("bar"))

are common enough to want a shortcut like:

one_of("abc")
one_of(["foo", "bar"])

and secondly, the backtracking issue of:

alt(string("thing", string("thing2"), string("thing3"))

which doesn't work as intended, can also benefit from something that gets the backtracking right.

For the first case, I think we'd want a more efficient implementation that doesn't use alt etc. - what I've posted above covers the second case.

I'm not convinced we need to make it more verbose by adding something into the name to indicate the type, especially if we have two different types that it can accept. Parsimmon has something very similar, only for strings: https://github.com/jneen/parsimmon/blob/master/API.md#parsimmononeofstring

We could have two different functions, but then that would require two names, and I can't think of better ones. I think they are similar enough in purpose to benefit from being a single function.

Regarding changing the root repo - it seems the only way is to contact GitHub support, they just need confirmation from the relevant people.

@bugaevc
Copy link
Member

bugaevc commented Oct 2, 2017

Okay, so I've come the full circle from trying to write down the changes to alt() I'd replace this with, to realizing it should be better done by the caller and to thinking of writing a simple wrapper around alt() to do that.

But I'd still like a more descriptive name, how about string_from()? And the docstring should mention backtracking (ex. "this function picks the longest of the passed strings that do match").

@jneen
Copy link
Contributor

jneen commented Oct 2, 2017

How about .keywords("thing", "thing2", ...)?

@spookylukey
Copy link
Member Author

@jneen that works well for multiple strings, not so good for one_of('0123456789').
@bugaevc similarly for string_from('0123456789') which could be confusing.
one_of seems pretty descriptive to me:

digit = one_of('0123456789')
boolean = one_of(["true", "false"])

Maybe we could have string_from and char_from, and just have two separate functions, if we want to have the data type in the name. I'm thinking of adding any_char so maybe that would complement it.

@bugaevc
Copy link
Member

bugaevc commented Oct 2, 2017

I'd rather have the function accept *strings so that you don't have to pack them in a list at the call site. If you want to pick a character from a string, call it like string_from(*'1234567890').

one_of would be fine if it was the only way to alternate things (and indeed, it would make sense to name the current alt one_of), but it is specific to strings/keywords and the name should indicate that.

@spookylukey
Copy link
Member Author

This is implemented now in PR #15

I implemented char_from separately because:

  1. string_from(*"abc") is really pretty obscure for most Python users, and this is a really common case
  2. char_from implemented using the in operator to test is many times faster. (5 times faster for 10 digits, 25 times faster for 62 chars (upper, lower, digits)).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants