Support for named groups in Regexp.scan and Regexp.run #434

Merged
merged 1 commit into from Aug 27, 2012

3 participants

@khia

No description provided.

@travisbot

This pull request passes (merged 5331a4f8 into 9259cfc).

@josevalim josevalim and 1 other commented on an outdated diff Aug 14, 2012
lib/elixir/lib/regex.ex
@@ -51,12 +51,13 @@ defmodule Regex do
`{ :error, reason }` otherwise.
"""
def compile(source, options // "") do
+ groups = parse_groups(source)
@josevalim
Elixir member
josevalim added a line comment Aug 14, 2012

I don't like that we need to parse the regex here. Can we simply ask the user to pass the users he wants to retrieve? As I said before, just something like:

Regex.scan(regex, binary, captures: ["Group1", "Group2"])
@khia
khia added a line comment Aug 14, 2012

In my use case user doesn't know groups.
We could either rename parse_groups into groups and make it public or pass :groups option to compile.

@josevalim
Elixir member
josevalim added a line comment Aug 14, 2012

A explicit compile group option seems great. If we are doing so, I don't think there is a need for captures: as I said above then.

@josevalim
Elixir member
josevalim added a line comment Aug 14, 2012

Actually, this is trickier. Because setting a flag in the regex should not affect the return type of the function. For instance, it is weird that Regex.scan may return a different structure (in this case a list of tuples) just because the regex was compiled with the group option. Let me think more about the topic.

@khia
khia added a line comment Aug 14, 2012

We could introduce ?g option and then remove it in translate_options.
I'm also implementing following options for 'run' and 'scan':

  • :global
  • :return
    • :list
    • :binary
@josevalim
Elixir member
josevalim added a line comment Aug 14, 2012

We could introduce ?g option and then remove it in translate_options.

This is what I had in mind. But if we are going to have a different result because of the flag ?g, it is a no-go imho. :S Maybe we should have a different API then just for group checks.

  • :global

Great.

  • :return
    • :list
    • :binary

Do we really need it? If you want binary as response, you can simply convert the input to binary before calling it.

@khia
khia added a line comment Aug 14, 2012

If the input is very big we have two options:

  • instruct Erlang.re (PCRE) to convert matches for us
  • do it ourself

Converting input if it's big is not an option (converting just matches way more efficient)

@khia
khia added a line comment Aug 14, 2012

This is what I had in mind. But if we are going to have a different result because of the flag ?g, it is a no-go imho. :S Maybe we should have a different API then just for group checks.
We could have separate function named as :

  • match
  • matches
  • search
  • match_to_keyword
  • to_keyword
@josevalim
Elixir member
josevalim added a line comment Aug 14, 2012

Right, it makes sense then.

Btw, run is exactly the same as scan, except that scan has global set to true. So I guess we don't need a global option?

@khia
khia added a line comment Aug 14, 2012

Except that scan does

          lc result inlist results do
            case result do
              [t] -> t
              [h|t] -> t
            end
          end

After that info about groups is not recoverable. However it could be fixed. I tend to agree that separate api would be better.

@josevalim
Elixir member
josevalim added a line comment Aug 14, 2012

OK. How this sounds to you then:

1) Change run to support a global option;

2) We can add return support to both scan and run;

After 1) and 2) is done and it is not enough for you to distinguish groups, we can think about an API for groups. Wdyt? Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
@josevalim
Elixir member

Thanks for the PR. Added on extra comment.

@travisbot

This pull request passes (merged eed09dbc into 9259cfc).

@josevalim
Elixir member

I have merged the previous one. Is this one ready to merge too? Thanks!

@khia

It is ready and it has rebased commits from previous one. However I would rather close this pull request rewrite a history and open a new one (I want to get rid of commit 5331a4f).

@travisbot

This pull request passes (merged 3178193 into 2297a06).

@khia

I have rewritten the history. Rewriting history didn't play well with pull requests on github, however it look like it is working now. Let me know if it doesn't merge I'll reopen the pull request then.

@khia

Did you review this request? Would you be able to merge it?

@josevalim
Elixir member
@josevalim
Elixir member

Hey @khia. I have just reviewed this pull request, thanks.

I am still concerned that Regex.match will run the regex many times, one per each group. Maybe we could call re:run() just once? For example:

re:run("ABCabcdABC",".*(?<FOO>abcd)(?<BAR>ABC).*",[{capture,['FOO', 'BAR'], binary}]).      
{match,[<<"abcd">>,<<"ABC">>]}

Also, what is the benefit over this approach compared to calling Regex.match passing :captures as an option? The code that calls Regex.match needs to be aware of the groups returned by this function, so I don't see the benefit over us parsing the regex in the first place. In all cases I can think of, the one responsible to choose which groups to return is always the calling code, not the code that defines the regex.

@josevalim josevalim merged commit 89b5e34 into elixir-lang:master Aug 27, 2012

1 check passed

Details default The Travis build passed
@josevalim
Elixir member

I have slightly changed your pull request in another commit and implemented it in terms of Regex.run, passing all captures at once, avoiding the multiple matches! Thanks for your contributions!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment