New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add support for "regex" library #22496
Comments
Are patterns compiled by |
No they are not. import re
import typing
import regex
re_pat = re.compile(r"\d")
regex_pat = regex.compile(r"\d")
re_pat.__class__.mro() # [_sre.SRE_Pattern, object]
isinstance(re_pat, typing.Pattern) # True
regex_pat.__class__.mro() # [_regex.Pattern, object]
isinstance(regex_pat, typing.Pattern) # False |
Hi @pmav99, any luck with this? Or did you happen to create a workaround for yourself? |
@madimov, I think I used vanila |
We could probably optionally import regex and append it’s type to the re types we handle.
… On Jul 26, 2019, at 10:23, pmav99 ***@***.***> wrote:
@madimov, I think I used vanila re for pandas, and regex for everything else. Not nice ,but there was no feedback and I needed to move on.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
@TomAugspurger that would be great! Might you have the time to give it a go? |
No, I won't have time.
…On Mon, Jul 29, 2019 at 3:21 AM Miko Dimov ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger> that would be great!
Might you have the time to give it a go?
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22496?email_source=notifications&email_token=AAKAOITEIYARGCLBCFFOYVLQB2SB5A5CNFSM4FRLSLXKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD27632Q#issuecomment-515894762>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAKAOIT5NQEZ7LURSM6AADLQB2SB5ANCNFSM4FRLSLXA>
.
|
If the answer to this is "no", then that's an upstream bug IMO. |
That said, here is my attempt at a fix: master...gwerbin:patch-2 Just made the edits here on Github, so haven't actually run any tests yet. |
Tom didn't have time, but PRs are welcome. |
@jbrockmendel did you take a look at my proposed patch? It will probably need a major rebase obviously. Just want to make sure what I did is an acceptable approach before I put more time into it. |
That looks roughly correct. You'll need to update some of the CI envs in |
@gwerbin thanks for pinging on this. Yah, that looks a lot less invasive than I expected, seems reasonable. |
Hi guys, any update on this? Using regex module in Pandas would be really useful for a lot of scenarios. |
@lucazav you or anyone in the community can submit a PR. all folks working in pandas are volunteers |
@jreback I'm not an experienced Pythonst. But I see that someone else has already proposed an easy solution just a few comments above, so I assumed it would be just as easy to submit a PR containing that code for you experts. |
@lucazav and someone needs to make an actual pull request with testing and documentation core devs can provide code review |
I totally forgot about this. I am willing to take the lead on this, going through the effort to update the docs, run the test suite, etc. However, I think my patch is a hack around the fact that
The reason I believe a generic solution is better than a I am willing to start work on (1), free time permitting, and possibly even (2). But I'd like some feedback on this idea from the Pandas dev community before I commit a bunch of time for it. |
@gwerbin Above is so true. I wish I could use Basically, we need to be able to switch the internal regex engine used for pandas' string methods. |
Like pcre2. See a comparison of language features for regular expression engines. |
Known differences between google-RE2 and re:
|
Code Sample, a copy-pastable example if possible
A simpler way to demonstrate the problem is:
Problem description
The regex library seems not to be supported by pandas. Not sure if you want to add support for it, but I had a quick look and It seems relatively straight forward to add support for it (+ it would make maintainance for projects that have already opted for
regex
easier).How to fix
So, I think that the steps that seem to be required are:
pandas.core.dtypes.inference.is_re
should return True forregex
compiled patterns too (assuming thatregex
is installed of course).re.compile()
(as is being done e.g. here):Output of
pd.show_versions()
The text was updated successfully, but these errors were encountered: