Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with Series.str.match #2074

Closed
jseabold opened this issue Oct 15, 2012 · 9 comments
Closed

Problem with Series.str.match #2074

jseabold opened this issue Oct 15, 2012 · 9 comments
Labels
Bug Strings String extension data type and string data

Comments

@jseabold
Copy link
Contributor

Not sure yet what's going on here. Doesn't appear to be a unicode issue

df = pandas.DataFrame([[u'A Confe\xdfion of the most auncient and true christen catholike olde belefe accordyng to the ordre of the .XII. Articles of our co[m]mon crede, set furthe in Englishe to the glory of almightye God, and to the confirmacion of Christes people in Christes catholike olde faith.']], columns=["title"])

df.title.str.match(".*[A|a]lmight")
# returns 
#0    ()
#Name: title

re.match(".*[A|a]lmight", df.title.ix[0])
# expected output
#<_sre.SRE_Match at 0x4fa8100>
@gerigk
Copy link

gerigk commented Oct 15, 2012

match searches only at the beginning of the string. I guess you are
expecting the behavior of re.search ?

re.match(pattern, string,
flags=0)http://docs.python.org/library/re.html#re.match

If zero or more characters at the beginning of string match the regular
expression pattern, return a corresponding
MatchObjecthttp://docs.python.org/library/re.html#re.MatchObject
instance.
Return None if the string does not match the pattern; note that this is
different from a zero-length match.

On Mon, Oct 15, 2012 at 10:21 PM, Skipper Seabold
notifications@github.comwrote:

Not sure yet what's going on here. Doesn't appear to be a unicode issue

df = pandas.DataFrame([[u'A Confe\xdfion of the most auncient and true christen catholike olde belefe accordyng to the ordre of the .XII. Articles of our co[m]mon crede, set furthe in Englishe to the glory of almightye God, and to the confirmacion of Christes people in Christes catholike olde faith.']], columns=["title"])

df.title.str.match(".*[A|a]lmight")

returns

#0 ()
#Name: title

re.match(".*[A|a]lmight", df.title.ix[0])

expected output

#<_sre.SRE_Match at 0x4fa8100>


Reply to this email directly or view it on GitHubhttps://github.com//issues/2074.

@wesm
Copy link
Member

wesm commented Oct 15, 2012

The problem is that you have no groups in the regular expression:

In [16]: df.title.str.match("(.*[A|a]lmight)").ix[0]
Out[16]: (u'A Confe\xdfion of the most auncient and true christen catholike olde belefe accordyng to the ordre of the .XII. Articles of our co[m]mon crede, set furthe in Englishe to the glory of almight',)

A better behavior in the case where match.groups() is empty is to use match.group(0) if it exists

@jseabold
Copy link
Contributor Author

I showed the expected behavior of re.match in the code example. That's why I stuck the .* at the beginning. What am I missing?

@wesm
Copy link
Member

wesm commented Oct 15, 2012

Well, if you look at the implementation of str.match, you see it unpacks the matched groups from the SRE_Match object. That is the issue

@jseabold
Copy link
Contributor Author

Ah, I expected it to return something I can evaluate to True/False to make an index. Will adjust expectations.

@wesm
Copy link
Member

wesm commented Oct 15, 2012

Very open to API improvements here...put all those functions together over the course of about a day or so

@jseabold
Copy link
Contributor Author

It's (sort of) clear in the documentation that it finds groups but I find myself right now just wanting to know if something matches rather than to pull out the groups. I can see use cases for both though.

@cpcloud
Copy link
Member

cpcloud commented Jul 25, 2013

you could have a matches method

@jreback
Copy link
Contributor

jreback commented Sep 22, 2013

@jseabold this new method might work for you

#4685

@hayd hayd closed this as completed May 29, 2014
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Strings String extension data type and string data
Projects
None yet
Development

No branches or pull requests

6 participants