Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

jython re expression doesn't work #42

Closed
tfmorris opened this issue Oct 15, 2012 · 2 comments
Closed

jython re expression doesn't work #42

tfmorris opened this issue Oct 15, 2012 · 2 comments
Labels
imported from old code repo Issue imported from Google Code in 2010 Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Bug Issues related to software defects or unexpected behavior, which require resolution.

Comments

@tfmorris
Copy link
Member

Original author: raymond....@gmail.com (May 16, 2010 21:16:27)

I'm trying to transform BWV 1 — Wie schön leuchtet der Morgenstern, BWV 1 in a cell to Wie schön
leuchtet der Morgenstern using the following Jython function:

import re
v = cell["value"]
g = re.search(r"""— (._),\s_BWV""",v)
return g.group(1)

which, alas, returns null

However, in Jython 2.5.1, the following code works

-- coding: utf-8 --

import re

v = cell["value"]

v = "BWV 1 — Wie schön leuchtet der Morgenstern, BWV 1"
g = re.search(r"""— (._),\s_BWV""",v)
print g.group(1)

I'm using GW Version 1.0.1-r732

Original issue: http://code.google.com/p/google-refine/issues/detail?id=42

@tfmorris
Copy link
Member Author

From rawl...@gmail.com on May 17, 2010 03:05:54:
I've managed to narrow it down to the special hyphen character in the regex, but I'm
not yet sure why that causes the code to fail. The actual exception thrown is this:

Traceback (most recent call last):
File "", line 5, in _temp_
File "/home/vishal/Workspace/metaweb/freebase-gridworks-read-
only/lib/jython/re.py", line 142, in search
return _compile(pattern, flags).search(string)
File "/home/vishal/Workspace/metaweb/freebase-gridworks-read-
only/lib/jython/re.py", line 241, in _compile
raise error, v # invalid expression
sre_constants.error: nothing to repeat

I'm still investigating.

@tfmorris
Copy link
Member Author

From dfhu...@gmail.com on May 17, 2010 04:51:06:
Raymond, I found that this works

import re
g = re.search(ur"\u2014 (._),\s_BWV", value)
return g.group(1)

Could you check? Note the unicode character as well as the ur prefix. Unfortunately, I don't know a quick way
to encode unicode characters within jython code without writing a parser for it. So right now you'd have to do
the encoding yourself. I did this by first using GEL on the expression "—".unicode(), which gives 8212, and
then converting that from decimal to hex.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
imported from old code repo Issue imported from Google Code in 2010 Priority: Medium Represents important issues that need to be addressed but are not urgent Type: Bug Issues related to software defects or unexpected behavior, which require resolution.
Projects
None yet
Development

No branches or pull requests

1 participant