Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

locatedExpr naming doesn't work when it's alone #294

Closed
Hi-Angel opened this issue Aug 3, 2021 · 3 comments
Closed

locatedExpr naming doesn't work when it's alone #294

Hi-Angel opened this issue Aug 3, 2021 · 3 comments

Comments

@Hi-Angel
Copy link

Hi-Angel commented Aug 3, 2021

This was found while creating a minimal testcase for #293. The testcase is basically the same code except I removed the second empty locatedExpr.

The problem is that, when you create a named locatedExpr as in locatedExpr(Regex(r'some text'))('line1'), it can only be called by line1 name if you add another expression to it. Otherwise referring by line1 will cause a KeyError: 'line1'

Steps to reproduce

$ cat ./parse.py
#!/usr/bin/python

from pyparsing import Regex, locatedExpr # type: ignore

regex_to_find \
    = locatedExpr(Regex(r'some text'))('line1')

def parse(text):
    r = locatedExpr(regex_to_find)('regex_to_find')
    return [match for match in r.scanString(text)]

def str_insert(string: str, substring: str, at: int) -> str:
    return string[:at] + substring + string[at:]

def process_matches(matches, text: str) -> str:
    print('Replacing…\n')
    for m in reversed(matches): # overwrite it starting from the bottom
        # prepend new text
        i_start = m[0]['regex_to_find'][f'line1']['locn_start']
        text = str_insert(text, 'foobar ', i_start)
    return text

text_to_process = \
"""	some text
"""

if __name__ == "__main__":
    matches = parse(text_to_process)
    print(f'Got matches: {matches}')
    if not matches:
        raise Exception("No matches")
    new_text = process_matches(matches, text_to_process)
    print(new_text)
$ ./parse.py
Got matches: [(([([8, ([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]}), 17], {'locn_start': [8], 'value': [([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]})], 'locn_end': [17]})], {'regex_to_find': [([8, ([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]}), 17], {'locn_start': [8], 'value': [([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]})], 'locn_end': [17]})]}), 8, 17)]
Replacing…

Traceback (most recent call last):
  File "/home/constantine/Projects/baum/bds/./parse.py", line 32, in <module>
    new_text = process_matches(matches, text_to_process)
  File "/home/constantine/Projects/baum/bds/./parse.py", line 19, in process_matches
    i_start = m[0]['regex_to_find'][f'line1']['locn_start']
  File "/usr/lib/python3.9/site-packages/pyparsing.py", line 598, in __getitem__
    return self.__tokdict[i][-1][0]
KeyError: 'line1'

Expected

There is no KeyError

Actual

There is KeyError

Additional information

pyparsing version: 2.4.7

@ptmcg
Copy link
Member

ptmcg commented Aug 14, 2021

Ok, I think I've got this sorted out. There are some issues in your example code, but it also helped me find a bug in the to-be-released Located class.

First off, you have locatedExpr used in too many places. For simplicity, I changed regex_to_find to just the Regex expression:

regex_to_find = Regex(r'some text')

Then, to maintain the naming structure that is implied in your parse action, I defined r in the parse() method as:

r = Group(locatedExpr(regex_to_find)('line1'))('regex_to_find')
r.parseWithTabs()

If we don't Group the expressions when adding a second results name, then the names clash with each other. (And of course, added the parseWithTabs() call.)

I think with those 2 changes, locatedExpr works as you would like. When I went to replace it with the new Located class, I did find a bug, so this was very helpful there, thanks!

Finally, it looks like you are implementing your own flavor of transformString. You could implement most of this code with the following:

regex_to_find = Regex("some text")
regex_to_find.addParseAction(lambda tokens: "foobar " + tokens[0])
print(regex_to_find.transformString(text_to_process))

@ptmcg
Copy link
Member

ptmcg commented Sep 4, 2021

Closing due to inactivity, but go ahead and reopen if you have further comments or questions.

@ptmcg ptmcg closed this as completed Sep 4, 2021
@Hi-Angel
Copy link
Author

Hi-Angel commented Sep 4, 2021

Sorry for silence, and thank you for resolving this. I recall back when you posted the comment I tried to do some experimentation, but haven't finished it due to lack of time. Anyway, since I haven't posted any questions, I presume it is alright. Thank you again for looking into this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants