locatedExpr naming doesn't work when it's alone #294

Hi-Angel · 2021-08-03T11:47:51Z

This was found while creating a minimal testcase for #293. The testcase is basically the same code except I removed the second empty locatedExpr.

The problem is that, when you create a named locatedExpr as in locatedExpr(Regex(r'some text'))('line1'), it can only be called by line1 name if you add another expression to it. Otherwise referring by line1 will cause a KeyError: 'line1'

Steps to reproduce

$ cat ./parse.py
#!/usr/bin/python

from pyparsing import Regex, locatedExpr # type: ignore

regex_to_find \
    = locatedExpr(Regex(r'some text'))('line1')

def parse(text):
    r = locatedExpr(regex_to_find)('regex_to_find')
    return [match for match in r.scanString(text)]

def str_insert(string: str, substring: str, at: int) -> str:
    return string[:at] + substring + string[at:]

def process_matches(matches, text: str) -> str:
    print('Replacing…\n')
    for m in reversed(matches): # overwrite it starting from the bottom
        # prepend new text
        i_start = m[0]['regex_to_find'][f'line1']['locn_start']
        text = str_insert(text, 'foobar ', i_start)
    return text

text_to_process = \
"""	some text
"""

if __name__ == "__main__":
    matches = parse(text_to_process)
    print(f'Got matches: {matches}')
    if not matches:
        raise Exception("No matches")
    new_text = process_matches(matches, text_to_process)
    print(new_text)
$ ./parse.py
Got matches: [(([([8, ([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]}), 17], {'locn_start': [8], 'value': [([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]})], 'locn_end': [17]})], {'regex_to_find': [([8, ([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]}), 17], {'locn_start': [8], 'value': [([8, 'some text', 17], {'locn_start': [8], 'value': ['some text'], 'locn_end': [17]})], 'locn_end': [17]})]}), 8, 17)]
Replacing…

Traceback (most recent call last):
  File "/home/constantine/Projects/baum/bds/./parse.py", line 32, in <module>
    new_text = process_matches(matches, text_to_process)
  File "/home/constantine/Projects/baum/bds/./parse.py", line 19, in process_matches
    i_start = m[0]['regex_to_find'][f'line1']['locn_start']
  File "/usr/lib/python3.9/site-packages/pyparsing.py", line 598, in __getitem__
    return self.__tokdict[i][-1][0]
KeyError: 'line1'

Expected

There is no KeyError

Actual

There is KeyError

Additional information

pyparsing version: 2.4.7

The text was updated successfully, but these errors were encountered:

ptmcg · 2021-08-14T21:24:31Z

Ok, I think I've got this sorted out. There are some issues in your example code, but it also helped me find a bug in the to-be-released Located class.

First off, you have locatedExpr used in too many places. For simplicity, I changed regex_to_find to just the Regex expression:

regex_to_find = Regex(r'some text')

Then, to maintain the naming structure that is implied in your parse action, I defined r in the parse() method as:

r = Group(locatedExpr(regex_to_find)('line1'))('regex_to_find')
r.parseWithTabs()

If we don't Group the expressions when adding a second results name, then the names clash with each other. (And of course, added the parseWithTabs() call.)

I think with those 2 changes, locatedExpr works as you would like. When I went to replace it with the new Located class, I did find a bug, so this was very helpful there, thanks!

Finally, it looks like you are implementing your own flavor of transformString. You could implement most of this code with the following:

regex_to_find = Regex("some text")
regex_to_find.addParseAction(lambda tokens: "foobar " + tokens[0])
print(regex_to_find.transformString(text_to_process))

ptmcg · 2021-09-04T13:37:18Z

Closing due to inactivity, but go ahead and reopen if you have further comments or questions.

Hi-Angel · 2021-09-04T14:02:24Z

Sorry for silence, and thank you for resolving this. I recall back when you posted the comment I tried to do some experimentation, but haven't finished it due to lack of time. Anyway, since I haven't posted any questions, I presume it is alright. Thank you again for looking into this!

ptmcg mentioned this issue Aug 3, 2021

A tab breaks locn_start indexation #293

Closed

ptmcg added a commit that referenced this issue Aug 14, 2021

Fix bug in Located class when used with a results name. (Issue #294)

e5994cc

ptmcg closed this as completed Sep 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

locatedExpr naming doesn't work when it's alone #294

locatedExpr naming doesn't work when it's alone #294

Hi-Angel commented Aug 3, 2021 •

edited

ptmcg commented Aug 14, 2021

ptmcg commented Sep 4, 2021

Hi-Angel commented Sep 4, 2021

locatedExpr naming doesn't work when it's alone #294

locatedExpr naming doesn't work when it's alone #294

Comments

Hi-Angel commented Aug 3, 2021 • edited

Steps to reproduce

Expected

Actual

Additional information

ptmcg commented Aug 14, 2021

ptmcg commented Sep 4, 2021

Hi-Angel commented Sep 4, 2021

Hi-Angel commented Aug 3, 2021 •

edited