-
-
Notifications
You must be signed in to change notification settings - Fork 274
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Regression in 3.0.2, 3.0.3, and 3.0.4 #323
Comments
I've downloaded translate-toolkit and tried to create a minimal extract of your parser to test it in isolation, but am unable to reproduce any parsing error on 3.0.2. Without a repro example, I can't really tell which parts of pyparsing need to change, or know when I have fixed the problem. Here is the code I'm using to test your parser: # https://github.com/pyparsing/pyparsing/issues/323
# repo: https://github.com/translate/translate/
# parser: https://github.com/translate/translate/blob/master/translate/storage/rc.py
from pyparsing import *
RC_SOURCE = r"""
#include "other_file.h" // This must be ignored
LANGUAGE LANG_ENGLISH, SUBLANG_DEFAULT
/////////////////////////////////////////////////////////////////////////////
//
// Dialog
//
IDD_REGGHC_DIALOG DIALOGEX 0, 0, 211, 191
STYLE DS_SETFONT | DS_MODALFRAME | DS_FIXEDSYS | WS_POPUP | WS_VISIBLE | WS_CAPTION | WS_SYSMENU
EXSTYLE WS_EX_APPWINDOW
CAPTION "License dialog"
FONT 8, "MS Shell Dlg", 0, 0, 0x1
BEGIN
PUSHBUTTON "Help",ID_HELP,99,162,48,15
PUSHBUTTON "Close",IDCANCEL,151,162,48,15
PUSHBUTTON "Activate instalation",IDC_BUTTON1,74,76,76,18
CTEXT "My very good program",IDC_STATIC1,56,21,109,19,SS_SUNKEN
CTEXT "You can use it without registering it",IDC_STATIC,35,131,128,19,SS_SUNKEN
PUSHBUTTON "Offline",IDC_OFFLINE,149,108,42,13
PUSHBUTTON "See license",IDC_LICENCIA,10,162,85,15
RTEXT "If you don't have internet, please use magic.",IDC_STATIC,23,105,120,18
ICON IDR_MAINFRAME,IDC_STATIC,44,74,20,20
CTEXT "Use your finger to activate the program.",IDC_ACTIVADA,17,50,175,17
ICON IDR_MAINFRAME1,IDC_STATIC6,18,19,20,20
END
MainMenu MENU
{
POPUP "&Debug"
{
MENUITEM "&Memory usage", ID_MEMORY
POPUP
{
MENUITEM SEPARATOR
MENUITEM "&Walk data heap", ID_WALK_HEAP
}
}
}
STRINGTABLE
BEGIN
ID_T_1 "Hello"
END
"""
POFILE = """
#: DIALOGEX.IDD_REGGHC_DIALOG.CAPTION
msgid "License dialog"
msgstr "Licenční dialog"
"""
def rc_statement():
"""
Generate a RC statement parser that can be used to parse a RC file
:rtype: pyparsing.ParserElement
"""
one_line_comment = "//" + rest_of_line
comments = c_style_comment ^ one_line_comment
precompiler = Word("#", alphanums) + rest_of_line
language_definition = (
"LANGUAGE"
+ Word(alphas + "_").set_results_name("language")
+ Optional("," + Word(alphas + "_").set_results_name("sublanguage"))
)
block_start = (Keyword("{") | Keyword("BEGIN")).set_name("block_start")
block_end = (Keyword("}") | Keyword("END")).set_name("block_end")
reserved_words = block_start | block_end
name_id = ~reserved_words + Word(alphas, alphanums + "_").set_name("name_id")
numbers = Word(nums)
integerconstant = numbers ^ Combine("0x" + numbers)
constant = Combine(
Optional(Keyword("NOT")) + (name_id | integerconstant),
adjacent=False,
join_string=" ",
)
autoname_elements()
combined_constants = delimited_list(constant, "|")
concatenated_string = OneOrMore(quoted_string)
block_options = Optional(
SkipTo(Keyword("CAPTION"), fail_on=block_start)("pre_caption")
+ Keyword("CAPTION")
+ quoted_string("caption")
) + SkipTo(block_start)("post_caption")
undefined_control = (
Group(
name_id.set_results_name("id_control")
+ delimited_list(
concatenated_string ^ constant ^ numbers ^ Group(combined_constants)
).set_results_name("values_")
)
| comments
)
block = block_start + ZeroOrMore(undefined_control)("controls") + block_end
dialog = (
name_id("block_id")
+ (Keyword("DIALOGEX") | Keyword("DIALOG"))("block_type")
+ block_options
+ block
)
string_table = Keyword("STRINGTABLE")("block_type") + block_options + block
menu_item = Keyword("MENUITEM")("block_type") + (
pyparsing_common.comma_separated_list("values_") | Keyword("SEPARATOR")
)
popup_block = Forward()
popup_block <<= Group(
Keyword("POPUP")("block_type")
+ Optional(quoted_string("caption"))
+ block_start
+ ZeroOrMore(Group(menu_item | popup_block))("elements")
+ block_end
)("popups*")
menu = (
name_id("block_id")
+ Keyword("MENU")("block_type")
+ block_options
+ block_start
+ ZeroOrMore(popup_block)
+ block_end
)
autoname_elements()
return comments ^ precompiler ^ language_definition ^ dialog ^ string_table ^ menu
import pyparsing
print(pyparsing.__version_info__)
parser = rc_statement()
parser.create_diagram("translate_toolkit_rc_parser.html")
parser.run_tests([
"""\r\nSTRINGTABLE\r\nBEGIN\r\n// Comment\r\nIDS_1 "Copied"\r\nEND\r\n""",
"""\r\nSTRINGTABLE\r\nBEGIN\r\n // Comment\r\n IDS_1 "Zkopirovano"\r\nEND\r\n""",
])
print()
result = Group(parser)[...].parse_string(RC_SOURCE, parse_all=True)
result.pprint() |
I was also not able to extract smaller testcase so far, but here is my observation so far:
Line 3850 in e269b0a
Line 3859 in e269b0a
|
Without being able to reproduce the problem or test case, we are stuck with trial and error. If I remove those lines, other pyparsing tests fail. Can you restore those lines, but instead delete these lines in
https://github.com/pyparsing/pyparsing/blob/master/pyparsing/core.py#L3869-L3870 |
Okay, here is the reproducer: from pyparsing import Keyword, Word, alphanums, alphas
block_start = (Keyword("{") | Keyword("BEGIN")).set_name("block_start")
block_end = (Keyword("}") | Keyword("END")).set_name("block_end")
reserved_words = block_start | block_end
name_id = ~reserved_words + Word(alphas, alphanums + "_").set_name("name_id")
dialog = name_id("block_id") + (Keyword("DIALOGEX") | Keyword("DIALOG"))("block_type")
string_table = Keyword("STRINGTABLE")("block_type")
parser = dialog ^ string_table
parser.addParseAction(lambda x: x)
result = parser.transformString(
b"""\r\nSTRINGTABLE\r\nBEGIN\r\n// Comment\r\nIDS_1 "Copied"\r\nEND\r\n""".decode()
)
print(repr(result))
assert result[:2] == "\r\n" The change you have suggested in |
This is a huge help, and reproduces the problem nicely, thanks! |
It can be probably simplified even more, I just didn't have time to play with it further. Maybe following parser will be enough to trigger this as well: parser = ~Keyword("BEGIN") ^ Keyword("STRINGTABLE") |
Yep. Even this triggers it:
The '~' (generates a |
Ok, I've come up with a fix, BUT it is a change in However, I think I have found a workaround in your parser, that works with the current released version (and should work with all the versions, I think). Replace:
with
Please give this a try. Otherwise, I would ask you to pin your pyparsing version to an existing version that works for you, until I can introduce this other change in a 3.1.0 pre-release. |
The workaround with |
ok. I won't be able to look at this again for a few days now, so for now you should probably pin to 2.4.7. I would not recommend pinning to 3.0.1, it is broken in some other areas. I'll leave this open so I can revisit - hopefully this weekend. |
I've already ported the code to 3 API, so pinning to 2.4.7 would need reverting these changes. We will stick with 3.0.1 for now as it works fine in the test suite and I believe it covers most of the use cases. |
I have a modified version of |
Thanks, that makes sense. I've turned it into translate/translate#4468 |
This is needed to address bug in the translate-toolkit RC parser, see pyparsing/pyparsing#323
Can this issue be closed now? |
For translate-toolkit it is fixed. It's up to you if you consider this big in pyparsing or in our parser |
The negative lookahead works in pyparsing for normal calls to
But not so with
Upcase identifiers will upcase more than we want:
Making our transformer recognize keywords, but not transform them is one solution.
Adding WordStart before the negative lookahead is another.
With this simple parser, I couldn't come up with an easy example using the And lastly, why did this work in 2.4.7 but break in 3.0? I think it goes back to how whitespace skipping was changed for the So for |
Documenting this is certainly a good approach. |
Documented in the Common Pitfalls wiki page: |
In translate-toolkit we use pyparsing for parsing Windows RC files.
I did port the code to 3.0.1 and everything worked fine. On 3.0.2 and 3.0.3 we get testsuite failures (see https://github.com/translate/translate/runs/4028529279?check_suite_focus=true).
The guilty commit is 4ab17bb, reverting it makes it work again.
The text was updated successfully, but these errors were encountered: