-
-
Notifications
You must be signed in to change notification settings - Fork 31.7k
expat parser throws Memory Error when parsing multiple files #50925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
I'm using the Expat python interface to parse multiple XML files in an Python Version: 2.6.2 |
This also occurs with Python 2.5.1 on OS X |
I am also seeing this with Python 2.5.2 on Ubuntu. |
Just in case it wasn't obvious - the workaround is to create a new |
I'm not familiar with expat, but we can see what is happening more Traceback (most recent call last):
File "expat-error.py", line 14, in <module>
p.ParseFile(file)
xml.parsers.expat.ExpatError: parsing finished: line 2, column 482 It seems ParseFile() doesn't support second call. I'm not sure this is |
The patch is good; a test would be appreciated. The difference now is that in case of true low-memory conditions,
|
Well, I tried to write test like this.
But I noticed XML_ERROR_FINISHED is not integer but string. (!) According to
Is this document bug or implementation bug? Personally, I think string |
Looks like an implementation bug to me; far too late to change it, though. In your test, you could use |
Here is the patch. I'm not confident with my English comment though. |
Do you know the new "context manager" feature of assertRaises? it makes |
I knew existence of that new feature, but didn't know how to use it. |
Hmm, looks useful. I think your patch is good. Only one problem is that |
I don't think this is a python specific problem. I have just seen |
The documentation should definitely be updated to clarify that a parser instance is not reusable with more than one file. I had a look at the equivalent documentation for Perl and TCL, and Perl's implementation explicitly does not allow attempts to reuse the parser instance (which is clearly noted in the documentation), and TCL's implementation (or one of them, anyway) offers a reset call that explicitly resets the parser in preparation for another file to be submitted. |
I agree that, at a minimum, the documentation should be updated to include a warning about not reusing a parser instance. Whether it's worth trying to plug all the holes in the expat library is another issue (see, for instance, bpo-12829). David, would you be willing to propose a wording for a documentation change? |
Also, note bpo-1208730 proposes a feature to expose a binding for XML_ParserReset and has the start of a patch. |
Ned: My proposed wording is: "Note that only one document can be parsed by a given instance; it is not possible to reuse an instance to parse multiple files." To provide more detail, one could also add something like: "The isfinal argument of the Parse() method is intended to allow the parsing of a single file in fragments, not the submission of multiple files." |
Updating to reflect the Python 3.4 documentation is now also relevant to this discussion. Perhaps someone could commit a change something like my suggestion in msg143295? |
Thanks for the reminder, David. Here are patches for 3.x and 2.7 that include updated versions of the proposed pyexpat.c and test_pyexpat.py patches along with a doc update along the lines suggested by David. |
New changeset 74faca1ac59c by Ned Deily in branch '2.7': New changeset 9e3fc66ee0b8 by Ned Deily in branch '3.4': New changeset ee0034434e65 by Ned Deily in branch 'default': |
Applied for release in 3.5.0, 3.4.1 and 2.7.7. Thanks, everyone! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: