-
-
Notifications
You must be signed in to change notification settings - Fork 31.6k
sax.parser cannot get xml data from a subprocess pipe #67104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
With the attached code, and an xml file, I got the following error with python 3.4.2: $ cat toto.xml
<?xml version='1.0' encoding='UTF-8'?>
<test></test> $ python3.4 toto.py
Traceback (most recent call last):
File "toto.py", line 10, in <module>
parse(proc.stdout, ContentHandler())
File "/usr/lib/python3.4/xml/sax/__init__.py", line 33, in parse
parser.parse(source)
File "/usr/lib/python3.4/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/usr/lib/python3.4/xml/sax/xmlreader.py", line 119, in parse
self.prepareParser(source)
File "/usr/lib/python3.4/xml/sax/expatreader.py", line 111, in prepareParser
self._parser.SetBase(source.getSystemId())
TypeError: must be str, not int The same test works with python2, and I would expect the code to work with python 3 too. Thanks, |
My guess is that this is similar to bpo-21044. |
I've looked at the sax code, and this does indeed have the same root cause: in python2 a dummy string was used for the 'name' attribute of io objects that are based on file descriptors, whereas in python3 the name is the integer value of the file descriptor. In test_sax we can see getSystemId being tested that it returns a filename (see test_expat_locator_withinfo). The fix should be analogous to the bpo-21044 fix: check that 'name' is a string and if not don't use it. I'm marking this as easy; hopefully someone will want to tackle figuring out exactly where in the code it needs to be fixed and adding tests for it. |
Here is a patch to add a test to test_sax.py. I'm not sure on the fix. Is the following one a correct one ? def prepareParser(self, source):
if source.getSystemId() is not None:
- self._parser.SetBase(source.getSystemId())
+ self._parser.SetBase(str(source.getSystemId())) Or should I remove the call to SetBase if getSystemId() is not a string ? |
Forgot to attach the testcase when opening the bug. |
Has anyone investigated what exactly sax uses SystemId/SetBase *for*? I think think we need that info in order to decide what to do, and I'm not familiar with sax. |
This bug should be fixed in other place. Here is a patch. |
Serhiy's patch looks correct to me. Given that if the source doesn't have a name attribute it is simply not set in the existing code, this change should be "safe" (backward compatible). Elsewhere the possibility was raised of converting the int to a string (<fdopen: N>), but that issue is a more global one and would apply at the io module level...and if implemented this fix would automatically take advantage of it. So I think this should be committed. |
The only explicit documentation I found on SystemId is from the java specification (it is my understanding that python sax implementation is adapted from Java one): The documentation says that "The system identifier is optional if there is a byte stream or a character stream". So, I agree that Serhiy's patch looks correct. Note that I'm not sure that my testcase with a subprocess is covered by Serhiy's tests, as these tests call parser() with a file object. |
New changeset 27ae1a476ef7 by Serhiy Storchaka in branch '3.4': New changeset ce9881eecfb4 by Serhiy Storchaka in branch 'default': |
Jocelyn, in both cases the argument of parse() is a file object with integer name. Tests use one of simplest way to create such object. |
Ok, I understand your tests now. Thanks for the fixes ! |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: