-
-
Notifications
You must be signed in to change notification settings - Fork 30.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose the C implementation of ElementTree by default when importing ElementTree #58196
Comments
Following the discussion on python-dev [1], this issue will track the re-organization of Lib/xml/etree to expose the C implementation (_elementtree) by default when importing ElementTree. The test suite will also have to be updated - it's required that it exercises both the C and the Python implementations. I would like to make the import "magic" simple. Thus, the idea I currently plan to pursue is:
The test suite will be modified accordingly. I'll be working on creating a patch for this. Any help, ideas, comments and discussions are more than welcome. [1] http://mail.python.org/pipermail/python-dev/2012-February/116278.html |
Oh, and not to forget: the documentation has to be updated to just not mention cElementTree any longer. For the user, the fact that a fast C library is invoked underneath should be invisible. |
Eli Bendersky, 10.02.2012 15:52:
IIRC, there is a well specified way how accelerator modules should be used
Careful with backwards compatibility here. It's the accelerator module I don't see a compelling enough reason to break imports in existing code by Stefan |
If there's a convention, I'll happily follow it. It would be great if someone could dig up the relevant details (I'll try to look for them myself).
Agreed. Perhaps it should just be deprecated? |
Hmm, that may be PEP-399: If an acceleration module is provided it is to be named the same as the module it is accelerating with an underscore attached as a prefix, e.g., _warnings for warnings. The common pattern to access the accelerated code from the pure Python implementation is to import it with an import *, e.g., from _warnings import *. This is typically done at the end of the module to allow it to overwrite specific Python objects with their accelerated equivalents. However, it's hardly a rule, just describing a "common pattern". I wonder why this approach is preferable to the one I proposed (explicit facade module)? |
Eli Bendersky, 10.02.2012 16:43:
>>> I don't see a compelling enough reason to break imports in existing code by
>>> removing the cElementTree module, so we should not do that.
>
> Agreed. Perhaps it should just be deprecated? Given that its mere existence is currently almost undocumented (except for I wouldn't want to see more than that done, though, specifically not a Stefan |
A note in the doc is easy to miss IMHO, and since DeprecationWarnings are silenced by default, I don't think they will affect the final users. A different "problem" is that developers will have to check for the Python version if they want to use ElementTree on Python >=3.3 and keep using cElementTree on <3.3 (unless another way is provided). If possible I would avoid pyElementTree, and keep ElementTree that imports from _elementtree and the deprecated cElementTree (until it can be removed). |
I suppose it's possible, but I'm genuinely interested in a technical reason for doing so. The approach suggested in PEP-399 is useful for module in which part of the functionality is implemented in C, and then augmented in Python. ElementTree is different - it's pretty much two separate implementations of the same API. So, I think there's little question in terms of simplicity and clarity. Having pyElementTree and cElementTree (keeping it for backwards compat), and a facade named ElementTree that chooses between them is simple, clean and intuitive. From a performance point of view, consider the (by far) common case - where _elementtree *is* successfully imported. Option 1: from _elementtree import *, at the end of the Python implementation in ElementTree.py - so for each invocation, the whole import of the Python code has to be done, just to reach the overriding import * at the end. Option 2: ElementTree is a facade that attempts to import _elementtree first. So the Python implementation in pyElementTree doesn't even have to be parsed and imported |
Me too:
Not fully separated... there's some python code hidden in the C module.
This point is wrong... the _elementtree.c accelerator imports Python ElementTree already. As you can see on lines 2938 to 2945, the change could lead to an import cycle: Trying to sort this out, it already gives me a headache. |
The Python code inside _elementtree could be moved to Python code, So this is a point *in favor* of pyElementTree being pure-Python :-) In other words: In xml/etree there is:
Would that work? |
Oops, in last message: s/there will be circular dependencies/there will not be circular dependencies/ |
What I had in mind is more like:
|
The problem with this is the bootstrap Python code executed by |
This might become unnecessary if ElementTree becomes the main module and _elementtree only contains a few faster functions/classes that are supposed to replace the ones written in Python.
We are assuming that _elementtree might be missing, but what are the cases where this might actually happen? Other implementations like PyPy? Exotic platforms that can't compile _elementtree?
Wouldn't that as simple as having in ElementTree.py: I'm not familiar with ElementTree (I just looked at the bootstrap bit quickly), so what I'm saying might not be applicable here, but I've seen other modules doing something similar to what I'm proposing (json, heapq, maybe even warning and others). |
The first step is to strip out the cElementTree bootstrap code from the C module. Then I think the right approach is to fold completely cElementTree behind ElementTree. |
Ezio,
I guess both. To make the stdlib work on PyPy without changes, it has to be able to load the pure Python modules in a fallback. As for platforms that can't compile _elementtree, keep in mind that there's also expat which _elementtree uses, so it's a lot of code to compile. Python works on some embedded systems, I'm not sure all of them can compile this code. |
Florent, thanks for the patch - at this point code is more useful than talk :-) Anyhow, I tried to apply it and a few tests in test_xml_etree_c fail, because it can't find fromstring and fromstringlist. This gets fixed when I import fromstringlist in cElementTree.py from ElementTree, and in the same file assign: fromstring = XML Which is similar to what ElementTree itself does. In general, I agree that a good first step would be to refactor the code to extract the boostrapping from _elementtree.c to cElementTree.py. As long as the tests pass, this can be committed regardless of this issue's original intent. However, why did you leave some bootstrapping code inside? Can't all of it go away? |
I strongly feel that existing code importing ElementTree or cElementTree should not be broken. Let’s add transparent import from _elementtree to ElementTree without breaking existing uses of cET. I think that 3.2 and 2.7 should get a doc note about cET, do we have a bug for this? |
AFAICS there's currently no disagreement on this point. The import
What doc note? Something in the spirit of: "Note that in 3.3, the I don't think there's an open bug for this. |
The more I think about it, the more the bootstrap code in _elementtree.c annoys me. It's the only instance of calling PyRun_String in Modules/ ! It's hackish and causes ugly import problems. If the C code needs stdlib functionality like copy.deepcopy, it should use PyImport_ImportModule like everyone else and not through a PyRun_String hack. Since we've already decided to do some refactoring, I suggest all trace of the bootstrap is removed from _elementtree.c |
Eli Bendersky, 11.02.2012 09:08:
I find it perfectly legitimate to run Python code from a C module. That being said, I think it's worth removing any clear *redundancy* with Stefan |
There can be arguments both way, but if we follow the lead of existing standard extension modules, the tendency is clearly not to use PyRun_String. Many C extensions use functionality from Python, but none does it the "bootstrap way". Why is that? Is there a good reason, or is it just convention? |
Ooops, I cut some redundancy after running the tests, and I forgot to re-add the import. You're right.
I just tried to cut the import cycle and import it the other way. |
Updated patch:
|
New changeset 31dfb4be934d by Florent Xicluna in branch 'default': |
I've pushed this first part, which is just a code refactoring. I tried to work out a patch for the second part. We have small differences between C and Python, about the warnings beeing raised. In general the C implementation do not raise the deprecation warnings. IMHO, this could be fixed later. Still missing is the patch for the documentation. |
Updated patch:
|
Florent, Your updated patch looks good. I think that the explicit import of _namespace_map into cElementTree is just to satisfy some weird magic in the tests and can probably be removed as well (along with the weird magic :-), but that's not really important and can be left for later cleanups. Regarding the documentation, alright let's not mention the implementation detail, and your "versionchanged" addition makes sense. I don't think adding directly to whatsnew/3.3.rst is necessary, updating Misc/NEWS is enough. I'll apply the documentation patch after you apply the code patch. Or if you want, you can apply it yourself, I don't mind. Thanks for the cooperation! |
By the way, I see that if the explicit import of _namespace_map is commented out, the test_xml_etree_c test fails because it's not in the __all__ list. So the test can just import it directly with: from xml.etree.ElementTree import _namespace_map And the import in cElementTree won't be necessary. After all, _namespace_map is definitely not a public API! This will keep cElementTree an nice-and-clean: from xml.etree.ElementTree import * |
Because of the interaction of the support.import_fresh_module with the CleanContext context manager, it's not so easy to remove black magic. if hasattr(ET, '_namespace_map'):
_namespace_map = ET._namespace_map
else:
from xml.etree.ElementTree import _namespace_map This is why I kept the import in the deprecated "cElementTree" at first. ( If you have doubts, try ./python -m test test_xml_etree{,_c} or variants. ) I will probably commit code and documentation at once. It makes things easier regarding traceability. |
Alright, it's not really important at this point and can be cleaned up
Sounds good |
FWIW the JSON doc doesn't even mention the acceleration module _json, but since people here are used to import cElementTree I think it should be mentioned that it's now deprecated and accelerations are used automatically, so something like this would work: .. versionchanged:: 3.3 I also agree with Éric that there's no need to mention _elementtree (people might try to import that instead, and other implementations might use a different name). Lib/test/test_xml_etree_c.py could also be removed, and the other tests could import cElementTree too (even though I'm not sure that works too well with doctests). Shouldn't cElementTree raise an error when _elementtree is missing? |
New changeset 65fc79fb4eb2 by Florent Xicluna in branch 'default': |
New changeset e9cf34d56ff1 by Florent Xicluna in branch 'default': |
Now the merge is done. Thank you Eli for the effort, and to the other contributors for the review. Following topics may need further work:
These topics are not high priority. A specific issue should be opened if any of them require some attention. |
I would add to the TODO - improve the documentation of the module. Opened bpo-14006 for this. |
I started going over the deprecated methods in ElementTree and ran into a more serious problem. XmlParser.doctype() is listed as deprecated, and indeed ElementTree (the Python version) issues a deprecation warning. However, the C implementation doesn't have doctype() at all so I get AttributeError. |
DeprecationWarnings aren't that annoying anymore now that they're silent by default. It should at least have a PendingDeprecationWarning |
Opened bpo-14007 to track the doctype() problem |
Emitting a deprecation warning on importing cElementTree has been rejected in the pydev list. The other remaining tasks have new issues on them, so this issue is done now. |
I'm still not sure that's the best option. Without deprecation people will keep using cElementTree and we will have to keep it around forever (or at least until Python4 and then have a 3to4 to fix the import). P.S. I'm fine with keeping it around for several more versions, but if we eventually have to remove it, we would still have to warn the users beforehand. The more we wait, the more users will be still using cElementTree by the time we will actually remove it. |
I don’t see benefits in removing cET. |
Hi, the C implementation of ElementTree do not support namespaces for find/all/... . To me this is a serious regression, as I rely on ElementTree namespace support, and 3.3 would break it with this change. Please reconsider this therefore. Code to reproduced attached - works fine with python 3.2. As the C implementation of ElementTree and Element lack the namespace keyword for findall (and *all* the other methods), Until the C implementation can do namespaces as well. |
The file was bad, sorry. |
Markus (cmn): Please file a separate issue, which will be a release blocker for 3.3 release. (It's not the only regression.) |
Temporary very ugly workaround (before importing xml.etree.ElementTree) is: import sys
sys.modules["_elementtree"] = None |
It seems to me that namespaces are actually supported, but they are accepted only as positional args and not keyword args, so this should be easy to fix. |
As advised I opened a new bug on this: |
New bug - C implementation of ElementTree: Inheriting from Element breaks text member |
Note: last traces of Python bootstrap code were removed from _elementtree in changeset 652d148bdc1d |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: