-
-
Notifications
You must be signed in to change notification settings - Fork 30.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
xml.etree.ElementInclude does not include nested xincludes #65127
Comments
After xml.etree.ElementInclude.include inserts an Xinclude'd href it does not walk the just-inserted subtree to see if it contains any Xincludes itself. I think the behaviour should be modified to walk the included subtree and perform any Xincludes contained. |
Agreed that this should be done. The XInclude spec suggests that the processing is transitive. Also, lxml does it that way, in both the libxml2 based implementation and the ElementInclude compatibility module. https://github.com/lxml/lxml/blob/master/src/lxml/ElementInclude.py (note that this was adapted for lxml, so it's not a drop-in replacement for ET anymore) Changing target version to Py3.5 as this is essentially a new (and somewhat backwards incompatible) feature. |
xml.etree.ElementInclude.include now checks the inserted subtree to see if contains any Xincludes itself. And adds them. Also added a unit test to check for the new functionality (which will fail without the change to ElementInclude.py). |
Thanks Caelyn. This patch also needs a doc patch and a whatsnew entry in order to be complete. It's not obvious to me where the relevant documentation is, though, so perhaps we instead have missing documentation that should be addressed in a separate issue. The whatsnew entry would still be needed, though. However, I think there is something important missing here in the patch itself. It is specified (section 4.2.7 Inclusion Loops) that it is a fatal error for an already included document to be specified in a recursively processed xinclude directive, and indeed failing to detect such a loop has DOS implications. So, the patch needs to address that before it can be committed. |
FWIW, we often do whatsnew entries later in the development cycle. I think this can go forward without whatsnew. The doc entry could be as simple as a ..versionchanged 2.5 include() now supports recursive Xincludes or somesuch. |
Since there is no guarantee anyone is going to tackle the job of reviewing all the changes for whatsnew entries, I prefer that we get in the habit of creating the entries as we go along. As for the versionchanged...if you can find where include is documented, that would work :) The recursion still needs to be addressed (it is addressed in the lxml version Stefan linked to). |
This patch incorporates nested includes and a testcase that shows how it handles infinite include by throwing an exception. |
Your code adds a lot of new code. Why is that necessary? Can't the new feature be integrated into the existing code? |
Yeah, include should now be a very simple wrapper around a call to _include, include shouldn't have the original include code (which is moved to _include) in it. |
That would make sense. Please see if the updated patch file works. |
I see it's been a few years, but I would still like to see xml.etree.ElementInclude support nested xincludes, as lxml does. Additionally, I'd like to have the default_loader attempt to open the href with respect to the original xml file's dirname, as again with lxml, as opposed to just assuming its relative to the working directory for the session. Let me know if there are any changes you'd like to see in the latest patch here, and I could incorporate those into a new PR? |
The latest "fixed2" patch looks good to me, but the author didn't sign a contributors agreement. However, I did, and I already wrote the same thing for lxml, so I put together an initial PR. |
Hello @scoder, Thanks for looking into this. In addition to what I mentioned with lxml's use of the parent xml file's respect dirname when attempting to open the include element's href, do you think it would be possible to (optionally or by default) have the include processor provide the necessary xml:base attributes for compliance with the specification [1][2][3]. From looking at the "fixed2", although the added lines 137-143 in ElementInclude.py appear to prevent infinite recursive includes, but I think it also prevents valid merging of include trajectories when expanding the tree of include paths. E.g. what happens if I state the inclusion of the same source multiple times elsewhere in the same file, or elsewhere in the include tree that is no way a sub-child of the current parent file. [1] https://stackoverflow.com/a/22791471/2577586 |
Agreed with ruffsl's concerns about the overly aggressive detection of infinite recursion. I also wonder if the hrefs should be normalized or canonized for the check? The check may miss infinite recursions if the hrefs happen to be written in non-matching but equivalent forms. Ex: relative versus absolute paths. |
I thought about that, too, but it's not a real problem. There are only a few different ways to spell the same file path, and once they are through, the recursion would still be detected and never become infinite. Admittedly, the current implementation might lower the overhead for attacks a little, but then, if an attacker can control the input anyway, then there is not really much to win by including the same file multiple times rather than including different files. Maybe we should add a "max_depth" parameter to limit the maximum recursion depth, defaulting to e.g. 5, that users would have to pass in order to say "I know what I'm doing". I agree with the comment about the overly restrictive global set, though. Included file paths should be collected only along an inclusion path and not across independent subtrees. |
Yes, well put.
Could that be set to false by the user, just in case we don't know beforehand how deep the rabbit hole goes, but we're feeling overly committed to see it through? Not to detract from the ticket, but I'd just like to share to a question related to this topic about the expected behavior of Xinclude [1]. You could also see it as a use case example for the recursive import feature we are currently deciding, of which would help avoid one more non system library to workaround [2]. [1] https://stackoverflow.com/q/48857647/2577586 |
I've added a 'max_depth' argument to limit the maximum recursive include depth to 6 by default (which is a small, arbitrarily chosen value). Users can set it to None to disable the limit. From my side, I don't see any more features that I would add. |
There seems to be no documentation currently for ElementInclude. Would be nice if someone who's interested in this feature could take the time to write something up. I created a documentation ticket as issue bpo-33187. |
PR is complete now, ready for merging. |
I think setting "xml:base" from ElementInclude is worth another ticket. |
Note: these values reflect the state of the issue at the time it was migrated and might not reflect the current state.
Show more details
GitHub fields:
bugs.python.org fields:
The text was updated successfully, but these errors were encountered: