Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve XML parsing with defusedxml #4367

Open
wants to merge 2 commits into
base: beta
Choose a base branch
from
Open

Conversation

JacquelineMorrissette
Copy link
Contributor

Checklist

  1. If you've added code that should be tested, add tests
  2. If you've changed APIs, update (or create!) the documentation
  3. Ensure the tests pass
  4. Make sure that your code lints and that you've followed our coding style
  5. Write a description of your work suitable for publishing on our forum
  6. Mention any related issues in this repository (as #ISSUE) and in other repositories (as kobotoolbox/other#ISSUE)
  7. Open an issue in the docs if there are UI/UX changes

Description

Switch to defusedxml for xml parsing

Other notes

The defusedxml library doesn't replace everything in the native XML library so it's only been changes where necessary

Related issues

kobotoolbox/kobocat#869

Comment on lines 21 to 22
from defusedxml.ElementTree import fromstring, tostring
from defusedxml.lxml import tostring
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @JacquelineMorrissette, could you double-check this PR with a few things in mind?

  1. fromstring and tostring are awfully generic-sounding things. I think the usual convention of … import ElementTree as ET, an then using stuff like ET.fromstring is good to avoid ambiguity.
    1. Watch out for shadowing one import with another, as in this case where the second tostring import completely replaces the first.
  2. Why are we still using xml.etree.ElementTree and lxml versus defusedxml.ElementTree and defusedxml.lxml? Are there certain things we need that the defusedxml alternatives don't provide?

Edit: whoops, I just saw this in defusedxml's readme:

defusedxml.lxml
DEPRECATED The module is deprecated and will be removed in a future release.

Given that, please don't use defusedxml.lxml anywhere. Try to see how our uses of lxml could be replaced with ElementTree, and don't hesitate to ask questions :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Will do, I'll give it another pass
  2. The reason I kept some uses of xml.etree.ElementTree (and others) is because the defusedxml library isn't a complete replacement for most features of these and only replaces specific methods such as tostring and fromstring. For example: defusexml doesn't have an equivalent to from xml.etree.ElementTree import ElementTree
    The documentation here lists what it replaces, in case it helps: https://pypi.org/project/defusedxml/#defusedxml-elementtree

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants