Skip to content

Update the XML security documentation #139313

@hedsnz

Description

@hedsnz

Documentation

The XML security documentation could do with some clarifications.

Some classes of vulnerability are mentioned once with no further explanation

The documentation states:

An attacker can abuse XML features to carry out denial of service attacks, access local files, generate network connections to other machines, or circumvent firewalls.

It then goes on to list the various types of DoS (e.g., billion laughs), and how they are mitigated with Expat >= 2.6.0. However, the three other classes of vulnerability that are mentioned - "access local files, generate network connections to other machines, or circumvent firewalls" - are not discussed further. It's not clear whether up-to-date versions of Python, which include libexpat >= 2.6.0, are vulnerable to those attacks. There's no discussion on how you might verify whether your code is vulnerable to those attacks, how to prevent them, etc.

Of course, Python isn't responsible for ensuring that anyone writing Python is doing so safely, but I think it's unusual to mention a series of (quite serious) vulnerabilities without any further indication of whether they're legitimate concerns, and potentially what to do about them.

Clarify the role of libexpat

It is mentioned that:

Expat versions lower that 2.6.0 may be vulnerable to “billion laughs”, “quadratic blowup” and “large tokens”. Python may be vulnerable if it uses such older versions of Expat as a system-provided library. Check pyexpat.EXPAT_VERSION.

It would be useful to add (from my understanding, which may be wrong):

  • libexpat is the default XML parser. If you do not do anything unusual, then you will be using libexpat for parsing XML.
  • libexpat is bundled with Python.
  • Therefore (I assume), specific Python versions ensure that libexpat >= 2.6.0 is used. If that's true, it may be helpful to to add that specific versions of Python are not vulnerable to those attacks, if they're downloaded from python.org directly, i.e., if they're not sourced from some third-party who has compiled Python with a different version.

Clarify system-provided vs bundled libexpat

Related to the previous point.

A system-provided library is mentioned, but it's not discussed in what circumstances that would be used instead of the Python-bundled libexpat. Presumably the default is to use Python-bundled libexpat, and I haven't been able to find a way to use a system version at runtime; is that only possible if you're compiling Python from source? If that's the case, it would be useful to include a note about that.

I'm happy to create a PR that includes some of these clarifications, but would appreciate getting some feedback on my assumptions first, since I obviously don't know much about this.

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    docsDocumentation in the Doc dirtopic-XML

    Projects

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions