Skip to content

itell-solutions/django_html_xml_validator

Repository files navigation

PyPI Python Versions Build Status Black License

Django HTML and XML Validator

Django_html_xml_validator is a Django middleware to validate HTML and XML responses generated by your application. This includes but is not limited to Django views using render() and Django HTML templates.

Features:

  • Specific error locations in case validation finds issues.
  • Runs locally without the need to upload your page to an external validation service.
  • Uses only Python packages without the need to install external tools from other ecosystems.
  • Fast because based on lxml and its native components.

This makes it feasible to perform validation while running your test suite.

Installation

To install, depending on your package manager, run:

pip install --update django_html_xml_validator

or

poetry add django_html_xml_validator

Usage

To add validation to your project, add it to settings.MIDDLEWARE.

MIDDLEWARE = [
    ...,
    "django_html_xml_validator.middleware.HtmlXmlValidatorMiddleware",
]

In most cases you only want it to validate the HTML generated by your views directly, so it would be the last entry. Especially if you have other middleware installed that modifies your HTML like adding the Django Debug toolbar or minifying it.

For example:

MIDDLEWARE = [
    # Possible middleware your project requires.
    ...,
     # Example middleware that modifies the HTML.
    "django_minify_html.middleware.MinifyHtmlMiddleware",
    "debug_toolbar.middleware.DebugToolbarMiddleware",
    ...,
    # Put validation middleware toward the end to ensure only your HTML/XML is validated.
    "django_html_xml_validator.middleware.HtmlXmlValidatorMiddleware",
]

After that, responses with a matching content type are validated:

  • HTML:
    • application/xhtml+xml
    • text/html
  • XML:
    • application/xml
    • text/xml

In case the response is valid, the middleware returns the original response and HTTP status code verbatim.

In case errors have been found, the response includes an HTML page detailing the errors with an HTTP status code of 500 (internal server error).

Configuration

By default, validation is active when the Django DEBUG mode is enabled in settings.py. In a reasonably configured project this means during local development and while running the test suite, but not once deployed to a server.

For more granular control, add the following to settings.py:

VALIDATE_HTML = True
VALIDATE_XML = True

If you are sure all your HTML pages are actually XHTML (which sadly will not be the case as soon as your code contains forms based on standard Django forms), you can enforce HTML to be validated as XHTML:

VALIDATE_HTML_AS_XHTML = True  # WARNING: Will fail with standard form templates

Disabling validation for specific tests

In case validation is not useful for selected tests (for example when processing deliberately huge documents), it can be disabled with the override_settings annotation. For example:

from django.test import override_settings

@override_settings(VALIDATE_XML=False)
def test_can_build_huge_xml():
    ...

Disabling validation for specific views

Sometimes a Django extension or external data to be rendered as part of one of your view produce invalid HTML. The proper solution of course is to fix the extension (for example by submitting a pull request with your fix) or cleaning up the external HTML.

This might however have a negative impact on your project's schedule and for the time being a quick workaround is needed. You could of course disable validation for the entire project, but then in turn would lose confidence in the general quality of your project's HTML.

To disable validation for a specific view only, use the @no_html_xml_validation decorator, for example:

from django.http import HttpRequest, HttpResponse
from django_html_xml_validator.decorators import no_html_xml_validation

@no_html_xml_validation
def broken_html_view(_request: HttpRequest) -> HttpResponse:
    return HttpResponse("<head></body>")

Internally this adds an HTTP header, which you can for example use to collect quality assurance statistics about how many HTTP responses contain such a header to evaluate the technical debt of your project.

X-Django-HTML-XML-Validation: 0

Limitations

  • Validation does not apply to stream responses.
  • Validation of HTML5 uses a hack to ignore errors about invalid tags on sectioning elements like <nav> or <article>.
  • Validation of XML only checks if the document is well-formed but does not validate against a schema or DTD. Technically lxml could do all this but would require more setup. If you need such a feature, feel free to submit a pull request.

License

Copyright (c) 2022 ITELL.SOLUTIONS GmbH, Graz, Austria.

Distributed under the MIT license. For details refer to the file LICENSE.

The source code is available from https://github.com/itell-solutions/django_html_xml_validator.