Skip to content

Sweep str(e) and reflected user input out of HTTP error responses #5916

@rtibbles

Description

@rtibbles

This issue is not open for contribution. Visit Contributing guidelines to learn about the contributing process and how to find suitable issues.

Overview

Sweep contentcuration/'s view and viewset code for two related antipatterns that fall out of the same family: passing str(e) from a caught exception into an HTTP response body, and formatting user-controlled input into an HttpResponse* body whose default Content-Type is text/html. Both are flagged by CodeQL: py/stack-trace-exposure (28 alerts) and py/reflective-xss (9 alerts). The bulk live in contentcuration/contentcuration/views/internal.py, with smaller clusters in views/base.py, views/users.py, views/nodes.py, viewsets/base.py, viewsets/contentnode.py, and utils/pagination.py.

Complexity: Medium
Target branch: hotfixes

Context

Three response shapes dominate the affected sites. All flow into an HTTP response body.

Pattern 1 — str(e) in an HttpResponse* body. The exception is caught and stringified straight into a response whose default Content-Type is text/html. Whatever the exception carries (filesystem paths, internal IDs, OS errors, internal SQL fragments) goes back to the caller, and any user-controlled substring is reflected unescaped.

# contentcuration/views/internal.py:141 (and 13 more sites in the same file)
except Exception as e:
    handle_server_error(e, request)
    return HttpResponseServerError(content=str(e), reason=str(e))
# contentcuration/views/internal.py:183-188
except Exception as e:
    return HttpResponseForbidden(str(e))
...
except SuspiciousOperation as e:
    return HttpResponseBadRequest(str(e))

Pattern 2 — User input formatted into an HttpResponse* body. The exception isn't the source — a request field is. The HttpResponse* family defaults to text/html, so anything formatted into the body is a reflective-XSS sink. CodeQL flags these as py/reflective-xss.

# contentcuration/views/internal.py:297
except (Channel.DoesNotExist, PermissionDenied):
    return HttpResponseNotFound("No channel matching: {}".format(channel_id))
# contentcuration/views/users.py:111
except KeyError:
    return HttpResponseBadRequest(
        "Missing attribute from data: {}".format(request.data)
    )

Pattern 3 — str(e) inside a structured errors field on bulk-change responses. CodeQL doesn't flag these because the sink is a JSON field on a list element, not a top-level response body, but the leak shape is the same: backend exception text reaches the API consumer.

# contentcuration/viewsets/base.py:712 (and 757, 799, 839, 878; viewsets/contentnode.py:718, 1083, 1094, 1199)
except Exception as e:
    change["errors"] = [str(e)]

Two outlier HttpResponse(...) sites in views/base.py complete the picture: health() returns HttpResponse(c.name) and the set_language POST handler returns HttpResponse(next_url). The values come from trusted-ish sources (channel name in DB, post-validation next_url), but the same default-text/html problem applies — Content-Type: text/plain is the safe shape for these one-line bodies.

The Change

Apply a consistent fix per pattern across all matching sites. The shared principle: never let exception text or unescaped user input cross the HTTP boundary, and use text/plain (or JSON) for any short error-string body that the caller is meant to read.

Pattern 1 — str(e) in HttpResponse*

Replace str(e) with a static message; let handle_server_error (or a new explicit logger.exception(...)) capture the detail server-side. For HttpResponseServerError specifically, drop the body entirely — clients don't act on the text — or return a static "Internal server error".

except Exception:
    handle_server_error(e, request)
    return HttpResponseServerError("Internal server error", content_type="text/plain")

For 4xx variants that genuinely need to differentiate cases (e.g. SuspiciousOperation in api_file_upload), use a static category message — HttpResponseBadRequest("Invalid file", content_type="text/plain") — and log the exception server-side.

Pattern 2 — User input in HttpResponse*

Replace the formatted body with a static message and content_type="text/plain" (or a small JSON payload). The 404 path doesn't need to echo the missing ID back — the client already knows what it asked for.

except (Channel.DoesNotExist, PermissionDenied):
    return HttpResponseNotFound("Channel not found", content_type="text/plain")
except KeyError:
    return HttpResponseBadRequest("Missing attributes in request data", content_type="text/plain")

Pattern 3 — str(e) in errors lists

Replace with a static per-pattern message (e.g. "Internal server error", "Validation failed"), and logger.exception(...) inside the except block to keep the diagnostic detail in logs.

except Exception:
    logger.exception("apply_changes for %s failed", change.get("key"))
    change["errors"] = ["Internal server error"]

raise ValidationError(e) in viewsets/contentnode.py:480

Wrap explicitly with a static message and chain via from e so the original exception is preserved in logs but not in the response.

except DjangoValidationError as e:
    raise ValidationError("Invalid completion criteria") from e

utils/pagination.py:90

Drop str(exc) from the message — pagination error messages don't need the underlying exception text.

except InvalidPage:
    raise NotFound("Invalid page: {}".format(page_number))

Out of Scope

  • Adding new error codes / a frontend-mirrored error_constants module. Studio doesn't currently have one, and introducing it is broader than this sweep — for now, static plain-text messages are sufficient.
  • Restructuring handle_server_error itself, beyond what's needed at each call site.
  • The frontend's handling of API error response bodies. Going from "exception text" to "static message" is a strict reduction in surface; no frontend currently parses these bodies in a way the static message would break (verify per site).
  • Management-command CommandError raises that include exception text (CLI surface, no API client).
  • Logging idioms (logger.error("...{}".format(str(e)))) — none flagged in studio; out of scope unless encountered while touching a sweep site.

Acceptance Criteria

Pattern 1 — str(e) in HttpResponse* body

  • contentcuration/views/internal.py:141
  • contentcuration/views/internal.py:166
  • contentcuration/views/internal.py:183 (HttpResponseForbidden)
  • contentcuration/views/internal.py:188 (HttpResponseBadRequest, SuspiciousOperation)
  • contentcuration/views/internal.py:199
  • contentcuration/views/internal.py:227 ("Required attribute missing from data | {}" format)
  • contentcuration/views/internal.py:231
  • contentcuration/views/internal.py:300 ("Required attribute missing from data | {}" format)
  • contentcuration/views/internal.py:304
  • contentcuration/views/internal.py:354 (HttpResponseBadRequest(content=str(e)))
  • contentcuration/views/internal.py:357 ("Required attribute missing from data | {}" format)
  • contentcuration/views/internal.py:360 (HttpResponseBadRequest(str(e)), NodeValidationError)
  • contentcuration/views/internal.py:363
  • contentcuration/views/internal.py:401
  • contentcuration/views/internal.py:460
  • contentcuration/views/internal.py:505
  • contentcuration/views/internal.py:539
  • contentcuration/viewsets/base.py:727 (Response({"error": str(e)}, status=409) — JSON, but same antipattern; static message + logger.exception)

Pattern 2 — User input in HttpResponse* body

  • contentcuration/views/internal.py:297 ("No channel matching: {}".format(channel_id))
  • contentcuration/views/internal.py:352 ("No content matching: {}".format(parent_id))
  • contentcuration/views/internal.py:398 ("No channel matching: {}".format(data))
  • contentcuration/views/internal.py:421 ("Channel not found {}".format(channel_id))
  • contentcuration/views/internal.py:424 (raise HttpResponseBadRequest("Missing attribute from data: {}".format(data)) — note this is raise of an HttpResponse; fix the body and convert to return while there)
  • contentcuration/views/internal.py:455 ("No tree name matching: {}".format(tree_name))
  • contentcuration/views/internal.py:501 ("No channel matching: {}".format(channel_id))
  • contentcuration/views/internal.py:533 ("No complete set of channels matching: {}".format(",".join(channel_ids)))
  • contentcuration/views/nodes.py:52 ("No topic found for {}".format(node_id))
  • contentcuration/views/users.py:111 ("Missing attribute from data: {}".format(request.data))
  • contentcuration/views/base.py:112 (HttpResponse(c.name) — at minimum set content_type="text/plain")
  • contentcuration/views/base.py:407 (HttpResponse(next_url) — same; pre-validated, but text/plain is the safe shape)

Pattern 3 — str(e) in errors lists (not CodeQL-flagged; included for consistency)

  • contentcuration/viewsets/base.py:712
  • contentcuration/viewsets/base.py:757
  • contentcuration/viewsets/base.py:799
  • contentcuration/viewsets/base.py:839
  • contentcuration/viewsets/base.py:878
  • contentcuration/viewsets/contentnode.py:718
  • contentcuration/viewsets/contentnode.py:1071 (return str(error))
  • contentcuration/viewsets/contentnode.py:1083 (return str(e))
  • contentcuration/viewsets/contentnode.py:1094
  • contentcuration/viewsets/contentnode.py:1199

Other

  • contentcuration/viewsets/contentnode.py:480raise ValidationError(e) replaced with raise ValidationError("Invalid completion criteria") from e.
  • contentcuration/utils/pagination.py:90NotFound("Invalid page: {}".format(page_number)) (drop str(exc) from the message).

General

  • No remaining call to HttpResponse*(str(e), ...) / HttpResponse*(content=str(e), ...) in contentcuration/.
  • No HttpResponse*(...) body interpolates request fields without content_type="text/plain" (or equivalent escaping).
  • Wherever the exception text was previously surfaced, a logger.exception(...) or existing handle_server_error(...) call is in place to retain the diagnostic in logs.

Testing

  • Existing tests for views/internal.py (notably the ricecooker-facing endpoints) still pass.
  • At least one test per pattern asserts the response body is the static message and does not contain the exception class name, traceback fragments, or echoed request input.
  • CodeQL re-scan after the change shows zero py/stack-trace-exposure and zero py/reflective-xss alerts in contentcuration/.

References

AI usage

Drafted with Claude during a CodeQL triage session. Claude collected the affected sites from the CodeQL alert list, grouped them by antipattern, and proposed a per-pattern fix shape. The per-file checklist is direct alert/grep output and should be re-checked line-by-line before applying the changes.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions