Sweep str(e) and reflected user input out of HTTP error responses

<img height="20px" src="https://i.imgur.com/c7hUeb5.jpeg">

❌ **This issue is not open for contribution. Visit <a href="https://learningequality.org/contributing-to-our-open-code-base/" target="_blank">Contributing guidelines</a>** to learn about the contributing process and how to find suitable issues.

<img height="20px" src="https://i.imgur.com/c7hUeb5.jpeg">



## Overview

Sweep `contentcuration/`'s view and viewset code for two related antipatterns that fall out of the same family: passing `str(e)` from a caught exception into an HTTP response body, and formatting user-controlled input into an `HttpResponse*` body whose default `Content-Type` is `text/html`. Both are flagged by CodeQL: `py/stack-trace-exposure` (28 alerts) and `py/reflective-xss` (9 alerts). The bulk live in `contentcuration/contentcuration/views/internal.py`, with smaller clusters in `views/base.py`, `views/users.py`, `views/nodes.py`, `viewsets/base.py`, `viewsets/contentnode.py`, and `utils/pagination.py`.

**Complexity:** Medium
**Target branch:** hotfixes

### Context

Three response shapes dominate the affected sites. All flow into an HTTP response body.

**Pattern 1 — `str(e)` in an `HttpResponse*` body.** The exception is caught and stringified straight into a response whose default `Content-Type` is `text/html`. Whatever the exception carries (filesystem paths, internal IDs, OS errors, internal SQL fragments) goes back to the caller, and any user-controlled substring is reflected unescaped.

```python
# contentcuration/views/internal.py:141 (and 13 more sites in the same file)
except Exception as e:
    handle_server_error(e, request)
    return HttpResponseServerError(content=str(e), reason=str(e))
```

```python
# contentcuration/views/internal.py:183-188
except Exception as e:
    return HttpResponseForbidden(str(e))
...
except SuspiciousOperation as e:
    return HttpResponseBadRequest(str(e))
```

**Pattern 2 — User input formatted into an `HttpResponse*` body.** The exception isn't the source — a request field is. The `HttpResponse*` family defaults to `text/html`, so anything formatted into the body is a reflective-XSS sink. CodeQL flags these as `py/reflective-xss`.

```python
# contentcuration/views/internal.py:297
except (Channel.DoesNotExist, PermissionDenied):
    return HttpResponseNotFound("No channel matching: {}".format(channel_id))
```

```python
# contentcuration/views/users.py:111
except KeyError:
    return HttpResponseBadRequest(
        "Missing attribute from data: {}".format(request.data)
    )
```

**Pattern 3 — `str(e)` inside a structured `errors` field on bulk-change responses.** CodeQL doesn't flag these because the sink is a JSON field on a list element, not a top-level response body, but the leak shape is the same: backend exception text reaches the API consumer.

```python
# contentcuration/viewsets/base.py:712 (and 757, 799, 839, 878; viewsets/contentnode.py:718, 1083, 1094, 1199)
except Exception as e:
    change["errors"] = [str(e)]
```

Two outlier `HttpResponse(...)` sites in `views/base.py` complete the picture: `health()` returns `HttpResponse(c.name)` and the `set_language` POST handler returns `HttpResponse(next_url)`. The values come from trusted-ish sources (channel name in DB, post-validation `next_url`), but the same default-`text/html` problem applies — `Content-Type: text/plain` is the safe shape for these one-line bodies.

### The Change

Apply a consistent fix per pattern across all matching sites. The shared principle: never let exception text or unescaped user input cross the HTTP boundary, and use `text/plain` (or JSON) for any short error-string body that the caller is meant to read.

**Pattern 1 — `str(e)` in `HttpResponse*`**

Replace `str(e)` with a static message; let `handle_server_error` (or a new explicit `logger.exception(...)`) capture the detail server-side. For `HttpResponseServerError` specifically, drop the body entirely — clients don't act on the text — or return a static `"Internal server error"`.

```python
except Exception:
    handle_server_error(e, request)
    return HttpResponseServerError("Internal server error", content_type="text/plain")
```

For 4xx variants that genuinely need to differentiate cases (e.g. `SuspiciousOperation` in `api_file_upload`), use a static category message — `HttpResponseBadRequest("Invalid file", content_type="text/plain")` — and log the exception server-side.

**Pattern 2 — User input in `HttpResponse*`**

Replace the formatted body with a static message and `content_type="text/plain"` (or a small JSON payload). The 404 path doesn't need to echo the missing ID back — the client already knows what it asked for.

```python
except (Channel.DoesNotExist, PermissionDenied):
    return HttpResponseNotFound("Channel not found", content_type="text/plain")
```

```python
except KeyError:
    return HttpResponseBadRequest("Missing attributes in request data", content_type="text/plain")
```

**Pattern 3 — `str(e)` in `errors` lists**

Replace with a static per-pattern message (e.g. `"Internal server error"`, `"Validation failed"`), and `logger.exception(...)` inside the `except` block to keep the diagnostic detail in logs.

```python
except Exception:
    logger.exception("apply_changes for %s failed", change.get("key"))
    change["errors"] = ["Internal server error"]
```

**`raise ValidationError(e)` in `viewsets/contentnode.py:480`**

Wrap explicitly with a static message and chain via `from e` so the original exception is preserved in logs but not in the response.

```python
except DjangoValidationError as e:
    raise ValidationError("Invalid completion criteria") from e
```

**`utils/pagination.py:90`**

Drop `str(exc)` from the message — pagination error messages don't need the underlying exception text.

```python
except InvalidPage:
    raise NotFound("Invalid page: {}".format(page_number))
```

### Out of Scope

- Adding new error codes / a frontend-mirrored `error_constants` module. Studio doesn't currently have one, and introducing it is broader than this sweep — for now, static plain-text messages are sufficient.
- Restructuring `handle_server_error` itself, beyond what's needed at each call site.
- The frontend's handling of API error response bodies. Going from "exception text" to "static message" is a strict reduction in surface; no frontend currently parses these bodies in a way the static message would break (verify per site).
- Management-command `CommandError` raises that include exception text (CLI surface, no API client).
- Logging idioms (`logger.error("...{}".format(str(e)))`) — none flagged in studio; out of scope unless encountered while touching a sweep site.

### Acceptance Criteria

#### Pattern 1 — `str(e)` in `HttpResponse*` body

- [ ] `contentcuration/views/internal.py:141`
- [ ] `contentcuration/views/internal.py:166`
- [ ] `contentcuration/views/internal.py:183` (`HttpResponseForbidden`)
- [ ] `contentcuration/views/internal.py:188` (`HttpResponseBadRequest`, `SuspiciousOperation`)
- [ ] `contentcuration/views/internal.py:199`
- [ ] `contentcuration/views/internal.py:227` (`"Required attribute missing from data | {}"` format)
- [ ] `contentcuration/views/internal.py:231`
- [ ] `contentcuration/views/internal.py:300` (`"Required attribute missing from data | {}"` format)
- [ ] `contentcuration/views/internal.py:304`
- [ ] `contentcuration/views/internal.py:354` (`HttpResponseBadRequest(content=str(e))`)
- [ ] `contentcuration/views/internal.py:357` (`"Required attribute missing from data | {}"` format)
- [ ] `contentcuration/views/internal.py:360` (`HttpResponseBadRequest(str(e))`, `NodeValidationError`)
- [ ] `contentcuration/views/internal.py:363`
- [ ] `contentcuration/views/internal.py:401`
- [ ] `contentcuration/views/internal.py:460`
- [ ] `contentcuration/views/internal.py:505`
- [ ] `contentcuration/views/internal.py:539`
- [ ] `contentcuration/viewsets/base.py:727` (`Response({"error": str(e)}, status=409)` — JSON, but same antipattern; static message + logger.exception)

#### Pattern 2 — User input in `HttpResponse*` body

- [ ] `contentcuration/views/internal.py:297` (`"No channel matching: {}".format(channel_id)`)
- [ ] `contentcuration/views/internal.py:352` (`"No content matching: {}".format(parent_id)`)
- [ ] `contentcuration/views/internal.py:398` (`"No channel matching: {}".format(data)`)
- [ ] `contentcuration/views/internal.py:421` (`"Channel not found {}".format(channel_id)`)
- [ ] `contentcuration/views/internal.py:424` (`raise HttpResponseBadRequest("Missing attribute from data: {}".format(data))` — note this is `raise` of an `HttpResponse`; fix the body and convert to `return` while there)
- [ ] `contentcuration/views/internal.py:455` (`"No tree name matching: {}".format(tree_name)`)
- [ ] `contentcuration/views/internal.py:501` (`"No channel matching: {}".format(channel_id)`)
- [ ] `contentcuration/views/internal.py:533` (`"No complete set of channels matching: {}".format(",".join(channel_ids))`)
- [ ] `contentcuration/views/nodes.py:52` (`"No topic found for {}".format(node_id)`)
- [ ] `contentcuration/views/users.py:111` (`"Missing attribute from data: {}".format(request.data)`)
- [ ] `contentcuration/views/base.py:112` (`HttpResponse(c.name)` — at minimum set `content_type="text/plain"`)
- [ ] `contentcuration/views/base.py:407` (`HttpResponse(next_url)` — same; pre-validated, but `text/plain` is the safe shape)

#### Pattern 3 — `str(e)` in `errors` lists (not CodeQL-flagged; included for consistency)

- [ ] `contentcuration/viewsets/base.py:712`
- [ ] `contentcuration/viewsets/base.py:757`
- [ ] `contentcuration/viewsets/base.py:799`
- [ ] `contentcuration/viewsets/base.py:839`
- [ ] `contentcuration/viewsets/base.py:878`
- [ ] `contentcuration/viewsets/contentnode.py:718`
- [ ] `contentcuration/viewsets/contentnode.py:1071` (`return str(error)`)
- [ ] `contentcuration/viewsets/contentnode.py:1083` (`return str(e)`)
- [ ] `contentcuration/viewsets/contentnode.py:1094`
- [ ] `contentcuration/viewsets/contentnode.py:1199`

#### Other

- [ ] `contentcuration/viewsets/contentnode.py:480` — `raise ValidationError(e)` replaced with `raise ValidationError("Invalid completion criteria") from e`.
- [ ] `contentcuration/utils/pagination.py:90` — `NotFound("Invalid page: {}".format(page_number))` (drop `str(exc)` from the message).

#### General

- [ ] No remaining call to `HttpResponse*(str(e), ...)` / `HttpResponse*(content=str(e), ...)` in `contentcuration/`.
- [ ] No `HttpResponse*(...)` body interpolates request fields without `content_type="text/plain"` (or equivalent escaping).
- [ ] Wherever the exception text was previously surfaced, a `logger.exception(...)` or existing `handle_server_error(...)` call is in place to retain the diagnostic in logs.

#### Testing

- [ ] Existing tests for `views/internal.py` (notably the ricecooker-facing endpoints) still pass.
- [ ] At least one test per pattern asserts the response body is the static message and does not contain the exception class name, traceback fragments, or echoed request input.
- [ ] CodeQL re-scan after the change shows zero `py/stack-trace-exposure` and zero `py/reflective-xss` alerts in `contentcuration/`.

### References

- CodeQL rules: [`py/stack-trace-exposure`](https://codeql.github.com/codeql-query-help/python/py-stack-trace-exposure/), [`py/reflective-xss`](https://codeql.github.com/codeql-query-help/python/py-reflective-xss/)
- Django docs: [`HttpResponse` content type](https://docs.djangoproject.com/en/3.2/ref/request-response/#django.http.HttpResponse.content_type) — defaults to `text/html` when omitted.
- `contentcuration/contentcuration/views/internal.py:handle_server_error` (already does the server-side logging; we just need to stop also returning the exception text to the caller).

## AI usage

Drafted with Claude during a CodeQL triage session. Claude collected the affected sites from the CodeQL alert list, grouped them by antipattern, and proposed a per-pattern fix shape. The per-file checklist is direct alert/grep output and should be re-checked line-by-line before applying the changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Sweep str(e) and reflected user input out of HTTP error responses #5916

Overview

Context

The Change

Out of Scope

Acceptance Criteria

Pattern 1 — `str(e)` in `HttpResponse*` body

Pattern 2 — User input in `HttpResponse*` body

Pattern 3 — `str(e)` in `errors` lists (not CodeQL-flagged; included for consistency)

Other

General

Testing

References

AI usage

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

Sweep str(e) and reflected user input out of HTTP error responses #5916

Description

Overview

Context

The Change

Out of Scope

Acceptance Criteria

Pattern 1 — str(e) in HttpResponse* body

Pattern 2 — User input in HttpResponse* body

Pattern 3 — str(e) in errors lists (not CodeQL-flagged; included for consistency)

Other

General

Testing

References

AI usage

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Pattern 1 — `str(e)` in `HttpResponse*` body

Pattern 2 — User input in `HttpResponse*` body

Pattern 3 — `str(e)` in `errors` lists (not CodeQL-flagged; included for consistency)