Skip to content

Pin response encoding to UTF-8 in API requests#36

Merged
marph91 merged 2 commits into
marph91:masterfrom
tillo:fix-utf8-response-encoding
May 24, 2026
Merged

Pin response encoding to UTF-8 in API requests#36
marph91 merged 2 commits into
marph91:masterfrom
tillo:fix-utf8-response-encoding

Conversation

@tillo
Copy link
Copy Markdown
Contributor

@tillo tillo commented May 22, 2026

joppy decodes API responses through response.text, which relies on requests knowing the response charset.

Joplin always serves UTF-8, but several endpoints omit the charset from the Content-Type header — notably the Joplin Server item /content endpoint that returns the raw note (title + body + metadata). When the charset is absent, requests does not assume UTF-8: it falls back to charset detection (or, depending on the requests / charset_normalizer version, to the latin-1 default). Non-ASCII note content is then mis-decoded into mojibake.

Because the mis-decoded string is read back, edited, and written again, the corruption compounds on every round-trip: an em-dash becomes â\x80\x94, then âÂ\x80Â\x94, and so on. Any tool built on joppy that round-trips notes (sync helpers, MCP servers, migration scripts) silently corrupts every note it touches.

This pins response.encoding = "utf-8" immediately after each request in both ServerApi._request and Api._request, before response.text is ever accessed. Joplin's API is UTF-8 in every case, so this is always correct and makes decoding deterministic regardless of the installed requests / charset_normalizer version.

No behaviour change for ASCII content; non-ASCII titles and bodies now round-trip losslessly.

tillo added 2 commits May 22, 2026 19:17
Joplin's REST API always returns UTF-8, but some responses — notably the
Joplin Server item /content endpoint — omit the charset from the
Content-Type header. requests then falls back to charset detection for
`response.text`, which mis-decodes non-ASCII note content as mojibake.

Set response.encoding = "utf-8" explicitly in both _request methods so
note titles and bodies decode correctly.
The Joplin Server upstream now returns a totp_secret field on the
user data endpoint (see laurent22/joplin database/types.ts). Without
the matching dataclass field, UserData(**item) raises:

  TypeError: UserData.__init__() got an unexpected keyword argument
  'totp_secret'

This currently breaks test_get_current_user on every CI run and is
independent of the UTF-8 encoding change in this PR -- it would
fail on master too once master CI is re-triggered.

Added as the last field, mirroring the chronological-extension pattern
of sso_auth_code* added in earlier upstream revisions.
@tillo
Copy link
Copy Markdown
Contributor Author

tillo commented May 24, 2026

Heads-up that I've pushed an extra commit (a60eea3) to fix the CI failure on test_get_current_user. The Joplin Server upstream now returns a totp_secret field on the user data response, which made UserData(**item) raise TypeError: UserData.__init__() got an unexpected keyword argument 'totp_secret' regardless of this PR's UTF-8 change — master CI hits the same error once re-triggered.

If you'd rather keep this PR scoped to the encoding fix only, happy to split the totp_secret field into its own PR. Two flake-class failures remain (Joplin AppImage stop-timeout in tearDownModule, and a SQLITE_BUSY on test_get_notebook on the 3.10 job only) — those look environmental and aren't touched here.

@marph91
Copy link
Copy Markdown
Owner

marph91 commented May 24, 2026

Thanks for the analysis and fix! I can reproduce the issue locally. I'm fine with the changes.

Two flake-class failures remain (Joplin AppImage stop-timeout in tearDownModule, and a SQLITE_BUSY on test_get_notebook on the 3.10 job only) — those look environmental and aren't touched here.

Yes, these failures are not caused by this PR. I need to find the root cause and fix them separately.

@marph91 marph91 merged commit 7743be1 into marph91:master May 24, 2026
2 of 4 checks passed
@tillo
Copy link
Copy Markdown
Contributor Author

tillo commented May 24, 2026

Update: a60eea32 resolved the real failure (test_get_current_user now passes on both 3.10 and 3.14). The two errors still showing are both pre-existing flakes:

  • tearDownModule — Joplin AppImage subprocess.communicate(timeout=10) times out on shutdown. Hits on every run; not test-or-PR-specific.
  • test_get_all_notes (3.14 only this time) — SQLITE_BUSY on the Joplin Server SQLite under suite concurrency. Same class as the test_get_notebook flake on the previous 3.10 run; just landed on a different test.

Neither touches the UTF-8 response-hook code or the totp_secret field. Happy to leave the flakes for separate work — if you'd rather see green CI here first, the cheapest fix would be @pytest.mark.flaky (or unittest's equivalent) on the SQLITE_BUSY-prone tests and a .kill() fallback after the 10s communicate timeout in setup_joplin.py::stop(). Let me know which you prefer.

@marph91
Copy link
Copy Markdown
Owner

marph91 commented May 24, 2026

Joppy 1.0.4 was just released on Pypi. Feel free to open a new issue (or PR) if there are problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants