Pin response encoding to UTF-8 in API requests#36
Conversation
Joplin's REST API always returns UTF-8, but some responses — notably the Joplin Server item /content endpoint — omit the charset from the Content-Type header. requests then falls back to charset detection for `response.text`, which mis-decodes non-ASCII note content as mojibake. Set response.encoding = "utf-8" explicitly in both _request methods so note titles and bodies decode correctly.
The Joplin Server upstream now returns a totp_secret field on the user data endpoint (see laurent22/joplin database/types.ts). Without the matching dataclass field, UserData(**item) raises: TypeError: UserData.__init__() got an unexpected keyword argument 'totp_secret' This currently breaks test_get_current_user on every CI run and is independent of the UTF-8 encoding change in this PR -- it would fail on master too once master CI is re-triggered. Added as the last field, mirroring the chronological-extension pattern of sso_auth_code* added in earlier upstream revisions.
|
Heads-up that I've pushed an extra commit ( If you'd rather keep this PR scoped to the encoding fix only, happy to split the totp_secret field into its own PR. Two flake-class failures remain (Joplin AppImage stop-timeout in |
|
Thanks for the analysis and fix! I can reproduce the issue locally. I'm fine with the changes.
Yes, these failures are not caused by this PR. I need to find the root cause and fix them separately. |
|
Update:
Neither touches the UTF-8 response-hook code or the |
|
Joppy 1.0.4 was just released on Pypi. Feel free to open a new issue (or PR) if there are problems. |
joppy decodes API responses through
response.text, which relies onrequestsknowing the response charset.Joplin always serves UTF-8, but several endpoints omit the charset from the
Content-Typeheader — notably the Joplin Server item/contentendpoint that returns the raw note (title + body + metadata). When the charset is absent,requestsdoes not assume UTF-8: it falls back to charset detection (or, depending on therequests/charset_normalizerversion, to the latin-1 default). Non-ASCII note content is then mis-decoded into mojibake.Because the mis-decoded string is read back, edited, and written again, the corruption compounds on every round-trip: an em-dash
—becomesâ\x80\x94, thenâÂ\x80Â\x94, and so on. Any tool built on joppy that round-trips notes (sync helpers, MCP servers, migration scripts) silently corrupts every note it touches.This pins
response.encoding = "utf-8"immediately after each request in bothServerApi._requestandApi._request, beforeresponse.textis ever accessed. Joplin's API is UTF-8 in every case, so this is always correct and makes decoding deterministic regardless of the installedrequests/charset_normalizerversion.No behaviour change for ASCII content; non-ASCII titles and bodies now round-trip losslessly.