Skip to content

Conversation

@abrookins
Copy link
Collaborator

Summary

Fixes #779 - bytes fields containing non-UTF8 binary data (e.g., PNG headers, binary files) caused UnicodeDecodeError when saving to Redis.

Root Cause

The jsonable_encoder uses Pydantic's ENCODERS_BY_TYPE which called bytes.decode() without specifying an encoding, defaulting to UTF-8. This fails for arbitrary binary data.

Solution

Use base64 encoding for bytes fields before JSON serialization, and decode back to bytes on retrieval. This follows the same pattern as the existing datetime timestamp conversion.

Added two new functions:

  • convert_bytes_to_base64() - encodes bytes to base64 strings before storage
  • convert_base64_to_bytes() - decodes base64 strings back to bytes on retrieval

Applied in:

  • HashModel.save() and HashModel.get()
  • JsonModel.save() and JsonModel.get()
  • from_redis() for search results

Testing

Added 5 new tests covering:

  • Binary data storage/retrieval (non-UTF8 bytes like PNG headers)
  • Optional[bytes] fields
  • Bytes fields in EmbeddedJsonModel

All 224 tests pass.

Bytes fields containing non-UTF8 data (e.g., binary files, images)
caused UnicodeDecodeError because jsonable_encoder called bytes.decode()
without specifying an encoding.

Solution: Use base64 encoding for bytes fields before JSON serialization,
and decode back to bytes on retrieval. This follows the same pattern as
datetime timestamp conversion.

Fixes #779
@abrookins abrookins merged commit bdaa5b1 into main Jan 23, 2026
15 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bytes fields fail with UnicodeDecodeError for non-UTF8 binary data

2 participants