Bug in buffer protocol implementation

While inspecting the code, I noticed a small bug in the buffer protocol implementation - the buffer's `.shape` is equal to `num_tokens * buffer.itemsize`, but instead it should be `num_tokens` to ensure `math.prod(buffer.shape) * buffer.itemsize == len(buffer)` afterward, as per the [official spec](https://docs.python.org/3/c-api/buffer.html#buffer-protocol). This then leads to `memoryview(buffer).tolist()` returning an incorrect result. Luckily, NumPy ignores the `.shape` (unlike CPython) and builds the result using `len(buffer) // buffer.itemsize`.

Even though the `encode_to_tiktoken_buffer` API is somewhat private, this bug is probably still worth fixing to follow the spec 😃.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Bug in buffer protocol implementation #405

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug in buffer protocol implementation #405

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions