You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Right now accessing individual items of str or bytes objects is not very efficient. One way to make this faster would be to support primitive types that allow direct access to item data via view objects. Fast access to individual bytes/str items is very useful in various libraries and low-level use cases, such as when writing parsers, decoders, etc.
For example, BytesView could allow direct access to the data within a bytes object. It would act like a sequence of integers, and wouldn't support most bytes methods. We'd represent it as a stack-allocated, immutable value object with three attributes:
Pointer to the beginning of the data view (char *)
Length of data (size_t)
Object (PyObject *)
We'd also support slicing, which would return a smaller view, and wouldn't copy any data. It would be a very fast operation in compiled code.
This approach has some benefits over adding primitives to operate directly on bytes values:
Slicing would be constant-time (and very fast), unlike bytes slicing, which allocates a new object.
We can support subclasses of bytes and bytearray universally without slowing down the bytes case, which is the common case. We'd construct a temporary bytes object behind the scenes if the target is mutable.
All operations on BytesView can be fast, so performance will be more predictable compared to dealing with bytes objects directly, as the latter have many unoptimized methods.
We can provide a very similar interface for direct access to the contents of str objects.
Similarly, StrView would provide direct, read-only access to the code point array backing a str. Here the performance benefit is more obvious compared to BytesView, since indexing strings produces strings with length 1, which are clearly not as efficient as (native) integers. Example:
All views could support some additional operations for convenience, beyond basic sequence operations:
Equality with bytes / str objects
startswith() and endswith()
Others that turn out to be useful
Additionally, StrView could support some operations for querying the internal representation (whether 1, 2 or 4 bytes is used per code point; maximum code point value).
Right now accessing individual items of
str
orbytes
objects is not very efficient. One way to make this faster would be to support primitive types that allow direct access to item data via view objects. Fast access to individualbytes
/str
items is very useful in various libraries and low-level use cases, such as when writing parsers, decoders, etc.For example,
BytesView
could allow direct access to the data within abytes
object. It would act like a sequence of integers, and wouldn't support mostbytes
methods. We'd represent it as a stack-allocated, immutable value object with three attributes:char *
)size_t
)PyObject *
)We'd also support slicing, which would return a smaller view, and wouldn't copy any data. It would be a very fast operation in compiled code.
Example:
This approach has some benefits over adding primitives to operate directly on
bytes
values:bytes
slicing, which allocates a new object.bytes
andbytearray
universally without slowing down thebytes
case, which is the common case. We'd construct a temporarybytes
object behind the scenes if the target is mutable.BytesView
can be fast, so performance will be more predictable compared to dealing withbytes
objects directly, as the latter have many unoptimized methods.str
objects.Similarly,
StrView
would provide direct, read-only access to the code point array backing astr
. Here the performance benefit is more obvious compared toBytesView
, since indexing strings produces strings with length 1, which are clearly not as efficient as (native) integers. Example:All views could support some additional operations for convenience, beyond basic sequence operations:
bytes
/str
objectsstartswith()
andendswith()
Additionally,
StrView
could support some operations for querying the internal representation (whether 1, 2 or 4 bytes is used per code point; maximum code point value).Related issue:
The text was updated successfully, but these errors were encountered: