(improvement) serializers: add Cython-optimized serialization for VectorType#748
Draft
mykaul wants to merge 1 commit intoscylladb:masterfrom
Draft
(improvement) serializers: add Cython-optimized serialization for VectorType#748mykaul wants to merge 1 commit intoscylladb:masterfrom
mykaul wants to merge 1 commit intoscylladb:masterfrom
Conversation
…torType Add cassandra/serializers.pyx and cassandra/serializers.pxd implementing Cython-optimized serialization that mirrors the deserializers.pyx architecture. Implements type-specialized serializers for the three subtypes commonly used in vector columns: - SerFloatType: 4-byte big-endian IEEE 754 float - SerDoubleType: 8-byte big-endian double - SerInt32Type: 4-byte big-endian signed int32 SerVectorType pre-allocates a contiguous buffer and uses C-level byte swapping for float/double/int32 vectors, with a generic fallback for other subtypes. GenericSerializer delegates to the Python-level cqltype.serialize() classmethod. Factory functions find_serializer() and make_serializers() allow easy lookup and batch creation of serializers for column types. Benchmarks show ~30x speedup over the current io.BytesIO baseline and ~3x speedup over Python struct.pack for Vector<float, 1536> serialization. No setup.py changes needed - the existing cassandra/*.pyx glob already picks up new .pyx files.
mykaul
added a commit
to mykaul/python-driver
that referenced
this pull request
Mar 14, 2026
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
There was a problem hiding this comment.
Pull request overview
This PR introduces a new Cython extension module to accelerate CQL value serialization—especially VectorType—using the same general “typed Serializer object + factory lookup” approach as the existing Cython deserialization stack.
Changes:
- Add
cassandra/serializers.pyximplementing Cython serializers forFloatType,DoubleType,Int32Type, and an optimizedVectorTypeserializer with generic fallback. - Add
find_serializer()/make_serializers()factory helpers for serializer creation. - Add
cassandra/serializers.pxdto expose theSerializerinterface to other Cython modules.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
| cassandra/serializers.pyx | New Cython-optimized serialization implementations and factory lookup. |
| cassandra/serializers.pxd | Cython declarations for the Serializer interface. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+103
to
+106
| cpdef bytes serialize(self, object value, int protocol_version): | ||
| cdef int32_t val = <int32_t>value | ||
| cdef char out[4] | ||
| cdef char *src = <char *>&val |
Comment on lines
+196
to
+200
| for i in range(self.vector_size): | ||
| val = <float>values[i] | ||
| src = <char *>&val | ||
| dst = buf + i * 4 | ||
|
|
Comment on lines
+258
to
+261
|
|
||
| try: | ||
| for i in range(self.vector_size): | ||
| val = <int32_t>values[i] |
| cqltype.serialize() classmethod. | ||
| """ | ||
|
|
||
| from libc.stdint cimport int32_t, uint32_t |
Comment on lines
+332
to
+334
| def make_serializers(cqltypes_list): | ||
| """Create a list of Serializer objects for each given cqltype.""" | ||
| return [find_serializer(ct) for ct in cqltypes_list] |
Comment on lines
+209
to
+212
| return PyBytes_FromStringAndSize(buf, buf_size) | ||
| finally: | ||
| free(buf) | ||
|
|
Comment on lines
+315
to
+320
| cpdef Serializer find_serializer(cqltype): | ||
| """Find a serializer for a cqltype.""" | ||
|
|
||
| # For VectorType, always use SerVectorType (it handles generic subtypes internally) | ||
| if issubclass(cqltype, cqltypes.VectorType): | ||
| return SerVectorType(cqltype) |
Comment on lines
+61
to
+66
| cpdef bytes serialize(self, object value, int protocol_version): | ||
| cdef float val = <float>value | ||
| cdef char out[4] | ||
| cdef char *src = <char *>&val | ||
|
|
||
| if is_little_endian: |
mykaul
added a commit
to mykaul/python-driver
that referenced
this pull request
Mar 16, 2026
…nt.bind() When Cython serializers (from cassandra.serializers) are available and no column encryption policy is active, BoundStatement.bind() now uses pre-built Serializer objects cached on the PreparedStatement instead of calling cqltype classmethods. This avoids per-value Python method dispatch overhead and enables the ~30x vector serialization speedup from the Cython serializers module. The bind loop is split into three paths: 1. Column encryption policy path (unchanged behavior) 2. Cython serializers path (new fast path) 3. Plain Python path (no CE, no Cython -- removes per-value ColDesc/CE check) Depends on PR scylladb#748 (Cython serializers module) and PR scylladb#630 (CE-policy bind split).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
cassandra/serializers.pyxandcassandra/serializers.pxdimplementing Cython-optimized serialization that mirrors thedeserializers.pyxarchitecture.What's included
SerFloatType(4-byte IEEE 754),SerDoubleType(8-byte),SerInt32Type(4-byte signed) — the three subtypes commonly used in vector columnschar *buffer and uses C-level byte swapping for float/double/int32 vectors, with a generic fallback for other subtypescqltype.serialize()classmethod for all other typesfind_serializer(cqltype)andmake_serializers(cqltypes_list)for easy lookup and batch creationArchitecture
Mirrors
deserializers.pyxexactly:Deserializerbase classSerializerbase classDesFloatType,DesDoubleType,DesInt32TypeSerFloatType,SerDoubleType,SerInt32TypeDesVectorType(type-specialized)SerVectorType(type-specialized)GenericDeserializerGenericSerializerfind_deserializer()find_serializer()make_deserializers()make_serializers()Performance
Benchmarked on Vector<float, 1536> (typical embedding dimension):
VectorType.serialize()(io.BytesIO loop)struct.packbatch format stringSerVectorTypeNo
setup.pychanges needed — the existingcassandra/*.pyxglob already picks up new.pyxfiles.Related PRs
BoundStatement.bind()(depends on this PR + Optimize column_encryption_policy checks in recv_results_rows #630)Pre-review checklist
./docs/source/.Fixes:annotations to PR description.