Skip to content

perf: pre-allocate values list in BoundStatement.bind()#12

Open
mykaul wants to merge 1 commit intomasterfrom
perf/preallocate-bind-values
Open

perf: pre-allocate values list in BoundStatement.bind()#12
mykaul wants to merge 1 commit intomasterfrom
perf/preallocate-bind-values

Conversation

@mykaul
Copy link
Owner

@mykaul mykaul commented Mar 15, 2026

Summary

  • Pre-allocate the values list in BoundStatement.bind() using [UNSET_VALUE] * col_meta_len (proto v4+) or [None] * value_len (proto v3) instead of starting with an empty list and calling .append() per value
  • Eliminates the separate trailing UNSET_VALUE padding loop (_append_unset_value() called in a for _ in range(diff) loop)
  • Uses index assignment (result[i] = col_bytes) instead of method lookup + call (self.values.append(col_bytes))

Motivation

Each .append() call involves a Python method lookup and function call, plus potential list resizing when capacity is exceeded. For prepared statements with many columns (common in LWT queries), this overhead is measurable. Pre-allocating with a known size and using index assignment avoids both the method dispatch overhead and all list resizing.

This is part of the LWT prepared statement performance improvement effort documented in scylladb#751 (optimization B5).

Changes

cassandra/query.py - BoundStatement.bind():

  • Initialize result list with known final size instead of self.values = []
  • Replace self.values.append(...) with result[i] = ... using enumerate(zip(...))
  • For proto v4+: result = [UNSET_VALUE] * col_meta_len — trailing unbound columns are already padded
  • For proto v3: result = [None] * value_len — only provided values
  • Inline the UNSET_VALUE routing key check (was previously delegated to _append_unset_value())
  • Assign self.values = result at the end (single attribute write)

Testing

All existing tests pass:

  • tests/unit/test_parameter_binding.py — 37/37 passed (V3, V4, V5 protocol versions)
  • tests/unit/test_query.py — 6/6 passed
  • tests/unit/test_resultset.py — 14/14 passed

Replace empty list + repeated append() with pre-allocated list and index
assignment in BoundStatement.bind(). For protocol v4+, the list is
initialized to [UNSET_VALUE] * col_meta_len, eliminating the separate
trailing UNSET_VALUE padding loop entirely. For protocol v3, the list is
initialized to [None] * value_len.

This avoids repeated list resizing and reduces Python bytecode overhead
per bound value (index assignment vs method lookup + call for append).

The routing key validation for UNSET_VALUE is preserved: explicit
UNSET_VALUE binds are checked inline, and implicitly padded trailing
columns are validated in a separate loop after the main bind loop.

Part of: scylladb#751
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant