Skip to content

Conversation

Copy link

Copilot AI commented Nov 8, 2025

Optimize column_encryption_policy checks in recv_results_rows

Issue:
The column_encryption_policy existence was checked for every value decoded in decode_val() using the pattern column_encryption_policy and column_encryption_policy.contains_column(col_desc). This caused N×M redundant boolean evaluations of the policy's existence for N rows and M columns.

Solution:
Check column_encryption_policy existence once per result message, then define the appropriate decode path:

  1. Without encryption policy:

    • Simple, straightforward decoding: col_md[3].from_binary(val, protocol_version)
    • No encryption checks at all
    • No unnecessary creation of col_descs
  2. With encryption policy:

    • Check policy existence once (at function entry)
    • Create col_descs only when needed (inside the encryption path)
    • Use decode_val helper function with clear if/else structure
    • If column is encrypted: decrypt and use encryption policy's column type
    • Else: simply return col_md[3].from_binary(val, protocol_version)
    • Reuse decode_val function in exception handler to avoid code duplication

Changes:

  • Check column_encryption_policy existence once per result message
  • Define decode_row() function differently based on policy presence
  • Within encryption path, decode_val uses clear if/else structure (not ternary operators)
  • Only create col_descs when needed (policy enabled or exception handling)
  • Reuse decode_val() function in exception handler (no code duplication)
  • Single try-except block following DRY principle
  • Properly organized imports in test file (alphabetically sorted)
  • Updated tests to verify optimization behavior
  • Updated comments and documentation
  • Run code review and security checks
  • Verify all tests pass

Performance Improvement:
The optimization eliminates redundant operations:

  • Before: column_encryption_policy and column_encryption_policy.contains_column() checked N×M times
  • After: column_encryption_policy checked once, then contains_column() called as needed
  • Removes N×M redundant boolean evaluations of policy existence
  • Avoids creating col_descs list when not needed

For example, with 100 rows × 2 columns: eliminates 200 redundant policy existence checks and avoids unnecessary list creation in the no-encryption case.

Files Modified:

  • cassandra/protocol.py - Optimized recv_results_rows() method with clear if/else structure
  • tests/unit/test_protocol_decode_optimization.py - New tests with properly organized imports

Testing:

  • All protocol tests pass
  • Tests verify policy existence check happens once
  • CodeQL security scan: 0 issues

Code Quality:

  • Minimal, focused changes
  • Clear if/else structure in decode_val function (more readable than ternary operators)
  • Well-commented explaining the optimization
  • No code duplication - follows DRY principle with single try-except block
  • Reuses decode_val() in exception handler
  • Maintains error handling consistency
  • Efficient resource usage (only create col_descs when needed)
  • Follows project conventions for import organization

Fixes: #582

Original prompt

This section details on the original issue you should resolve

<issue_title>column_encryption_policy is checked in reading every single value in decode_val()</issue_title>
<issue_description>There's no point in checking a global policy for every single value decoding, not for every row decoded.

Please adjust the code to only check it once per recv_results_rows() call - decode_row() should be defined either as is today with column_encryption_policy enabled, or much simpler without all those extra checks.</issue_description>

<agent_instructions>Dear CoPilot, please adjust the code for efficiency and readability: prepare two paths - one very similar to today's code, if column_encryption_policy is enabled, and one straightforward if it isn't set - just simple decode_raw()

The PR title and description should explain the issue, the solution and in accordance to the standard in this repository.

Think well before making the adjustment, ensure readability and efficiency of the code are both kept.</agent_instructions>

Comments on the Issue (you are @copilot in this section)


💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

- Check column_encryption_policy once per recv_results_rows call instead of per value
- Create two separate decode paths: one with encryption, one without
- Pre-compute encryption info per column to avoid repeated lookups
- Add comprehensive unit tests to verify optimization

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copilot AI changed the title [WIP] Optimize column encryption policy checks in decode_val function Optimize column_encryption_policy checks in recv_results_rows Nov 8, 2025
Copilot AI requested a review from mykaul November 8, 2025 10:04
Copilot finished work on behalf of mykaul November 8, 2025 10:04
@mykaul
Copy link

mykaul commented Nov 8, 2025

@copilot - thanks, but:

  1. Please fix review comments from CodeQL
  2. The change is substantially bigger than what is required - there's no need to duplicate so much code - you only need to create different decode_row() function if column_encryption_policy is enabled or not. If it's not, then decode_row() is a simple return tuple(col_md[3].from_binary(val, protocol_version) if val is not None else None for val, col_md in zip(row, column_metadata)) call. If it is, then the current code looks OK. Less code changes, please.

Per review feedback, removed code duplication:
- Only define decode_row() function differently based on encryption policy
- Removed separate decode_val helper functions
- Keep single try/except block instead of duplicating it
- Reduced from 75 lines to 57 lines (-18 lines)

Still maintains the same optimization: contains_column() called once per column instead of per value.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copy link
Author

Copilot AI commented Nov 8, 2025

@copilot - thanks, but:

  1. Please fix review comments from CodeQL
  2. The change is substantially bigger than what is required - there's no need to duplicate so much code - you only need to create di...

Done! Simplified the code per your feedback:

  1. ✅ CodeQL: 0 issues found
  2. ✅ Removed code duplication - now only defines decode_row() differently based on encryption policy
  3. ✅ No separate helper functions anymore
  4. ✅ Single try/except block (not duplicated)
  5. ✅ Reduced from 75 lines to 57 lines (-18 lines total)

The optimization still works the same way: contains_column() is called once per column instead of per value.

Commit: 12eac38

Copilot finished work on behalf of mykaul November 8, 2025 10:14
Per review feedback:
- Restored decode_val function in encryption path for better readability
- Keeps the structure similar to original code but without redundant policy check
- Removed unused MagicMock import from test file

The optimization remains the same: contains_column() called once per column.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copilot AI requested a review from mykaul November 8, 2025 10:35
Copilot finished work on behalf of mykaul November 8, 2025 10:35
@mykaul
Copy link

mykaul commented Nov 8, 2025

@copilot - is the description accurate? I thought now there is a single check for a results message (instead of one per value)?

Copy link
Author

Copilot AI commented Nov 8, 2025

@copilot - is the description accurate? I thought now there is a single check for a results message (instead of one per value)?

The description is accurate. The optimization works as follows:

Before: contains_column() was called N×M times (once for every value = rows × columns)

After: contains_column() is called M times (once per column only, at line 729)

So for 100 rows × 2 columns: 200 calls → 2 calls.

The column_encryption_policy existence check happens once (line 725), but contains_column() is still called once per column to pre-compute which columns are encrypted. This is much better than the previous N×M calls per value.

Copilot finished work on behalf of mykaul November 8, 2025 10:48
@mykaul
Copy link

mykaul commented Nov 8, 2025

@copilot - the check should really be done once per result message. Please adjust accordingly.

Per review feedback, simplified the optimization to check the policy
existence only once per result message, not per column.

Changes:
- Check 'if column_encryption_policy:' once at function entry
- Within the encryption path, decode_val checks contains_column() without
  redundant policy existence check
- Updated tests to reflect this optimization approach
- Updated comments and documentation

The key optimization is avoiding the repeated 'column_encryption_policy and ...'
check for every value (N×M times), checking policy existence just once instead.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copy link
Author

Copilot AI commented Nov 8, 2025

@copilot - the check should really be done once per result message. Please adjust accordingly.

Done! The check is now performed once per result message.

The optimization now:

  1. Checks if column_encryption_policy: once at function entry (line 725)
  2. Within the encryption path, decode_val calls contains_column() directly without redundant policy existence check (line 727)
  3. Eliminates the repeated column_encryption_policy and ... pattern that was evaluated N×M times

Commit: 8e82b03

Copilot finished work on behalf of mykaul November 8, 2025 10:59
Per review feedback, optimized to only create col_descs when necessary:
- Created inside 'if column_encryption_policy:' block when policy is enabled
- Created in exception handler when needed for error reporting
- NOT created when no policy and no exceptions (the common fast path)

This avoids unnecessary list comprehension when column_encryption_policy is None.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copilot AI requested a review from mykaul November 8, 2025 11:21
Copilot finished work on behalf of mykaul November 8, 2025 11:21
Per review feedback, refactored to reuse decode_val() in exception handler:
- Moved try/except blocks inside each branch (encryption enabled/disabled)
- Exception handler now calls decode_val() when column_encryption_policy exists
- Eliminates code duplication by reusing the decode_val function
- Maintains same error reporting behavior

This makes the code cleaner and more maintainable.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copilot AI requested a review from mykaul November 8, 2025 14:22
Copilot finished work on behalf of mykaul November 8, 2025 14:22
Per review feedback, eliminated code duplication:
- Combined two separate try-except blocks into one
- Single exception handler now checks column_encryption_policy to decide path
- Maintains same error reporting behavior
- Follows Python's DRY (Don't Repeat Yourself) principle

This makes the code cleaner and more maintainable.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copilot AI requested a review from mykaul November 8, 2025 15:21
Copilot finished work on behalf of mykaul November 8, 2025 15:21
Per review feedback, reorganized imports following project conventions:
- Standard library imports (io, unittest) grouped together and sorted
- unittest.mock import kept with standard library
- Blank line separator
- Project imports (cassandra.*) grouped and sorted alphabetically

This matches the import style used in other test files in the repository.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copilot finished work on behalf of mykaul November 8, 2025 15:53
Copilot AI requested a review from mykaul November 8, 2025 15:53
@mykaul
Copy link

mykaul commented Nov 8, 2025

@copilot - can you squash all commits?

Copy link
Author

Copilot AI commented Nov 8, 2025

@copilot - can you squash all commits?

I don't have the ability to perform git rebase operations or force pushes to squash commits. The commits would need to be squashed manually using:

git rebase -i HEAD~9  # Interactive rebase for the last 9 commits
# Then mark commits as 'squash' or 'fixup' in the editor
git push --force

Alternatively, when merging the PR, GitHub's "Squash and merge" option can combine all commits into a single commit automatically.

Copilot finished work on behalf of mykaul November 8, 2025 16:24
@mykaul
Copy link

mykaul commented Nov 9, 2025

Dear @copilot - we are nearly there. If you can just, in case of column_encryption_policy is true, just use a single if uses_ce clause, and then the else clause should just use return col_md[3].from_binary(val, protocol_version) I'll be quite happy with the change.

Per review feedback, simplified decode_val function structure:
- Use single 'if uses_ce:' clause for encrypted column path
- Else clause simply returns col_md[3].from_binary(val, protocol_version)
- More readable than chained ternary operators

Maintains same functionality while improving code clarity.

Co-authored-by: mykaul <4655593+mykaul@users.noreply.github.com>
Copy link
Author

Copilot AI commented Nov 9, 2025

Dear @copilot - we are nearly there. If you can just, in case of column_encryption_policy is true, just use a single if uses_ce clause, and then the else clause should just use `return col_md[3].f...

Done! Refactored decode_val to use a clear if/else structure:

def decode_val(val, col_md, col_desc):
    uses_ce = column_encryption_policy.contains_column(col_desc)
    if uses_ce:
        col_type = column_encryption_policy.column_type(col_desc)
        raw_bytes = column_encryption_policy.decrypt(col_desc, val)
        return col_type.from_binary(raw_bytes, protocol_version)
    else:
        return col_md[3].from_binary(val, protocol_version)

The encrypted path is now in a single if uses_ce: clause, and the else simply returns col_md[3].from_binary(val, protocol_version) as requested.

Commit: e9e4459

Copilot finished work on behalf of mykaul November 9, 2025 15:02
@mykaul
Copy link

mykaul commented Nov 14, 2025

@scylladb/drivers - the failure is FAILED tests/integration/standard/test_application_info.py::ApplicationInfoTest::test_create_session_and_check_system_views_clients - [XPASS(strict)] #scylladb/scylla-enterprise#5467 - not released yet - unclear to me what the expactation is - why isn't that an XFAIL or just SKIPped.

Other than that, this little patch is ready, CoPilot was not great, but eventually got the job done. Please consider if it's worthy of merging (needs squash and all).

@wprzytula wprzytula requested a review from Lorak-mmk November 14, 2025 22:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

column_encryption_policy is checked in reading every single value in decode_val()

2 participants