Skip to content
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions key-value/key-value-aio/src/key_value/aio/stores/dynamodb/store.py
Original file line number Diff line number Diff line change
Expand Up @@ -164,6 +164,22 @@ async def _setup(self) -> None:
waiter = self._connected_client.get_waiter("table_exists") # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
await waiter.wait(TableName=self._table_name) # pyright: ignore[reportUnknownMemberType]

# Enable TTL on the table if not already enabled
ttl_response = await self._connected_client.describe_time_to_live( # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
TableName=self._table_name
)
ttl_status = ttl_response.get("TimeToLiveDescription", {}).get("TimeToLiveStatus") # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]

# Only enable TTL if it's currently disabled
if ttl_status == "DISABLED":
await self._connected_client.update_time_to_live( # pyright: ignore[reportUnknownMemberType]
TableName=self._table_name,
TimeToLiveSpecification={
"Enabled": True,
"AttributeName": "ttl",
},
)

Comment on lines +167 to +182
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Add error handling and handle transient TTL states.

The check-before-update approach addresses the previous concern about broad exception handling. However, the current implementation has several issues:

  1. Missing error handling: The update_time_to_live call has no error handling. If this fails (due to permissions, rate limiting, transient errors, or concurrent updates), the exception will propagate and break the setup process.

  2. Unhandled transient states: DynamoDB TTL can be in ENABLING or DISABLING states during transitions. The code only checks for DISABLED, so:

    • If TTL is ENABLING, the code skips the update (good) but doesn't verify it completes
    • If TTL is DISABLING, the code skips the update, leaving TTL in an incomplete state
    • If ttl_status is None (malformed response), the check silently fails
  3. Race condition: Between checking the status and calling update, another process could start enabling TTL, causing a ValidationException from the update call.

Apply this diff to add proper error handling and state management:

         # Enable TTL on the table if not already enabled
         ttl_response = await self._connected_client.describe_time_to_live(  # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
             TableName=self._table_name
         )
         ttl_status = ttl_response.get("TimeToLiveDescription", {}).get("TimeToLiveStatus")  # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
 
-        # Only enable TTL if it's currently disabled
-        if ttl_status == "DISABLED":
-            await self._connected_client.update_time_to_live(  # pyright: ignore[reportUnknownMemberType]
-                TableName=self._table_name,
-                TimeToLiveSpecification={
-                    "Enabled": True,
-                    "AttributeName": "ttl",
-                },
-            )
+        # Only enable TTL if it's currently disabled or unknown
+        if ttl_status in ("DISABLED", None):
+            try:
+                await self._connected_client.update_time_to_live(  # pyright: ignore[reportUnknownMemberType]
+                    TableName=self._table_name,
+                    TimeToLiveSpecification={
+                        "Enabled": True,
+                        "AttributeName": "ttl",
+                    },
+                )
+            except self._connected_client.exceptions.ClientError as e:  # pyright: ignore[reportUnknownMemberType]
+                error_code = e.response["Error"]["Code"]  # pyright: ignore[reportUnknownMemberType]
+                # Suppress only if TTL is already enabled/enabling; re-raise other errors
+                if error_code == "ValidationException":
+                    error_message = e.response["Error"]["Message"]  # pyright: ignore[reportUnknownMemberType]
+                    if "already enabled" not in error_message.lower() and "already being" not in error_message.lower():  # pyright: ignore[reportUnknownArgumentType]
+                        raise
+                else:
+                    raise
+        # If status is ENABLING, TTL will be enabled soon; if ENABLED, nothing to do

Note: This reintroduces exception handling but only suppresses the specific "already enabled" case, addressing the concern from the previous review while maintaining robustness.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# Enable TTL on the table if not already enabled
ttl_response = await self._connected_client.describe_time_to_live( # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
TableName=self._table_name
)
ttl_status = ttl_response.get("TimeToLiveDescription", {}).get("TimeToLiveStatus") # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
# Only enable TTL if it's currently disabled
if ttl_status == "DISABLED":
await self._connected_client.update_time_to_live( # pyright: ignore[reportUnknownMemberType]
TableName=self._table_name,
TimeToLiveSpecification={
"Enabled": True,
"AttributeName": "ttl",
},
)
# Enable TTL on the table if not already enabled
ttl_response = await self._connected_client.describe_time_to_live( # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
TableName=self._table_name
)
ttl_status = ttl_response.get("TimeToLiveDescription", {}).get("TimeToLiveStatus") # pyright: ignore[reportUnknownMemberType, reportUnknownVariableType]
# Only enable TTL if it's currently disabled or unknown
if ttl_status in ("DISABLED", None):
try:
await self._connected_client.update_time_to_live( # pyright: ignore[reportUnknownMemberType]
TableName=self._table_name,
TimeToLiveSpecification={
"Enabled": True,
"AttributeName": "ttl",
},
)
except self._connected_client.exceptions.ClientError as e: # pyright: ignore[reportUnknownMemberType]
error_code = e.response["Error"]["Code"] # pyright: ignore[reportUnknownMemberType]
# Suppress only if TTL is already enabled/enabling; re-raise other errors
if error_code == "ValidationException":
error_message = e.response["Error"]["Message"] # pyright: ignore[reportUnknownMemberType]
if "already enabled" not in error_message.lower() and "already being" not in error_message.lower(): # pyright: ignore[reportUnknownArgumentType]
raise
else:
raise
# If status is ENABLING, TTL will be enabled soon; if ENABLED, nothing to do
🤖 Prompt for AI Agents
In key-value/key-value-aio/src/key_value/aio/stores/dynamodb/store.py around
lines 167 to 182, add robust error handling and transient-state management for
TTL: validate ttl_status for None and treat it as unknown, if ttl_status is
"ENABLING" wait/poll with a short backoff until it becomes "ENABLED" or a
timeout expires, if ttl_status is "DISABLING" wait/poll until it is "DISABLED"
or timeout before attempting change, wrap the update_time_to_live call in a
try/except to catch and suppress the specific ValidationException/error that
indicates TTL was enabled by a concurrent process while logging other exceptions
and re-raising or handling retries for transient errors (throttling/network)
with a small retry/backoff loop; ensure all errors are logged with details.

@override
async def _get_managed_entry(self, *, key: str, collection: str) -> ManagedEntry | None:
"""Retrieve a managed entry from DynamoDB."""
Expand Down