# Bedrock TPM Limit Testing: Absolute vs Relative Minutes

This notebook tests whether AWS Bedrock's TPM (Tokens Per Minute) limits are based on:
- **Absolute minutes**: Quota resets at the start of each clock minute (xx:00)
- **Relative minutes**: Quota resets 60 seconds after first token usage

## Test Plan
1. Consume all 8,000 TPM at xx:35 seconds
2. Make a 1000-token request before the minute ends (should get 429)
3. Make a 1000-token request at yy:05 seconds (35 seconds later)
4. Make a 1000-token request at yy:50 seconds
5. Analyze results to determine the quota reset behavior

In [1]:
import sys
import time
from datetime import datetime
from utils import BedrockTPMTester

# Initialize the tester
tester = BedrockTPMTester(region_name='us-east-1')
print(f"Initialized Bedrock TPM Tester for model: {tester.model_id}")
print(f"Current time: {datetime.now().strftime('%H:%M:%S')}")

Initialized Bedrock TPM Tester for model: amazon.nova-pro-v1:0
Current time: 13:17:59


## Step 1: Wait for xx:35 and Consume All TPM Quota

In [2]:
# Wait until 35 seconds of the current minute
print("Waiting for xx:35 seconds...")
tester.wait_until_second(35)

start_time = datetime.now()
print(f"Starting quota consumption at: {start_time.strftime('%H:%M:%S')}")

# Consume 8000 tokens (our TPM limit) in chunks
consume_results = tester.consume_tpm_quota(total_tokens=8000, chunk_size=1000)

# Log the consumption results
total_consumed = 0
for i, result in enumerate(consume_results):
    if result['status_code'] == 200:
        total_consumed += result['tokens_requested']
    tester.log_result(
        f"Consume_batch_{i+1}",
        datetime.fromisoformat(result['timestamp']),
        result['tokens_requested'],
        result['status_code'],
        result['error']
    )

print(f"\nTotal tokens consumed: {total_consumed}")
print(f"Consumption completed at: {datetime.now().strftime('%H:%M:%S')}")

Waiting for xx:35 seconds...
Starting quota consumption at: 13:18:35
[13:18:35] Consume_batch_1: 1000 tokens -> 200
[13:18:35] Consume_batch_2: 1000 tokens -> 200
[13:18:36] Consume_batch_3: 1000 tokens -> 429
  Error: An error occurred (ThrottlingException) when calling the Converse operation (reached max retries: 1): Too many requests, please wait before trying again.

Total tokens consumed: 2000
Consumption completed at: 13:18:36


## Step 2: Test Request Before Minute Ends (Should Get 429)

In [3]:
# Make a test request before the minute ends
current_time = datetime.now()
print(f"Making test request at: {current_time.strftime('%H:%M:%S')}")

status_code, error, response_time = tester.make_bedrock_request(1000)
tester.log_result(
    "Test_before_minute_end",
    current_time,
    1000,
    status_code,
    error
)

expected_result = "429 (throttled)" if status_code == 429 else f"{status_code} (unexpected)"
print(f"Result: {expected_result}")

Making test request at: 13:18:37
[13:18:37] Test_before_minute_end: 1000 tokens -> 429
  Error: An error occurred (ThrottlingException) when calling the Converse operation (reached max retries: 1): Too many requests, please wait before trying again.
Result: 429 (throttled)


## Step 3: Wait for yy:05 and Test Again

In [4]:
# Wait until 5 seconds of the next minute
print("Waiting for yy:05 seconds (35 seconds after initial consumption)...")
tester.wait_until_second(5)

current_time = datetime.now()
print(f"Making test request at: {current_time.strftime('%H:%M:%S')}")

status_code, error, response_time = tester.make_bedrock_request(1000)
tester.log_result(
    "Test_at_next_minute_05",
    current_time,
    1000,
    status_code,
    error
)

if status_code == 200:
    print("✓ Request succeeded - suggests ABSOLUTE minute limits")
elif status_code == 429:
    print("✗ Request throttled - suggests RELATIVE minute limits")
else:
    print(f"? Unexpected status code: {status_code}")

Waiting for yy:05 seconds (35 seconds after initial consumption)...
Making test request at: 13:19:05
[13:19:05] Test_at_next_minute_05: 1000 tokens -> 200
✓ Request succeeded - suggests ABSOLUTE minute limits


## Step 4: Test at yy:50 (Should Always Succeed)

In [5]:
# Wait until 50 seconds of the current minute
print("Waiting for yy:50 seconds...")
tester.wait_until_second(50)

current_time = datetime.now()
print(f"Making final test request at: {current_time.strftime('%H:%M:%S')}")

status_code, error, response_time = tester.make_bedrock_request(1000)
tester.log_result(
    "Test_at_minute_50",
    current_time,
    1000,
    status_code,
    error
)

if status_code == 200:
    print("✓ Request succeeded as expected (>60 seconds passed)")
else:
    print(f"✗ Unexpected result: {status_code}")

Waiting for yy:50 seconds...
Making final test request at: 13:19:50
[13:19:50] Test_at_minute_50: 1000 tokens -> 200
✓ Request succeeded as expected (>60 seconds passed)


## Step 5: Results Summary and Analysis

In [6]:
# Print comprehensive summary
tester.print_summary()


TEST SUMMARY
Test: Consume_batch_1
  Time: 13:18:35 (minute 18, second 35)
  Tokens: 1000
  Status: 200

Test: Consume_batch_2
  Time: 13:18:35 (minute 18, second 35)
  Tokens: 1000
  Status: 200

Test: Consume_batch_3
  Time: 13:18:36 (minute 18, second 36)
  Tokens: 1000
  Status: 429
  Error: An error occurred (ThrottlingException) when calling the Converse operation (reached max retries: 1): Too many requests, please wait before trying again.

Test: Test_before_minute_end
  Time: 13:18:37 (minute 18, second 37)
  Tokens: 1000
  Status: 429
  Error: An error occurred (ThrottlingException) when calling the Converse operation (reached max retries: 1): Too many requests, please wait before trying again.

Test: Test_at_next_minute_05
  Time: 13:19:05 (minute 19, second 5)
  Tokens: 1000
  Status: 200

Test: Test_at_minute_50
  Time: 13:19:50 (minute 19, second 50)
  Tokens: 1000
  Status: 200

ANALYSIS:
----------------------------------------
Total tokens consumed in quota exhaustion:

## Additional Analysis

Based on the test results above:

### If TPM limits are ABSOLUTE (clock-based):
- Quota consumption at xx:35 fills the quota for that minute
- Request before minute end (xx:45-59) should get 429
- Request at yy:05 should get 200 (new minute, fresh quota)
- Request at yy:50 should get 200

### If TPM limits are RELATIVE (rolling window):
- Quota consumption at xx:35 starts a 60-second window
- Request before minute end should get 429
- Request at yy:05 should get 429 (only 30 seconds passed)
- Request at yy:50 should get 200 (>60 seconds passed)

The key differentiator is the result at yy:05 seconds.