# Implementing Retries and Timeouts

- External services can be slow or unreliable, causing scripts to hang or fail unexpectedly.
- Timeouts and retries help ensure your automation scripts remain responsive and resilient.

## Timeouts

- By default, `requests` may wait indefinitely for a response, which is risky in automation.
- Use the `timeout` parameter with a single value for both connect and read, or a tuple `(connect, read)` for fine-grained control.
- A `ConnectTimeout` is raised if the connection can’t be established in time; a `ReadTimeout` is raised if data stops arriving within the read timeout.

In [12]:
HTTPBIN_ENDPOINT = "https://httpbin.org"

In [13]:
import requests
import time

delay_url = f"{HTTPBIN_ENDPOINT}/delay/5"     # simulate a 5s delay

start = time.perf_counter()
try:
    res = requests.get(delay_url, timeout=2)  # timeout in 2s
    print(f"Completed in {time.perf_counter() - start:.2f}s, status {res.status_code}")
except (
    requests.exceptions.ConnectTimeout,
    requests.exceptions.ReadTimeout
) as timeout_err:
    print(f"Timeout after {time.perf_counter() - start:.2f}s: {timeout_err}")


Timeout after 2.37s: HTTPSConnectionPool(host='httpbin.org', port=443): Read timed out. (read timeout=2)


## Retries

- Transient issues like network blips or server overloads may cause requests to fail temporarily.
- Implement a simple retry loop that catches errors, retries on server-side (5xx) errors or network exceptions, and breaks on success or client errors.
- Use a fixed delay between retries for simplicity, or an exponential backoff for a more robust approach. 
- Avoid retrying non-idempotent operations. **Idempotent** means operations are the operations which can be applied multiple times without changing the initial result.

In [16]:
import requests
import time

# Random status codes which will be picked up by server and returns back (to simulate flaky URL).
flaky_url = f"{HTTPBIN_ENDPOINT}/status/200,500,503"  
max_retries = 3
delay = 2

for attempt in range(1, max_retries + 1):
    print(f"Attempt {attempt}/{max_retries}...")

    try:
        res = requests.get(flaky_url, timeout=10)
        res.raise_for_status()
        print(f"Succeeded with status {res.status_code}")
        break
    except requests.exceptions.HTTPError as err:
        if err.response.status_code < 500:
            print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
            break
        else:
            print(f"Failed with server error code {err.response.status_code}.")
    if attempt < max_retries:
        print(f"Waiting {delay}s before retry...")
        time.sleep(delay)
        
# Only prints if the loop is finished
# Does not execute if the loop is stopped by break statement
else:
    print(f"All {max_retries} attempts failed!") 

Attempt 1/3...
Failed with server error code 503.
Waiting 2s before retry...
Attempt 2/3...
Succeeded with status 200


## Exponential Backoff with Jitter

- Fixed delays can overwhelm a recovering server if many clients retry simultaneously.
- Exponential backoff increases the wait time after each failure (e.g., 1s, 2s, 4s...).
- Adding jitter (a small random offset) prevents synchronized retry spikes.

In [None]:
import requests
import time
import random

def get_with_backoff(url, max_retries=3):
    delay = 1 

    for attempt in range(1, max_retries + 1):
        print(f"Attempt {attempt}/{max_retries}...")
        
        try:
            res = requests.get(url, timeout=10)
            res.raise_for_status()
            print(f"Succeeded with status {res.status_code}")
            return res
        except requests.exceptions.HTTPError as err:
            if err.response.status_code < 500:
                print(f"Failed with client error code {err.response.status_code}. Skipping retry.")
                raise RuntimeError(f"Client error! Please review request.")
            else:
                jitter = random.uniform(-0.1 * delay, 0.1 * delay)
                # delay = 1 -> jitter [-0.1, 0.1] -> between 0.9 and 1.1s
                # delay = 2 -> jitter [-0.2, 0.2] -> between 1.8 and 2.2s
                # delay = 4 -> jitter [-0.4, 0.4] -> between 3.6 and 4.4s
                wait = min(delay * 2, 30) + jitter
                print(f"  Failed with server error code {err.response.status_code}. Retrying in {wait:.2f}s")
                time.sleep(wait)
                delay = min(delay * 2, 30) # update delay
    raise RuntimeError(f"All retries to query {url} failed!")

try:
    res = get_with_backoff(
        f"{HTTPBIN_ENDPOINT}/status/503",
        max_retries=4
    )
except RuntimeError as e:
    print(e)

Attempt 1/4...
  Failed with server error code 503. Retrying in 1.99s
Attempt 2/4...
  Failed with server error code 503. Retrying in 4.14s
Attempt 3/4...
  Failed with server error code 503. Retrying in 7.81s
Attempt 4/4...
  Failed with server error code 503. Retrying in 16.78s
All retries to query https://httpbin.org/status/503 failed!


## Common Pitfalls & How to Avoid Them

- Forgetting to set timeouts can cause scripts to hang indefinitely; always use `timeout`.
- Retrying client errors (4xx) usually won’t help; only retry transient server errors (5xx) or network issues.
- Retrying non-idempotent operations (e.g., POST) can cause duplicate actions; limit retries to safe methods.
- Fixed retry delays can lead to synchronized retry spikes; use exponential backoff with jitter for production scenarios.