```{contents}
```
## Redis Cache 


**Redis** is an in-memory key–value data store used as a **high-performance distributed cache**.

It stores data in RAM, making access extremely fast compared to databases or file systems.

**Why it is used**

* Microsecond-level latency
* Shared cache across multiple servers
* Built-in expiration (TTL)
* Supports strings, hashes, lists, sets, vectors, JSON

---

### Where Redis Fits in an AI System

```
Client Request
      ↓
Redis Cache ── hit → Return Data
      ↓ miss
Database / LLM / Embedding Model → Store in Redis → Return
```

Redis becomes the **first lookup layer** before expensive operations.

---

### Basic Redis Caching (Response Cache)

#### Demonstration

```python
import redis
import json

r = redis.Redis(host="localhost", port=6379, db=0, decode_responses=True)

def get_response(prompt):
    key = f"resp:{prompt}"

    cached = r.get(key)
    if cached:
        print("Redis Cache Hit")
        return cached

    print("Redis Cache Miss — Calling LLM")
    response = llm.invoke(prompt).content
    r.set(key, response, ex=3600)   # TTL = 1 hour
    return response
```

---

### Embedding Cache Using Redis

#### Demonstration

```python
import pickle, hashlib

def embed_key(text):
    return f"emb:{hashlib.sha256(text.encode()).hexdigest()}"

def get_embedding(text):
    key = embed_key(text)

    cached = r.get(key)
    if cached:
        print("Embedding Cache Hit")
        return pickle.loads(bytes.fromhex(cached))

    vector = embedding_model.encode(text)
    r.set(key, pickle.dumps(vector).hex(), ex=86400)   # 24 hours
    return vector
```

---

### Semantic Cache with Redis + Similarity

#### Demonstration (Simplified)

```python
def get_semantic_response(prompt):
    query_vec = get_embedding(prompt)

    for key in r.scan_iter("emb:*"):
        cached_vec = pickle.loads(bytes.fromhex(r.get(key)))
        similarity = cosine_similarity(query_vec, cached_vec)

        if similarity > 0.9:
            return r.get(key.replace("emb:", "resp:"))

    response = llm.invoke(prompt).content
    r.set(f"resp:{prompt}", response)
    return response
```

---

### Cache Invalidation

#### Demonstration

```python
def invalidate(key):
    r.delete(key)
```

Invalidate when:

* Prompt templates change
* LLM model version changes
* Knowledge base updates

---

### TTL & Eviction

Redis supports:

* TTL expiration
* LRU / LFU eviction
* Memory limits

#### Demonstration

```python
r.set("key", "value", ex=600)  # expires in 10 minutes
```

---

### Production Best Practices

| Rule                         | Purpose               |
| ---------------------------- | --------------------- |
| Use namespaced keys          | Avoid collisions      |
| Include model version in key | Prevent stale reuse   |
| Apply TTL                    | Prevent outdated data |
| Monitor hit ratio            | Measure performance   |
| Use Redis Cluster            | Horizontal scaling    |

---

### Mental Model

```
Redis Cache = Ultra-fast shared memory for your entire system
```

---

### Key Takeaways

* Redis is the backbone cache for scalable AI systems
* Supports response caching, embedding caching, retrieval caching
* Enables low-latency and cost-efficient pipelines
* Mandatory for production-grade LLM systems