```{contents}
```
## Online Learning from Feedback


**Online learning from feedback** is the process where an AI system **continuously improves** using **real user interactions and evaluations** while the system is running in production.

Instead of training only offline:

```
Train ‚Üí Deploy ‚Üí Freeze
```

The system becomes:

```
Deploy ‚Üí Collect Feedback ‚Üí Learn ‚Üí Improve ‚Üí Repeat
```

---

### Why Online Learning Is Important

| Without Online Learning | With Online Learning   |
| ----------------------- | ---------------------- |
| Stagnant behavior       | Continuous improvement |
| User dissatisfaction    | Personalized responses |
| Delayed fixes           | Immediate adaptation   |
| Manual tuning           | Automated optimization |

---

### Feedback Types

| Source         | Examples                  |
| -------------- | ------------------------- |
| Explicit       | üëç / üëé ratings, comments |
| Implicit       | Clicks, dwell time, edits |
| Human review   | Expert scoring            |
| System signals | Errors, retries           |

---

### Online Learning Architecture

```
User Interaction
   ‚Üì
Feedback Collector
   ‚Üì
Evaluation Engine
   ‚Üì
Model / Prompt Optimizer
   ‚Üì
Updated System
```

---

### Feedback Collection

#### Demonstration

```python
feedback_log = []

def collect_feedback(user_id, answer, rating):
    feedback_log.append({
        "user": user_id,
        "answer": answer,
        "rating": rating
    })
```

---

### Performance Scoring

```python
def compute_quality():
    scores = [f["rating"] for f in feedback_log]
    return sum(scores) / len(scores)
```

---

### Prompt Optimization Loop

```python
if compute_quality() < 0.7:
    update_prompt()
```

---

### Example: Retrieval Optimization

```python
def adjust_retriever(feedback):
    if feedback["rating"] < 0.5:
        retriever.increase_recall()
```

---

### Human-in-the-Loop Integration

```python
def route_to_human(query):
    if risk_detected(query):
        return human_review(query)
```

---

### Safety & Stability Controls

| Control            | Purpose             |
| ------------------ | ------------------- |
| Versioning         | Rollback safety     |
| Canary rollout     | Prevent regressions |
| Offline evaluation | Protect quality     |
| Audit logs         | Trace learning      |

---

### Mental Model

```
Online Learning = AI learning from experience in real time
```

---

### Key Takeaways

* Transforms static AI into adaptive systems
* Requires strong monitoring and rollback
* Improves personalization and accuracy
* Essential for high-quality long-term AI systems