
# APIs: Where Production Data Pipelines Begin  
**#25DaysOfDataTech | Python to Production**  
By Prerna Joshi

---

This notebook focuses on **APIs from a production data & ML perspective**.

You will learn:
- What an API is (clearly, without buzzwords)
- Why APIs are critical in data systems
- APIs vs Notebooks vs Scripts
- How FastAPI + Pydantic enforce contracts
- What happens when validation fails (important!)
- How Pandera validates data *inside* pipelines
- Where validation happens in the architecture

---



## 1. What is an API?

An **API (Application Programming Interface)** is a **contract** that defines:
- What data you can send
- In what format
- What response you will receive

APIs provide **controlled, programmatic access** to systems.



## 2. Why APIs Matter in Production Data Systems

Production systems:
- Receive data continuously
- Require automation
- Cannot depend on manual notebooks

APIs enable:
- Real-time ingestion
- Pipeline triggering
- Model serving
- Service-to-service communication

> Most production data pipelines don‚Äôt start with CSVs ‚Äî they start with APIs.



## 3. Where Validation Happens in Data Architecture

Client  
‚Üí **API (Pydantic validation)**  
‚Üí **Data pipeline (Pandera validation)**  
‚Üí Storage / Models  
‚Üí API (serve outputs)

Validation at multiple layers prevents silent data corruption.



## 4. APIs vs Notebooks vs Scripts

| Aspect | Notebooks | Scripts | APIs |
|------|----------|--------|-----|
| Interaction | Manual | Semi-automated | Fully automated |
| Validation | Weak | Manual | Strong (schemas) |
| Scalability | Low | Medium | High |
| Production Ready | ‚ùå | ‚ö†Ô∏è | ‚úÖ |
| Best Use | EDA | Batch jobs | Live systems |

Notebooks explore. Scripts automate. **APIs operationalize.**



## 5. Why FastAPI?

FastAPI:
- Uses **Pydantic** for validation
- Enforces request & response schemas
- Auto-generates **interactive API docs**
- Is async-ready
- Fits data & ML workloads naturally

üìå FastAPI automatically creates interactive docs at:
- `/docs` (Swagger UI)
- `/redoc`


In [1]:

# Install dependencies (run once)
# !pip install fastapi uvicorn pydantic pandera pandas



## 6. Basic FastAPI App

üìå Run this app from the **terminal**, not inside the notebook:
```
uvicorn main:app --reload
```


In [2]:

from fastapi import FastAPI

app = FastAPI(title="Day 12 - APIs for Data Pipelines")

@app.get("/health")
def health():
    return {"status": "running"}



## 7. Data Ingestion API (Pydantic Validation)


In [3]:

from pydantic import BaseModel, Field
from typing import Literal

class Transaction(BaseModel):
    user_id: int
    amount: float = Field(gt=0)
    currency: Literal["USD", "INR", "EUR"]
    location: str


In [4]:

@app.post("/ingest")
def ingest(txn: Transaction):
    return {"message": "Transaction accepted", "data": txn}



### ‚ùå What happens when validation fails? (CRITICAL)

Example invalid request:
```json
{
  "user_id": 1,
  "amount": -50,
  "currency": "CRYPTO",
  "location": "NY"
}
```

FastAPI automatically returns:
- HTTP 422 status
- Clear error messages
- Field-level validation details

üëâ This happens **before your business logic runs**.



## 8. Async API Example (Why FastAPI Scales)


In [5]:

@app.post("/ingest-async")
async def ingest_async(txn: Transaction):
    # Async-ready for IO-heavy operations (DB, Kafka, APIs)
    return {"message": "Async transaction accepted"}



## 9. Pydantic + Pandera Together (Concrete Example)


In [6]:

import pandas as pd
import pandera as pa
from pandera import Column, Check

# Pandera schema used INSIDE the pipeline
transaction_schema = pa.DataFrameSchema({
    "user_id": Column(int, nullable=False),
    "amount": Column(float, Check.gt(0)),
    "currency": Column(str, Check.isin(["USD", "INR", "EUR"])),
    "location": Column(str, nullable=False)
})


top-level pandera module will be **removed in a future version of pandera**.
If you're using pandera to validate pandas objects, we highly recommend updating
your import:

```
# old import
import pandera as pa

# new import
import pandera.pandas as pa
```

If you're using pandera to validate objects from other compatible libraries
like pyspark or polars, see the supported libraries section of the documentation
for more information on how to import pandera:

https://pandera.readthedocs.io/en/stable/supported_libraries.html


```
```



In [7]:

@app.post("/ingest-with-pandera")
def ingest_with_validation(txn: Transaction):
    # Pydantic validates at API boundary
    df = pd.DataFrame([txn.model_dump()])
    
    # Pandera validates inside pipeline
    transaction_schema.validate(df)
    
    return {"message": "Transaction validated at all layers"}



### Why this matters
- Pydantic stops bad requests at the API boundary
- Pandera stops bad tabular data inside pipelines
- Multiple validation layers = production reliability



## 10. Prediction API Example


In [8]:

class PredictionRequest(BaseModel):
    age: int = Field(gt=17, lt=66)
    salary: int = Field(gt=0)

class PredictionResponse(BaseModel):
    risk_score: float
    approved: bool


In [9]:

@app.post("/predict", response_model=PredictionResponse)
def predict(req: PredictionRequest):
    risk = 0.8 if req.salary < 50000 else 0.2
    return PredictionResponse(risk_score=risk, approved=risk < 0.5)



## 11. Summary

- APIs are the **entry point** to production data systems
- FastAPI enforces correctness by default
- Pydantic validates inputs at system boundaries
- Pandera validates tabular data inside pipelines
- Validation failures should be **expected, explicit, and early**

