# Lab 3: Deploying the Model as a FastAPI Microservice

Welcome to Lab 3! We have trained and registered a model. Now it's time to deploy it. In this lab, we will wrap our model in a **FastAPI** microservice, making it accessible via an API. We'll also build in resilience using a **circuit breaker**.

## Learning Objectives

By the end of this lab, you will be able to:
- Understand the principles of serving models via a REST API.
- Build a high-performance API using FastAPI.
- Load a model directly from the MLflow Model Registry.
- Implement the circuit breaker pattern to make your service more resilient.
- Use automatic API documentation to test your endpoint.

### 1. Setup: Installing Dependencies

For this lab, we need `fastapi` to build the API, `uvicorn` to run it, `pydantic` for data validation, `pybreaker` for the circuit breaker, and `mlflow` and `scikit-learn` to load and use our model.

In [None]:
%pip install fastapi uvicorn pydantic pybreaker mlflow "scikit-learn~=1.0.0"

**Note on Scikit-Learn Version:** It's crucial that the Python environment running the API has the same major/minor version of `scikit-learn` that was used to train the model. Mismatched versions can cause errors when loading the model. PyCaret often uses a specific version, so we pin it here to be safe.

### 2. Key Concepts

#### What is FastAPI?
FastAPI is a modern, high-performance web framework for building APIs with Python. It's built on standards like OpenAPI and JSON Schema, which means you get automatic, interactive API documentation (like Swagger UI) for free.

#### What is a Circuit Breaker?
A circuit breaker is a design pattern used to detect failures and prevent a failing service from being constantly called. Imagine our API can't reach the MLflow server to load the model. Instead of timing out on every request, the circuit breaker will "trip" after a few failures. Once tripped, all subsequent requests will fail immediately for a set period, giving the failing service (MLflow) time to recover. This makes our API more resilient and responsive.

### 3. IMPORTANT: Prepare the Model in MLflow

Our API is configured to load the model from the **"Production"** stage in the MLflow Model Registry. Before you can run the API, you must promote your model to this stage.

1. **Start the MLflow UI** if it's not already running (`mlflow ui` from the project root).
2. **Go to the "Models" tab** and click on your `churn-classifier` model.
3. **Select the latest version** (e.g., Version 1).
4. In the top right, find the **"Stage"** dropdown.
5. **Select "Transition to Production"** and add an optional comment. 

Once this is done, your model is ready to be served by the API.

### 4. The FastAPI Application (`app/main.py`)

The complete code for our API is in the `app/main.py` file. We will not run it in this notebook, but we will break down its key components here. Please open the file and follow along.

#### 4.1 Pydantic Input Model
To ensure that the data sent to our API is valid, we define a Pydantic model. It specifies the data types and structure of the input JSON.
```python
# From app/main.py
class CustomerFeatures(BaseModel):
    Age: int
    Tenure: int
    Balance: float
    NumOfProducts: int
    HasCrCard: int
    IsActiveMember: int
    EstimatedSalary: float
    Gender: str
    Geography: str
```

#### 4.2 Loading the Model with a Circuit Breaker
We define a function to load the model from MLflow and wrap it with the `@model_breaker` decorator. If this function fails 3 times in a row, the breaker will trip and won't try again for 60 seconds.
```python
# From app/main.py
model_breaker = CircuitBreaker(fail_max=3, reset_timeout=60)
model = None

@model_breaker
def load_model():
    global model
    model_uri = f"models:/{MODEL_NAME}/{MODEL_STAGE}"
    model = mlflow.pyfunc.load_model(model_uri)
```

#### 4.3 The Prediction Endpoint
This is the core of our API. It's a POST endpoint at `/predict` that takes the `CustomerFeatures` as input, performs the same feature engineering as in training, and returns the model's prediction.
```python
# From app/main.py
@app.post("/predict", tags=["Prediction"])
def predict_churn(features: CustomerFeatures):
    # ... (error handling for model loading) ...
    input_dict = features.dict()
    # Feature engineering must match training!
    input_dict['BalanceSalaryRatio'] = input_dict['Balance'] / (input_dict['EstimatedSalary'] + 0.01)
    input_df = pd.DataFrame([input_dict])
    prediction = model.predict(input_df)
    churn_status = "Churn" if int(prediction[0]) == 1 else "Stay"
    return {
        "prediction": churn_status,
        "prediction_label": int(prediction[0])
    }
```

### 5. Running and Interacting with the API

Now, let's run our API and test it.

#### 5.1 Run the API Server
1. **Open a new terminal or command prompt.**
2. **Navigate to the `app` directory of this lab:** `cd advanced-mlops-tutorial/labs/lab3-api-fastapi/app`
3. **Run the command:** `uvicorn main:app --reload`

You should see output indicating that the server is running.

#### 5.2 Use the Interactive Docs (Swagger UI)
1. **Open your browser** and go to `http://localhost:8000/docs`.
2. You will see the interactive Swagger UI for your API.
3. **Expand the `/predict` endpoint** and click **"Try it out"**.
4. **Fill in the example JSON** with some customer data.
5. Click **"Execute"**. You should see the prediction response from the API!

This is one of FastAPI's most powerful features for development and testing.

### 6. Conclusion

Fantastic! You have successfully deployed your machine learning model as a robust, production-ready microservice. You've learned how to serve models via an API, handle data validation, and build in resilience.

In the final lab, we will address a crucial question: How do we know if our deployed model is still performing well over time? We will explore model monitoring and drift detection.