# Car Recognition Model Deployment

This notebook documents the process of saving and deploying our car recognition model as an API using Docker, Kubernetes, and Google Cloud. We'll walk through all the necessary steps from exporting the trained model to deploying it as a scalable API service.

## Step 1: Export the Trained Model

First, we need to export our trained model to a format suitable for production. We'll use the `export_model.py` script which handles loading the trained weights and saving the model in TensorFlow's SavedModel format.

In [None]:
# Run this to export the model (or use terminal: python export_model.py --model-type transfer)
!python export_model.py --model-type transfer

This script performs several key actions:

1. Rebuilds the model architecture based on the specified type (transfer learning with ResNet50V2)
2. Loads the trained weights from the checkpoint files
3. Saves the model in TensorFlow's SavedModel format
4. Exports class names to a JSON file for prediction mapping

The exported model is saved in `saved_model/transfer/` directory.

## Step 2: Build the Flask API Server

Now we need a REST API server to serve predictions from our model. We have created an `app.py` file that uses Flask to set up endpoints:

In [None]:
# Display key parts of the Flask API server
!head -n 30 app.py

The Flask API server provides:

- `/predict` endpoint that accepts image uploads and returns car make/model predictions
- `/health` endpoint for Kubernetes health checks
- Proper error handling and image preprocessing
- Loading and using the saved TensorFlow model

You can test the API locally before containerization:

In [None]:
# This cell would run the Flask app locally (commented out as it blocks notebook execution)
# !python app.py

# In a separate terminal, you could test with:
# curl -X POST -F "file=@cars_test/cars_test/00001.jpg" http://localhost:5000/predict

## Step 3: Containerize the Application with Docker

Next, we will containerize our application using Docker. The Dockerfile defines the environment and dependencies required to run our application.

In [None]:
# Display the Dockerfile
!cat Dockerfile

The Dockerfile includes:

1. A TensorFlow base image 
2. Installation of system and Python dependencies
3. Copying of application code and the saved model
4. Environment variable configuration
5. Command to run the Flask application

To build and test the Docker image locally:

In [None]:
# Build the Docker image
# !docker build -t car-recognition-api:latest .

# Run the container locally
# !docker run -p 8080:8080 car-recognition-api:latest

## Step 4: Kubernetes Configuration

For deploying to Kubernetes, we've prepared configuration files that define our deployment, service, and horizontal pod autoscaler.

In [None]:
# Display the Kubernetes deployment configuration
!cat kubernetes/deployment.yaml

The Kubernetes configuration includes:

1. A deployment with resource limits and requests
2. A service to expose the API to external traffic
3. A horizontal pod autoscaler to scale based on CPU utilization
4. Health check probes to ensure container stability

Note that `[PROJECT_ID]` in the deployment file needs to be replaced with your actual Google Cloud project ID before deployment.

## Step 5: Google Cloud Platform Deployment

Finally, we deploy to Google Cloud Platform using GKE (Google Kubernetes Engine). We've created a deployment script to automate the process.

In [None]:
# Display the deployment script
!cat deploy.sh

The deployment script automates the following steps:

1. Exporting the model (if not already done)
2. Building and tagging the Docker image
3. Pushing the image to Google Container Registry
4. Creating a GKE cluster (if it doesn't exist)
5. Deploying the application to Kubernetes
6. Waiting for and displaying the external IP address

To run the deployment script (after editing with your GCP project ID):

In [None]:
# Set execute permission and run the deployment script
# !chmod +x deploy.sh
# !./deploy.sh

# Note: Before running, edit deploy.sh to set your GCP project ID and preferred region

## Step 6: Testing the Deployed API

Once deployed, you can test the API by sending HTTP requests to the provided external IP:

In [None]:
import requests
import json
import matplotlib.pyplot as plt
import numpy as np
from PIL import Image
import io

# Replace with your actual deployed API URL
API_URL = "http://[YOUR_EXTERNAL_IP]/predict"

def test_car_recognition_api(image_path):
    """Test the deployed car recognition API with a local image"""
    # Load and display the image
    img = Image.open(image_path)
    plt.figure(figsize=(10, 6))
    plt.imshow(np.array(img))
    plt.axis('off')
    plt.title('Test Image')
    plt.show()
    
    # Prepare the image file for upload
    with open(image_path, 'rb') as f:
        files = {'file': (os.path.basename(image_path), f, 'image/jpeg')}
        
        try:
            # Send POST request to the API
            response = requests.post(API_URL, files=files)
            
            # Check if the request was successful
            if response.status_code == 200:
                # Parse the JSON response
                result = response.json()
                
                # Display the top prediction
                print(f"Top Prediction: {result['top_prediction']['class']}")
                print(f"Confidence: {result['top_prediction']['confidence']:.2f}%")
                
                # Display all predictions
                print("\nAll Predictions:")
                for i, pred in enumerate(result['predictions']):
                    print(f"{i+1}. {pred['class']} - {pred['confidence']:.2f}%")
                    
                return result
            else:
                print(f"Error: API request failed with status code {response.status_code}")
                print(response.text)
                
        except Exception as e:
            print(f"Error connecting to API: {e}")
            
# Uncomment and run this to test with one of your test images
# test_car_recognition_api('cars_test/cars_test/00001.jpg')

## Step 7: Monitoring and Scaling

After deployment, you can monitor your application and scale it as needed using Google Cloud Console or kubectl commands:

In [None]:
# Commands to monitor your deployment (run these in your terminal)
'''
# Get all pods
kubectl get pods

# Check the horizontal pod autoscaler
kubectl get hpa

# See detailed information about the deployment
kubectl describe deployment car-recognition-api

# View logs from a specific pod (replace pod-name with actual pod name)
kubectl logs pod-name
'''

## Step 8: Clean Up Resources

When you're done with the deployment, clean up the resources to avoid unnecessary charges:

In [None]:
# Commands to clean up resources (run these in your terminal)
'''
# Delete the Kubernetes deployment, service, and HPA
kubectl delete -f kubernetes/deployment.yaml

# Delete the GKE cluster
gcloud container clusters delete car-recognition-cluster --region=us-central1

# Delete the container images
gcloud container images delete gcr.io/[YOUR_PROJECT_ID]/car-recognition-api:latest --force-delete-tags
'''

## Conclusion

We have successfully deployed our car recognition model as a scalable API service using Docker, Kubernetes, and Google Cloud Platform. The deployment architecture provides:

- Scalability through Kubernetes and Horizontal Pod Autoscaler
- High availability with multiple replicas
- Resource efficiency through containerization
- Health monitoring for stability
- Easy updates and rollbacks

This deployment is production-ready and can handle substantial API traffic with automatic scaling based on demand.