Skip to content

hamidmatiny/mlops_deployment

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Shakespeare Model Deployment - README

Project Overview

This project provides a production-ready FastAPI application that serves a Hugging Face Shakespeare language model with:

  • ✅ Text generation endpoint (/generate)
  • ✅ Inference metrics tracking (/metrics)
  • ✅ Health check endpoint (/health)
  • ✅ Automatic API documentation (/docs)
  • ✅ Docker containerization
  • ✅ Google Cloud deployment configuration

Files Included

  • main.py - FastAPI application with model serving and metrics
  • requirements.txt - Python dependencies
  • Dockerfile - Container configuration
  • app.yaml - Google Cloud App Engine configuration
  • deploy.sh - Automated deployment script
  • DEPLOYMENT_GUIDE.md - Detailed deployment instructions

Quick Start

Local Testing

  1. Install dependencies:

    pip install -r requirements.txt
  2. Run the app:

    python main.py

    The API will be available at http://localhost:8080

  3. View documentation: Open your browser to http://localhost:8080/docs

  4. Test endpoints:

    # Health check
    curl http://localhost:8080/health
    
    # Generate text
    curl -X POST http://localhost:8080/generate \
      -H "Content-Type: application/json" \
      -d '{"prompt": "To be", "max_length": 50}'
    
    # Get metrics
    curl http://localhost:8080/metrics

Deploy to Google Cloud

  1. Set your project ID:

    export PROJECT_ID="your-gcp-project-id"
  2. Run deployment script:

    chmod +x deploy.sh
    ./deploy.sh
  3. Choose deployment method:

    • Option 1: App Engine (simple, auto-scaling, no containers needed)
    • Option 2: Cloud Run (containerized, fine-grained control)
    • Option 3: Both

API Endpoints

POST /generate

Generate text based on a prompt.

Request:

{
  "prompt": "To be or not to be",
  "max_length": 100,
  "temperature": 0.7,
  "top_p": 0.9
}

Response:

{
  "prompt": "To be or not to be",
  "generated_text": "To be or not to be, that is the question...",
  "tokens_generated": 45
}

GET /metrics

Get inference statistics.

Response:

{
  "total_inferences": 42,
  "service_status": "running"
}

GET /health

Health check endpoint.

Response:

{
  "status": "healthy",
  "model_loaded": true
}

GET /

Service information.

Customization

Using Your Own Model

Edit main.py line 33:

model_name = "/path/to/your/model"  # Update this path

Or use any Hugging Face model:

model_name = "gpt2"  # or any other HF model

Adjusting Resources

In app.yaml:

automatic_scaling:
  min_instances: 1
  max_instances: 10

resources:
  memory_gb: 4
  cpu: 2

Environment Variables

Add environment variables to app.yaml:

env_variables:
  MY_VAR: "value"
  PYTHONUNBUFFERED: "true"

Monitoring & Logs

View Logs

gcloud app logs read -n 100

Monitor Metrics

  • Visit Cloud Console > App Engine > Metrics
  • Check requests, errors, and latency

Stream Logs

gcloud app logs read -f

Troubleshooting

Model Download Issues

The first deployment may take time as the model is downloaded. Check logs:

gcloud app logs read

Out of Memory

Increase memory in app.yaml:

resources:
  memory_gb: 8  # Increase from 4

Slow Performance

  • Increase number of worker processes in Dockerfile
  • Increase min_instances in app.yaml
  • Consider using a lighter model

Cost Estimation

App Engine (Standard)

  • Free tier: 28 instance-hours/day
  • Pay-as-you-go: ~$0.05/instance-hour
  • Example: 1 instance 24/7 ≈ $36/month

Cloud Run

  • Free tier: 180,000 vCPU-seconds/month
  • Pay-as-you-go: ~$0.00002400/vCPU-second
  • Example: 1M requests/month ≈ $20-50/month

Next Steps

  1. Update the model path to your specific Shakespeare model
  2. Add authentication if needed (API keys, OAuth2)
  3. Set up monitoring and alerts
  4. Configure auto-scaling based on your traffic
  5. Add request logging for analytics
  6. Set up CI/CD pipeline for automated deployments

Support & Resources

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors