Shakespeare Model Deployment - README

Project Overview

This project provides a production-ready FastAPI application that serves a Hugging Face Shakespeare language model with:

✅ Text generation endpoint (/generate)
✅ Inference metrics tracking (/metrics)
✅ Health check endpoint (/health)
✅ Automatic API documentation (/docs)
✅ Docker containerization
✅ Google Cloud deployment configuration

Files Included

main.py - FastAPI application with model serving and metrics
requirements.txt - Python dependencies
Dockerfile - Container configuration
app.yaml - Google Cloud App Engine configuration
deploy.sh - Automated deployment script
DEPLOYMENT_GUIDE.md - Detailed deployment instructions

Quick Start

Local Testing

Install dependencies:
```
pip install -r requirements.txt
```
Run the app:
```
python main.py
```
The API will be available at http://localhost:8080
View documentation: Open your browser to http://localhost:8080/docs

Test endpoints:

# Health check
curl http://localhost:8080/health

# Generate text
curl -X POST http://localhost:8080/generate \
  -H "Content-Type: application/json" \
  -d '{"prompt": "To be", "max_length": 50}'

# Get metrics
curl http://localhost:8080/metrics

Deploy to Google Cloud

Set your project ID:
```
export PROJECT_ID="your-gcp-project-id"
```
Run deployment script:
```
chmod +x deploy.sh
./deploy.sh
```
Choose deployment method:
- Option 1: App Engine (simple, auto-scaling, no containers needed)
- Option 2: Cloud Run (containerized, fine-grained control)
- Option 3: Both

API Endpoints

POST /generate

Generate text based on a prompt.

Request:

{
  "prompt": "To be or not to be",
  "max_length": 100,
  "temperature": 0.7,
  "top_p": 0.9
}

Response:

{
  "prompt": "To be or not to be",
  "generated_text": "To be or not to be, that is the question...",
  "tokens_generated": 45
}

GET /metrics

Get inference statistics.

Response:

{
  "total_inferences": 42,
  "service_status": "running"
}

GET /health

Health check endpoint.

Response:

{
  "status": "healthy",
  "model_loaded": true
}

GET /

Service information.

Customization

Using Your Own Model

Edit main.py line 33:

model_name = "/path/to/your/model"  # Update this path

Or use any Hugging Face model:

model_name = "gpt2"  # or any other HF model

Adjusting Resources

In app.yaml:

automatic_scaling:
  min_instances: 1
  max_instances: 10

resources:
  memory_gb: 4
  cpu: 2

Environment Variables

Add environment variables to app.yaml:

env_variables:
  MY_VAR: "value"
  PYTHONUNBUFFERED: "true"

Monitoring & Logs

View Logs

gcloud app logs read -n 100

Monitor Metrics

Visit Cloud Console > App Engine > Metrics
Check requests, errors, and latency

Stream Logs

gcloud app logs read -f

Troubleshooting

Model Download Issues

The first deployment may take time as the model is downloaded. Check logs:

gcloud app logs read

Out of Memory

Increase memory in app.yaml:

resources:
  memory_gb: 8  # Increase from 4

Slow Performance

Increase number of worker processes in Dockerfile
Increase min_instances in app.yaml
Consider using a lighter model

Cost Estimation

App Engine (Standard)

Free tier: 28 instance-hours/day
Pay-as-you-go: ~$0.05/instance-hour
Example: 1 instance 24/7 ≈ $36/month

Cloud Run

Free tier: 180,000 vCPU-seconds/month
Pay-as-you-go: ~$0.00002400/vCPU-second
Example: 1M requests/month ≈ $20-50/month

Next Steps

Update the model path to your specific Shakespeare model
Add authentication if needed (API keys, OAuth2)
Set up monitoring and alerts
Configure auto-scaling based on your traffic
Add request logging for analytics
Set up CI/CD pipeline for automated deployments

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Shakespeare Model Deployment - README

Project Overview

Files Included

Quick Start

Local Testing

Deploy to Google Cloud

API Endpoints

POST /generate

GET /metrics

GET /health

GET /

Customization

Using Your Own Model

Adjusting Resources

Environment Variables

Monitoring & Logs

View Logs

Monitor Metrics

Stream Logs

Troubleshooting

Model Download Issues

Out of Memory

Slow Performance

Cost Estimation

App Engine (Standard)

Cloud Run

Next Steps

Support & Resources

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
.gitattributes		.gitattributes
DEPLOYMENT_GUIDE.md		DEPLOYMENT_GUIDE.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
app.yaml		app.yaml
deploy.sh		deploy.sh
main.py		main.py
requirements.txt		requirements.txt

Folders and files

Latest commit

History

Repository files navigation

Shakespeare Model Deployment - README

Project Overview

Files Included

Quick Start

Local Testing

Deploy to Google Cloud

API Endpoints

POST /generate

GET /metrics

GET /health

GET /

Customization

Using Your Own Model

Adjusting Resources

Environment Variables

Monitoring & Logs

View Logs

Monitor Metrics

Stream Logs

Troubleshooting

Model Download Issues

Out of Memory

Slow Performance

Cost Estimation

App Engine (Standard)

Cloud Run

Next Steps

Support & Resources

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages