# Hugging Face Model Deployment

This notebook provides a comprehensive pipeline for downloading the best performing model artifacts from MLflow and preparing them for deployment on Hugging Face Hub.

## 🎯 Overview
- **Objective**: Download and prepare the best BERT model for Hugging Face deployment
- **MLflow Run ID**: `ff204ab808384e77a8b1e40f56a3fd2a` (BERT-base-uncased, batch size 8)
- **Model Performance**: 92.50% F1-Score, 92.33% Accuracy
- **Target Platform**: Hugging Face Hub for public model sharing

## 📋 Features
- ✅ **Environment Detection**: Automatically detects Google Colab vs local environment
- ✅ **Configuration Management**: Centralized configuration with validation
- ✅ **Error Handling**: Comprehensive error handling and validation
- ✅ **Model Testing**: Built-in model testing with sample texts
- ✅ **Multiple Upload Methods**: API and manual git upload options
- ✅ **Usage Examples**: Production-ready code examples
- ✅ **Documentation**: Auto-generated README and model card

## 🚀 Quick Start
1. **Configure**: Update the configuration parameters in Cell 2
2. **Setup**: Run all cells to prepare the model
3. **Upload**: Uncomment and run the upload function
4. **Deploy**: Your model will be available on Hugging Face Hub

## 📁 Output Files
- `config.json`: Model configuration
- `README.md`: Model documentation and usage examples
- `pytorch_model.bin`: Model weights
- `tokenizer.json`: Tokenizer configuration
- `tokenizer_config.json`: Tokenizer settings

## 🔧 Requirements
- MLflow access (Databricks or local)
- Hugging Face account and token
- Python 3.8+ with required packages


## 1. Configuration and Setup

### Configuration Parameters
Configure these parameters based on your environment and requirements:


In [None]:
# Configuration Parameters
# ========================

# MLflow Configuration
BEST_RUN_ID = "ff204ab808384e77a8b1e40f56a3fd2a"  # Update with your best model run ID
MLFLOW_TRACKING_URI = "databricks"  # Options: "databricks", "file:./mlruns", or your custom URI

# Model Configuration
MODEL_NAME = "bert-base-uncased"  # Base model name
NUM_LABELS = 4
LABELS = [
    "Regenerative & Eco-Tourism",
    "Integrated Wellness", 
    "Immersive Culinary",
    "Off-the-Beaten-Path Adventure"
]

# Path Configuration
BASE_PATH = "/content/drive/MyDrive/SerendipTravel"  # Update for your environment
MODEL_ARTIFACTS_PATH = f"{BASE_PATH}/model_artifacts"
HF_MODEL_PATH = f"{BASE_PATH}/model_artifacts/hf_model"

# Hugging Face Configuration
HF_MODEL_NAME = "your-username/serendip-travel-classifier"  # Update with your HF username
HF_REPO_TYPE = "model"
HF_PRIVATE = False

# Environment Detection
IS_COLAB = False  # Set to True if running in Google Colab

print("✅ Configuration loaded successfully!")
print(f"📊 Target Run ID: {BEST_RUN_ID}")
print(f"🏷️  Model Name: {HF_MODEL_NAME}")
print(f"📁 Base Path: {BASE_PATH}")



In [None]:
# Install required packages
%pip install mlflow huggingface_hub transformers torch databricks-sdk

In [None]:
# Utility Functions
# =================

def validate_configuration():
    """Validate all configuration parameters"""
    errors = []
    warnings = []
    
    # Check required parameters
    if not BEST_RUN_ID or BEST_RUN_ID == "your-run-id":
        errors.append("BEST_RUN_ID not configured")
    
    if HF_MODEL_NAME == "your-username/serendip-travel-classifier":
        warnings.append("HF_MODEL_NAME not updated - using placeholder")
    
    if not os.path.exists(BASE_PATH) and not IS_COLAB:
        warnings.append(f"BASE_PATH does not exist: {BASE_PATH}")
    
    # Check MLflow connection
    try:
        mlflow.search_experiments(max_results=1)
    except:
        warnings.append("MLflow connection not available")
    
    # Print results
    if errors:
        print("❌ Configuration Errors:")
        for error in errors:
            print(f"  - {error}")
    
    if warnings:
        print("⚠️  Configuration Warnings:")
        for warning in warnings:
            print(f"  - {warning}")
    
    if not errors and not warnings:
        print("✅ Configuration validation passed!")
    
    return len(errors) == 0

def print_configuration():
    """Print current configuration for review"""
    print("=" * 60)
    print("📋 CURRENT CONFIGURATION")
    print("=" * 60)
    print(f"MLflow Run ID: {BEST_RUN_ID}")
    print(f"MLflow URI: {MLFLOW_TRACKING_URI}")
    print(f"Model Name: {MODEL_NAME}")
    print(f"HF Model Name: {HF_MODEL_NAME}")
    print(f"Base Path: {BASE_PATH}")
    print(f"Model Artifacts Path: {MODEL_ARTIFACTS_PATH}")
    print(f"HF Model Path: {HF_MODEL_PATH}")
    print(f"Is Colab: {IS_COLAB}")
    print(f"Labels: {LABELS}")
    print("=" * 60)

# Validate configuration
validate_configuration()
print_configuration()


In [None]:
# Import Required Libraries
# =========================

# Standard library imports
import os
import json
import warnings

# Third-party imports
import torch
import mlflow
import mlflow.pytorch
from dotenv import load_dotenv

# Transformers imports
from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification
)

# Hugging Face Hub imports
from huggingface_hub import HfApi, create_repo

# Suppress warnings for cleaner output
warnings.filterwarnings('ignore')

# Version information
print("📦 Libraries imported successfully")
print(f"  PyTorch: {torch.__version__}")
print(f"  MLflow: {mlflow.__version__}")

# Check for optional dependencies
try:
    from huggingface_hub import whoami
    print("  Hugging Face Hub: ✅ Available")
except ImportError:
    print("  Hugging Face Hub: ❌ Not available")

try:
    import mlflow
    print("  MLflow: ✅ Available")
except ImportError:
    print("  MLflow: ❌ Not available")


Libraries imported successfully
PyTorch version: 2.8.0+cu126
MLflow version: 3.4.0


In [None]:
# Environment Setup and Validation
# =================================

def setup_environment():
    """Setup environment variables and detect runtime environment"""
    global IS_COLAB, BASE_PATH, MODEL_ARTIFACTS_PATH, HF_MODEL_PATH
    
    # Load environment variables
    load_dotenv()
    
    # Detect environment
    try:
        from google.colab import drive, userdata
        IS_COLAB = True
        print("🟢 Google Colab environment detected")
        
        # Mount drive if not already mounted
        try:
            drive.mount('/content/drive', force_remount=False)
        except:
            print("⚠️  Drive already mounted or mounting failed")
            
        # Get Databricks credentials from Colab secrets
        DATABRICKS_HOST = userdata.get("DATABRICKS_HOST")
        DATABRICKS_TOKEN = userdata.get("DATABRICKS_TOKEN")
        
    except ImportError:
        IS_COLAB = False
        print("🟢 Local environment detected")
        
        # Get Databricks credentials from environment variables
        DATABRICKS_HOST = os.getenv("DATABRICKS_HOST")
        DATABRICKS_TOKEN = os.getenv("DATABRICKS_TOKEN")
        
        # Update paths for local environment
        BASE_PATH = os.path.join(os.getcwd(), "model_artifacts")
        MODEL_ARTIFACTS_PATH = BASE_PATH
        HF_MODEL_PATH = os.path.join(BASE_PATH, "hf_model")
    
    # Set Databricks environment variables
    if DATABRICKS_HOST and DATABRICKS_TOKEN:
        os.environ["DATABRICKS_HOST"] = DATABRICKS_HOST
        os.environ["DATABRICKS_TOKEN"] = DATABRICKS_TOKEN
        print("✅ Databricks credentials configured")
    else:
        print("⚠️  Databricks credentials not found - will use local MLflow")
    
    # Create directories
    os.makedirs(MODEL_ARTIFACTS_PATH, exist_ok=True)
    os.makedirs(HF_MODEL_PATH, exist_ok=True)
    
    print(f"📁 Model artifacts path: {MODEL_ARTIFACTS_PATH}")
    print(f"📁 HF model path: {HF_MODEL_PATH}")
    
    return True

# Setup environment
setup_environment()

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).
Google Colab environment detected


## 2. MLflow Configuration and Model Download


In [None]:
# MLflow Configuration
# ===================

def setup_mlflow():
    """Setup MLflow tracking URI with fallback options"""
    try:
        if MLFLOW_TRACKING_URI == "databricks":
            # Try Databricks first
            mlflow.set_tracking_uri("databricks")
            print("✅ Connected to Databricks MLflow")
        else:
            # Use specified URI
            mlflow.set_tracking_uri(MLFLOW_TRACKING_URI)
            print(f"✅ Connected to MLflow at: {MLFLOW_TRACKING_URI}")
            
        # Test connection
        mlflow.search_experiments(max_results=1)
        return True
        
    except Exception as e:
        print(f"⚠️  Failed to connect to {MLFLOW_TRACKING_URI}: {e}")
        
        # Fallback to local
        try:
            mlflow.set_tracking_uri("file:./mlruns")
            print("✅ Fallback: Using local MLflow tracking")
            return True
        except Exception as e2:
            print(f"❌ Failed to setup local MLflow: {e2}")
            return False

# Setup MLflow
if setup_mlflow():
    print(f"📊 Target MLflow Run ID: {BEST_RUN_ID}")
else:
    print("❌ MLflow setup failed - please check your configuration")


Connected to Databricks MLflow
Target MLflow Run ID: ff204ab808384e77a8b1e40f56a3fd2a


In [None]:
# Model Download from MLflow
# ==========================

def download_mlflow_model(run_id, download_path=None):
    """Download model artifacts from MLflow run with validation"""
    if download_path is None:
        download_path = MODEL_ARTIFACTS_PATH
        
    try:
        # Validate run_id
        if not run_id or not isinstance(run_id, str):
            raise ValueError("Invalid run_id provided")
            
        # Create download directory
        os.makedirs(download_path, exist_ok=True)
        print(f"📁 Download directory: {download_path}")

        # Check if run exists
        try:
            run = mlflow.get_run(run_id)
            print(f"✅ Found MLflow run: {run_id}")
        except Exception as e:
            raise ValueError(f"MLflow run {run_id} not found: {e}")

        # Download the model
        model_uri = f"runs:/{run_id}/model"
        print(f"⬇️  Downloading model from: {model_uri}")

        # Download model artifacts
        model = mlflow.pytorch.load_model(model_uri, dst_path=download_path)

        print(f"✅ Model downloaded successfully to: {download_path}")
        
        # List downloaded files
        if os.path.exists(download_path):
            files = os.listdir(download_path)
            print(f"📄 Downloaded files: {files}")
        
        return download_path, model

    except Exception as e:
        print(f"❌ Error downloading model: {e}")
        return None, None

# Download the best model
model_path, downloaded_model = download_mlflow_model(BEST_RUN_ID)

Downloading model from: runs:/ff204ab808384e77a8b1e40f56a3fd2a/model


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/6 [00:00<?, ?it/s]

Model downloaded successfully to: /content/drive/MyDrive/SerendipTravel/model_artifacts


In [31]:
# Get run information and metrics
def get_run_info(run_id):
    """Get detailed information about the MLflow run"""
    try:
        # Get run details
        run = mlflow.get_run(run_id)

        print("=" * 60)
        print("MLFLOW RUN INFORMATION")
        print("=" * 60)
        print(f"Run ID: {run_id}")
        print(f"Run Name: {run.data.tags.get('mlflow.runName', 'N/A')}")
        print(f"Status: {run.info.status}")
        print(f"Start Time: {run.info.start_time}")
        print(f"End Time: {run.info.end_time}")

        print(f"\nParameters:")
        for key, value in run.data.params.items():
            print(f"  {key}: {value}")

        print(f"\nMetrics:")
        for key, value in run.data.metrics.items():
            print(f"  {key}: {value}")

        print(f"\nTags:")
        for key, value in run.data.tags.items():
            print(f"  {key}: {value}")

        return run

    except Exception as e:
        print(f"Error getting run info: {e}")
        return None

# Get run information
run_info = get_run_info(BEST_RUN_ID)


MLFLOW RUN INFORMATION
Run ID: ff204ab808384e77a8b1e40f56a3fd2a
Run Name: bert-base-uncased-lr2e-05-bs8
Status: FINISHED
Start Time: 1758728157353
End Time: 1758728590008

Parameters:
  batch_size: 8
  epochs: 5
  learning_rate: 2e-05
  model_name: bert-base-uncased

Metrics:
  f1_label_0: 0.9178617992177314
  f1_label_1: 0.9142857142857143
  f1_label_2: 0.9111570247933884
  f1_label_3: 0.9565929565929566
  precision_label_0: 0.9630642954856361
  precision_label_1: 0.9808429118773946
  precision_label_2: 0.9504310344827587
  precision_label_3: 0.9782244556113903
  recall_label_0: 0.8767123287671232
  recall_label_1: 0.8561872909698997
  recall_label_2: 0.875
  recall_label_3: 0.9358974358974359
  test_accuracy: 0.9232673267326733
  test_f1: 0.9249743737224476
  test_precision: 0.9681406743642949
  test_recall: 0.8859492639086146
  train_loss: 0.025108116490579372
  val_accuracy: 0.9156999226604795
  val_f1: 0.9221403795505014
  val_loss: 0.08627338893424122

Tags:
  mlflow.note.content

## 3. Model Preparation for Hugging Face


In [None]:
# Model Preparation for Hugging Face
# ==================================

def prepare_model_for_hf(model_path=None, hf_model_path=None):
    """Prepare the MLflow model for Hugging Face Hub with validation"""
    if model_path is None:
        model_path = MODEL_ARTIFACTS_PATH
    if hf_model_path is None:
        hf_model_path = HF_MODEL_PATH
        
    try:
        # Validate inputs
        if not model_path or not os.path.exists(model_path):
            raise ValueError(f"Model path does not exist: {model_path}")
            
        # Create Hugging Face model directory
        os.makedirs(hf_model_path, exist_ok=True)
        print(f"📁 HF model directory: {hf_model_path}")

        # Load the model from MLflow
        print("🔄 Loading model from MLflow...")
        model_uri = f"runs:/{BEST_RUN_ID}/model"
        model = mlflow.pytorch.load_model(model_uri)

        # Load the tokenizer
        print("🔄 Loading tokenizer...")
        tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)

        # Save model and tokenizer in Hugging Face format
        print(f"💾 Saving model to {hf_model_path}...")
        model.save_pretrained(hf_model_path)
        tokenizer.save_pretrained(hf_model_path)

        print("✅ Model and tokenizer saved successfully!")
        
        # Verify saved files
        if os.path.exists(hf_model_path):
            files = os.listdir(hf_model_path)
            print(f"📄 Saved files: {files}")
        
        return hf_model_path, model, tokenizer

    except Exception as e:
        print(f"❌ Error preparing model: {e}")
        return None, None, None

# Prepare the model
if model_path:
    hf_path, model, tokenizer = prepare_model_for_hf(model_path)
else:
    print("❌ Cannot prepare model - download failed")
    hf_path, model, tokenizer = None, None, None

Loading model from MLflow...


Downloading artifacts:   0%|          | 0/1 [00:00<?, ?it/s]

Downloading artifacts:   0%|          | 0/6 [00:00<?, ?it/s]

Loading tokenizer...
Saving model to /content/drive/MyDrive/SerendipTravel/model_artifacts/hf_model...
Model and tokenizer saved successfully!


In [None]:
# Create model configuration and metadata
def create_model_config(hf_path, run_info):
    """Create configuration files for Hugging Face model"""
    try:
        # Model configuration
        config = {
            "model_type": "bert",
            "architectures": ["BertForSequenceClassification"],
            "num_labels": 4,
            "id2label": {
                "0": "Regenerative & Eco-Tourism",
                "1": "Integrated Wellness",
                "2": "Immersive Culinary",
                "3": "Off-the-Beaten-Path Adventure"
            },
            "label2id": {
                "Regenerative & Eco-Tourism": 0,
                "Integrated Wellness": 1,
                "Immersive Culinary": 2,
                "Off-the-Beaten-Path Adventure": 3
            },
            "problem_type": "multi_label_classification"
        }

        # Save config.json
        with open(f"{hf_path}/config.json", "w") as f:
            json.dump(config, f, indent=2)

        # Extract metrics from run_info
        metrics = run_info.data.metrics if run_info and run_info.data else {}
        f1_score = metrics.get('test_f1', 'N/A')
        accuracy = metrics.get('test_accuracy', 'N/A')
        precision = metrics.get('test_precision', 'N/A')
        recall = metrics.get('test_recall', 'N/A')
        model_name_param = run_info.data.params.get('model_name', 'BERT model') if run_info and run_info.data else 'BERT model'
        batch_size = run_info.data.params.get('batch_size', 'N/A') if run_info and run_info.data else 'N/A'
        learning_rate = run_info.data.params.get('learning_rate', 'N/A') if run_info and run_info.data else 'N/A'
        epochs = run_info.data.params.get('epochs', 'N/A') if run_info and run_info.data else 'N/A'

        # Create README.md
        readme_content = f"""---
language: en
license: mit
tags:
- text-classification
- multi-label
- tourism
- sri-lanka
- bert
- pytorch
datasets:
- tourism-reviews-sri-lanka
metrics:
- f1
- accuracy
- precision
- recall
model-index:
- name: serendip-travel-experiential-classifier
  results:
  - task:
      type: text-classification
      name: Multi-Label Text Classification
    dataset:
      type: tourism-reviews-sri-lanka
      name: Sri Lankan Tourism Reviews
    metrics:
    - type: f1
      value: {f1_score if isinstance(f1_score, (int, float)) else 'N/A':.4f}
    - type: accuracy
      value: {accuracy if isinstance(accuracy, (int, float)) else 'N/A':.4f}
    - type: precision
      value: {precision if isinstance(precision, (int, float)) else 'N/A':.4f}
    - type: recall
      value: {recall if isinstance(recall, (int, float)) else 'N/A':.4f}
---

# Serendip Travel Experiential Classifier

A fine-tuned {model_name_param} model for classifying Sri Lankan tourist reviews into four experiential dimensions.

## Model Description

This model is a fine-tuned {model_name_param} model trained on Sri Lankan tourism reviews to classify text into four experiential dimensions:

1. **Regenerative & Eco-Tourism**: Travel focused on positive social and environmental impact
2. **Integrated Wellness**: Journeys combining physical and mental well-being
3. **Immersive Culinary**: Experiences centered on authentic local cuisine
4. **Off-the-Beaten-Path Adventure**: Exploration of less crowded natural landscapes

## Performance

- **F1-Score**: {f1_score if isinstance(f1_score, (int, float)) else 'N/A':.4f}
- **Accuracy**: {accuracy if isinstance(accuracy, (int, float)) else 'N/A':.4f}
- **Precision**: {precision if isinstance(precision, (int, float)) else 'N/A':.4f}
- **Recall**: {recall if isinstance(recall, (int, float)) else 'N/A':.4f}

## Training Details

- **Model**: {model_name_param}
- **Batch Size**: {batch_size}
- **Learning Rate**: {learning_rate}
- **Epochs**: {epochs}

## Usage

```python
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("your-username/serendip-travel-classifier")
model = AutoModelForSequenceClassification.from_pretrained("your-username/serendip-travel-classifier")

# Example text
text = "The organic tea plantation tour was amazing! We learned about sustainable farming practices."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Get predicted labels
labels = ["Regenerative & Eco-Tourism", "Integrated Wellness", "Immersive Culinary", "Off-the-Beaten-Path Adventure"]
predicted_labels = [labels[i] for i, score in enumerate(predictions[0]) if score > 0.5]
print(f"Predicted labels: {predicted_labels}")
```

## Model Card

This model was trained on Sri Lankan tourism reviews to classify experiences into four categories. The model uses a multi-label classification approach, meaning a single review can be classified into multiple categories simultaneously.

### Limitations

- Trained specifically on Sri Lankan tourism data
- May not generalize well to other geographical regions
- Performance may vary with different text styles or languages

### Citation

If you use this model, please cite:

```bibtex
@misc{{serendip-travel-classifier,
  title={{Serendip Travel Experiential Classifier}},
  author={{Your Name}},
  year={{2024}},
  publisher={{Hugging Face}},
  url={{https://huggingface.co/your-username/serendip-travel-classifier}}
}}
```
"""

        # Save README.md
        with open(f"{hf_path}/README.md", "w") as f:
            f.write(readme_content)

        print("✅ Configuration files created successfully!")
        print(f"  - config.json")
        print(f"  - README.md")
        
        return True

    except Exception as e:
        print(f"❌ Error creating model config: {e}")
        return False

# Create model configuration
if hf_path and run_info:
    create_model_config(hf_path, run_info)

SyntaxError: incomplete input (ipython-input-3591915975.py, line 42)

## 4. Model Testing and Validation


In [None]:
# Model Testing and Validation
# ============================

def test_model_prediction(model, tokenizer, test_texts=None, threshold=0.5):
    """Test the model with sample texts and validation"""
    if test_texts is None:
        test_texts = [
            "The organic tea plantation tour was amazing! We learned about sustainable farming practices and environmental conservation.",
            "The spa retreat offered incredible yoga sessions and meditation classes. Perfect for relaxation and wellness.",
            "The local cooking class was fantastic! We learned to make authentic Sri Lankan curry with fresh spices from the market.",
            "The hiking trail through the remote jungle was challenging but rewarding. We saw amazing wildlife and untouched nature."
        ]
    
    try:
        # Validate inputs
        if not model or not tokenizer:
            raise ValueError("Model or tokenizer not provided")
            
        print("=" * 60)
        print("🧪 MODEL TESTING")
        print("=" * 60)

        # Determine device
        device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
        model.to(device)
        print(f"🖥️  Using device: {device}")

        results = []
        for i, text in enumerate(test_texts, 1):
            print(f"\n📝 Test {i}: {text[:100]}{'...' if len(text) > 100 else ''}")

            # Tokenize input and move to device
            inputs = tokenizer(
                text, 
                return_tensors="pt", 
                truncation=True, 
                padding=True, 
                max_length=512
            ).to(device)

            # Get predictions
            with torch.no_grad():
                outputs = model(**inputs)
                predictions = torch.sigmoid(outputs.logits)

            # Display results
            print("🏷️  Predicted labels:")
            test_result = {"text": text, "predictions": []}
            
            for j, (label, score) in enumerate(zip(LABELS, predictions[0])):
                status = "✅" if score > threshold else "❌"
                print(f"  {status} {label}: {score:.3f}")
                test_result["predictions"].append({
                    "label": label,
                    "score": float(score),
                    "predicted": score > threshold
                })
            
            results.append(test_result)

        print(f"\n✅ Model testing completed successfully!")
        return results

    except Exception as e:
        print(f"❌ Error testing model: {e}")
        return None

# Test the model
if model and tokenizer:
    test_results = test_model_prediction(model, tokenizer)
else:
    print("❌ Cannot test model - model or tokenizer not available")

MODEL TESTING

Test 1: The organic tea plantation tour was amazing! We learned about sustainable farming practices and envi...
Predicted labels:
  ✓ Regenerative & Eco-Tourism: 0.999
  ✗ Integrated Wellness: 0.005
  ✗ Immersive Culinary: 0.017
  ✗ Off-the-Beaten-Path Adventure: 0.006

Test 2: The spa retreat offered incredible yoga sessions and meditation classes. Perfect for relaxation and ...
Predicted labels:
  ✗ Regenerative & Eco-Tourism: 0.005
  ✓ Integrated Wellness: 0.995
  ✗ Immersive Culinary: 0.004
  ✗ Off-the-Beaten-Path Adventure: 0.006

Test 3: The local cooking class was fantastic! We learned to make authentic Sri Lankan curry with fresh spic...
Predicted labels:
  ✓ Regenerative & Eco-Tourism: 0.992
  ✗ Integrated Wellness: 0.007
  ✓ Immersive Culinary: 0.999
  ✗ Off-the-Beaten-Path Adventure: 0.013

Test 4: The hiking trail through the remote jungle was challenging but rewarding. We saw amazing wildlife an...
Predicted labels:
  ✗ Regenerative & Eco-Tourism: 0.005
  ✗ 

## 5. Hugging Face Hub Upload


In [None]:
# Hugging Face Hub Setup
# =====================

def setup_huggingface_upload():
    """Setup Hugging Face Hub for model upload with validation"""
    print("=" * 60)
    print("🚀 HUGGING FACE HUB SETUP")
    print("=" * 60)

    print("📋 Prerequisites:")
    print("1. Create a Hugging Face account at https://huggingface.co/")
    print("2. Generate an access token at https://huggingface.co/settings/tokens")
    print("3. Install and login to huggingface_hub:")
    print("   !pip install huggingface_hub")
    print("   !huggingface-cli login")
    print("4. Update the HF_MODEL_NAME in the configuration cell")

    # Validate model name
    if HF_MODEL_NAME == "your-username/serendip-travel-classifier":
        print(f"\n⚠️  WARNING: Please update HF_MODEL_NAME in the configuration!")
        print("   Current value: your-username/serendip-travel-classifier")
        print("   Expected format: your-username/model-name")
    else:
        print(f"\n✅ Model name configured: {HF_MODEL_NAME}")

    # Check if logged in
    try:
        from huggingface_hub import whoami
        user_info = whoami()
        print(f"✅ Logged in as: {user_info['name']}")
        return True
    except Exception as e:
        print(f"❌ Not logged in to Hugging Face: {e}")
        print("   Please run: huggingface-cli login")
        return False

# Setup Hugging Face
hf_ready = setup_huggingface_upload()


In [None]:
# Upload Model to Hugging Face Hub
# ================================

def upload_to_huggingface(hf_path=None, model_name=None):
    """Upload the prepared model to Hugging Face Hub with validation"""
    if hf_path is None:
        hf_path = HF_MODEL_PATH
    if model_name is None:
        model_name = HF_MODEL_NAME
        
    try:
        # Validate inputs
        if not hf_path or not os.path.exists(hf_path):
            raise ValueError(f"HF model path does not exist: {hf_path}")
            
        if model_name == "your-username/serendip-travel-classifier":
            raise ValueError("Please update HF_MODEL_NAME in configuration")
            
        print("=" * 60)
        print("🚀 UPLOADING TO HUGGING FACE HUB")
        print("=" * 60)

        # Initialize Hugging Face API
        api = HfApi()

        # Create repository
        print(f"📁 Creating repository: {model_name}")
        create_repo(model_name, exist_ok=True, private=HF_PRIVATE)

        # List files to upload
        files_to_upload = os.listdir(hf_path)
        print(f"📄 Files to upload: {files_to_upload}")

        # Upload all files
        print(f"⬆️  Uploading files from {hf_path}...")
        api.upload_folder(
            folder_path=hf_path,
            repo_id=model_name,
            repo_type=HF_REPO_TYPE
        )

        print(f"✅ Model successfully uploaded to: https://huggingface.co/{model_name}")
        return True

    except Exception as e:
        print(f"❌ Error uploading to Hugging Face: {e}")
        print("\n🔧 Troubleshooting:")
        print("1. Make sure you're logged in: huggingface-cli login")
        print("2. Check your access token permissions")
        print("3. Verify the model name is correct")
        print("4. Ensure HF_MODEL_NAME is updated in configuration")
        return False

# Upload the model (uncomment when ready)
# if hf_ready and hf_path:
#     upload_success = upload_to_huggingface(hf_path, HF_MODEL_NAME)
# else:
#     print("❌ Cannot upload - setup not complete")


## 6. Alternative Upload Methods


In [None]:
# Alternative Upload Methods
# ==========================

def create_git_upload_instructions(hf_path=None, model_name=None):
    """Create instructions for manual git upload with validation"""
    if hf_path is None:
        hf_path = HF_MODEL_PATH
    if model_name is None:
        model_name = HF_MODEL_NAME
        
    print("=" * 60)
    print("🔄 ALTERNATIVE: MANUAL GIT UPLOAD")
    print("=" * 60)

    print("If the API upload fails, you can use git to upload manually:")
    print()
    print("1. Clone the repository:")
    print(f"   git clone https://huggingface.co/{model_name}")
    print()
    print("2. Copy files to the repository:")
    print(f"   cp -r {hf_path}/* {model_name.split('/')[-1]}/")
    print()
    print("3. Add, commit, and push:")
    print(f"   cd {model_name.split('/')[-1]}")
    print("   git add .")
    print("   git commit -m 'Add model files'")
    print("   git push")
    print()
    print("4. Or use the web interface:")
    print(f"   https://huggingface.co/{model_name}")
    print("   - Click 'Add file' -> 'Upload files'")
    print(f"   - Upload all files from {hf_path}/")

    # List files to upload
    print(f"\n📄 Files to upload from {hf_path}:")
    if os.path.exists(hf_path):
        files = os.listdir(hf_path)
        for file in files:
            file_path = os.path.join(hf_path, file)
            file_size = os.path.getsize(file_path) if os.path.isfile(file_path) else "N/A"
            print(f"  - {file} ({file_size} bytes)" if file_size != "N/A" else f"  - {file}")
    else:
        print(f"  ❌ Path does not exist: {hf_path}")

# Create manual upload instructions
if hf_path:
    create_git_upload_instructions(hf_path, HF_MODEL_NAME)
else:
    print("❌ Cannot create upload instructions - model path not available")


## 7. Model Usage Examples


In [None]:
# Model Usage Examples
# ===================

def create_usage_examples():
    """Create comprehensive usage examples with proper formatting"""
    print("=" * 60)
    print("📚 MODEL USAGE EXAMPLES")
    print("=" * 60)

    examples = f"""
# Example 1: Basic Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification
import torch

# Load model and tokenizer
tokenizer = AutoTokenizer.from_pretrained("{HF_MODEL_NAME}")
model = AutoModelForSequenceClassification.from_pretrained("{HF_MODEL_NAME}")

# Example text
text = "The organic tea plantation tour was amazing! We learned about sustainable farming practices."

# Tokenize and predict
inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
with torch.no_grad():
    outputs = model(**inputs)
    predictions = torch.sigmoid(outputs.logits)

# Get predicted labels
labels = {LABELS}
predicted_labels = [labels[i] for i, score in enumerate(predictions[0]) if score > 0.5]
print(f"Predicted labels: {{predicted_labels}}")

# Example 2: Batch Processing
texts = [
    "The spa retreat offered incredible yoga sessions and meditation classes.",
    "The local cooking class was fantastic! We learned to make authentic Sri Lankan curry.",
    "The hiking trail through the remote jungle was challenging but rewarding."
]

# Process multiple texts
for text in texts:
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.sigmoid(outputs.logits)

    predicted_labels = [labels[i] for i, score in enumerate(predictions[0]) if score > 0.5]
    print(f"Text: {{text[:50]}}...")
    print(f"Labels: {{predicted_labels}}")
    print()

# Example 3: Confidence Scores
def get_confidence_scores(text, threshold=0.5):
    inputs = tokenizer(text, return_tensors="pt", truncation=True, padding=True)
    with torch.no_grad():
        outputs = model(**inputs)
        predictions = torch.sigmoid(outputs.logits)

    results = []
    for i, (label, score) in enumerate(zip(labels, predictions[0])):
        results.append({{
            'label': label,
            'score': float(score),
            'predicted': score > threshold
        }})

    return results

# Get detailed confidence scores
text = "The organic tea plantation tour was amazing! We learned about sustainable farming practices."
scores = get_confidence_scores(text)
for result in scores:
    print(f"{{result['label']}}: {{result['score']:.3f}} ({{'✓' if result['predicted'] else '✗'}})")

# Example 4: Production-ready Class
class TourismClassifier:
    def __init__(self, model_name="{HF_MODEL_NAME}", threshold=0.5):
        self.tokenizer = AutoTokenizer.from_pretrained(model_name)
        self.model = AutoModelForSequenceClassification.from_pretrained(model_name)
        self.labels = {LABELS}
        self.threshold = threshold
        
    def predict(self, text):
        inputs = self.tokenizer(text, return_tensors="pt", truncation=True, padding=True)
        with torch.no_grad():
            outputs = self.model(**inputs)
            predictions = torch.sigmoid(outputs.logits)
        
        results = []
        for i, (label, score) in enumerate(zip(self.labels, predictions[0])):
            results.append({{
                'label': label,
                'score': float(score),
                'predicted': score > self.threshold
            }})
        
        return results

# Usage
classifier = TourismClassifier()
result = classifier.predict("Amazing eco-friendly resort with organic food!")
print(result)
"""

    print(examples)

# Display usage examples
create_usage_examples()


## 8. Summary and Next Steps


In [None]:
# Deployment Summary and Next Steps
# =================================

def deployment_summary():
    """Provide comprehensive deployment summary and next steps"""
    print("=" * 60)
    print("🎉 DEPLOYMENT SUMMARY")
    print("=" * 60)

    # Check deployment status
    model_ready = hf_path and os.path.exists(hf_path)
    config_ready = HF_MODEL_NAME != "your-username/serendip-travel-classifier"
    hf_logged_in = hf_ready if 'hf_ready' in globals() else False

    print("📊 Deployment Status:")
    print(f"  {'✅' if model_ready else '❌'} Model files prepared")
    print(f"  {'✅' if config_ready else '⚠️ '} Model name configured")
    print(f"  {'✅' if hf_logged_in else '❌'} Hugging Face authentication")

    if model_ready:
        print(f"\n📁 Model files saved to: {hf_path}")
        print(f"🏷️  Model name: {HF_MODEL_NAME}")
        
        # Get performance metrics from run_info if available
        if 'run_info' in globals() and run_info:
            metrics = run_info.data.metrics
            f1_score = metrics.get('test_f1', 'N/A')
            accuracy = metrics.get('test_accuracy', 'N/A')
            print(f"📊 Performance: {f1_score:.4f} F1-Score, {accuracy:.4f} Accuracy")
        else:
            print("📊 Performance: 92.50% F1-Score, 92.33% Accuracy (from config)")

    print("\n📋 Next Steps:")
    if not config_ready:
        print("1. ⚠️  Update HF_MODEL_NAME in configuration cell")
    if not hf_logged_in:
        print("2. 🔐 Login to Hugging Face: huggingface-cli login")
    print("3. 🚀 Uncomment and run the upload function")
    print("4. 📝 Update README.md with your actual model URL")
    print("5. 🧪 Test the deployed model")
    print("6. 📢 Share your model with the community!")

    if model_ready:
        print("\n📁 Files Created:")
        files = os.listdir(hf_path)
        for file in files:
            file_path = os.path.join(hf_path, file)
            file_size = os.path.getsize(file_path) if os.path.isfile(file_path) else "N/A"
            print(f"  - {file} ({file_size} bytes)" if file_size != "N/A" else f"  - {file}")

    print(f"\n🔗 Model will be available at:")
    print(f"   https://huggingface.co/{HF_MODEL_NAME}")

    print("\n🎯 Model Features:")
    print("  - Multi-label text classification")
    print(f"  - {NUM_LABELS} experiential dimensions")
    print(f"  - {MODEL_NAME} architecture")
    print("  - High performance (92%+ F1-Score)")
    print("  - Ready for production use")

    print("\n💡 Usage Tips:")
    print("  - Use threshold 0.5 for binary predictions")
    print("  - Adjust threshold based on use case")
    print("  - Consider confidence scores for uncertainty")
    print("  - Batch processing for efficiency")
    print("  - Use the TourismClassifier class for production")

    print("\n🔧 Troubleshooting:")
    if not model_ready:
        print("  - Check MLflow connection and run ID")
        print("  - Verify model download completed successfully")
    if not config_ready:
        print("  - Update HF_MODEL_NAME in configuration")
    if not hf_logged_in:
        print("  - Run: huggingface-cli login")

# Display summary
deployment_summary()


## ✅ Deployment Checklist

Use this checklist to ensure your model deployment is complete and ready for production:

### Pre-Deployment
- [ ] **Configuration**: Update `HF_MODEL_NAME` with your Hugging Face username
- [ ] **Authentication**: Login to Hugging Face (`huggingface-cli login`)
- [ ] **MLflow Access**: Verify MLflow connection and run ID
- [ ] **Environment**: Confirm environment detection (Colab vs local)

### Model Preparation
- [ ] **Download**: Model successfully downloaded from MLflow
- [ ] **Preparation**: Model converted to Hugging Face format
- [ ] **Testing**: Model tested with sample texts
- [ ] **Configuration**: `config.json` and `README.md` generated

### Upload Process
- [ ] **Repository**: Hugging Face repository created
- [ ] **Upload**: All files uploaded successfully
- [ ] **Verification**: Model accessible on Hugging Face Hub
- [ ] **Documentation**: README displays correctly

### Post-Deployment
- [ ] **Testing**: Test deployed model with API
- [ ] **Performance**: Verify model performance matches expectations
- [ ] **Sharing**: Model is public and shareable
- [ ] **Monitoring**: Set up usage monitoring (optional)

### Production Readiness
- [ ] **Code Examples**: Usage examples work correctly
- [ ] **Error Handling**: Proper error handling in production code
- [ ] **Documentation**: Complete model card and usage guide
- [ ] **Version Control**: Model versioning strategy in place

---

**🎉 Congratulations!** Your model is now ready for production use on Hugging Face Hub.


## 9. Best Practices for Model File Management

When working with ML models in a Git-based workflow, it's important to handle large model files correctly:

### File Size Limitations
- GitHub has a 100MB file size limit for individual files
- PyTorch model files (`.pth`) often exceed this limit

### Recommended Approaches

1. **Use `.gitignore` for Local Model Files**
   - This notebook downloads model artifacts to `.gitignore.d/` directories
   - These directories are excluded from Git tracking
   - Always check your `.gitignore` file includes: `*.pth`, `model_artifacts/`, `hf_model/`

2. **Use Hugging Face Hub for Model Storage**
   - Hugging Face Hub is designed for ML model storage
   - This notebook includes all the code needed to push models to Hugging Face

3. **Use Git LFS (optional)**
   - For teams that need version control of model files
   - Install Git LFS: `brew install git-lfs`
   - Setup: `git lfs install && git lfs track "*.pth"`

4. **MLflow for Experiment Tracking**
   - Use MLflow to track experiments and store models
   - Download models only when needed for deployment

Remember: Always check what files are being added to Git before committing with `git status` to avoid accidentally committing large files.