# Large Language Model Serving Tutorial with DigitalHub

This notebook demonstrates how to deploy and serve a pre-trained Large Language Model using HuggingFace transformers with the DigitalHub SDK. We'll work with a DistilBERT model for sentiment classification and deploy it as a REST API service.

## Overview
- **Model Selection**: Use a pre-trained DistilBERT model from HuggingFace Hub
- **Model Serving**: Deploy the model as a REST API endpoint with GPU acceleration
- **Inference**: Test the deployed model with text classification tasks
- **Integration**: Seamless integration with DigitalHub's serving infrastructure

## Setup and Model Configuration

We'll set up our DigitalHub project and configure the HuggingFace model for serving. The model we'll use is DistilBERT fine-tuned for sentiment classification (SST-2 dataset).

## Project Initialization

Now we'll initialize our DigitalHub project using consistent naming with other tutorials.

In [None]:
import digitalhub as dh

p_name = "tutorial-project"
project = dh.get_or_create_project(p_name)

## Step 1: Model Configuration

We'll create a function to serve the DistilBERT model directly from HuggingFace Hub. This model is fine-tuned for sentiment classification and can classify text as positive or negative.

The model path uses the `huggingface://` protocol to directly reference models from the HuggingFace Hub without manual downloading.

In [None]:
llm_function = project.new_function(
    name="sentiment-classifier",
    kind="huggingfaceserve",
    model_name="sentiment-model",
    path="huggingface://distilbert/distilbert-base-uncased-finetuned-sst-2-english",
)

## Step 2: Model Serving

Now we'll deploy our LLM model as a REST API service. We're using a GPU profile (`1xa100`) to accelerate inference for better performance with the transformer model.

In [None]:
llm_run = llm_function.run("serve", profile="1xa100", wait=True)

Let's check that our service is running and ready to accept requests:

In [None]:
service = llm_run.refresh().status.service
print("Service status:", service)

### Test the LLM API

Now let's test our deployed sentiment classification model with some sample text. We'll send both positive and negative sentiment examples to see how the model performs.

In [None]:
# Prepare test data for sentiment classification
model_name = "sentiment-model"
json_payload = {
    "inputs": [
        {
            "name": "input-0",
            "shape": [2],
            "datatype": "BYTES",
            "data": ["Hello, my dog is cute", "I am feeling sad"],
        },
    ]
}

In [None]:
# Make prediction request to the deployed LLM
result = llm_run.invoke(model_name=model_name, json=json_payload).json()
print("Sentiment classification results:")
print(result)

### Understanding the Results

The model returns sentiment predictions for each input text:
- **Positive sentiment**: Higher probability for positive class
- **Negative sentiment**: Higher probability for negative class

The DistilBERT model has been fine-tuned on the Stanford Sentiment Treebank (SST-2) dataset, making it effective for binary sentiment classification tasks.

## Summary

We have successfully:

1. **Deployed a Pre-trained LLM**: Used DistilBERT from HuggingFace Hub for sentiment classification
2. **GPU-Accelerated Serving**: Deployed the model with GPU acceleration for optimal performance  
3. **REST API Integration**: Created a REST endpoint for real-time sentiment analysis
4. **Tested the Service**: Verified the model works correctly with sample text inputs

The LLM is now ready to handle sentiment classification requests through the DigitalHub serving infrastructure. The service can be integrated into applications, workflows, or used for batch processing tasks.

This approach demonstrates how easy it is to deploy state-of-the-art language models using DigitalHub's HuggingFace integration, without needing to manage model downloads, dependencies, or serving infrastructure manually.

## Summary

We have successfully:

1. **Deployed a Pre-trained LLM**: Used DistilBERT from HuggingFace Hub for sentiment classification
2. **GPU-Accelerated Serving**: Deployed the model with GPU acceleration for optimal performance  
3. **REST API Integration**: Created a REST endpoint for real-time sentiment analysis
4. **Tested the Service**: Verified the model works correctly with sample text inputs

The LLM is now ready to handle sentiment classification requests through the DigitalHub serving infrastructure. The service can be integrated into applications, workflows, or used for batch processing tasks.

This approach demonstrates how easy it is to deploy state-of-the-art language models using DigitalHub's HuggingFace integration, without needing to manage model downloads, dependencies, or serving infrastructure manually.