## Overview

This notebook demonstrates how to leverage [llama-prompt-ops](https://github.com/meta-llama/llama-prompt-ops) to optimize prompts for an *HTS code classification* task.  It leverages existing question(description)-answer(hts_code) data in Databricks and includes a custom metric to match output format.

It covers:

1. **Data Preparation:** Sampling data from a Databricks table (ID, description, HTS code), then storing it as a JSON file.

2. **Configuration:** Setting up a YAML file with Databricks model configurations and defining a custom metric to validate HTS code format.

3. **Prompt Optimization:** Executing the *llama-prompt-ops migrate* command to optimize prompts.

4. **Results:** Optimized prompts will be saved in the results folder. 


In [0]:
!pip install llama-prompt-ops python-dotenv
dbutils.library.restartPython()

### 1. Data prep
- Prepare question, answer json dataset 

In [0]:
import json
df = spark.sql("SELECT * FROM workspace.default.hts_description ORDER BY RAND() LIMIT 100")
json_data = df.toPandas().to_json(orient='records')
json_list = json.loads(json_data)


In [0]:
import json

file_path = "hts_classification_sample.json"
with open(file_path, 'w') as file:
    json.dump(json_list, file)

### 2. Configure hts_classification.yaml
 - Update model name and base url

In [0]:
from dotenv import load_dotenv
import os

load_dotenv()
os.environ["DATABRICKS_API_TOKEN"] = os.getenv("DATABRICKS_API_TOKEN")

### 3. Create Custom metric if needed
- hts_metric.py

### 4. Execute llama-prompt-ops migrate

In [0]:
!llama-prompt-ops migrate --config hts_classification.yaml --api-key-env DATABRICKS_API_TOKEN --log-level CRITICAL

Loaded environment variables from .env
Loaded configuration from hts_classification.yaml
 Using model with DSPy: databricks/maverick
Using the same model for task and proposer: databricks/maverick
Using metric: HTSCodeMetric
Resolved relative dataset path to: /Workspace/Users/anusha_7777@yahoo.com/databricks-prompt-ops-example/hts_classification_sample.json
Using dataset adapter: ConfigurableJSONAdapter
Auto-detected BasicOptimizationStrategy for model: maverick
Using 'system_prompt' from config
Using output prefix from config: hts_classification
Starting prompt optimization...
2025/05/26 21:33:25 INFO dspy.teleprompt.mipro_optimizer_v2: 
RUNNING WITH THE FOLLOWING LIGHT AUTO RUN SETTINGS:
num_trials: 7
minibatch: False
num_candidates: 5
valset size: 15

2025/05/26 21:33:25 INFO dspy.teleprompt.mipro_optimizer_v2: 
==> STEP 1: BOOTSTRAP FEWSHOT EXAMPLES <==
2025/05/26 21:33:25 INFO dspy.teleprompt.mipro_optimizer_v2: These will be used as few-shot example candidates