# Get Job Parameters

In [0]:
params = dbutils.widgets.getAll()
print(params)

## Set catalog and schema

We set the catalog and schema to organise our data and ensure it is stored in the correct location. Change these to suit your workspace.

In [0]:
catalog = params["catalog"]
schema = params["schema"]
transcription_model_id = params["transcription_model_id"]
transcription_model_save_path = f"/Volumes/{catalog}/{schema}/data/models/llm_classifier/{transcription_model_id.replace("-", "_").replace("/", "_")}"
llm_model_id = params["llm_model_id"]
llm_model_save_path = f"/Volumes/{catalog}/{schema}/data/models/llm_classifier/{llm_model_id.replace("-", "_").replace("/", "_")}"

Create catalog, schema and volume if they don't exist, and create directories for compressed, raw audio files and models.

In [0]:
spark.sql(f"CREATE CATALOG IF NOT EXISTS {catalog}")
spark.sql(f"CREATE SCHEMA IF NOT EXISTS {catalog}.{schema}")
spark.sql(f"CREATE VOLUME IF NOT EXISTS {catalog}.{schema}.data")
dbutils.fs.mkdirs(f"/Volumes/{catalog}/{schema}/data/compressed/LJSpeech")
dbutils.fs.mkdirs(f"/Volumes/{catalog}/{schema}/data/raw_audio/LJSpeech")
dbutils.fs.mkdirs(f"/Volumes/{catalog}/{schema}/data/models")

## Download models from Hugging Face

We download two models from Hugging Face. We do this because it's more efficient to download these larger models once and retrieve them from storage for every batch of inference:
- [Whisper-medium](https://huggingface.co/openai/whisper-medium)
- [Phi-4](https://huggingface.co/microsoft/phi-4)

In [0]:
from transformers import pipeline
import torch
import os

Whisper-medium is a state-of-the-art automatic speech recognition (ASR) model developed by OpenAI. It is designed to transcribe spoken language into written text with high accuracy.

In [0]:
if os.path.exists(transcription_model_save_path):
    dbutils.fs.rm(transcription_model_save_path, recurse=True)

dbutils.fs.mkdirs(transcription_model_save_path)

transcription_pipeline = pipeline(
    "automatic-speech-recognition",
    model=transcription_model_id,
    torch_dtype=torch.float16,
    device="cuda:0"
)

transcription_pipeline.save_pretrained(transcription_model_save_path)

Phi-4 is a state-of-the-art language model developed by Microsoft. It is designed for text generation and can be used for various natural language processing tasks. We will use it for simple classification.

In [0]:
if os.path.exists(llm_model_save_path):
    dbutils.fs.rm(llm_model_save_path, recurse=True)

dbutils.fs.mkdirs(llm_model_save_path)

llm_pipeline = pipeline(
    "text-generation",
    model=llm_model_id,
    model_kwargs={"torch_dtype": "auto"},
    device_map="auto",
)

llm_pipeline.save_pretrained(llm_model_save_path)