# Fine-Tuning Models on Krutrim Cloud
This notebook demonstrates how to fine-tune models on Krutrim Cloud using various engines and modes, manage datasets, create fine-tuning tasks, and monitor their status. It provides step-by-step instructions to perform tasks such as listing supported models, creating and retrieving datasets, initiating fine-tuning tasks, and managing them effectively.

## Install Krutrim Cloud SDK

In [None]:
%pip install krutrim-cloud

## Prerequisite
**Export the Required Environment Variables:** Create a .env file in the examples directory with the following details:

- KRUTRIM_CLOUD_API_KEY="Your Krutrim Cloud API Key"

- KRUTRIM_CLOUD_S3_REGION = "Your Krurim Cloud S3 Region"

- KRUTRIM_CLOUD_S3_PUBLIC_KEY = "Your Krutrim Cloud S3 Public Key"

- KRUTRIM_CLOUD_S3_BUCKET_ENDPOINT = "Your Krutrim Cloud Bucket Endpoint URL"

- KRUTRIM_CLOUD_S3_SECRET_KEY = "Your Krutrim Cloud S3 Access Key"


## Import Libraries and Load Environment Variables
- **Purpose**: To prepare the environment for the script.
- **Key Actions**:
    -  Import necessary libraries (krutrim_cloud for API access, dotenv for environment variables).
    - Load environment variables from a .env file.

In [None]:
# Import necessary libraries
from krutrim_cloud import KrutrimCloud
from dotenv import load_dotenv
import os
# Load environment variables from .env file
load_dotenv()

## Initialize KrutrimCloud Client
- **Purpose:** To set up the API client for making requests.
- **Key Actions:**
    - Create an instance of KrutrimCloud using the base URL from environment variables.

In [4]:
# Initialize KrutrimCloud client
client = KrutrimCloud()

## List Supported Models
- **Purpose:** To see which models are available for fine-tuning.
- **Key Actions:**
    - Retrieve and print a list of all supported models.

In [5]:
# List all the supported models
try:
    supported_model_list = client.fine_tuning.models.list()
    print(supported_model_list)
except Exception as e:
    print(f"An error occurred while listing supported models: {e}")

['Meta-Llama-3-8B', 'Meta-Llama-3-8B-instruct', 'Mistral-7B']


## List Fine-Tuning Engines
- **Purpose:** To identify the frameworks available for fine-tuning.
- **Key Actions**:
    - Fetch and display names of all fine-tuning engines.

In [6]:
# List fine_tuning engine(framework) name
try:
    fine_tuning_engines = client.fine_tuning.engines.list()
    print(fine_tuning_engines)
except Exception as e:
    print(f"An error occurred while listing fine-tuning engines: {e}")

['torchtune', 'nemo']


## List Models Supported by a Specific Engine
- **Purpose:** To find models compatible with a specific fine-tuning engine.
- **Key Actions:**
    - Retrieve models supported by the engine and print them.

In [7]:
# List all models supported by the given engine
try:
    engine = fine_tuning_engines[0]
    supported_models = client.fine_tuning.models.retrieve(engine)
    print(supported_models)
except Exception as e:
    print(f"An error occurred while retrieving models for the given engine: {e}")

['Meta-Llama-3-8B', 'Meta-Llama-3-8B-instruct', 'Mistral-7B']


## List Supported Fine-Tuning Modes
- **Purpose:** To understand the available training configurations.
- **Key Actions:**
    - List all fine-tuning modes supported by the API.

In [8]:
# List all supported fine_tuning modes
try:
    supported_modes = client.fine_tuning.modes.list()
    print(supported_modes)
except Exception as e:
    print(f"An error occurred while listing supported fine-tuning modes: {e}")


['full', 'lora', 'lora_dpo']


## List Models by Engine and Mode
- **Purpose:** To narrow down models based on engine and mode.
- **Key Actions:**
    - Retrieve models that work with the specified engine and mode.

In [9]:
# List all models supported by the given engine and mode
try:
    engine = fine_tuning_engines[0]
    mode = supported_modes[0]
    supported_models_given_engine_mode = client.fine_tuning.models.mode.retrieve(engine=engine, mode=mode)
    print(supported_models_given_engine_mode)
except Exception as e:
    print(f"An error occurred while retrieving models for the given engine and mode: {e}")

['Meta-Llama-3-8B', 'Meta-Llama-3-8B-instruct', 'Mistral-7B']


## List Modes Supported by a Given Engine
- **Purpose:** To check modes available for a specific engine.
- **Key Actions:**
    - Display modes supported by the engine.

In [10]:
# List all modes supported by the given engine
try:
    engine = fine_tuning_engines[0]
    supported_modes_given_engine = client.fine_tuning.modes.retrieve(engine=engine)
    print(supported_modes_given_engine)
except Exception as e:
    print(f"An error occurred while retrieving modes for the given engine: {e}")

['full', 'lora', 'lora_dpo']


## List Modes Supported by a Given Engine and Model
- **Purpose:** To determine modes available for a specific engine and model.
- **Key Actions:**
    - Retrieve modes for the engine and model.

In [11]:
# List all modes supported by the given engine and model
try:
    engine = fine_tuning_engines[0]
    model = supported_model_list[0]
    supported_modes_given_engine_model = client.fine_tuning.modes.model.retrieve(engine=engine, model=model)
    print(supported_modes_given_engine_model)
except Exception as e:
    print(f"An error occurred while retrieving modes for the given engine and model: {e}")

['full', 'lora', 'lora_dpo']


## List Fine-Tuning Engine by Model
- **Purpose:** To find the engine used for a specific model.
- **Key Actions:**
    - Display the fine-tuning engine for the model.

In [12]:
# List fine_tuning engine(framework) by the given model
try:
    model = supported_model_list[0]
    finetune_engine_given_model = client.fine_tuning.engines.model.retrieve(model=model)
    print(finetune_engine_given_model)
except Exception as e:
    print(f"An error occurred while retrieving fine-tuning engine for the given model: {e}")

['torchtune', 'nemo']


## List Fine-Tuning Engine by Model and Mode
- **Purpose:** To see which engine can be used with a model in a specific mode.
- **Key Actions:**
    - Retrieve the engine for given model and mode.

In [13]:
# List fine_tuning engine(framework) by the given model and mode
try:
    model = supported_model_list[0]
    mode = supported_modes[0]
    finetune_engine_given_model = client.fine_tuning.engines.model.mode.retrieve(model=model, mode=mode)
    print(finetune_engine_given_model)
except Exception as e:
    print(f"An error occurred while retrieving fine-tuning engine for the given model and mode: {e}")

['torchtune']


## Create Datasets using File Object
- **Purpose:** To upload datasets for fine-tuning tasks.
- **Key Actions:**
    - Open and upload a dataset file, using the path from environment variables.

In [14]:
# Create Datasets
try:
    dataset_path = "/Users/harsha.s1/Downloads/ft-alpaca-tiny.json"
    with open(dataset_path, 'rb') as file:
        create_dataset = client.fine_tuning.datasets.create(file=file)
except FileNotFoundError:
    print(f"Dataset file not found at path: {dataset_path}")
except Exception as e:
    print(f"An error occurred while creating the dataset: {e}")

## Create Datasets using S3
- **Purpose:**
    - Upload a local file from your system to an S3 bucket using the SDK's method.
    - Copy a dataset from a user's own S3 bucket to the fine-tuning service
- **Key Actions:**
    - Uploads a local file to the specified S3 bucket using the SDK and handles potential errors gracefully.
    - Use the SDK’s copy method to transfer the dataset file and handle any exceptions that occur.

In [16]:
try:
    local_directory = "/Users/harsha.s1/Downloads/ft-alpaca-tiny_sample3.json"
    bucket_name = "dvc-model-catalogue"
    s3_data = client.fine_tuning.upload_files_to_s3(
            local_dir_path=local_directory,
            bucket_name=bucket_name
        )

    dataset_info = client.fine_tuning.datasets.copy(filename=s3_data["filename"],
                                       path=s3_data["s3-path"],
                                       s3_region=os.getenv("KRUTRIM_CLOUD_S3_REGION"),
                                       s3_access_key=os.getenv("KRUTRIM_CLOUD_S3_PUBLIC_KEY"),
                                       s3_endpoint=os.getenv("KRUTRIM_CLOUD_S3_BUCKET_ENDPOINT"),
                                       s3_secret=os.getenv("KRUTRIM_CLOUD_S3_SECRET_KEY")
                                      )
    print(f"Dataset successfully copied: {dataset_info.name}")
except Exception as exc:
    print(f"Exception while uploading data: {exc}")
    

## List All Datasets
- **Purpose:** To view datasets that have been uploaded.
- **Key Actions:**
    - Retrieve and print all available datasets.

In [17]:
# List all datasets
try:
    dataset_list = client.fine_tuning.datasets.list()
    if dataset_list:
        for index, dataset in enumerate(dataset_list):
            print(f"Datasets: {index}")
            print(f"name={dataset.name}")
    else:
        print("No datasets found.")
except Exception as e:
    print(f"An error occurred while listing datasets: {e}")

Datasets: 0
name=databricks-dolly-15k.json
Datasets: 1
name=ft-alpaca-tiny.json
Datasets: 2
name=ft-alpaca-tiny_sample.json
Datasets: 3
name=ft-alpaca-tiny_sample1.json
Datasets: 4
name=law-qa-test.json


## Read a Specific Dataset
- **Purpose:** To verify the contents of a dataset.
- **Key Actions:**
    - Retrieve and display the contents of a specific dataset file.

In [18]:
# Read the datasets
try:
    filename = dataset_list[0].name
    data = client.fine_tuning.datasets.retrieve(filename=filename)
    print(data)
except Exception as e:
    print(f"An error occurred while retrieving the dataset: {e}")

['{"instruction": "When did Virgin Australia start operating?", "context": "Virgin Australia, the trading name of Virgin Australia Airlines Pty Ltd, is an Australian-based airline. It is the largest airline by fleet size to use the Virgin brand. It commenced services on 31 August 2000 as Virgin Blue, with two aircraft on a single route. It suddenly found itself as a major airline in Australia\'s domestic market after the collapse of Ansett Australia in September 2001. The airline has since grown to directly se...']


## Delete a Specific Dataset
- **Purpose:** To manage datasets by removing unnecessary ones.
- **Key Actions:**
   - Delete a specific dataset using its filename.

In [31]:
# Delete the Dataset
try:
    delete_response = client.fine_tuning.datasets.delete(filename=dataset_list[0].name)
except Exception as e:
    print(f"An error occurred while deleting the dataset: {e}")

## Create a Fine-Tuning Task
- **Purpose:** To define and initiate a new fine-tuning task.
- Key Actions:
    - Set parameters (engine, model, dataset, etc.) and create the fine-tuning task.

In [19]:
# Create finetuning task
try:
    create_finetuning_task = client.fine_tuning.tasks.create(
        engine="torchtune",
        task_name="ft_task_1",
        namespace="gpu-scheduler",
        priority=0,
        model="Meta-Llama-3-8B-instruct",
        mode="lora",
        dataset="ft-alpaca-tiny.json",
        test_dataset="",
        ngpu=1,
        lora_rank=0,
        lora_alpha=0,
        batch=4,
        lr=1,
        epoch=1,
        seed=0,
        version="v1",
        total_checkpoint=1
    )
    print(create_finetuning_task)
except Exception as e:
    print(f"An error occurred while creating the fine-tuning task: {e}")

TaskCreateResponse(task_id='49bd242d-5f3d-4e47-8640-2a5e7a5454fe')


## List All Fine-Tuning Tasks
- **Purpose:** To track ongoing and completed fine-tuning tasks.
- **Key Actions:**
    - Retrieve and display all fine-tuning tasks.

In [20]:
# List fine_tuning Tasks
try:
    finetuning_tasks_list = client.fine_tuning.tasks.list()
    if finetuning_tasks_list.task_list:
        for index, task in enumerate(finetuning_tasks_list.task_list):
            print(f"\nFine-tuning Task: {index}")
            for key, value in task.items():
                print(f"{key}={value}")
    else:
        print("No fine-tuning tasks found.")
except Exception as e:
    print(f"An error occurred while listing fine-tuning tasks: {e}")


Fine-tuning Task: 0
name=ft_task_1
model=Meta-Llama-3-8B-instruct
id=49bd242d-5f3d-4e47-8640-2a5e7a5454fe
status=starting
mtime=09/27/2024 04_32_38 UTC
total_checkpoint=1
checkpoints=[{'name': 'ft-9ea84101-4cee-4b80-8b31-5e67d8628b98-ft_task_1_1_final', 'status': 'succeed'}]


## Retrieve a Specific Fine-Tuning Task
- **Purpose:** To inspect details of a particular fine-tuning task.
- **Key Actions:**
   - Retrieve task information using its unique ID.

In [21]:
# Get fine_tuning task
try:
    id = finetuning_tasks_list.task_list[0].get("id")
    finetuning_task = client.fine_tuning.tasks.retrieve(id=id)
    for key, value in finetuning_task.__dict__.items():
        print(f"{key}={value}")
except Exception as e:
    print(f"An error occurred while retrieving the fine-tuning task: {e}")

id=49bd242d-5f3d-4e47-8640-2a5e7a5454fe
batch=4
checkpoints=[{'name': 'ft-9ea84101-4cee-4b80-8b31-5e67d8628b98-ft_task_1_1_final', 'status': 'succeed'}]
ctime=09/27/2024 04_32_29 UTC
dataset=ft-alpaca-tiny.json
dataset_size=18478
epoch=1
lora_alpha=0
lora_rank=0
lr=1
mode=lora
model=Meta-Llama-3-8B-instruct
name=ft_task_1
namespace=gpu-scheduler
ngpu=1
priority=0
reason=out_of_gpu
seed=0
status=starting
test_dataset=
test_dataset_size=0
total_checkpoint=1
version=v1


## Retrieve Logs for a Specific Fine-Tuning Task
- **Purpose:** To monitor training progress through logs.
- **Key Actions:**
    - Fetch logs for a specified fine-tuning task.

In [None]:
# Get all logs from the finetuning_task
try:
    id = finetuning_tasks_list.task_list[0].get("id")
    finetuning_logs = client.fine_tuning.tasks.logs(id=id)
    for log_item in finetuning_logs:
        print("\nLog Entry:")
        for key, value in log_item.__dict__.items():
            print(f"{key}={value}")
except Exception as e:
    print(f"An error occurred while retrieving logs for the fine-tuning task: {e}")

## Cancel a Specific Fine-Tuning Task
- **Purpose:** To stop an ongoing fine-tuning task if needed.
- **Key Actions:**
    - Cancel a task using its unique ID.


In [29]:
# Cancel fine_tuning task
try:
    id = finetuning_tasks_list.task_list[0].get("id")
    cancel_response = client.fine_tuning.tasks.cancel(id=id)
except Exception as e:
    print(f"An error occurred while cancelling the fine-tuning task: {e}")