### 1. Setup Feast

In [None]:
%pip install feast

In [18]:
!feast version

Feast SDK Version: "0.48.1"


### 2. Create a feature repository : Initiate a Feast project

In [2]:

!feast init myproject


Creating a new Feast repository in [1m[32m/opt/app-root/src/myproject[0m.



### 3. Inspecting the feature repository

In [4]:
!pwd

/opt/app-root/src


In [5]:
%cd myproject/feature_repo

/opt/app-root/src/myproject/feature_repo


In [6]:
# Inspect the feast repo path files. Displaying folder strucuture as tree.
!find . | sed -e 's/[^-][^\/]*\// |-- /g' -e 's/|-- \(.*\)/+-- \1/'

.
 +-- __init__.py
 +-- example_repo.py
 +-- test_workflow.py
 +-- data
 +--  |-- driver_stats.parquet
 +-- feature_store.yaml
 +-- __pycache__
 +--  |-- __init__.cpython-311.pyc
 +--  |-- test_workflow.cpython-311.pyc
 +--  |-- example_repo.cpython-311.pyc


### Key files : 
* `example_repo.py` file will have the code to create feast objects such as FeatureView, FeatureServices and OnDemandFeatureViews required to demonstrate this example. -- _myproject/feature_repo/example_repo.py_

* `feature_store.yaml` file will have all the configurations related to feast. -- _my_feast_project/feature_repo/feature_store.yaml_

* `test_workflow.py` contains the python code to demonstrate runining all key Feast commands, including defining, retrieving, and pushing features.  -- _my_feast_project/feature_repo/test_workflow.py_

In [7]:
!cat feature_store.yaml

project: myproject
# By default, the registry is a file (but can be turned into a more scalable SQL-backed registry)
registry: data/registry.db
# The provider primarily specifies default offline / online stores & storing the registry in a given cloud
provider: local
online_store:
    type: sqlite
    path: data/online_store.db
entity_key_serialization_version: 2
# By default, no_auth for authentication and authorization, other possible values kubernetes and oidc. Refer the documentation for more details.
auth:
    type: no_auth


> [!NOTE] :
> 
> File `data/driver_stats.parquet` is generated by the _feast init_ command and it acts as an historical information source to this example. We have defined this source in the `myproject/feature_repo/example_repo.py` file.

In [9]:
# Inspect driver_stats data
import pandas as pd
pd.read_parquet("data/driver_stats.parquet")

Unnamed: 0,event_timestamp,driver_id,conv_rate,acc_rate,avg_daily_trips,created
0,2025-03-26 13:00:00+00:00,1005,0.004254,0.155951,370,2025-04-10 13:50:06.169
1,2025-03-26 14:00:00+00:00,1005,0.356659,0.145077,48,2025-04-10 13:50:06.169
2,2025-03-26 15:00:00+00:00,1005,0.429421,0.184800,183,2025-04-10 13:50:06.169
3,2025-03-26 16:00:00+00:00,1005,0.653685,0.019408,481,2025-04-10 13:50:06.169
4,2025-03-26 17:00:00+00:00,1005,0.800338,0.426761,105,2025-04-10 13:50:06.169
...,...,...,...,...,...,...
1802,2025-04-10 11:00:00+00:00,1001,0.217492,0.468898,496,2025-04-10 13:50:06.169
1803,2025-04-10 12:00:00+00:00,1001,0.914503,0.137143,855,2025-04-10 13:50:06.169
1804,2021-04-12 07:00:00+00:00,1001,0.668744,0.717003,134,2025-04-10 13:50:06.169
1805,2025-04-03 01:00:00+00:00,1003,0.402941,0.269871,430,2025-04-10 13:50:06.169


### 4. Creating Feast objects

In [10]:

# You have not yet created any feast objects. In order to do that you have to execute the `feast apply` command on the directory where feature_store.yaml exists.
# this command will actual creates the feast objects mentioned in `example_repo.py`
!feast apply

  driver = Entity(name="driver", join_keys=["driver_id"])
Applying changes for project myproject
Created project [1m[32mmyproject[0m
Created entity [1m[32mdriver[0m
Created feature view [1m[32mdriver_hourly_stats_fresh[0m
Created feature view [1m[32mdriver_hourly_stats[0m
Created on demand feature view [1m[32mtransformed_conv_rate[0m
Created on demand feature view [1m[32mtransformed_conv_rate_fresh[0m
Created feature service [1m[32mdriver_activity_v3[0m
Created feature service [1m[32mdriver_activity_v1[0m
Created feature service [1m[32mdriver_activity_v2[0m

Created sqlite table [1m[32mmyproject_driver_hourly_stats_fresh[0m
Created sqlite table [1m[32mmyproject_driver_hourly_stats[0m



### 5. Retrieving historical features (for training data)

In [11]:
from datetime import datetime
import pandas as pd
import json
from feast import FeatureStore

# Initialize Feast feature store
store = FeatureStore(repo_path=".")

# Define entity to retrieve data for
timestamp_now = pd.to_datetime("now", utc=True)
entity_df = pd.DataFrame.from_dict(
    {
        # entity's join key -> entity values
        "driver_id": [1001, 1002, 1003, 1004, 1005],
        # "event_timestamp" (reserved key) -> timestamps
        "event_timestamp": [
            datetime(2021, 4, 12, 10, 59, 42),
            datetime(2021, 4, 12, 8, 12, 10),
            datetime(2021, 4, 12, 16, 40, 26),
            datetime(2021, 4, 12, 12, 30, 0),
            datetime(2021, 4, 12, 14, 15, 30)
        ],
        # (optional) label name -> label values. Feast does not process these
        "label_driver_reported_satisfaction": [1, 5, 3, 4, 2],
        # values we're using for an on-demand transformation
        "val_to_add": [1, 2, 3, 4, 5],
        "val_to_add_2": [10, 20, 30, 40, 50],
    }
)

# Retrieve historical features
training_df = store.get_historical_features(
    entity_df=entity_df,
    features=[
        "driver_hourly_stats:conv_rate",
        "driver_hourly_stats:acc_rate",
        "driver_hourly_stats:avg_daily_trips",
        "transformed_conv_rate:conv_rate_plus_val1",
        "transformed_conv_rate:conv_rate_plus_val2",
    ],
).to_df()

print("----- Feature schema -----\n")
print(training_df.info())

print()
print("----- Example features -----\n")
print(training_df.head())

----- Feature schema -----

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5 entries, 0 to 4
Data columns (total 10 columns):
 #   Column                              Non-Null Count  Dtype              
---  ------                              --------------  -----              
 0   driver_id                           5 non-null      int64              
 1   event_timestamp                     5 non-null      datetime64[ns, UTC]
 2   label_driver_reported_satisfaction  5 non-null      int64              
 3   val_to_add                          5 non-null      int64              
 4   val_to_add_2                        5 non-null      int64              
 5   conv_rate                           5 non-null      float32            
 6   acc_rate                            5 non-null      float32            
 7   avg_daily_trips                     5 non-null      int32              
 8   conv_rate_plus_val1                 5 non-null      float64            
 9   conv_rate_plus_val2



### 6. Data Preprocessing: Converting Driver Statistics to JSONL Format

In [24]:
## Transforming Driver Stats DataFrame to JSONL Document Objects for LLM Training

import json

# Assuming training_df is already defined and populated, store all document objects in an array
documents = []

for i in range(len(training_df)):
    doc = {
        "driver_id": int(training_df['driver_id'][i]),
        "conv_rate": float(training_df['conv_rate'][i]),
        "acc_rate": float(training_df['acc_rate'][i]),
        "avg_daily_trips": int(training_df['avg_daily_trips'][i]),
    }
    documents.append(doc)

# Save the transformed data in a JSONL format for LLM training
stats_document_jsonl_file="driver_stats_documents.jsonl"
with open(stats_document_jsonl_file, "w") as f:
    for doc in documents:
        f.write(json.dumps(doc) + "\n")

print(f">> Driver stats document objects converted and saved as JSONL to '{stats_document_jsonl_file}'.")


>> Driver stats document objects converted and saved as JSONL to 'driver_stats_documents.jsonl'.


In [25]:
##Creating Training Examples: Generating Driver Performance Summaries from JSONL Data

import json

# Define file paths
input_file = "driver_stats_documents.jsonl"   # Input file with driver stats
output_file = "driver_stats_training.jsonl"  # Output file for training data

instruction_text = "Summarize the driver's performance metrics."

# Function to generate a summary for the driver's performance
def generate_summary(record):
    summary = (
        f"Driver {record['driver_id']} has a conversion rate of {record['conv_rate']:.2%}, "
        f"an acceleration rate of {record['acc_rate']:.2%}, and completes an average of {record['avg_daily_trips']} daily trips."
    )
    return summary

# Process the input file and create training examples
with open(input_file, "r") as fin, open(output_file, "w") as fout:
    for line in fin:
        record = json.loads(line.strip())
        input_text = (
            f"Driver ID: {record['driver_id']}, "
            f"Conversion Rate: {record['conv_rate']:.4f}, "
            f"Acceleration Rate: {record['acc_rate']:.4f}, "
            f"Average Daily Trips: {record['avg_daily_trips']}"
        )
        output_text = generate_summary(record)
        example = {
            "instruction": instruction_text,
            "input": input_text,
            "output": output_text
        }
        fout.write(json.dumps(example) + "\n")

print(f">> Training dataset saved to '{output_file}'")


>> Training dataset saved to 'driver_stats_training.jsonl'


In [None]:
%pip install datasets transformers

In [28]:
## sample dataset loading

from datasets import load_dataset

dataset = load_dataset("json", data_files="driver_stats_training.jsonl", split="train")
print(dataset)

Dataset({
    features: ['instruction', 'input', 'output'],
    num_rows: 5
})


### 7. Configuring Tokenization for Fine-Tuning: Mapping Instructions and Outputs with Granite Tokenizer

In [None]:
## Sample preprocessing and Tokenization of Instruction-Based Dataset Using Hugging Face Transformers

from transformers import AutoTokenizer

# Initialize tokenizer
model_name = "ibm-granite/granite-3.0-1b-a400m-base"
tokenizer = AutoTokenizer.from_pretrained(model_name)

def tokenize_function(example):
    input_text = f"### Instruction:\n{example['instruction']}\n\n### Input:\n{example['input']}"
    target_text = example['output']
    model_inputs = tokenizer(input_text, max_length=512, truncation=True, padding="max_length")
    labels = tokenizer(target_text, max_length=512, truncation=True, padding="max_length")
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs


tokenized_datasets = dataset.map(tokenize_function, batched=False)
print(tokenized_datasets)

tokenizer_config.json:   0%|          | 0.00/4.13k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/2.06M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/1.02k [00:00<?, ?B/s]

Map:   0%|          | 0/5 [00:00<?, ? examples/s]

Dataset({
    features: ['instruction', 'input', 'output', 'input_ids', 'attention_mask', 'labels'],
    num_rows: 5
})


In [None]:
%pip install -U kubeflow-training

In [None]:
# the kubeflow-training version must be >= `1.9.0`, as the `env_vars` support was first introduced in this version - https://github.com/kubeflow/trainer/releases/tag/v1.9.0
%pip show kubeflow-training

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


Name: kubeflow-training
Version: 1.9.1
Summary: Training Operator Python SDK
Home-page: https://github.com/kubeflow/training-operator/tree/master/sdk/python
Author: Kubeflow Authors
Author-email: hejinchi@cn.ibm.com
License: Apache License Version 2.0
Location: /opt/app-root/lib64/python3.11/site-packages
Requires: certifi, kubernetes, retrying, setuptools, six, urllib3
Required-by: 
Note: you may need to restart the kernel to use updated packages.


In [None]:
# parameters
num_gpus = "1" # Number of GPUs per worker node
openshift_api_url = "has to be specified"
namespace = "feast-kfto-finetuning" # Update this to match the name of your data science project
token = "has to be specified"
training_image= "quay.io/modh/training:py311-cuda121-torch241"

In [33]:
# create configmap to store driver_stats_training.jsonl files
import os
from kubernetes import client, config
from kubernetes.client.rest import ApiException

# Define ConfigMap name
CONFIGMAP_NAME = "training-config"

# Define headers for API authentication
configuration = client.Configuration()
configuration.host = openshift_api_url
configuration.verify_ssl = False  # Set to True if using a valid CA certificate
configuration.api_key = {"authorization": f"Bearer {token}"}

# Load Kubernetes client configuration
client.Configuration.set_default(configuration)
api_instance = client.CoreV1Api()

# Read training dataset file from the current directory
DATASET_FILE = "driver_stats_training.jsonl"

if os.path.exists(DATASET_FILE):
    with open(DATASET_FILE, "r", encoding="utf-8") as file:
        dataset_content = file.read()
else:
    raise FileNotFoundError(f"Dataset file '{DATASET_FILE}' not found in current directory.")

# Define ConfigMap data
if dataset_content:
    configmap_data = {
        f"{DATASET_FILE}": dataset_content
    }

    # Create the ConfigMap object
    configmap = client.V1ConfigMap(
        metadata=client.V1ObjectMeta(name=CONFIGMAP_NAME),
        data=configmap_data
    )
    
    # Apply ConfigMap to Kubernetes
    try:
        api_instance.create_namespaced_config_map(namespace=namespace, body=configmap)
        print(f"ConfigMap '{CONFIGMAP_NAME}' created successfully in namespace '{namespace}'.")
    except ApiException as e:
        if e.status == 409:
            print(f"ConfigMap '{CONFIGMAP_NAME}' already exists. Updating it...")
            api_instance.replace_namespaced_config_map(name=CONFIGMAP_NAME, namespace=namespace, body=configmap)
        else:
            print(f"Error creating ConfigMap: {e}")


ConfigMap 'training-config' created successfully in namespace 'feast-kfto-finetuning'.


### 8. Define a training function

In [None]:
def train_and_upload():
    import os
    import json
    import torch
    import torch.distributed as dist
    from torch.utils.data import Dataset, DataLoader, DistributedSampler
    from transformers import AutoTokenizer, AutoModelForCausalLM, get_linear_schedule_with_warmup, logging
    from torch.nn.utils.rnn import pad_sequence
    from torch.optim import AdamW
    import boto3
    from botocore.exceptions import ClientError
    from torch.cuda.amp import autocast, GradScaler

    logging.set_verbosity_debug()

    # Helper function to log both global and local rank info
    def log_rank_info(stage: str):
        global_rank = dist.get_rank() if dist.is_initialized() else 0
        local_rank = int(os.environ.get("LOCAL_RANK", "0"))
        device = torch.device("cuda", local_rank)
        try:
            device_name = torch.cuda.get_device_name(local_rank)
        except Exception as e:
            device_name = f"Unknown (error: {e})"
        print(f"[{stage} | Global Rank {global_rank} | Local Rank {local_rank}]: Using device: {device} ({device_name})")

    # Read configuration flags from environment variables
    use_lora = os.getenv("USE_LORA", "false").lower() in ["true", "1"]
    use_qlora = os.getenv("USE_QLORA", "false").lower() in ["true", "1"]
    use_deepspeed = os.getenv("USE_DEEPSPEED", "false").lower() in ["true", "1"]

    # For QLoRA, we require DeepSpeed for optimal performance. Switch on DeepSpeed if needed.
    if use_qlora and not use_deepspeed:
        print("QLoRA typically requires DeepSpeed for optimal performance. Enabling DeepSpeed.")
        use_deepspeed = True

    if use_lora or use_qlora:
        from peft import get_peft_model, LoraConfig, TaskType

    if use_deepspeed:
        import deepspeed

    # Set CUDA memory configuration to help mitigate fragmentation
    os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"

    # Initialize distributed training (NCCL backend)
    dist.init_process_group(backend="nccl")
    local_rank = int(os.environ.get("LOCAL_RANK", "0"))
    torch.cuda.set_device(local_rank)
    log_rank_info("After Device Setup")

    # Load tokenizer
    model_name = os.environ.get("MODEL_NAME", "ibm-granite/granite-3.0-1b-a400m-base")
    tokenizer = AutoTokenizer.from_pretrained(model_name)

    # Load model – if using QLoRA, load in 4-bit mode; otherwise, load normally.
    if use_qlora:
        from transformers import BitsAndBytesConfig
        quantization_config = BitsAndBytesConfig(
            load_in_4bit=True,
            bnb_4bit_use_double_quant=True,
            bnb_4bit_quant_type="nf4",
            bnb_4bit_compute_dtype=torch.float16
        )
        # When using device_map="auto", the model is automatically split across GPUs.
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            quantization_config=quantization_config,
            device_map="auto"
        )
        log_rank_info("After Loading Model in 4-bit Mode (QLoRA)")
    else:
        # Standard loading onto the designated GPU.
        model = AutoModelForCausalLM.from_pretrained(model_name).to(torch.device("cuda", local_rank))
        log_rank_info("After Standard Model Loading")

    # Apply LoRA or QLoRA if enabled.
    if use_lora or use_qlora:
        # Common LoRA configuration
        lora_r = int(os.environ.get("LORA_R", "8"))
        lora_alpha = int(os.environ.get("LORA_ALPHA", "32"))
        lora_dropout = float(os.environ.get("LORA_DROPOUT", "0.1"))
        lora_config = LoraConfig(
            r=lora_r,
            lora_alpha=lora_alpha,
            target_modules=["q_proj", "v_proj"],
            lora_dropout=lora_dropout,
            bias="none",
            task_type=TaskType.CAUSAL_LM,
        )
        # For QLoRA, prepare the model for k-bit training.
        if use_qlora:
            from peft import prepare_model_for_kbit_training
            model = prepare_model_for_kbit_training(model)
        model = get_peft_model(model, lora_config)
        log_rank_info("After Applying LoRA/QLoRA")

    # Custom JSONL dataset
    class JsonlDataset(Dataset):
        def __init__(self, file_path, tokenizer, max_length=512):
            self.samples = []
            with open(file_path, 'r') as f:
                for line in f:
                    record = json.loads(line)
                    self.samples.append(record)
            self.tokenizer = tokenizer
            self.max_length = max_length

        def __len__(self):
            return len(self.samples)

        def __getitem__(self, idx):
            record = self.samples[idx]
            input_text = record.get("input", "")
            target_text = record.get("output", "")
            # Combine input and target using the EOS token (or newline) as separator
            separator = self.tokenizer.eos_token if self.tokenizer.eos_token is not None else "\n"
            combined_text = input_text + separator + target_text
            encoded = self.tokenizer(
                combined_text, truncation=True, max_length=self.max_length, return_tensors="pt"
            )
            return {
                "input_ids": encoded.input_ids.squeeze(0),
                "attention_mask": encoded.attention_mask.squeeze(0),
                "labels": encoded.input_ids.squeeze(0)
            }

    # Collate function for DataLoader
    def collate_fn(batch):
        input_ids = pad_sequence([x["input_ids"] for x in batch], batch_first=True, padding_value=0)
        attention_mask = pad_sequence([x["attention_mask"] for x in batch], batch_first=True, padding_value=0)
        labels = pad_sequence([x["labels"] for x in batch], batch_first=True, padding_value=-100)
        return {"input_ids": input_ids, "attention_mask": attention_mask, "labels": labels}

    # Ensure S3 bucket exists; if not, attempt to create it.
    def ensure_bucket_exists(s3_client, bucket, region):
        try:
            s3_client.head_bucket(Bucket=bucket)
            print(f"Bucket {bucket} exists.")
        except ClientError as e:
            error_code = int(e.response['Error']['Code'])
            if error_code == 404:
                print(f"Bucket {bucket} does not exist. Creating it...")
                try:
                    if region == "us-east-1":
                        s3_client.create_bucket(Bucket=bucket)
                    else:
                        s3_client.create_bucket(
                            Bucket=bucket,
                            CreateBucketConfiguration={'LocationConstraint': region}
                        )
                    print(f"Bucket {bucket} created.")
                except ClientError as create_err:
                    print(f"Failed to create bucket {bucket}: {create_err}")
                    raise create_err
            else:
                print(f"Error checking bucket: {e}")
                raise e

    # Upload a directory recursively to S3
    def upload_directory_to_s3(directory, bucket, s3_prefix, s3_client):
        for root, _, files in os.walk(directory):
            for file in files:
                local_path = os.path.join(root, file)
                relative_path = os.path.relpath(local_path, directory)
                s3_key = os.path.join(s3_prefix, relative_path) if s3_prefix else relative_path
                try:
                    s3_client.upload_file(local_path, bucket, s3_key)
                    print(f"Uploaded {local_path} to s3://{bucket}/{s3_key}")
                except ClientError as e:
                    print(f"Failed to upload {local_path} to S3: {e}")

    # Prepare dataset and DataLoader
    dataset_path = os.environ.get("DATA_PATH", "/mnt/config/driver_stats_training.jsonl")
    output_dir = os.environ.get("OUTPUT_DIR", "fine-tuned-model")
    dataset = JsonlDataset(file_path=dataset_path, tokenizer=tokenizer)
    sampler = DistributedSampler(dataset)
    batch_size = int(os.environ.get("BATCH_SIZE", "2"))
    dataloader = DataLoader(dataset, batch_size=batch_size, sampler=sampler, collate_fn=collate_fn)

    # Setup optimizer and scheduler
    learning_rate = float(os.environ.get("LEARNING_RATE", "5e-5"))
    num_epochs = int(os.environ.get("NUM_EPOCHS", "3"))
    optimizer = AdamW(model.parameters(), lr=learning_rate)
    num_training_steps = len(dataloader) * num_epochs
    scheduler = get_linear_schedule_with_warmup(optimizer, num_warmup_steps=100, num_training_steps=num_training_steps)

    # If using DeepSpeed, initialize the DeepSpeed engine
    if use_deepspeed:
        ds_config = {
            "train_micro_batch_size_per_gpu": batch_size,
            "gradient_accumulation_steps": 1,
            "fp16": {"enabled": True},
            "zero_optimization": {"stage": 2},
        }
        model, optimizer, _, scheduler = deepspeed.initialize(
            model=model,
            optimizer=optimizer,
            args=None,
            config=ds_config
        )
        log_rank_info("After DeepSpeed Initialization")
    else:
        # If not using DeepSpeed and not QLoRA (which is unlikely), you may wrap with FSDP.
        from torch.distributed.fsdp import FullyShardedDataParallel as FSDP, CPUOffload

        def lora_auto_wrap_policy(module, recurse, nonwrapped_numel=0):
            return any(p.requires_grad for p in module.parameters(recurse=False))

        model = FSDP(
            model,
            auto_wrap_policy=lora_auto_wrap_policy,
            cpu_offload=CPUOffload(offload_params=True),
            use_orig_params=True
        )
        log_rank_info("After Wrapping with FSDP")

    # Initialize gradient scaler only if not using DeepSpeed
    if not use_deepspeed:
        scaler = GradScaler()

    # Training loop
    if use_deepspeed:
        model.train()
        for epoch in range(num_epochs):
            sampler.set_epoch(epoch)
            for batch in dataloader:
                # Ensure batch tensors are on the correct device.
                batch = {k: v.to(torch.device("cuda", local_rank)) for k, v in batch.items()}
                optimizer.zero_grad()
                outputs = model(**batch)
                loss = outputs.loss
                model.backward(loss)
                optimizer.step()
                if scheduler is not None:
                    scheduler.step()
                # Log progress with global and local rank.
                if dist.get_rank() == 0:
                    print(f"[Epoch {epoch} | Global Rank {dist.get_rank()} | Local Rank {local_rank}]: Loss: {loss.item()}")
    else:
        model.train()
        for epoch in range(num_epochs):
            sampler.set_epoch(epoch)
            for batch in dataloader:
                batch = {k: v.to(torch.device("cuda", local_rank)) for k, v in batch.items()}
                optimizer.zero_grad()
                with autocast():
                    outputs = model(**batch)
                    loss = outputs.loss
                scaler.scale(loss).backward()
                scaler.step(optimizer)
                scaler.update()
                scheduler.step()
                if dist.get_rank() == 0:
                    print(f"[Epoch {epoch} | Global Rank {dist.get_rank()} | Local Rank {local_rank}]: Loss: {loss.item()}")

    # Only global rank 0 should save and upload the model
    if dist.get_rank() == 0:
        os.makedirs(output_dir, exist_ok=True)
        model_to_save = model.module if hasattr(model, "module") else model
        model_to_save.save_pretrained(output_dir)
        tokenizer.save_pretrained(output_dir)
        print("Model saved locally.")

        s3_bucket = os.environ.get("AWS_S3_BUCKET") or "feast-kfto-demo"
        s3_prefix = os.environ.get("AWS_S3_PREFIX") or "models/fine-tuned-granite"
        aws_access_key = os.environ.get("AWS_ACCESS_KEY_ID")
        aws_secret_key = os.environ.get("AWS_SECRET_ACCESS_KEY")
        aws_region = os.environ.get("AWS_DEFAULT_REGION")
        creds=[s3_bucket, aws_access_key, aws_secret_key, aws_region]
        print(creds)
        if not all(creds):
            print("S3 credentials (AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, AWS_DEFAULT_REGION) not provided. Skipping upload to S3.")
        else:
            s3_client = boto3.client(
                "s3",
                aws_access_key_id=aws_access_key,
                aws_secret_access_key=aws_secret_key,
                region_name=aws_region,
            )
            ensure_bucket_exists(s3_client, s3_bucket, aws_region)
            print(f"Uploading model to s3://{s3_bucket}/{s3_prefix} ...")
            upload_directory_to_s3(output_dir, s3_bucket, s3_prefix, s3_client)

    # Cleanup the distributed training process
    dist.destroy_process_group()

### 9. API Client Initialization with kubeflow-training : Setting Up OpenShift Authentication with Token

In [35]:
import sys
from kubeflow.training import TrainingClient
from kubernetes import client
import time

In [38]:
# Configure the API client with the server and token
configuration = client.Configuration()
configuration.host = openshift_api_url
configuration.api_key = {"authorization": f"Bearer {token}"}
configuration.verify_ssl = False  # Disable SSL verification

# Initialize API client and TrainingClient with the configuration
api_client = client.ApiClient(configuration)
client = TrainingClient(client_configuration=api_client.configuration)

print("Initialized API client and authorized using token-authentication successfully!")

Initialized API client and authorized using token-authentication successfully!


### 10. Job Submission to be managed by the Openshift AI's Training-Operator: Launching a PyTorch Distributed Training Job with LoRA/QLoRA and FSDP/DeepSpeed

In [None]:
import os
client.create_job(
    name="pytorch-distributed-trainer",
    namespace=namespace,
    train_func=train_and_upload,  # The training function
    num_workers=2,
    resources_per_worker={"gpu": num_gpus},
    base_image=training_image,
    packages_to_install=[
        "transformers~=4.46.0",
        "boto3~=1.26.0",
        "datasets~=2.21.0",
        "deepspeed~=0.16.5",
        "peft~=0.4.0"
    ],
    env_vars={
       "NCCL_DEBUG": "INFO",
       "TORCH_DISTRIBUTED_DEBUG": "DETAIL",
       "MODEL_NAME": "ibm-granite/granite-3.0-1b-a400m-base",
       "OUTPUT_DIR": "fine-tuned-llama",
       "DATA_PATH": "/mnt/config/driver_stats_training.jsonl",
       "AWS_S3_PREFIX": "models/fine-tuned-llama",
       "AWS_S3_BUCKET": os.environ.get("AWS_S3_BUCKET"),
       "AWS_ACCESS_KEY_ID": os.environ.get("AWS_ACCESS_KEY_ID"),
       "AWS_SECRET_ACCESS_KEY": os.environ.get("AWS_SECRET_ACCESS_KEY"),
       "AWS_DEFAULT_REGION": os.environ.get("AWS_DEFAULT_REGION"),
       "HF_TOKEN": os.environ.get("HF_TOKEN"),
       "USE_DEEPSPEED": "true", # Whether to use DeepSpeed for distributed training. When 'false' uses FSDP by default.
       "USE_LORA": "true", # Whether to apply LoRA adapters in the standard (full‑precision) mode.
       "USE_QLORA":"false", # Whether to apply QLoRA, which loads the model in 4‑bit quantized mode and then applies LoRA adapters.
    }, 
    # labels={"kueue.x-k8s.io/queue-name": "<LOCAL_QUEUE_NAME>"}, # Optional: Add local queue name and uncomment these lines if using Kueue for resource management
    volume_mounts=[
        {
           "name": "config-volume",
           "mountPath": "/mnt/config"  # Directory where training dataset files will be available
        }
    ],
    volumes=[
       {
           "name": "config-volume",
           "configMap": {"name": "training-config"}  # Reference ConfigMap consisting training dataset file
       }
    ]
)

### 11. Related methods :

In [None]:
print("waiting...")
while not client.is_job_succeeded(name="pytorch-distributed-trainer", namespace=namespace):
    time.sleep(1)
print("PytorchJob Succeeded!")

waiting...
PytorchJob Succeeded!


In [94]:
# Get pod names for the Training Job.
client.get_job_pod_names("pytorch-distributed-trainer", namespace)

['pytorch-distributed-trainer-master-0',
 'pytorch-distributed-trainer-worker-0']

In [None]:
# Get the logs from TrainJob
client.get_job_logs("pytorch-distributed-trainer", namespace,"PyTorchJob")

In [None]:
# Delete the Training Job
client.delete_job("pytorch-distributed-trainer", namespace)