# 📓 The GenAI Revolution Cookbook

**Title:** Mastering Domain-Specific LLM Customization: Techniques and Tools Unveiled

**Description:** Discover how to tailor Large Language Models for specific domains using Retrieval-Augmented Generation, fine-tuning, and prompt engineering to boost relevance and accuracy.

---

*This jupyter notebook contains executable code examples. Run the cells below to try out the code yourself!*



# Introduction

In the rapidly evolving field of Generative AI, deploying and maintaining scalable, secure, and production-ready solutions is crucial for AI builders. This tutorial will guide you through the process of taking a prototype to production, focusing on deploying, optimizing, and maintaining AI applications using frameworks like LangChain, Hugging Face, and ChromaDB. We will cover the entire journey, from setting up the environment to implementing a human-in-the-loop feedback system for continuous improvement.

# Setup & Installation

To begin, we need to install the necessary libraries. Ensure you are using a virtual environment to avoid dependency conflicts.

In [None]:
# Install core libraries for LLM customization
# transformers: Hugging Face library for pre-trained models and fine-tuning
# langchain: Framework for building LLM-powered applications with RAG capabilities
# chromadb: Vector database for efficient similarity search and retrieval

!pip install transformers>=4.30.0
!pip install langchain>=0.1.0
!pip install chromadb>=0.4.0

# Note: Consider using a virtual environment to avoid dependency conflicts
# Run: python -m venv llm_env && source llm_env/bin/activate (Linux/Mac)

# Step-by-Step Walkthrough

## Data Collection and Preparation

First, we need to load and preprocess our domain-specific data. This step ensures that our model is trained on clean and relevant data.

In [None]:
# Purpose: Load and preprocess domain-specific data for LLM training
import pandas as pd
import logging

# Configure logging for tracking data processing steps
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)

def preprocess_domain_data(file_path, text_column='text'):
    """
    Load and preprocess domain-specific dataset for LLM training.
    
    Args:
        file_path (str): Path to the CSV file containing domain data
        text_column (str): Name of the column containing text data
    
    Returns:
        pd.DataFrame: Preprocessed dataframe with cleaned text
    
    Raises:
        FileNotFoundError: If the specified file doesn't exist
        KeyError: If the text column is not found in the dataset
    """
    try:
        # Load the dataset from CSV file
        logger.info(f"Loading dataset from {file_path}")
        data = pd.read_csv(file_path)
        
        # Validate that the text column exists
        if text_column not in data.columns:
            raise KeyError(f"Column '{text_column}' not found in dataset")
        
        # Remove rows with missing text values to ensure data quality
        initial_rows = len(data)
        data = data.dropna(subset=[text_column])
        logger.info(f"Removed {initial_rows - len(data)} rows with missing text")
        
        # Normalize text: convert to lowercase and remove leading/trailing whitespace
        data[text_column] = data[text_column].apply(lambda x: x.lower().strip())
        
        # Remove duplicate entries to prevent model overfitting on repeated examples
        data = data.drop_duplicates(subset=[text_column])
        logger.info(f"Final dataset size: {len(data)} rows")
        
        return data
    
    except FileNotFoundError:
        logger.error(f"File not found: {file_path}")
        raise
    except Exception as e:
        logger.error(f"Error during preprocessing: {str(e)}")
        raise

# Example usage
data = preprocess_domain_data('domain_specific_data.csv')

## Model Training and Fine-Tuning

With our data prepared, we can now fine-tune a pre-trained model using Hugging Face Transformers. This section will guide you through the setup and execution of the training process.

In [None]:
# Purpose: Fine-tune a pre-trained model on domain-specific data
from transformers import (
    Trainer, 
    TrainingArguments, 
    AutoModelForSequenceClassification,
    AutoTokenizer,
    DataCollatorWithPadding
)
from datasets import Dataset
import torch

def prepare_and_train_model(train_data, eval_data, num_labels=2, model_name='bert-base-uncased'):
    """
    Prepare datasets, tokenize, and fine-tune a pre-trained model.
    
    Args:
        train_data (pd.DataFrame): Training dataset with 'text' and 'label' columns
        eval_data (pd.DataFrame): Evaluation dataset with 'text' and 'label' columns
        num_labels (int): Number of classification labels
        model_name (str): Hugging Face model identifier
    
    Returns:
        Trainer: Trained model trainer object
    
    Raises:
        ValueError: If datasets are empty or missing required columns
    """
    # Validate input data
    if train_data.empty or eval_data.empty:
        raise ValueError("Training or evaluation data is empty")
    
    # Load tokenizer for the specified pre-trained model
    tokenizer = AutoTokenizer.from_pretrained(model_name)
    
    def tokenize_function(examples):
        """
        Tokenize text examples with padding and truncation.
        
        Args:
            examples (dict): Batch of text examples
        
        Returns:
            dict: Tokenized examples with input_ids, attention_mask, etc.
        """
        return tokenizer(examples['text'], truncation=True, max_length=512)
    
    # Convert pandas DataFrames to Hugging Face Dataset format
    train_dataset = Dataset.from_pandas(train_data[['text', 'label']])
    eval_dataset = Dataset.from_pandas(eval_data[['text', 'label']])
    
    # Apply tokenization to all examples in the datasets
    train_dataset = train_dataset.map(tokenize_function, batched=True)
    eval_dataset = eval_dataset.map(tokenize_function, batched=True)
    
    # Load pre-trained model with classification head for domain-specific task
    model = AutoModelForSequenceClassification.from_pretrained(
        model_name, 
        num_labels=num_labels
    )
    
    # Data collator handles dynamic padding to the longest sequence in each batch
    data_collator = DataCollatorWithPadding(tokenizer=tokenizer)
    
    # Define training hyperparameters
    training_args = TrainingArguments(
        output_dir='./results',              # Directory to save model checkpoints
        num_train_epochs=3,                  # Number of complete passes through training data
        per_device_train_batch_size=16,      # Batch size per GPU/CPU (adjust based on memory)
        per_device_eval_batch_size=32,       # Larger batch for evaluation (no gradients needed)
        learning_rate=2e-5,                  # Learning rate (2e-5 is standard for BERT fine-tuning)
        weight_decay=0.01,                   # L2 regularization to prevent overfitting
        evaluation_strategy="epoch",         # Evaluate after each epoch
        save_strategy="epoch",               # Save checkpoint after each epoch
        load_best_model_at_end=True,         # Load best model based on evaluation metric
        metric_for_best_model="f1",          # Use F1 score to determine best model
        logging_dir='./logs',                # Directory for training logs
        logging_steps=100,                   # Log metrics every 100 steps
        warmup_steps=500,                    # Gradual learning rate warmup for stability
        fp16=torch.cuda.is_available(),      # Use mixed precision training if GPU available
    )
    
    # Initialize Trainer with model, datasets, and training configuration
    trainer = Trainer(
        model=model,
        args=training_args,
        train_dataset=train_dataset,
        eval_dataset=eval_dataset,
        tokenizer=tokenizer,
        data_collator=data_collator,
        compute_metrics=compute_metrics,     # Function to compute evaluation metrics
    )
    
    # Start training process
    trainer.train()
    
    return trainer

# Example usage (assuming train_data and eval_data are prepared DataFrames)
# trainer = prepare_and_train_model(train_data, eval_data)

For a comprehensive guide on fine-tuning these models, see our article on [mastering fine-tuning of large language models with Hugging Face](/blog/44830763/mastering-fine-tuning-of-large-language-models-with-hugging-face).

## Evaluation and Optimization

After training, it's essential to evaluate the model's performance and optimize it for deployment.

In [None]:
# Purpose: Compute evaluation metrics for model performance assessment
from sklearn.metrics import (
    accuracy_score, 
    precision_recall_fscore_support,
    confusion_matrix,
    classification_report
)
import numpy as np
import logging

logger = logging.getLogger(__name__)

def compute_metrics(pred):
    """
    Compute comprehensive evaluation metrics for model predictions.
    
    This function calculates accuracy, precision, recall, and F1 score
    to assess model performance on domain-specific tasks.
    
    Args:
        pred (EvalPrediction): Object containing predictions and labels
            - predictions: Model output logits (shape: [batch_size, num_labels])
            - label_ids: Ground truth labels (shape: [batch_size])
    
    Returns:
        dict: Dictionary containing computed metrics:
            - accuracy: Overall classification accuracy
            - f1: F1 score (harmonic mean of precision and recall)
            - precision: Ratio of true positives to predicted positives
            - recall: Ratio of true positives to actual positives
    
    Raises:
        ValueError: If predictions and labels have mismatched shapes
    """
    labels = pred.label_ids
    preds = pred.predictions.argmax(-1)
    
    if len(preds) != len(labels):
        raise ValueError(f"Predictions ({len(preds)}) and labels ({len(labels)}) length mismatch")
    
    num_classes = len(np.unique(labels))
    average_method = 'binary' if num_classes == 2 else 'weighted'
    
    precision, recall, f1, _ = precision_recall_fscore_support(
        labels, 
        preds, 
        average=average_method,
        zero_division=0
    )
    
    acc = accuracy_score(labels, preds)
    
    logger.info("\nClassification Report:")
    logger.info(classification_report(labels, preds))
    
    return {
        'accuracy': float(acc),
        'f1': float(f1),
        'precision': float(precision),
        'recall': float(recall)
    }

def evaluate_model_performance(trainer, test_dataset):
    """
    Perform comprehensive model evaluation on test dataset.
    
    Args:
        trainer (Trainer): Trained model trainer object
        test_dataset (Dataset): Test dataset for evaluation
    
    Returns:
        dict: Evaluation metrics and confusion matrix
    """
    eval_results = trainer.evaluate(test_dataset)
    
    predictions = trainer.predict(test_dataset)
    preds = predictions.predictions.argmax(-1)
    labels = predictions.label_ids
    
    cm = confusion_matrix(labels, preds)
    
    logger.info(f"\nConfusion Matrix:\n{cm}")
    logger.info(f"Evaluation Results: {eval_results}")
    
    return {
        'metrics': eval_results,
        'confusion_matrix': cm
    }

# Example usage
# results = evaluate_model_performance(trainer, test_dataset)

## Incorporating Human-in-the-Loop Feedback

To ensure continuous improvement, we implement a human-in-the-loop feedback system. This system allows for expert corrections and active learning strategies.

In [None]:
# Purpose: Implement human-in-the-loop feedback system for continuous model improvement
import json
from datetime import datetime
from typing import List, Dict, Any
import logging

logger = logging.getLogger(__name__)

class FeedbackLoop:
    """
    Human-in-the-loop feedback system for iterative model refinement.
    
    This class manages the collection of expert feedback, stores corrections,
    and facilitates model retraining with improved data.
    """
    
    def __init__(self, feedback_file='feedback_log.json'):
        """
        Initialize feedback loop system.
        
        Args:
            feedback_file (str): Path to JSON file for storing feedback history
        """
        self.feedback_file = feedback_file
        self.feedback_history = []
        
    def collect_feedback(self, model_input: str, model_output: str, 
                        expert_correction: str, confidence_score: float) -> Dict[str, Any]:
        """
        Collect and store expert feedback on model predictions.
        
        Args:
            model_input (str): Original input text to the model
            model_output (str): Model's predicted output
            expert_correction (str): Expert's corrected output
            confidence_score (float): Model's confidence in its prediction (0-1)
        
        Returns:
            dict: Feedback entry with metadata
        """
        feedback_entry = {
            'timestamp': datetime.now().isoformat(),
            'input': model_input,
            'model_output': model_output,
            'expert_correction': expert_correction,
            'confidence_score': confidence_score,
            'requires_retraining': model_output != expert_correction
        }
        
        self.feedback_history.append(feedback_entry)
        self._save_feedback()
        
        logger.info(f"Feedback collected: {'Correction needed' if feedback_entry['requires_retraining'] else 'Confirmed correct'}")
        
        return feedback_entry
    
    def _save_feedback(self):
        """Save feedback history to JSON file for persistence."""
        try:
            with open(self.feedback_file, 'w') as f:
                json.dump(self.feedback_history, f, indent=2)
        except IOError as e:
            logger.error(f"Failed to save feedback: {str(e)}")
    
    def get_correction_dataset(self, min_confidence_threshold: float = 0.7) -> List[Dict[str, str]]:
        """
        Extract corrections for model retraining, prioritizing low-confidence errors.
        
        Args:
            min_confidence_threshold (float): Only include corrections where model
                                             confidence was above this threshold
                                             (indicates systematic errors)
        
        Returns:
            list: Training examples with corrected labels
        """
        corrections = [
            {
                'text': entry['input'],
                'label': entry['expert_correction'],
                'original_prediction': entry['model_output'],
                'confidence': entry['confidence_score']
            }
            for entry in self.feedback_history
            if entry['requires_retraining'] and entry['confidence_score'] >= min_confidence_threshold
        ]
        
        logger.info(f"Extracted {len(corrections)} high-confidence corrections for retraining")
        
        return corrections
    
    def active_learning_selection(self, predictions: List[Dict[str, Any]], 
                                  uncertainty_threshold: float = 0.5,
                                  sample_size: int = 100) -> List[Dict[str, Any]]:
        """
        Select uncertain predictions for expert review using active learning.
        
        This method identifies predictions where the model is least confident,
        prioritizing them for human review to maximize learning efficiency.
        
        Args:
            predictions (list): List of model predictions with confidence scores
            uncertainty_threshold (float): Confidence threshold below which to flag
            sample_size (int): Maximum number of samples to select for review
        
        Returns:
            list: Selected samples for expert review
        """
        uncertain_samples = [
            pred for pred in predictions
            if pred['confidence'] < uncertainty_threshold
        ]
        
        logger.info(f"Selected {min(len(uncertain_samples), sample_size)} samples for active learning review")
        
        return uncertain_samples[:sample_size]

# Example usage
# feedback_loop = FeedbackLoop()
# feedback_entry = feedback_loop.collect_feedback("input text", "model output", "corrected output", 0.6)

# Conclusion

In this tutorial, we have walked through the process of deploying, optimizing, and maintaining a GenAI application. From setting up the environment and preparing data to fine-tuning models and incorporating human feedback, each step is crucial for building robust AI solutions. By following these guidelines, AI builders can ensure their applications are not only effective but also continuously improving. Consider exploring further extensions such as integrating additional data sources or deploying models using cloud platforms for scalability.