# LLM Model Exploration

This notebook demonstrates how to use the LLM Playground for exploring and experimenting with different Large Language Models (LLMs).

## Setup

First, let's make sure we have the right path setup and imports.

In [None]:
import sys
import os

# Add the project root to the path
project_root = os.path.abspath(os.path.join(os.getcwd(), '..'))
if project_root not in sys.path:
    sys.path.append(project_root)

# Import the model class
from src.inference.model import LLMModel

# Import other useful libraries
import torch
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time

## Loading a Model

Let's check if we have the required resources for running models.

In [None]:
# Check if CUDA is available
print(f"CUDA available: {torch.cuda.is_available()}")
if torch.cuda.is_available():
    print(f"GPU: {torch.cuda.get_device_name(0)}")
    print(f"Memory allocated: {torch.cuda.memory_allocated(0) / 1e9:.2f} GB")
    print(f"Memory reserved: {torch.cuda.memory_reserved(0) / 1e9:.2f} GB")

### Note on Model Access

To use actual models from Hugging Face, you need:
1. A Hugging Face account
2. Authentication via `huggingface-cli login`
3. Access to the models (some require explicit approval)

For this notebook, we'll use mock mode to demonstrate functionality without actually loading models.

In [None]:
# Define the model we want to use
model_name = "mistralai/Mistral-7B-Instruct-v0.2"  # You can change this to another model

# Load the model in mock mode for demonstration
print(f"Loading model {model_name} in mock mode...")
model = LLMModel(
    model_name_or_path=model_name,
    use_half_precision=True,  # Use FP16 for faster inference and less memory usage
    mock_mode=True  # Use mock mode to avoid actual model loading
)
print("Model 'loaded' successfully (in mock mode)!")

## Basic Text Generation

Now let's demonstrate text generation with our model in mock mode.

In [None]:
# Test with a simple prompt
simple_prompt = "What is machine learning?"
print(f"Prompt: {simple_prompt}")

start_time = time.time()
response = model.generate(simple_prompt, max_new_tokens=256)
end_time = time.time()

print(f"\nResponse (generated in {end_time - start_time:.2f} seconds):")
print(response)

## Demonstrating Different Parameters

In a real setup, these parameters would affect the model output.

In [None]:
# Create a function to generate text and measure time
def generate_and_measure(prompt, **kwargs):
    start_time = time.time()
    response = model.generate(prompt, **kwargs)
    end_time = time.time()
    
    print(f"\nGenerated in {end_time - start_time:.2f} seconds with settings:")
    for k, v in kwargs.items():
        print(f"  {k}: {v}")
    print(f"\nResponse:\n{response}")
    
    return response, end_time - start_time

In [None]:
creative_prompt = "Write a short poem about artificial intelligence and the future of humanity."
print(f"Prompt: {creative_prompt}")

# Demonstrate different temperature values
# Higher temperature (>1.0) = more random/creative
# Lower temperature (<1.0) = more deterministic/focused
responses = []

for temp in [0.3, 0.7, 1.2]:
    resp, time_taken = generate_and_measure(
        creative_prompt, 
        temperature=temp,
        max_new_tokens=200
    )
    responses.append({"temperature": temp, "response": resp, "time": time_taken})

## Using the Real Model

If you want to use the real model (not in mock mode), you would:

1. Authenticate with Hugging Face:
```python
!huggingface-cli login
```

2. Load the model without mock mode:
```python
model = LLMModel(
    model_name_or_path=model_name,
    use_half_precision=True  # Use FP16 for faster inference and less memory usage
)
```

Note that loading real LLMs requires significant GPU memory and compute resources.

## Conclusion

This notebook demonstrated the structure and API of the LLM Playground. In a real setup with proper authentication and hardware, you would be able to:

1. Load and use actual LLMs from Hugging Face
2. Experiment with different prompting techniques
3. Compare performance across models
4. Fine-tune models on custom datasets

Check out the other components of this repository for more advanced usage examples.