# Phi-3 Mini Tool Calling Laboratory

## Introduction
Welcome to the Phi-3 Tool Calling Laboratory! In this session, we will explore how to install and use Microsoft's Phi-3-mini locally with function calling capabilities.

### Learning Objectives
By the end of this laboratory, you will be able to:
1. Install and run Phi-3-mini locally using Ollama or Hugging Face
2. Define custom tools/functions for the model to use
3. Implement tool calling workflows
4. Handle tool execution and response processing
5. Create complex multi-tool scenarios
6. Optimize performance for local deployment

### Why Phi-3-mini?
- **Lightweight**: Only ~3GB of RAM required
- **Tool Calling**: Excellent support for function calling
- **Educational**: Perfect for learning and experimentation
- **Open Source**: No API keys or gated access required

### Prerequisites
- Python 3.9+
- At least 8GB RAM (16GB+ recommended)
- Optional: CUDA-compatible GPU for faster inference

## Step 1: Setup and Import Libraries

First, let's import all necessary libraries and check our environment.

In [1]:
# Core libraries
import os
import sys
import json
import time
from typing import Dict, Any, List, Optional

# Add tools directory to path
sys.path.append('./tools')

# Import our custom tools
try:
    from custom_tools import (
        CalculatorTool, WeatherTool, FileOperationsTool, TextAnalysisTool,
        TOOL_SCHEMAS, execute_tool
    )
    from ollama_client import OllamaClient, format_tool_call_prompt, extract_tool_call
    print("✅ Custom tools imported successfully")
except ImportError as e:
    print(f"⚠️ Could not import custom tools: {e}")
    print("📝 Note: Some imports may fail if dependencies aren't installed yet")

# Check Python version
print(f"🐍 Python version: {sys.version}")
print(f"📁 Current working directory: {os.getcwd()}")

✅ Custom tools imported successfully
🐍 Python version: 3.10.14 (main, Jun 10 2024, 11:31:15) [Clang 15.0.0 (clang-1500.3.9.4)]
📁 Current working directory: /Volumes/Firestore/codeforge/rbs-nlp-rl-dm/llm-tool-calling


## Step 2: Choose Your Installation Method

We'll support two methods for running Phi-3-mini locally:
1. **Ollama** (Recommended - easier setup)
2. **Hugging Face Transformers** (More control)

Let's check what's available:

In [2]:
# Check if Ollama is available
def check_ollama():
    try:
        import requests
        response = requests.get("http://localhost:11434/api/tags", timeout=5)
        return response.status_code == 200
    except:
        return False

# Check if transformers is available
def check_transformers():
    try:
        import transformers
        import torch
        return True
    except ImportError:
        return False

# Check available methods
ollama_available = check_ollama()
transformers_available = check_transformers()

print("🔍 Environment Check:")
print(f"  Ollama available: {'✅' if ollama_available else '❌'}")
print(f"  Transformers available: {'✅' if transformers_available else '❌'}")

if not ollama_available and not transformers_available:
    print("\n⚠️ Neither Ollama nor Transformers is available.")
    print("Please install one of them:")
    print("  Option 1: Install Ollama from https://ollama.ai/")
    print("  Option 2: pip install transformers torch")
elif ollama_available:
    print("\n🎉 Great! Ollama is running. We'll use that.")
    USE_OLLAMA = True
else:
    print("\n📦 Using Hugging Face Transformers.")
    USE_OLLAMA = False

🔍 Environment Check:
  Ollama available: ❌
  Transformers available: ✅

📦 Using Hugging Face Transformers.


## Step 3: Initialize the Model

Now let's initialize our chosen method for running Phi-3-mini:

In [4]:
if 'USE_OLLAMA' in locals() and USE_OLLAMA:
    # Using Ollama
    print("🚀 Initializing Ollama client...")
    
    try:
        client = OllamaClient()
        
        # Check available models
        models = client.list_models()
        print(f"📋 Available models: {models}")
        
        # Test connection
        test_response = client.generate("Hello! Please respond with 'Connection successful.'")
        print(f"🧪 Test response: {test_response[:100]}...")
        
        MODEL_INITIALIZED = True
        print("✅ Ollama client initialized successfully!")
        
    except Exception as e:
        print(f"❌ Error initializing Ollama: {e}")
        MODEL_INITIALIZED = False

else:
    # Using Hugging Face Transformers
    print("🚀 Initializing Hugging Face Transformers...")
    
    try:
        from transformers import AutoTokenizer, AutoModelForCausalLM
        import torch
        import getpass
        
        # Check for Hugging Face token
        hf_token = os.getenv('HUGGINGFACE_HUB_TOKEN')
        
        if not hf_token:
            print("🔑 Hugging Face API key required for gated models")
            print("You can get your token from: https://huggingface.co/settings/tokens")
            hf_token = getpass.getpass("Enter your Hugging Face API key: ")
            
            # Optionally save to environment for this session
            os.environ['HUGGINGFACE_HUB_TOKEN'] = hf_token
        
        print("✅ Using Hugging Face API key")
        
        # Using Phi-3-mini: lightweight model with excellent tool calling capabilities
        model_name = "microsoft/Phi-3-mini-4k-instruct"
        
        print(f"📥 Loading tokenizer from {model_name}...")
        print("🎯 Phi-3-mini is specifically designed for tool calling and instruction following")
        
        tokenizer = AutoTokenizer.from_pretrained(
            model_name, 
            trust_remote_code=True,
            use_fast=True
        )
        
        print(f"📥 Loading model from {model_name}...")
        print("⏰ This may take several minutes on first run...")
        
        # Use appropriate device
        device = "cuda" if torch.cuda.is_available() else "cpu"
        print(f"🖥️ Using device: {device}")
        
        # Load model with improved cache configuration
        model = AutoModelForCausalLM.from_pretrained(
            model_name,
            torch_dtype=torch.float16 if device == "cuda" else torch.float32,
            device_map="auto" if device == "cuda" else None,
            trust_remote_code=True,
            use_cache=True,
            attn_implementation="eager"  # Use eager attention to avoid cache issues
        )
        
        # Add padding token if it doesn't exist
        if tokenizer.pad_token is None:
            tokenizer.pad_token = tokenizer.eos_token
        
        # Test the model with simpler generation parameters
        test_input = tokenizer("Hello! Please respond with 'Model loaded successfully.'", return_tensors="pt")
        if device == "cuda":
            test_input = {k: v.to(device) for k, v in test_input.items()}
        
        with torch.no_grad():
            test_output = model.generate(
                **test_input, 
                max_new_tokens=20,  # Use max_new_tokens instead of max_length
                do_sample=True, 
                temperature=0.7,
                pad_token_id=tokenizer.pad_token_id,
                eos_token_id=tokenizer.eos_token_id,
                use_cache=False  # Disable cache for initial test
            )
        
        test_response = tokenizer.decode(test_output[0], skip_special_tokens=True)
        print(f"🧪 Test response: {test_response[:100]}...")
        
        MODEL_INITIALIZED = True
        print("✅ Hugging Face model initialized successfully!")
        
    except Exception as e:
        print(f"❌ Error initializing Phi-3-mini model: {e}")
        print("\n💡 Troubleshooting tips:")
        print("   1. Ensure you have sufficient memory (Phi-3-mini needs ~3GB RAM)")
        print("   2. Check your internet connection for model downloads")
        print("   3. Try restarting the kernel if you see memory issues")
        print("   4. Make sure transformers>=4.36.0 is installed")
        print("   5. Try: pip install --upgrade transformers torch")
        MODEL_INITIALIZED = False

if not MODEL_INITIALIZED:
    print("\n⚠️ Model initialization failed. Please check the setup.")
    print("You can still run the tool definition examples below.")

🚀 Initializing Hugging Face Transformers...
✅ Using Hugging Face API key
📥 Loading tokenizer from microsoft/Phi-3-mini-4k-instruct...
🎯 Phi-3-mini is specifically designed for tool calling and instruction following
📥 Loading model from microsoft/Phi-3-mini-4k-instruct...
⏰ This may take several minutes on first run...
🖥️ Using device: cpu
📥 Loading model from microsoft/Phi-3-mini-4k-instruct...
⏰ This may take several minutes on first run...
🖥️ Using device: cpu


Loading checkpoint shards:   0%|          | 0/2 [00:00<?, ?it/s]

You are not running the flash-attention implementation, expect numerical differences.


KeyboardInterrupt: 

## Step 4: Define Tools and Functions

Let's explore the tools we've defined and understand how they work:

In [None]:
# Display available tools
print("🛠️ Available Tools:")
print("=" * 50)

for tool_name, tool_schema in TOOL_SCHEMAS.items():
    func_info = tool_schema['function']
    print(f"\n📋 {func_info['name']}")
    print(f"   Description: {func_info['description']}")
    print(f"   Parameters: {list(func_info['parameters']['properties'].keys())}")

print("\n" + "=" * 50)

In [None]:
# Test our tools manually
print("🧪 Testing Tools Manually:")
print("=" * 30)

# Test calculator
print("\n🔢 Calculator Test:")
result1 = execute_tool("calculator_add", {"a": 15, "b": 27})
print(f"  15 + 27 = {result1}")

result2 = execute_tool("calculator_multiply", {"a": 8, "b": 7})
print(f"  8 × 7 = {result2}")

# Test weather (mock)
print("\n🌤️ Weather Test:")
weather = execute_tool("get_weather", {"city": "Milan", "country": "IT"})
print(f"  Weather in Milan: {json.dumps(weather, indent=2)}")

# Test text analysis
print("\n📝 Text Analysis Test:")
sample_text = "Machine learning and artificial intelligence are transforming the way we work with data. Natural language processing enables computers to understand human language."
analysis = execute_tool("analyze_text", {"text": sample_text, "max_keywords": 5})
print(f"  Analysis results: {json.dumps(analysis, indent=2)}")

print("\n✅ All tools working correctly!")

## Step 5: Basic Tool Calling with Phi-3

Now let's implement tool calling with our Phi-3-mini model:

In [None]:
def process_user_request(user_message: str, available_tools: List[Dict] = None) -> str:
    """Process a user request that might require tool calling"""
    
    if available_tools is None:
        available_tools = list(TOOL_SCHEMAS.values())
    
    print(f"👤 User: {user_message}")
    print("🤖 Processing...")
    
    if not MODEL_INITIALIZED:
        return "❌ Model not initialized. Cannot process request."
    
    try:
        if USE_OLLAMA:
            # Format prompt for tool calling
            prompt = format_tool_call_prompt(user_message, available_tools)
            
            # Get model response
            response = client.generate(prompt)
            print(f"🧠 Model response: {response}")
            
            # Check if the model wants to use a tool
            tool_call = extract_tool_call(response)
            
            if tool_call:
                print(f"🛠️ Tool call detected: {tool_call}")
                
                # Execute the tool
                tool_name = tool_call['name']
                parameters = tool_call['parameters']
                
                try:
                    tool_result = execute_tool(tool_name, parameters)
                    print(f"⚙️ Tool result: {tool_result}")
                    
                    # Generate final response incorporating tool result
                    final_prompt = f"""The user asked: {user_message}
                    
You used the tool '{tool_name}' with parameters {parameters} and got this result: {tool_result}
                    
Please provide a helpful response to the user incorporating this information:"""
                    
                    final_response = client.generate(final_prompt)
                    return final_response
                    
                except Exception as e:
                    return f"❌ Error executing tool: {e}"
            else:
                # No tool needed, return model response
                return response
                
        else:
            # Using Phi-3-mini with optimized prompt format
            prompt = f"""<|system|>
You are a helpful AI assistant with access to tools. When the user requests something that requires tools, respond with the appropriate tool call.

Available tools:
- calculator_add: Add two numbers
- calculator_multiply: Multiply two numbers  
- calculator_divide: Divide two numbers
- get_weather: Get weather information for a city
- analyze_text: Analyze text and extract keywords

<|user|>
{user_message}

<|assistant|>
"""
            
            inputs = tokenizer(prompt, return_tensors="pt")
            if torch.cuda.is_available():
                inputs = {k: v.to(device) for k, v in inputs.items()}
            
            with torch.no_grad():
                outputs = model.generate(
                    **inputs,
                    max_new_tokens=150,  # Use max_new_tokens instead of max_length
                    do_sample=True,
                    temperature=0.7,
                    top_p=0.9,
                    pad_token_id=tokenizer.eos_token_id,
                    eos_token_id=tokenizer.eos_token_id,
                    use_cache=False,  # Disable cache to avoid DynamicCache issues
                    repetition_penalty=1.1
                )
            
            response = tokenizer.decode(outputs[0], skip_special_tokens=True)
            # Extract just the assistant's response
            if "<|assistant|>" in response:
                response = response.split("<|assistant|>")[-1].strip()
            
            return response
            
    except Exception as e:
        return f"❌ Error processing request: {e}"

print("🎯 Tool calling system ready!")

## Step 6: Interactive Examples

Let's test our tool calling system with various requests:

In [None]:
# Example 1: Simple calculation
print("📊 Example 1: Calculation Request")
print("=" * 40)

response1 = process_user_request("What is 156 multiplied by 23?")
print(f"🤖 Response: {response1}")
print("\n")

In [None]:
# Example 2: Weather query
print("🌤️ Example 2: Weather Request")
print("=" * 40)

response2 = process_user_request("What's the weather like in Rome?")
print(f"🤖 Response: {response2}")
print("\n")

In [None]:
# Example 3: Text analysis
print("📝 Example 3: Text Analysis Request")
print("=" * 40)

text_to_analyze = """Artificial Intelligence is revolutionizing many industries. 
Machine learning algorithms can process vast amounts of data to identify patterns and make predictions. 
Deep learning, a subset of machine learning, uses neural networks to solve complex problems."""

response3 = process_user_request(f"Please analyze this text and extract the main keywords: {text_to_analyze}")
print(f"🤖 Response: {response3}")
print("\n")

In [None]:
# Example 4: No tool needed
print("💬 Example 4: General Conversation")
print("=" * 40)

response4 = process_user_request("What are the benefits of using local LLMs?")
print(f"🤖 Response: {response4}")
print("\n")

## Step 7: Advanced Tool Calling Scenarios

Let's create more complex scenarios involving multiple tools:

In [None]:
def multi_step_process(user_message: str) -> str:
    """Handle requests that might need multiple tool calls"""
    
    print(f"🎯 Multi-step processing: {user_message}")
    
    # This is a simplified example - in a real system, you'd implement
    # more sophisticated planning and execution
    
    if "calculate" in user_message.lower() and "weather" in user_message.lower():
        # Example: "Calculate the average temperature if Rome is 22°C and Milan is 18°C, then get weather for Naples"
        
        # Step 1: Get weather for Naples
        weather_result = execute_tool("get_weather", {"city": "Naples", "country": "IT"})
        
        # Step 2: Calculate average (manual for this example)
        avg_temp = execute_tool("calculator_add", {"a": 22, "b": 18})
        avg_temp = execute_tool("calculator_multiply", {"a": avg_temp, "b": 0.5})
        
        return f"Weather in Naples: {weather_result['condition']}, {weather_result['temperature']}°C. Average of Rome and Milan: {avg_temp}°C"
    
    # Fall back to single-step processing
    return process_user_request(user_message)

# Test multi-step scenario
print("🔗 Multi-Step Example:")
print("=" * 30)

multi_response = multi_step_process("Calculate the sum of 45 and 67, then analyze this text: 'Tool calling enables AI models to interact with external systems'")
print(f"🤖 Multi-step response: {multi_response}")

## Step 8: Performance Monitoring and Optimization

Let's add some performance monitoring to our tool calling system:

In [None]:
import time

def timed_tool_execution(tool_name: str, parameters: Dict[str, Any]) -> tuple:
    """Execute a tool and measure execution time"""
    start_time = time.time()
    result = execute_tool(tool_name, parameters)
    end_time = time.time()
    execution_time = end_time - start_time
    
    return result, execution_time

def benchmark_tools():
    """Benchmark all available tools"""
    print("⏱️ Benchmarking Tools:")
    print("=" * 30)
    
    benchmarks = [
        ("calculator_add", {"a": 100, "b": 200}),
        ("calculator_multiply", {"a": 15, "b": 25}),
        ("get_weather", {"city": "Florence", "country": "IT"}),
        ("analyze_text", {"text": "This is a sample text for analysis with various keywords and content.", "max_keywords": 5})
    ]
    
    for tool_name, params in benchmarks:
        try:
            result, exec_time = timed_tool_execution(tool_name, params)
            print(f"  {tool_name}: {exec_time*1000:.2f}ms")
        except Exception as e:
            print(f"  {tool_name}: ERROR - {e}")

benchmark_tools()

## Step 9: Error Handling and Robustness

Let's test how our system handles various error conditions:

In [None]:
def test_error_handling():
    """Test various error conditions"""
    print("🛡️ Testing Error Handling:")
    print("=" * 30)
    
    # Test 1: Invalid tool parameters
    print("\n1. Invalid calculator parameters:")
    try:
        result = execute_tool("calculator_divide", {"a": 10, "b": 0})
        print(f"   Result: {result}")
    except Exception as e:
        print(f"   ❌ Error (expected): {e}")
    
    # Test 2: Missing required parameters
    print("\n2. Missing required parameters:")
    try:
        result = execute_tool("calculator_add", {"a": 5})  # Missing 'b'
        print(f"   Result: {result}")
    except Exception as e:
        print(f"   ❌ Error (expected): {e}")
    
    # Test 3: Unknown tool
    print("\n3. Unknown tool:")
    try:
        result = execute_tool("unknown_tool", {})
        print(f"   Result: {result}")
    except Exception as e:
        print(f"   ❌ Error (expected): {e}")
    
    # Test 4: Invalid text analysis
    print("\n4. Empty text analysis:")
    try:
        result = execute_tool("analyze_text", {"text": ""})
        print(f"   Result: {result}")
    except Exception as e:
        print(f"   ❌ Error: {e}")
    
    print("\n✅ Error handling tests completed")

test_error_handling()

## Step 10: Creating Custom Tools

Let's demonstrate how to create and add new custom tools:

In [None]:
# Define a new custom tool
class DataProcessingTool:
    """Advanced data processing operations"""
    
    @staticmethod
    def calculate_statistics(numbers: List[float]) -> Dict[str, float]:
        """Calculate basic statistics for a list of numbers"""
        if not numbers:
            return {"error": "Empty list provided"}
        
        return {
            "count": len(numbers),
            "sum": sum(numbers),
            "mean": sum(numbers) / len(numbers),
            "min": min(numbers),
            "max": max(numbers),
            "range": max(numbers) - min(numbers)
        }
    
    @staticmethod
    def generate_sequence(start: int, end: int, step: int = 1) -> List[int]:
        """Generate a sequence of numbers"""
        return list(range(start, end + 1, step))

# Add schema for the new tool
NEW_TOOL_SCHEMA = {
    "type": "function",
    "function": {
        "name": "calculate_statistics",
        "description": "Calculate basic statistics (mean, min, max, etc.) for a list of numbers",
        "parameters": {
            "type": "object",
            "properties": {
                "numbers": {
                    "type": "array",
                    "items": {"type": "number"},
                    "description": "List of numbers to analyze"
                }
            },
            "required": ["numbers"]
        }
    }
}

# Test the new tool
print("🔧 Testing Custom Tool:")
test_numbers = [10, 20, 30, 40, 50, 15, 25, 35]
stats = DataProcessingTool.calculate_statistics(test_numbers)
print(f"Numbers: {test_numbers}")
print(f"Statistics: {json.dumps(stats, indent=2)}")

# Generate a sequence
sequence = DataProcessingTool.generate_sequence(1, 10, 2)
print(f"\nSequence (1 to 10, step 2): {sequence}")

print("\n✅ Custom tool works correctly!")

## Step 11: Interactive Session

Now let's create an interactive session where you can test tool calling:

In [None]:
def interactive_session():
    """Run an interactive tool calling session"""
    print("🎮 Interactive Tool Calling Session")
    print("=" * 40)
    print("Available commands:")
    print("  - Ask for calculations: 'What is 25 * 34?'")
    print("  - Ask for weather: 'What's the weather in Paris?'")
    print("  - Ask for text analysis: 'Analyze this text: [your text]'")
    print("  - Type 'quit' to exit")
    print("\nNote: This is a demo - modify the cell below to test different queries")
    
    # Example queries you can modify and run
    example_queries = [
        "What is 144 divided by 12?",
        "Analyze this text: Machine learning is transforming healthcare with predictive analytics and personalized medicine.",
        "What's the weather like in Tokyo?",
        "Calculate 15 plus 28"
    ]
    
    print("\n🧪 Testing example queries:")
    for i, query in enumerate(example_queries, 1):
        print(f"\n--- Example {i} ---")
        response = process_user_request(query)
        print(f"Query: {query}")
        print(f"Response: {response}")

# Run the interactive session
interactive_session()

## Step 12: Best Practices and Optimization Tips

Let's discuss some best practices for tool calling with local LLMs:

In [None]:
def display_best_practices():
    """Display best practices for tool calling"""
    
    practices = {
        "🎯 Tool Design": [
            "Keep tools simple and focused on one task",
            "Provide clear descriptions and parameter documentation",
            "Include proper error handling and validation",
            "Use meaningful parameter names and types"
        ],
        "⚡ Performance": [
            "Cache model responses when possible",
            "Use quantized models for better speed/memory trade-off",
            "Implement tool result caching for expensive operations",
            "Consider async execution for I/O bound tools"
        ],
        "🛡️ Security": [
            "Validate all tool inputs thoroughly",
            "Implement rate limiting for API calls",
            "Sanitize file operations and paths",
            "Use proper authentication for external services"
        ],
        "🐛 Debugging": [
            "Log all tool calls and responses",
            "Implement verbose mode for development",
            "Test edge cases and error conditions",
            "Monitor tool execution times"
        ]
    }
    
    print("📚 Best Practices for Tool Calling:")
    print("=" * 50)
    
    for category, tips in practices.items():
        print(f"\n{category}:")
        for tip in tips:
            print(f"  • {tip}")
    
    print("\n" + "=" * 50)
    
    # Performance metrics
    if MODEL_INITIALIZED:
        print("\n📊 Current System Status:")
        print(f"  Model: {'Ollama' if USE_OLLAMA else 'Hugging Face Transformers'}")
        print(f"  Available tools: {len(TOOL_SCHEMAS)}")
        print(f"  Device: {'GPU' if torch.cuda.is_available() else 'CPU'}" if 'torch' in locals() else "  Device: Unknown")

display_best_practices()

## Conclusion

🎉 **Congratulations!** You have successfully completed the Phi-3 Tool Calling Laboratory!

### What You've Learned:

1. **Local LLM Setup**: How to install and run Phi-3-mini locally using both Ollama and Hugging Face Transformers
2. **Tool Definition**: How to create custom tools with proper schemas and parameter validation
3. **Function Calling**: How to implement tool calling workflows with prompt engineering
4. **Error Handling**: How to handle various error conditions gracefully
5. **Performance Optimization**: Best practices for efficient tool calling
6. **Advanced Scenarios**: Multi-step processing and complex tool interactions

### Next Steps:

- **Expand Tool Library**: Create more specialized tools for your use case
- **Integrate APIs**: Connect to real external services (weather, databases, etc.)
- **Production Deployment**: Scale up for production use with proper caching and monitoring
- **Fine-tuning**: Consider fine-tuning the model for better tool calling performance

### Resources:

- [Phi-3 Documentation](https://huggingface.co/microsoft/Phi-3-mini-4k-instruct)
- [Ollama Documentation](https://ollama.ai/docs)
- [Hugging Face Transformers](https://huggingface.co/docs/transformers/)

Happy coding with Phi-3 and tool calling! 🚀