# Memory Buffer, Cache, and Automatic Calls in LLMs

Welcome to the third practical session of Day 2! In this notebook, we'll explore advanced techniques for enhancing LLMs with memory systems, caching mechanisms, and automated function calling capabilities. These techniques are crucial for building sophisticated financial applications that maintain context, optimize performance, and interact with external systems.

## Learning Objectives

- Implement memory buffers to maintain conversation context
- Set up caching systems to improve performance and reduce API costs
- Create automated function calling for LLMs to interact with financial data sources
- Build a simple financial assistant that demonstrates these capabilities
- Compare implementation approaches using local models versus API-based LLMs

## 1. Setup

Let's start by importing the necessary libraries and setting up our environment.

In [None]:
import os
import json
import time
import numpy as np
import pandas as pd
import torch
import matplotlib.pyplot as plt
import seaborn as sns
from dotenv import load_dotenv
from datetime import datetime
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from transformers import GPT2Tokenizer, GPT2LMHeadModel
import re
import requests
import sqlite3
import hashlib
from IPython.display import display, Markdown, HTML
from tqdm.notebook import tqdm

# Load environment variables for API keys
load_dotenv()

# Set device
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
print(f"Using device: {device}")

# Set up paths
CACHE_DIR = './cache'
os.makedirs(CACHE_DIR, exist_ok=True)

# Set random seeds for reproducibility
torch.manual_seed(42)
np.random.seed(42)

# Check if we have API keys for OpenAI or DeepSeek
openai_api_key = os.getenv("OPENAI_API_KEY")
deepseek_api_key = os.getenv("DEEPSEEK_API_KEY")

api_available = openai_api_key is not None or deepseek_api_key is not None
if api_available:
    print("API key(s) found. API-based LLM examples will be available.")
    
    # Import API libraries if keys are available
    if openai_api_key:
        import openai
        openai.api_key = openai_api_key
        print("OpenAI API is configured.")
    
    if deepseek_api_key:
        print("DeepSeek API is configured.")
else:
    print("No API keys found. We'll use local models only.")

## 2. Memory Systems for LLMs

LLMs have a limited context window, meaning they can only "see" a certain number of tokens at once. To build applications that maintain conversation history and context, we need to implement memory systems. Let's explore different approaches to memory management:

1. **Basic Message History**: Storing all previous interactions
2. **Summary Memory**: Periodically summarizing conversation history
3. **Vector Memory**: Storing and retrieving relevant information based on embeddings

In [None]:
class ConversationMemory:
    """Base class for conversation memory systems."""
    
    def __init__(self, max_tokens=1024):
        self.max_tokens = max_tokens
        self.messages = []
        self.metadata = {}
    
    def add_message(self, role, content):
        """Add a message to the conversation history."""
        self.messages.append({"role": role, "content": content})
    
    def get_conversation_history(self):
        """Get the current conversation history."""
        return self.messages
    
    def clear(self):
        """Clear the conversation history."""
        self.messages = []
        self.metadata = {}
    
    def get_token_count(self, tokenizer):
        """Estimate the number of tokens in the conversation history."""
        full_text = " ".join([msg["content"] for msg in self.messages])
        tokens = tokenizer.encode(full_text)
        return len(tokens)


class BasicMemory(ConversationMemory):
    """Simple message history with token limit enforcement."""
    
    def add_message(self, role, content, tokenizer):
        """Add a message and trim history if needed to stay within token limit."""
        super().add_message(role, content)
        
        # Check token count and trim if necessary
        while self.get_token_count(tokenizer) > self.max_tokens and len(self.messages) > 2:
            # Remove the oldest message, but keep at least the system prompt
            # and the most recent user message
            self.messages.pop(1)


class SummaryMemory(ConversationMemory):
    """Memory system that periodically summarizes conversation history."""
    
    def __init__(self, max_tokens=1024, summarize_every=10, model=None, tokenizer=None):
        super().__init__(max_tokens)
        self.summarize_every = summarize_every
        self.summary = ""
        self.model = model
        self.tokenizer = tokenizer
    
    def add_message(self, role, content):
        """Add a message and summarize if needed."""
        super().add_message(role, content)
        
        # Check if it's time to summarize
        if len(self.messages) % self.summarize_every == 0 and self.model is not None:
            self._summarize_history()
    
    def _summarize_history(self):
        """Summarize recent conversation history."""
        if len(self.messages) < 2:
            return
        
        # Get text to summarize (skip system message if present)
        start_idx = 1 if self.messages[0]["role"] == "system" else 0
        to_summarize = self.messages[start_idx:]
        
        # Prepare prompt for summarization
        conversation_text = "\n".join([
            f"{msg['role'].capitalize()}: {msg['content']}" 
            for msg in to_summarize
        ])
        
        prompt = f"""Summarize the following conversation concisely, 
        focusing on key financial information and decisions:
        
        {conversation_text}
        
        Summary:"""
        
        # Generate summary using the model
        inputs = self.tokenizer(prompt, return_tensors="pt").to(device)
        output = self.model.generate(
            inputs["input_ids"],
            max_length=150,
            temperature=0.3,
            num_return_sequences=1
        )
        summary = self.tokenizer.decode(output[0], skip_special_tokens=True)
        
        # Extract just the summary part
        if "Summary:" in summary:
            summary = summary.split("Summary:")[1].strip()
        
        # Update the summary
        self.summary = summary
        
        # Replace older messages with the summary
        system_message = None
        if self.messages[0]["role"] == "system":
            system_message = self.messages[0]
        
        # Keep only recent messages and the summary
        recent_messages = self.messages[-4:]
        self.messages = []
        
        if system_message:
            self.messages.append(system_message)
        
        self.messages.append({"role": "system", "content": f"Previous conversation summary: {summary}"})
        self.messages.extend(recent_messages)
    
    def get_summary(self):
        """Get the current conversation summary."""
        return self.summary


class VectorMemory(ConversationMemory):
    """Memory system that uses embeddings to store and retrieve relevant information."""
    
    def __init__(self, max_tokens=1024, embedding_model=None):
        super().__init__(max_tokens)
        self.embedding_model = embedding_model
        self.embeddings = []
        self.message_embeddings = []
    
    def add_message(self, role, content):
        """Add a message and its embedding to memory."""
        super().add_message(role, content)
        
        if self.embedding_model is not None:
            # Get embedding for the message
            embedding = self._get_embedding(content)
            self.message_embeddings.append(embedding)
    
    def _get_embedding(self, text):
        """Get embedding for a text using the embedding model."""
        # This is a simplified version; in practice, you'd use
        # a proper embedding model like OpenAI's embedding API
        # or a local model via transformers
        
        # For demonstration, we'll create a simple random embedding
        # In a real system, replace this with actual embeddings
        return np.random.randn(768)  # Typical embedding dimension
    
    def search_similar(self, query, top_k=3):
        """Find messages most similar to the query based on embeddings."""
        if not self.message_embeddings:
            return []
        
        # Get query embedding
        query_embedding = self._get_embedding(query)
        
        # Calculate similarity scores
        similarities = []
        for i, emb in enumerate(self.message_embeddings):
            # Cosine similarity
            similarity = np.dot(query_embedding, emb) / (
                np.linalg.norm(query_embedding) * np.linalg.norm(emb)
            )
            similarities.append((i, similarity))
        
        # Sort by similarity (descending)
        similarities.sort(key=lambda x: x[1], reverse=True)
        
        # Return top-k most similar messages
        top_messages = []
        for i, _ in similarities[:top_k]:
            top_messages.append(self.messages[i])
        
        return top_messages

# Load a small model for demonstration
model_name = "gpt2"
tokenizer = GPT2Tokenizer.from_pretrained(model_name)
model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
tokenizer.pad_token = tokenizer.eos_token

# Initialize memory systems
basic_memory = BasicMemory(max_tokens=512)
summary_memory = SummaryMemory(max_tokens=512, summarize_every=5, model=model, tokenizer=tokenizer)

# Demo conversation
print("Demonstrating memory systems with a simulated financial conversation...")

conversation = [
    ("system", "You are a financial advisor helping with investment decisions."),
    ("user", "What's the difference between stocks and bonds?"),
    ("assistant", "Stocks represent ownership in a company, while bonds are debt instruments where you lend money to an entity. Stocks generally offer higher potential returns but with more risk, while bonds typically provide more stable but lower returns."),
    ("user", "How should I allocate my portfolio if I'm 35 years old?"),
    ("assistant", "At 35, you have a long investment horizon before retirement. A common approach is to allocate 70-80% to stocks and 20-30% to bonds. This provides growth potential while maintaining some stability. Consider your risk tolerance and financial goals when making your decision."),
    ("user", "What about international investments?"),
    ("assistant", "International investments can provide diversification benefits. Consider allocating 20-30% of your stock portion to international markets. This helps reduce country-specific risk and gives you exposure to global growth opportunities."),
    ("user", "Should I invest in emerging markets?"),
    ("assistant", "Emerging markets can offer higher growth potential but come with additional risks like political instability and currency fluctuations. For a 35-year-old investor, allocating 5-10% of your portfolio to emerging markets can be reasonable, but ensure it aligns with your risk tolerance.")
]

# Add messages to both memory systems
for role, content in conversation:
    basic_memory.add_message(role, content, tokenizer)
    summary_memory.add_message(role, content)

# Display the memory contents
print("\nBasic Memory:")
for i, msg in enumerate(basic_memory.get_conversation_history()):
    print(f"{i+1}. {msg['role'].capitalize()}: {msg['content'][:50]}...")

print("\nSummary Memory:")
print(f"Summary: {summary_memory.get_summary()}")
for i, msg in enumerate(summary_memory.get_conversation_history()):
    print(f"{i+1}. {msg['role'].capitalize()}: {msg['content'][:50]}...")

# Token counts
print(f"\nBasic Memory Token Count: {basic_memory.get_token_count(tokenizer)}")
print(f"Summary Memory Token Count: {summary_memory.get_token_count(tokenizer)}")

## 3. Caching Systems for LLMs

Caching is essential for LLM applications to improve performance, reduce costs, and ensure consistency in responses. Let's implement a simple caching system for LLM queries.

In [None]:
class LLMCache:
    """A simple caching system for LLM queries."""
    
    def __init__(self, cache_dir=CACHE_DIR, ttl=86400, db_name="llm_cache.db"):
        """
        Initialize the cache.
        
        Args:
            cache_dir: Directory to store the cache
            ttl: Time to live in seconds (default: 24 hours)
            db_name: Name of the SQLite database file
        """
        self.cache_dir = cache_dir
        self.ttl = ttl
        os.makedirs(cache_dir, exist_ok=True)
        
        # Initialize SQLite database
        self.db_path = os.path.join(cache_dir, db_name)
        self.conn = sqlite3.connect(self.db_path)
        self.cursor = self.conn.cursor()
        
        # Create cache table if it doesn't exist
        self.cursor.execute('''
        CREATE TABLE IF NOT EXISTS llm_cache (
            key TEXT PRIMARY KEY,
            value TEXT,
            timestamp INTEGER
        )
        ''')
        self.conn.commit()
    
    def _get_cache_key(self, prompt, model, temperature, max_tokens):
        """Generate a unique cache key based on the query parameters."""
        # Combine all parameters into a string
        params_str = f"{prompt}|{model}|{temperature}|{max_tokens}"
        
        # Create a hash of the parameters
        cache_key = hashlib.md5(params_str.encode()).hexdigest()
        return cache_key
    
    def get(self, prompt, model, temperature=0.7, max_tokens=100):
        """
        Get a cached response if available and not expired.
        
        Returns:
            Cached response or None if not found or expired
        """
        cache_key = self._get_cache_key(prompt, model, temperature, max_tokens)
        
        # Query the database
        self.cursor.execute(
            "SELECT value, timestamp FROM llm_cache WHERE key = ?", 
            (cache_key,)
        )
        result = self.cursor.fetchone()
        
        if result:
            value, timestamp = result
            # Check if cache is expired
            if time.time() - timestamp <= self.ttl:
                print(f"Cache hit for prompt: {prompt[:30]}...")
                return json.loads(value)
            else:
                # Remove expired entry
                self.cursor.execute("DELETE FROM llm_cache WHERE key = ?", (cache_key,))
                self.conn.commit()
        
        return None
    
    def set(self, prompt, model, temperature, max_tokens, response):
        """Store a response in the cache."""
        cache_key = self._get_cache_key(prompt, model, temperature, max_tokens)
        
        # Store in database
        self.cursor.execute(
            "INSERT OR REPLACE INTO llm_cache (key, value, timestamp) VALUES (?, ?, ?)",
            (cache_key, json.dumps(response), int(time.time()))
        )
        self.conn.commit()
        print(f"Cached response for prompt: {prompt[:30]}...")
    
    def clear(self):
        """Clear all cache entries."""
        self.cursor.execute("DELETE FROM llm_cache")
        self.conn.commit()
    
    def clear_expired(self):
        """Clear expired cache entries."""
        self.cursor.execute(
            "DELETE FROM llm_cache WHERE timestamp < ?", 
            (int(time.time() - self.ttl),)
        )
        self.conn.commit()
    
    def close(self):
        """Close the database connection."""
        self.conn.close()

# Initialize the cache
llm_cache = LLMCache()

# Function to generate text with caching
def generate_text_with_cache(prompt, model_name="gpt2", temperature=0.7, max_tokens=100):
    """Generate text using a model with caching."""
    # Check cache first
    cached_response = llm_cache.get(prompt, model_name, temperature, max_tokens)
    if cached_response:
        return cached_response, True  # Return cached response
    
    # If not in cache, generate new response
    if model_name == "gpt2":
        # Use local GPT-2 model
        inputs = tokenizer(prompt, return_tensors="pt").to(device)
        
        with torch.no_grad():
            outputs = model.generate(
                inputs["input_ids"],
                max_length=len(inputs["input_ids"][0]) + max_tokens,
                temperature=temperature,
                do_sample=True,
                num_return_sequences=1
            )
        
        response = tokenizer.decode(outputs[0], skip_special_tokens=True)
        
        # Store in cache
        llm_cache.set(prompt, model_name, temperature, max_tokens, response)
        
        return response, False  # Return new response
    
    # For API-based models, we would implement API calls here
    # but we'll use the local model for this demonstration
    else:
        # Simulate API call
        print(f"Would call API for model: {model_name}")
        return "API response would be here", False

# Test the cache with a few queries
test_prompts = [
    "What are the key factors to consider when evaluating a stock?",
    "How does inflation affect bond prices?",
    "What is the efficient market hypothesis?",
    "What are the key factors to consider when evaluating a stock?",  # Repeated to test cache
]

for prompt in test_prompts:
    start_time = time.time()
    response, cache_hit = generate_text_with_cache(prompt)
    end_time = time.time()
    
    print(f"Prompt: {prompt}")
    print(f"Cache hit: {cache_hit}")
    print(f"Time taken: {end_time - start_time:.4f} seconds")
    print(f"Response: {response[:100]}...\n")

# Clear expired cache entries
llm_cache.clear_expired()

## 4. Automatic Function Calling

One of the most powerful features of modern LLMs is their ability to call external functions or tools. This enables them to retrieve real-time data, perform calculations, and interact with external systems. Let's implement a simple function calling system for our financial assistant.

In [None]:
class FunctionRegistry:
    """Registry for functions that can be called by the LLM."""
    
    def __init__(self):
        self.functions = {}
        self.function_descriptions = {}
    
    def register(self, func, description):
        """Register a function with its description."""
        self.functions[func.__name__] = func
        self.function_descriptions[func.__name__] = description
    
    def get_function(self, func_name):
        """Get a function by name."""
        return self.functions.get(func_name)
    
    def get_description(self, func_name):
        """Get a function's description."""
        return self.function_descriptions.get(func_name)
    
    def list_functions(self):
        """List all registered functions."""
        return list(self.functions.keys())
    
    def get_all_descriptions(self):
        """Get descriptions of all registered functions."""
        return self.function_descriptions.copy()


# Create some example financial functions

def get_stock_price(symbol):
    """
    Get the current price of a stock.
    
    Args:
        symbol: The stock symbol (e.g., AAPL, MSFT)
        
    Returns:
        Current price of the stock
    """
    # In a real application, this would call an API
    # For demonstration, we'll return simulated prices
    stock_prices = {
        "AAPL": 182.52,
        "MSFT": 425.63,
        "GOOGL": 175.98,
        "AMZN": 178.75,
        "META": 487.55,
        "TSLA": 175.22,
        "NVDA": 950.02,
    }
    
    return stock_prices.get(symbol.upper(), None)


def calculate_compound_interest(principal, rate, time, compounding_periods=1):
    """
    Calculate compound interest.
    
    Args:
        principal: Initial investment amount
        rate: Annual interest rate (as a decimal, e.g., 0.05 for 5%)
        time: Time period in years
        compounding_periods: Number of compounding periods per year (1 for annual)
        
    Returns:
        Final amount after compound interest
    """
    # A = P(1 + r/n)^(nt)
    n = compounding_periods
    final_amount = principal * (1 + rate / n) ** (n * time)
    return final_amount


def calculate_portfolio_value(holdings):
    """
    Calculate the total value of a portfolio.
    
    Args:
        holdings: Dictionary mapping stock symbols to number of shares
        
    Returns:
        Total value of the portfolio
    """
    total_value = 0
    for symbol, shares in holdings.items():
        price = get_stock_price(symbol)
        if price is not None:
            total_value += price * shares
    
    return total_value


def get_historical_returns(index_name, years):
    """
    Get historical annual returns for a market index.
    
    Args:
        index_name: Name of the index (e.g., S&P500, NASDAQ)
        years: Number of years of data to return (max 10)
        
    Returns:
        List of annual returns for the specified index
    """
    # Simulated historical returns
    historical_returns = {
        "S&P500": [9.3, 28.7, 18.4, 31.5, -4.4, 21.8, -6.2, 32.4, 13.6, 2.1],
        "NASDAQ": [12.5, 35.2, 22.2, 44.9, -2.8, 28.2, -3.9, 43.6, 23.1, 8.4],
        "DOW": [7.2, 25.1, 16.5, 26.5, -3.9, 19.4, -5.6, 25.3, 9.7, 0.8]
    }
    
    # Get data for the requested index
    returns = historical_returns.get(index_name.upper(), [])
    
    # Limit to the requested number of years
    years = min(years, len(returns))
    
    return returns[:years]


# Register functions
function_registry = FunctionRegistry()
function_registry.register(get_stock_price, "Get the current price of a stock by symbol")
function_registry.register(calculate_compound_interest, "Calculate compound interest on an investment")
function_registry.register(calculate_portfolio_value, "Calculate the total value of a portfolio of stocks")
function_registry.register(get_historical_returns, "Get historical annual returns for a market index")

# List registered functions
print("Registered functions:")
for func_name, description in function_registry.get_all_descriptions().items():
    print(f"  - {func_name}: {description}")

# Function to parse and execute function calls from LLM output
def parse_and_execute_function_call(text, function_registry):
    """
    Parse function calls from text and execute them.
    
    Args:
        text: Text containing function calls
        function_registry: Registry of available functions
        
    Returns:
        Text with function calls replaced by their results
    """
    # Define pattern for function calls
    # Format: {{function_name(arg1, arg2, ...)}}
    pattern = r'\{\{(\w+)\((.*?)\)\}\}'
    
    # Find all function calls
    matches = re.findall(pattern, text)
    
    # Process each function call
    for func_name, args_str in matches:
        # Get the function
        func = function_registry.get_function(func_name)
        if func is None:
            continue
        
        # Parse arguments
        args = []
        kwargs = {}
        
        if args_str.strip():
            # Split by comma, but respect nested structures
            depth = 0
            current_arg = ""
            for char in args_str:
                if char == '(' or char == '{' or char == '[':
                    depth += 1
                elif char == ')' or char == '}' or char == ']':
                    depth -= 1
                
                if char == ',' and depth == 0:
                    # End of argument
                    args.append(current_arg.strip())
                    current_arg = ""
                else:
                    current_arg += char
            
            if current_arg.strip():
                args.append(current_arg.strip())
            
            # Process each argument
            processed_args = []
            for arg in args:
                # Check if it's a keyword argument
                if '=' in arg and not arg.startswith(("'", '"', '{')):
                    key, value = arg.split('=', 1)
                    key = key.strip()
                    value = value.strip()
                    
                    # Convert value to appropriate type
                    try:
                        # Try as number
                        if '.' in value:
                            value = float(value)
                        else:
                            value = int(value)
                    except ValueError:
                        # Try as dict, list, or string
                        if value.startswith('{') and value.endswith('}'):
                            try:
                                value = eval(value)  # Caution: eval can be dangerous
                            except:
                                pass
                        elif value.startswith('[') and value.endswith(']'):
                            try:
                                value = eval(value)  # Caution: eval can be dangerous
                            except:
                                pass
                        elif (value.startswith('"') and value.endswith('"')) or \
                             (value.startswith("'") and value.endswith("'")):
                            value = value[1:-1]  # Remove quotes
                    
                    kwargs[key] = value
                else:
                    # Positional argument
                    # Convert to appropriate type
                    arg = arg.strip()
                    try:
                        # Try as number
                        if '.' in arg:
                            arg = float(arg)
                        else:
                            try:
                                arg = int(arg)
                            except ValueError:
                                pass
                    except ValueError:
                        # Try as dict, list, or string
                        if arg.startswith('{') and arg.endswith('}'):
                            try:
                                arg = eval(arg)  # Caution: eval can be dangerous
                            except:
                                pass
                        elif arg.startswith('[') and arg.endswith(']'):
                            try:
                                arg = eval(arg)  # Caution: eval can be dangerous
                            except:
                                pass
                        elif (arg.startswith('"') and arg.endswith('"')) or \
                             (arg.startswith("'") and arg.endswith("'")):
                            arg = arg[1:-1]  # Remove quotes
                    
                    processed_args.append(arg)
        
        # Execute the function
        try:
            result = func(*processed_args, **kwargs)
            
            # Replace the function call with the result
            call_text = f"{{{{{func_name}({args_str})}}}}}"
            text = text.replace(call_text, str(result))
        except Exception as e:
            # Replace with error message
            error_text = f"Error in {func_name}: {str(e)}"
            call_text = f"{{{{{func_name}({args_str})}}}}}"
            text = text.replace(call_text, error_text)
    
    return text

# Test function calling
test_texts = [
    "The current price of Apple stock is {{get_stock_price('AAPL')}}.",
    "If you invest $10,000 at 5% interest for 10 years with annual compounding, you'll have ${{calculate_compound_interest(10000, 0.05, 10)}}.",
    "A portfolio with 10 shares of Apple, 5 shares of Microsoft, and 8 shares of Google is worth ${{calculate_portfolio_value({'AAPL': 10, 'MSFT': 5, 'GOOGL': 8})}}.",
    "The S&P 500 has had the following annual returns over the past 5 years: {{get_historical_returns('S&P500', 5)}}%."
]

for text in test_texts:
    result = parse_and_execute_function_call(text, function_registry)
    print(f"Original: {text}")
    print(f"Result: {result}\n")

## 5. Building a Financial Assistant

Now, let's combine our memory, caching, and function calling systems to build a simple financial assistant that can maintain context, respond efficiently, and access external data.

In [None]:
class FinancialAssistant:
    """A financial assistant with memory, caching, and function calling capabilities."""
    
    def __init__(self, model_name="gpt2", use_api=False):
        """
        Initialize the financial assistant.
        
        Args:
            model_name: Name of the model to use
            use_api: Whether to use an API-based model
        """
        self.model_name = model_name
        self.use_api = use_api
        
        # Initialize components
        self.memory = BasicMemory(max_tokens=1024)
        self.cache = LLMCache()
        self.function_registry = function_registry  # Use the one we created earlier
        
        # Set up tokenizer and model
        if not use_api:
            self.tokenizer = GPT2Tokenizer.from_pretrained(model_name)
            self.model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
            self.tokenizer.pad_token = self.tokenizer.eos_token
        else:
            # For API-based models, we'd initialize API clients here
            if openai_api_key:
                self.tokenizer = None
                self.model = None
                print("Using OpenAI API")
            else:
                print("API key not available, falling back to local model")
                self.use_api = False
                self.tokenizer = GPT2Tokenizer.from_pretrained(model_name)
                self.model = GPT2LMHeadModel.from_pretrained(model_name).to(device)
                self.tokenizer.pad_token = self.tokenizer.eos_token
        
        # Add system prompt
        self.system_prompt = """
        You are a helpful financial assistant. You can provide information and advice on:
        - Investment strategies
        - Stock analysis
        - Financial planning
        - Market trends
        
        You can call functions to get real-time data or perform calculations.
        To call a function, use the format {{function_name(arg1, arg2, ...)}}.
        
        Available functions:
        """
        
        # Add function descriptions to system prompt
        for func_name, description in self.function_registry.get_all_descriptions().items():
            self.system_prompt += f"- {func_name}: {description}\n"
        
        self.memory.add_message("system", self.system_prompt, self.tokenizer)
    
    def generate_response(self, prompt, temperature=0.7, max_tokens=150):
        """
        Generate a response to the user's prompt.
        
        Args:
            prompt: User's input
            temperature: Temperature for text generation
            max_tokens: Maximum tokens to generate
            
        Returns:
            Assistant's response
        """
        # Add user message to memory
        self.memory.add_message("user", prompt, self.tokenizer)
        
        # Prepare conversation history for the model
        conversation_history = self.memory.get_conversation_history()
        
        # Format conversation for the model
        if self.use_api and openai_api_key:
            # Format for OpenAI API
            formatted_messages = []
            for msg in conversation_history:
                formatted_messages.append({"role": msg["role"], "content": msg["content"]})
            
            # Call the API
            try:
                response = openai.ChatCompletion.create(
                    model="gpt-3.5-turbo",
                    messages=formatted_messages,
                    temperature=temperature,
                    max_tokens=max_tokens
                )
                model_response = response.choices[0].message.content
            except Exception as e:
                model_response = f"Error: {str(e)}"
        else:
            # Format for local model
            formatted_prompt = "\n".join([
                f"{msg['role'].capitalize()}: {msg['content']}" 
                for msg in conversation_history
            ])
            
            # Check cache
            cached_response = self.cache.get(formatted_prompt, self.model_name, temperature, max_tokens)
            if cached_response:
                model_response = cached_response
            else:
                # Generate response with local model
                inputs = self.tokenizer(formatted_prompt, return_tensors="pt").to(device)
                
                with torch.no_grad():
                    outputs = self.model.generate(
                        inputs["input_ids"],
                        max_length=len(inputs["input_ids"][0]) + max_tokens,
                        temperature=temperature,
                        do_sample=True,
                        num_return_sequences=1
                    )
                
                full_response = self.tokenizer.decode(outputs[0], skip_special_tokens=True)
                
                # Extract just the assistant's response
                if "Assistant:" in full_response:
                    model_response = full_response.split("Assistant:")[-1].strip()
                else:
                    model_response = full_response
                
                # Cache the response
                self.cache.set(formatted_prompt, self.model_name, temperature, max_tokens, model_response)
        
        # Process function calls
        processed_response = parse_and_execute_function_call(model_response, self.function_registry)
        
        # Add assistant's response to memory
        self.memory.add_message("assistant", processed_response, self.tokenizer)
        
        return processed_response
    
    def reset(self):
        """Reset the conversation."""
        self.memory.clear()
        self.memory.add_message("system", self.system_prompt, self.tokenizer)


# Create a financial assistant
financial_assistant = FinancialAssistant(use_api=False)  # Set to True to use API if available

# Test the assistant with some financial queries
test_queries = [
    "What's the current price of Apple stock?",
    "If I invest $5000 at 7% annual interest for 20 years, how much will I have?",
    "What's the value of a portfolio with 15 shares of NVDA, 10 shares of MSFT, and 20 shares of AAPL?",
    "What have been the returns of the S&P 500 over the past 3 years?",
    "Based on our previous conversation, which stock has the highest price?"
]

# Interactive conversation
for query in test_queries:
    print(f"\nUser: {query}")
    response = financial_assistant.generate_response(query)
    print(f"Assistant: {response}")

# Display conversation history
print("\nConversation History:")
for i, msg in enumerate(financial_assistant.memory.get_conversation_history()):
    if msg["role"] != "system":  # Skip system message for clarity
        print(f"{i}. {msg['role'].capitalize()}: {msg['content']}")

## 6. Comparing Local Models vs. API-based LLMs

Let's compare the advantages and limitations of using local models versus API-based LLMs for financial applications.

### Local Models vs. API-based LLMs Comparison

| Feature | Local Models | API-based LLMs |
|---------|-------------|----------------|
| **Cost** | One-time cost for hardware | Pay-per-token pricing |
| **Performance** | Limited by local hardware | High performance on provider's infrastructure |
| **Latency** | Lower for small models | Depends on network and API provider |
| **Privacy** | Data stays on local machine | Data sent to third-party servers |
| **Customization** | Full control over fine-tuning | Limited to provider's options |
| **Scaling** | Limited by local resources | Easily scales with demand |
| **Regulatory Compliance** | Easier to comply with financial regulations | Requires careful data handling |
| **Consistency** | Consistent responses (same version) | May change with provider updates |
| **Knowledge Cutoff** | Fixed at training time | More recent for some providers |
| **Function Calling** | Requires custom implementation | Often built into the API |

### When to Use Each Approach

**Local Models are better when:**
- Privacy and data security are paramount
- Consistent, deterministic responses are required
- Working with sensitive financial information
- Operating in highly regulated environments
- Deployment in air-gapped systems is needed
- Cost predictability is important

**API-based LLMs are better when:**
- Advanced capabilities beyond local hardware are needed
- Development speed is prioritized over customization
- Scaling to many users or high volume is required
- Up-to-date knowledge is important
- Complex reasoning for financial analysis is needed
- Integration with provider's ecosystem is valuable

## 7. Practical Exercise: Creating a Portfolio Analyzer

Let's put everything together by creating a portfolio analyzer that can:
1. Track portfolio holdings in memory
2. Retrieve current prices using function calls
3. Calculate portfolio statistics
4. Provide investment recommendations

This exercise will demonstrate how memory, caching, and function calling work together in a practical financial application.

In [None]:
# Add portfolio-specific functions to our registry

def calculate_portfolio_statistics(holdings):
    """
    Calculate statistics for a portfolio.
    
    Args:
        holdings: Dictionary mapping stock symbols to number of shares
        
    Returns:
        Dictionary with portfolio statistics
    """
    statistics = {}
    total_value = 0
    total_investment = 0
    positions = []
    
    # Sample cost basis data (in a real app, this would be stored or provided)
    cost_basis = {
        "AAPL": 150.25,
        "MSFT": 380.50,
        "GOOGL": 140.75,
        "AMZN": 140.30,
        "META": 320.40,
        "TSLA": 190.60,
        "NVDA": 700.25,
    }
    
    for symbol, shares in holdings.items():
        price = get_stock_price(symbol)
        if price is not None:
            # Calculate position value
            position_value = price * shares
            total_value += position_value
            
            # Calculate cost basis and gain/loss
            if symbol in cost_basis:
                position_cost = cost_basis[symbol] * shares
                total_investment += position_cost
                gain_loss = position_value - position_cost
                gain_loss_pct = (gain_loss / position_cost) * 100
            else:
                position_cost = "Unknown"
                gain_loss = "Unknown"
                gain_loss_pct = "Unknown"
            
            # Add position details
            positions.append({
                "symbol": symbol,
                "shares": shares,
                "price": price,
                "value": position_value,
                "cost_basis": cost_basis.get(symbol, "Unknown"),
                "gain_loss": gain_loss,
                "gain_loss_pct": gain_loss_pct
            })
    
    # Calculate overall portfolio statistics
    if total_investment > 0:
        total_gain_loss = total_value - total_investment
        total_gain_loss_pct = (total_gain_loss / total_investment) * 100
    else:
        total_gain_loss = "Unknown"
        total_gain_loss_pct = "Unknown"
    
    # Store statistics
    statistics["total_value"] = total_value
    statistics["total_investment"] = total_investment
    statistics["total_gain_loss"] = total_gain_loss
    statistics["total_gain_loss_pct"] = total_gain_loss_pct
    statistics["positions"] = positions
    
    return statistics


def get_portfolio_allocation(holdings):
    """
    Get the allocation breakdown of a portfolio.
    
    Args:
        holdings: Dictionary mapping stock symbols to number of shares
        
    Returns:
        Dictionary with allocation percentages by stock
    """
    allocation = {}
    total_value = 0
    
    # Calculate total portfolio value
    for symbol, shares in holdings.items():
        price = get_stock_price(symbol)
        if price is not None:
            position_value = price * shares
            total_value += position_value
            allocation[symbol] = position_value
    
    # Calculate percentages
    if total_value > 0:
        for symbol in allocation:
            allocation[symbol] = (allocation[symbol] / total_value) * 100
    
    return allocation


def recommend_portfolio_adjustments(holdings, risk_profile="moderate"):
    """
    Recommend portfolio adjustments based on holdings and risk profile.
    
    Args:
        holdings: Dictionary mapping stock symbols to number of shares
        risk_profile: Risk profile (conservative, moderate, aggressive)
        
    Returns:
        List of recommended adjustments
    """
    # Get current allocation
    allocation = get_portfolio_allocation(holdings)
    
    # Define target allocations based on risk profile
    target_allocations = {
        "conservative": {
            "AAPL": 10,
            "MSFT": 10,
            "GOOGL": 5,
            "AMZN": 5,
            "META": 5,
            "TSLA": 5,
            "NVDA": 10,
            "other": 50  # Bonds, etc.
        },
        "moderate": {
            "AAPL": 15,
            "MSFT": 15,
            "GOOGL": 10,
            "AMZN": 10,
            "META": 10,
            "TSLA": 10,
            "NVDA": 15,
            "other": 15  # Bonds, etc.
        },
        "aggressive": {
            "AAPL": 20,
            "MSFT": 20,
            "GOOGL": 15,
            "AMZN": 15,
            "META": 10,
            "TSLA": 10,
            "NVDA": 10,
            "other": 0  # Bonds, etc.
        }
    }
    
    # Get target allocation for selected risk profile
    targets = target_allocations.get(risk_profile, target_allocations["moderate"])
    
    # Compare current allocation with targets
    recommendations = []
    for symbol, target_pct in targets.items():
        if symbol == "other":
            continue
            
        current_pct = allocation.get(symbol, 0)
        difference = target_pct - current_pct
        
        if difference > 5:
            recommendations.append(f"Increase {symbol} allocation by approximately {difference:.1f}%")
        elif difference < -5:
            recommendations.append(f"Decrease {symbol} allocation by approximately {abs(difference):.1f}%")
    
    # Check for diversification
    if len(allocation) < 3:
        recommendations.append("Consider adding more stocks to diversify your portfolio")
    
    # Check for missing key stocks
    for symbol in ["AAPL", "MSFT"]:
        if symbol not in allocation and targets.get(symbol, 0) > 5:
            recommendations.append(f"Consider adding {symbol} to your portfolio")
    
    return recommendations

# Register new functions
function_registry.register(calculate_portfolio_statistics, "Calculate statistics for a portfolio of stocks")
function_registry.register(get_portfolio_allocation, "Get the allocation percentages of a portfolio")
function_registry.register(recommend_portfolio_adjustments, "Recommend adjustments to a portfolio based on risk profile")

# Create a portfolio analyzer assistant
portfolio_analyzer = FinancialAssistant(use_api=False)  # Set to True to use API if available

# Sample portfolio for testing
sample_portfolio = {
    "AAPL": 15,
    "MSFT": 10,
    "GOOGL": 5,
    "NVDA": 8
}

# Test queries for portfolio analysis
portfolio_queries = [
    f"Analyze this portfolio: {sample_portfolio}",
    "What's the allocation of this portfolio?",
    "Can you recommend any adjustments to this portfolio for a moderate risk profile?",
    "What would be the total value if I added 10 more shares of NVDA?",
    "Which stock in my portfolio has performed the best?"
]

# Run interactive portfolio analysis
print("\n--- Portfolio Analyzer Demo ---\n")
for query in portfolio_queries:
    print(f"User: {query}")
    response = portfolio_analyzer.generate_response(query)
    print(f"Assistant: {response}\n")

## 8. Conclusion and Best Practices

Let's summarize what we've learned about implementing memory, caching, and function calling in LLM applications for finance.

### Key Takeaways

1. **Memory Systems**
   - Enable contextual conversations about financial topics
   - Can be implemented with different strategies (basic, summary, vector)
   - Critical for maintaining user context in financial advisory scenarios
   - Help reduce token usage and costs while preserving relevant information

2. **Caching Systems**
   - Improve performance and reduce costs for repeated queries
   - Ensure consistency in financial advice and analysis
   - Can be implemented with different storage backends (file, database)
   - Important to consider cache invalidation strategies for financial data

3. **Function Calling**
   - Connects LLMs to real-time financial data and calculations
   - Enables more accurate and up-to-date financial analysis
   - Can be implemented with different levels of sophistication
   - Essential for building practical financial applications

4. **Local vs. API Models**
   - Choose based on your specific requirements for privacy, performance, and cost
   - Consider regulatory compliance needs in financial applications
   - Hybrid approaches may offer the best balance for financial use cases

### Best Practices for Financial LLM Applications

1. **Memory Management**
   - Use summarization for long conversations about financial topics
   - Store critical financial information (portfolio holdings, goals) separately
   - Implement proper security measures for sensitive financial data
   - Consider user-specific memory for personalized financial advice

2. **Caching Strategy**
   - Cache factual financial information with appropriate TTL
   - Don't cache personalized financial advice without proper controls
   - Implement versioning for cached responses
   - Monitor cache hit rates to optimize performance

3. **Function Implementation**
   - Validate all inputs to financial functions
   - Implement error handling for all external data sources
   - Document the limitations of financial calculations
   - Consider rate limits for external financial APIs

4. **Security Considerations**
   - Never expose API keys in client-side code
   - Implement proper authentication for financial functions
   - Log access to sensitive financial data
   - Consider regulatory requirements (GDPR, CCPA, financial regulations)

5. **Performance Optimization**
   - Use the smallest model that meets your accuracy needs
   - Implement batching for multiple financial calculations
   - Consider asynchronous processing for long-running analyses
   - Monitor and optimize token usage for cost control

### Further Learning Resources

- [HuggingFace Transformers Documentation](https://huggingface.co/docs/transformers/index)
- [OpenAI Function Calling Guide](https://platform.openai.com/docs/guides/function-calling)
- [LangChain Memory Documentation](https://js.langchain.com/docs/modules/memory/)
- [Financial Data APIs](https://polygon.io/docs)
- [Regulatory Considerations for AI in Finance](https://www.finra.org/rules-guidance/key-topics/fintech/report/artificial-intelligence-in-the-securities-industry)

## 9. Assignment: Build Your Own Financial Chatbot

As a practical assignment, try building your own financial chatbot with the following features:

1. **Memory System**
   - Implement at least one type of memory system
   - Test with a multi-turn conversation about financial planning

2. **Caching**
   - Set up a caching system for LLM responses
   - Measure and report performance improvements

3. **Financial Functions**
   - Implement at least three financial functions (e.g., investment calculator, stock data retrieval)
   - Create a function calling mechanism to use these functions

4. **User Interface**
   - Build a simple interface for your chatbot (CLI, web, or notebook)
   - Support multi-turn conversations about financial topics

5. **Documentation**
   - Document your architecture and design decisions
   - Explain how you addressed memory, caching, and function calling

Submit your code, documentation, and a demo of your chatbot in action.

Good luck!