# Laboratory Session: LangChain with OllamaLLM - Document Analyzer

**Session Breakdown:**

1. Setup and Environment Preparation (15 minutes)
2. LangChain and OllamaLLM Fundamentals (30 minutes)
3. Practical Implementation - Document Analyzer (45 minutes)
4. Lab Wrap-Up and Discussion (15 minutes)

**Learning Objectives**

...

**Resources**

- [Ollama Official Documentation](https://ollama.ai/docs)
- [LangChain Documentation](https://python.langchain.com/)

# LangChain Workshop: Environment Setup

### 1. Prerequisites and System Check

#### Python and Environment Preparation

In [None]:
# Verify Python Installation
!python --version

In [None]:
# Create a virtual environment
!python -m venv langchain_workshop

In [None]:
# Check the os
import platform
platform.platform()

In [None]:
# Activate the virtual environment
# On Unix or MacOS
!source langchain_workshop/bin/activate

In [None]:
# On Windows
!langchain_workshop\Scripts\activate

### 2. Library Installation

#### Install Required Libraries

In [None]:
!pip install --upgrade pip

In [None]:
# Install core libraries
!pip install \
    langchain \
    ollama \
    pypdf \
    transformers \
    sentence-transformers \
    chromadb \
    unstructured \
    langchain_community

### 3. Ollama and Model Setup

#### Verify Ollama Installation

In [None]:
# Verify Ollama is running
import ollama

# List available models
print(ollama.list())

#### Download Ollama Model

In [None]:
# Pull a suitable open-source model
!ollama pull llama3.2

### 4. Environment Verification Script

In [3]:
import sys
import ollama
import langchain

def check_environment():
    """Perform comprehensive environment check."""
    print("--- Environment Check ---\n")
    
    # Python version
    print(f"Python Version: {sys.version}")
    
    # Library versions
    print(f"LangChain Version: {langchain.__version__}\n")
    
    # Verify model availability
    try:
        models = ollama.list()
        print("Ollama Models Available:\n")
        for model in models['models']:
            print(f"- {model['model']}")
    except Exception as e:
        print(f"Error checking Ollama models: {e}")

In [4]:
# Run the check
check_environment()

--- Environment Check ---

Python Version: 3.11.7 (main, Sep  3 2024, 17:13:36) [Clang 15.0.0 (clang-1500.3.9.4)]
LangChain Version: 0.3.11

Ollama Models Available:

- llama3.2:latest
- mistral:latest


### 5. Troubleshooting Guide

In [None]:
def troubleshoot_ollama():
    """Display Ollama troubleshooting guidelines."""
    print("\n--- Ollama Troubleshooting Guide ---")
    print("Common Issues and Solutions:")
    
    issues = {
        "Installation": [
            "Ensure Ollama is installed",
            "Check system PATH",
            "Verify administrative permissions"
        ],
        "Connection": [
            "Check internet connectivity",
            "Verify firewall settings",
            "Restart Ollama service"
        ],
        "Model Download": [
            "Sufficient disk space",
            "Stable internet connection",
            "Use `ollama pull` command"
        ]
    }
    
    for category, solutions in issues.items():
        print(f"\n{category} Troubleshooting:")
        for solution in solutions:
            print(f"- {solution}")

# Display troubleshooting guide
troubleshoot_ollama()

# LangChain and Large Language Models - Theoretical Foundation

## Table of Contents
1. [Introduction to LangChain](#introduction-to-langchain)
2. [Understanding Large Language Models](#understanding-large-language-models)
3. [Core Components of LangChain](#core-components-of-langchain)
4. [Working with Local LLMs](#working-with-local-llms)
5. [Design Patterns and Best Practices](#design-patterns-and-best-practices)

## Introduction to LangChain

### What is LangChain?
LangChain is a framework designed to simplify the development of applications using large language models (LLMs). It provides a standardized interface for chaining together different components needed in LLM-powered applications.

### Key Features
- **Component Architecture**: Modular design for easy integration
- **Chain Construction**: Sequential processing of language tasks
- **Memory Management**: Context handling across conversations
- **Prompt Management**: Templates and dynamic prompt generation
- **Model Integration**: Support for various LLM providers

### Use Cases
1. Document Analysis and QA
2. Chatbots and Conversational Agents
3. Data Analysis and Summarization
4. Text Generation and Processing
5. Knowledge Base Construction

## Understanding Large Language Models

### Fundamentals of LLMs
Large Language Models are neural networks trained on vast amounts of text data. They excel at:
- Text Generation
- Pattern Recognition
- Context Understanding
- Language Translation
- Code Generation

### Types of LLMs
1. **Cloud-based Models**
   - OpenAI GPT Series
   - Anthropic Claude
   - Google PaLM

2. **Local Models**
   - Ollama Models
   - Llama 2
   - GPT4All
   - LocalAI

### Comparing Local vs. Cloud LLMs

| Aspect | Local LLMs | Cloud LLMs |
|--------|------------|------------|
| Privacy | High | Depends on Provider |
| Cost | One-time/Free | Pay-per-use |
| Latency | Hardware Dependent | Network Dependent |
| Setup Complexity | Higher | Lower |
| Customization | More Flexible | Limited |

## Core Components of LangChain

### 1. Chains
Chains are sequences of operations that:
- Process inputs systematically
- Combine multiple components
- Manage state and memory
- Handle errors and edge cases

Example Chain Structure:
```python
input → Prompt Template → LLM → Output Parser → final output
```

**Basic Chain Syntax**
```python
chain = prompt | self.llm
response = chain.invoke({
    'context': context,
    'question': question
})
```

#### Step-by-Step Breakdown

##### 1. Chain Creation (`chain = prompt | self.llm`)
- The `|` (pipe) operator creates a sequential chain
- Purpose: "Take the output of the prompt template and feed it into the LLM"
- Similar to Unix pipes: output of one command becomes input to another
- Under the hood process:
  1. Format prompt template with variables
  2. Send formatted prompt to LLM for processing

##### 2. Prompt Template Definition
```python
prompt = PromptTemplate.from_template(
    "Invoice Data:\n{context}\n\n"
    "Question: {question}\n\n"
    "Provide a detailed, data-driven answer."
)
```
- Template contains two variables: `{context}` and `{question}`
- Variables are filled with actual values during invocation

##### 3. LLM Configuration
```python
self.llm = Ollama(model='mistral', temperature=0.1)
```

##### 4. Chain Invocation
```python
response = chain.invoke({
    'context': context,
    'question': question
})
```
Process flow:
1. Takes input dictionary
2. Formats prompt template
3. Sends to Ollama
4. Returns response

#### Example Usage

```python
# Context and question
context = "Invoice #1: $500, Invoice #2: $300"
question = "What's the total amount?"

# Generated prompt
formatted_prompt = """
Invoice Data:
Invoice #1: $500, Invoice #2: $300

Question: What's the total amount?

Provide a detailed, data-driven answer.
"""

chain = prompt | self.llm
response = chain.invoke({'context': context, 'question': question})

# Example response
response = "The total amount across the two invoices is $800, calculated by adding Invoice #1 ($500) and Invoice #2 ($300)."
```

### 2. Prompts
Prompts are structured templates that:
- Guide LLM behavior
- Include context and instructions
- Handle variable substitution
- Enforce output formats

### 3. Memory
Memory systems in LangChain:
- Store conversation history
- Maintain context
- Handle token limitations
- Enable stateful interactions

### 4. Output Parsers
Output parsers:
- Structure LLM responses
- Validate outputs
- Transform data formats
- Handle errors

## Working with Local LLMs

### Ollama Integration
Ollama provides:
- Easy model management
- Local execution
- Custom model support
- API compatibility

### Model Selection Criteria
Consider:
1. Hardware Requirements
2. Performance Needs
3. Licensing Requirements
4. Use Case Compatibility

### Performance Optimization
Strategies include:
- Prompt Engineering
- Context Window Management
- Batch Processing
- Caching Mechanisms

## Design Patterns and Best Practices

### 1. Prompt Engineering Patterns
- **Zero-shot Learning**: No examples needed
- **Few-shot Learning**: Include examples in prompt
- **Chain-of-Thought**: Break down complex reasoning
- **Self-Consistency**: Multiple passes for verification

### 2. Error Handling
```python
try:
    response = chain.invoke(input_data)
except LLMError:
    # Handle model errors
except ChainError:
    # Handle chain execution errors
except ParseError:
    # Handle output parsing errors
```

### 3. Performance Optimization
- Cache frequently used results
- Batch similar requests
- Implement retry mechanisms
- Monitor token usage

### 4. Security Considerations
1. Input Validation
2. Output Sanitization
3. Rate Limiting
4. Access Control

## Further Reading

1. LangChain Documentation
   - [Python Documentation](https://python.langchain.com/)
   - [JavaScript Documentation](https://js.langchain.com/)

2. Ollama Resources
   - [Official Documentation](https://ollama.ai/docs)
   - [Model Library](https://ollama.ai/library)

3. Related Topics
   - Prompt Engineering
   - Vector Databases
   - Embeddings
   - RAG (Retrieval Augmented Generation)

# LangChain Invoice Analyzer Exercise

## Objective
Create an invoice analysis tool that can:
- Load JSON invoice data
- Perform financial analytics
- Generate insights
- Answer specific questions about invoices


## Sample Invoice JSON Structure
```json
[
    {
        "invoice_id": "INV-2024-001",
        "customer_name": "TechCorp Solutions",
        "date": "2024-01-15",
        "total_amount": 5750.25,
        "items": [
            {"name": "Software License", "quantity": 10, "unit_price": 450.00},
            {"name": "Cloud Services", "quantity": 5, "unit_price": 250.50}
        ],
        "payment_status": "Paid",
        "tax_rate": 0.18
    },
    {
        ...
    }
]
```

## Task 1: Implement AI-Powered Financial Insights Generator

Given the class AdvancedInvoiceAnalyzer, create a method that generates AI-powered insights from invoice data using LangChain and Ollama. The method should compile financial data and use an LLM to provide business analysis.

### Method Structure
```python
def ai_powered_insights(self):
    """Your implementation here"""
```

### The method should:
1. Combine data from three existing methods:
   - `financial_summary()`
   - `generate_tax_analysis()`
   - `item_level_analysis()`

2. Create a structured context string containing:
   - Financial Overview
   - Tax Analysis
   - Top Items Analysis

3. Use LangChain's PromptTemplate to create an analysis prompt

4. Return AI-generated insights using the LLM

### Expected Format of Context String
```text
Financial Overview:
- Total Invoices: [number]
- Total Revenue: $[amount]
- Payment Status: [breakdown]

Tax Analysis:
- Total Tax Collected: $[amount]
- Average Tax Rate: [percentage]

Top Items:
[item analysis data]
```

### Steps to Complete
1. Collect data from existing analysis methods
2. Format the context string with f-strings
3. Create a PromptTemplate for analysis
4. Set up LangChain chain
5. Return the insights

## Task 2: Implement Interactive Query Handler

### Your Task
Create a method that allows users to ask questions about invoice data and receive AI-generated answers. The method should take a user question, combine it with invoice data context, and use an LLM to generate a response.

### Method Structure
```python
def interactive_query(self, question):
    """Your implementation here"""
```

### Method Parameters
- `question`: String containing the user's query about the invoices

### The method should:
1. Convert DataFrame to string format for context
2. Create a prompt template with context and question
3. Generate a response using the LLM

### Expected Prompt Format
```text
Invoice Data:
[dataframe contents]

Question: [user's question]

Provide a detailed, data-driven answer. 
If the question cannot be directly answered, explain why.
```

### Steps to Complete
1. Access the class DataFrame (`self.df`)
2. Convert DataFrame to string representation
3. Create PromptTemplate with two variables:
   - `context`
   - `question`
4. Set up LangChain chain
5. Return the answer

### Sample Usage
```python
# Example usage
analyzer = AdvancedInvoiceAnalyzer(data)
response = analyzer.interactive_query("What is the total revenue?")
```

## Your implementation

In [5]:
# Import Required Libraries
import json
import pandas as pd
from langchain.llms import Ollama
from langchain.prompts import PromptTemplate
from langchain.schema.runnable import RunnablePassthrough

In [6]:
class AdvancedInvoiceAnalyzer:
    def __init__(self, invoices_data, model):
        """
        Initialize the invoice analyzer with comprehensive data processing.
        
        :param invoices_data: List of invoice dictionaries
        :param model: Name of the model to use
        """
        # Convert to DataFrame with expanded item details
        self.df = self._prepare_dataframe(invoices_data)
        
        # Setup Ollama Language Model
        self.llm = Ollama(model=model, temperature=0.1)
    
    def _prepare_dataframe(self, invoices_data):
        """
        Prepare a comprehensive DataFrame with expanded invoice details.
        
        :param invoices_data: List of invoice dictionaries
        :return: Processed pandas DataFrame
        """
        # Flatten the invoices to include item-level details
        flattened_invoices = []
        for invoice in invoices_data:
            base_invoice = invoice.copy()
            for item in invoice['items']:
                invoice_item = base_invoice.copy()
                invoice_item.update(item)
                invoice_item['item_total'] = item['quantity'] * item['unit_price']
                flattened_invoices.append(invoice_item)
        
        return pd.DataFrame(flattened_invoices)
    
    def financial_summary(self):
        """
        Generate comprehensive financial summary.
        
        :return: Dictionary of financial insights
        """
        summary = {
            'total_invoices': len(self.df['invoice_id'].unique()),
            'total_revenue': self.df['total_amount'].sum(),
            'average_invoice_value': self.df['total_amount'].mean(),
            'payment_status_breakdown': self.df['payment_status'].value_counts().to_dict(),
            'top_customers': (self.df.groupby('customer_name')['total_amount']
                               .sum()
                               .nlargest(3)
                               .to_dict())
        }
        return summary
    
    def generate_tax_analysis(self):
        """
        Analyze tax implications across invoices.
        
        :return: Detailed tax analysis
        """
        tax_summary = {
            'total_tax_collected': (self.df['total_amount'] * self.df['tax_rate']).sum(),
            'average_tax_rate': self.df['tax_rate'].mean(),
            'tax_rate_distribution': self.df['tax_rate'].value_counts().to_dict()
        }
        return tax_summary
    
    def item_level_analysis(self):
        """
        Perform detailed analysis of invoice items.
        
        :return: Comprehensive item-level insights
        """
        item_summary = (self.df.groupby('name').agg({
            'quantity': 'sum',
            'item_total': 'sum',
            'unit_price': 'mean'
        }).sort_values('item_total', ascending=False))
        
        return item_summary
    
    def ai_powered_insights(self):
        """
        Generate AI-powered natural language insights.
        
        :return: Conversational financial analysis
        """
        # Prepare comprehensive context
        context = f"""
        Financial Overview:
        - Total Invoices: {self.financial_summary()['total_invoices']}
        - Total Revenue: ${self.financial_summary()['total_revenue']:,.2f}
        - Payment Status: {self.financial_summary()['payment_status_breakdown']}
        
        Tax Analysis:
        - Total Tax Collected: ${self.generate_tax_analysis()['total_tax_collected']:,.2f}
        - Average Tax Rate: {self.generate_tax_analysis()['average_tax_rate']:.2%}
        
        Top Items:
        {self.item_level_analysis()}
        """
        
        # Create a prompt template for generating insights
        prompt = PromptTemplate.from_template(
            "Analyze the following financial data and provide insightful, "
            "professional business recommendations. Focus on financial health, "
            "potential risks, and strategic opportunities:\n\n{context}"
        )
        
        # Create a chain with the LLM
        chain = prompt | self.llm
        return chain.invoke({"context": context})
    
    def interactive_query(self, question):
        """
        Answer specific questions about the invoices.
        
        :param question: User's specific query
        :return: Analytical response
        """
        # Prepare context with all invoice details
        context = self.df.to_string()
        
        # Create a flexible query prompt
        prompt = PromptTemplate.from_template(
            "Invoice Data:\n{context}\n\n"
            "Question: {question}\n\n"
            "Provide a detailed, data-driven answer. "
            "If the question cannot be directly answered, explain why."
        )
        
        # Create a chain with the LLM
        chain = prompt | self.llm
        return chain.invoke({
            'context': context,
            'question': question
        })

## Test Your implementation

In [7]:
# Load invoices from JSON file
with open('data/invoice-data.json', 'r') as file:
    invoices = json.load(file)

# Initialize the analyzer
analyzer = AdvancedInvoiceAnalyzer(invoices, model='llama3.2')

  self.llm = Ollama(model=model, temperature=0.1)


In [8]:
# Financial Summary
financial_summary = analyzer.financial_summary()

print("Financial Summary:")
print(json.dumps(financial_summary, indent=2))

Financial Summary:
{
  "total_invoices": 13,
  "total_revenue": 200853.0,
  "average_invoice_value": 7725.115384615385,
  "payment_status_breakdown": {
    "Paid": 12,
    "Pending": 8,
    "Overdue": 6
  },
  "top_customers": {
    "Smart Solutions Group": 31561.0,
    "Tech Solutions Pro": 25500.0,
    "Software Experts Inc": 22900.0
  }
}


In [9]:
# Tax Analysis
tax_analysis = analyzer.generate_tax_analysis()
print("Tax Analysis:")
print(json.dumps(tax_analysis, indent=2))

Tax Analysis:
{
  "total_tax_collected": 37378.84,
  "average_tax_rate": 0.18,
  "tax_rate_distribution": {
    "0.18": 6,
    "0.15": 6,
    "0.2": 4,
    "0.21": 2,
    "0.16": 2,
    "0.19": 2,
    "0.17": 2,
    "0.22": 2
  }
}


In [10]:
# Item-Level Analysis
item_level_analysis = analyzer.item_level_analysis()
print("Item-Level Analysis:")
print(item_level_analysis)

Item-Level Analysis:
                        quantity  item_total  unit_price
name                                                    
Software License              30    14500.00      475.00
AI Implementation              1    12000.00    12000.00
Analytics Dashboard            1     8000.00     8000.00
Custom Development            50     7500.00      150.00
Enterprise Package             3     7500.00     2500.00
Security Audit                 1     6500.00     6500.00
Project Management            25     5250.00      210.00
App Development                1     5000.00     5000.00
Server Maintenance            12     4800.00      400.00
Data Migration                 1     3780.50     3780.50
Cloud Storage Plus             5     3750.00      750.00
Implementation Support        15     3450.00      230.00
Consulting Hours              16     3200.00      200.00
Network Setup                  1     3000.00     3000.00
Website Hosting               10     2500.00      250.00
Firewall S

In [12]:
# AI-Powered Insights
ai_powered_insights = analyzer.ai_powered_insights()
print("\nAI-Powered Insights:")
print(ai_powered_insights)


AI-Powered Insights:
Based on the provided financial data, here are some insightful and professional business recommendations:

**Financial Health:**

1. **Payment Status Analysis**: The payment status shows that 12 out of 13 invoices have been paid, leaving only one invoice pending. This indicates a high level of customer satisfaction and timely payments.
2. **Revenue Growth**: The total revenue has increased by $200,853.00, which is a significant growth rate. However, it's essential to analyze the revenue streams to identify areas for improvement.
3. **Tax Analysis**: The average tax rate is 18.00%, which is relatively high. It may be beneficial to review and optimize tax strategies to minimize tax liabilities.

**Potential Risks:**

1. **Overdue Invoices**: Six invoices are overdue, indicating a potential risk of delayed payments or non-payment. It's crucial to follow up with customers promptly to resolve these issues.
2. **High Unit Prices**: Some items have high unit prices, such

In [14]:
# Interactive query
print("Interactive Query:")
question = input()
print("i'm thinking...")
answer = analyzer.interactive_query(question)
print(answer)

Interactive Query:


 how the company can improve its financial health?


i'm thinking...
To provide a data-driven answer on how the company can improve its financial health, I'll analyze the provided data and identify trends, areas of concern, and potential opportunities for improvement.

**Overall Financial Health**

The data suggests that the company has a diverse range of services and projects across various industries. However, the overall financial performance is not explicitly stated in the data. To assess the company's financial health, I'll focus on the following metrics:

1. **Revenue**: The total revenue is not provided, but we can estimate it by summing up the values of individual invoices.
2. **Payment Terms**: The payment terms are not specified, which makes it difficult to determine the average collection period or days sales outstanding (DSO).
3. **Days Sales Outstanding (DSO)**: DSO is a key metric that indicates how long it takes for customers to pay their bills. A higher DSO can indicate cash flow problems.
4. **Payment Status**: The payme