<a href="https://colab.research.google.com/github/ngoax/agent/blob/main/mathagent.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Initialization

Enable colab to access your drive

In [None]:
from google.colab import drive
drive.mount('/content/drive')

Update packages and install cuda drivers

In [None]:
!sudo apt-get update && sudo apt-get install -y cuda-drivers

Check if CUDA drivers are installed properly and GPU is running

In [None]:
!nvcc --version

In [None]:
!nvidia-smi

In [None]:
import torch
torch.cuda.is_available()

Sometimes colab throws an error for UTF-8, please then execute this code snippet and try again

In [None]:
import locale
def getpreferredencoding():
    return "UTF-8"
locale.getpreferredencoding = getpreferredencoding

Install dependencies

In [None]:
!pip install langchain
!pip install --upgrade --quiet numexpr
!pip install -qU langchain-docling
!pip install docling
!pip install langgraph
!pip install --upgrade --quiet text-generation transformers langchainhub sentencepiece jinja2 bitsandbytes accelerate

Dependencies for using Ollama

In [None]:
!pip install -U ollama
!pip install -U langchain-ollama

Dependencies for API usage

In [None]:
!pip install -U langchain-anthropic

In [None]:
!pip install langchain-google-genai

PDF generation (for evaluation)

In [None]:
!pip install reportlab

# Tracing using LangSmith

In [None]:
import getpass
import os

os.environ['PYTORCH_CUDA_ALLOC_CONF'] = 'expandable_segments:True'
os.environ["LANGSMITH_TRACING"] = "true"
os.environ["LANGSMITH_PROJECT"] = "evaluation"
if "LANGSMITH_API_KEY" not in os.environ:
  os.environ["LANGSMITH_API_KEY"] = getpass.getpass("Enter your LangSmith API key:")

# Model Set up

## Claude

In [None]:
import getpass
import os

if "ANTHROPIC_API_KEY" not in os.environ:
  os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter your Anthropic API key: ")

In [None]:
from langchain_anthropic import ChatAnthropic

claude35 = ChatAnthropic(
    model="claude-3-5-sonnet-20240620",
    temperature=0,
    max_tokens=4096,
    timeout=None,
    max_retries=2,
)

In [None]:
from langchain_anthropic import ChatAnthropic

claude37 = ChatAnthropic(
    model="claude-3-7-sonnet-20250219",
    temperature=0,
    max_tokens=4096,
    timeout=None,
    max_retries=2,
)

## Gemini

In [None]:
import getpass
import os

if "GOOGLE_API_KEY" not in os.environ:
  os.environ["GOOGLE_API_KEY"] = getpass.getpass("Enter your Google AI API key: ")

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

gemini2lite = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-lite",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

gemini2exp = ChatGoogleGenerativeAI(
    model="gemini-2.0-flash-exp",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

In [None]:
from langchain_google_genai import ChatGoogleGenerativeAI

gemini15pro = ChatGoogleGenerativeAI(
    model="gemini-1.5-pro",
    temperature=0,
    max_tokens=None,
    timeout=None,
    max_retries=2
)

## Models using Ollama

In [None]:
!curl -fsSL https://ollama.com/install.sh | sh

Turn on Flash Attention ([reference](https://docs.nvidia.com/nemo-framework/user-guide/latest/nemotoolkit/features/optimizations/attention_optimizations.html)) and set context length

In [None]:
import os

os.environ['OLLAMA_FLASH_ATTENTION'] = "1"
print(os.environ.get('OLLAMA_FLASH_ATTENTION'))

os.environ['OLLAMA_CONTEXT_LENGTH'] = "4096"
print(os.environ.get('OLLAMA_CONTEXT_LENGTH'))

Run ollama as subprocess to run in background

In [None]:
import subprocess
process = subprocess.Popen("OLLAMA_LLAMA_EXTRA_ARGS='--flash-attn' ollama serve", shell=True)

**Parameters**


*   **num_predict** defines the max amount of tokens to be generated. -1 sets this to infinity.
*   **Temperature**: lesser value = more precise, deterministic; higher value = more creative
*   **num_context**: context length (default: 2048 in Ollama)




### Llama 3.2 3b

In [None]:
!ollama pull llama3.2:3b

In [None]:
from langchain_ollama import ChatOllama
llama32 = ChatOllama(
    model = "llama3.2:3b",
    temperature = 0.5,
    num_predict = -1,
)

### Llama 3.3 70b

In [None]:
!ollama pull llama3.3

In [None]:
from langchain_ollama import ChatOllama
llama33 = ChatOllama(
    model = "llama3.3",
    temperature = 0.8,
    num_predict = 256,
)

### Watt-tool-70B

In [None]:
!ollama pull fireshihtzu/watt-tool-70B-formatted

In [None]:
from langchain_ollama import ChatOllama
wattTool70 = ChatOllama(
    model = "fireshihtzu/watt-tool-70B-formatted",
    temperature = 0.8,
    num_predict = 256,
)

### Watt-tool-8B

In [None]:
!ollama pull kitsonk/watt-tool-8B

In [None]:
from langchain_ollama import ChatOllama
wattTool8 = ChatOllama(
    model = "kitsonk/watt-tool-8B",
    temperature = 0.8,
    num_predict = -1,
)

### Hermes-2-Pro-Llama-3-8B

In [None]:
!ollama pull interstellarninja/hermes-2-pro-llama-3-8b-tools

In [None]:
from langchain_ollama import ChatOllama
hermes2pro = ChatOllama(
    model = "interstellarninja/hermes-2-pro-llama-3-8b-tools",
    temperature = 0.8,
    num_predict = -1,
)

### Hermes-2-Theta-Llama-3-8B

In [None]:
!ollama pull interstellarninja/hermes-2-pro-llama-3-8b-tools

In [None]:
from langchain_ollama import ChatOllama
hermes2theta = ChatOllama(
    model = "interstellarninja/hermes-2-pro-llama-3-8b-tools",
    temperature = 0.7,
    num_predict = -1,
)

### Qwen 2.5 14b

In [None]:
!ollama pull qwen2.5:14b

In [None]:
from langchain_ollama import ChatOllama
qwen14b = ChatOllama(
    model = "qwen2.5:14b",
    temperature = 0,
    num_predict = -1,
    num_ctx = 4096,
)

### Qwen 2.5 7b

In [None]:
!ollama pull qwen2.5:7b

In [None]:
from langchain_ollama import ChatOllama
qwen7b = ChatOllama(
    model = "qwen2.5:7b",
    temperature = 0,
    num_predict = -1,
    num_ctx = 4096
)

# Tool Set up

Math tool

In [None]:
import math
import numexpr
from langchain_core.tools import tool

@tool
def calculator(expression: str) -> str:
    """Calculate expression using Python's numexpr library.

    Expression should be a single line mathematical expression
    that solves the problem.

    Examples:
        "37593 * 67" for "37593 times 67"
        "37593**(1/5)" for "37593^(1/5)"
    """
    local_dict = {"pi": math.pi, "e": math.e}
    return str(
        numexpr.evaluate(
            expression.strip(),
            global_dict={},  # restrict access to globals
            local_dict=local_dict,  # add common mathematical functions
        )
    )

Use Docling Framework to extract file content from PDFs, Word and images  ([Reference](https://docling-project.github.io/docling/examples/custom_convert/))

In [None]:
from docling.document_converter import DocumentConverter, PdfFormatOption
from docling.datamodel.pipeline_options import PdfPipelineOptions, AcceleratorOptions, AcceleratorDevice
from docling.datamodel.base_models import InputFormat
from langchain_core.tools import tool

from IPython.display import display, Markdown # for Markdown rendering in Colab


@tool
def pdfFormula(filepath:str) -> str:
  """ Get the content from file at a given filepath using Docling library.

  File format can be pdf, docx or an image. OCR is used to extract the text if necessary.

  Args:
      filepath (str): path to file

  Returns:
      str: file content in Markdown format represented as string
  """

  pipeline_options = PdfPipelineOptions()

  # refine performance settings as necessary
  pipeline_options.ocr_options.use_gpu = True
  pipeline_options.accelerator_options = AcceleratorOptions(
        num_threads=2,
        device = AcceleratorDevice.CUDA, # options: AUTO, CPU or CUDA
        cuda_use_flash_attention2 = True
    )

  images_scale: float = 1

  converter = DocumentConverter(format_options={
      InputFormat.PDF: PdfFormatOption(pipeline_options=pipeline_options)
  })

  result = converter.convert(filepath)
  doc = result.document.export_to_markdown()

  return doc

# Math solving agent

## Agent with create_react_agent ([credit](https://langchain-ai.github.io/langgraph/reference/prebuilt/#langgraph.prebuilt.chat_agent_executor.create_react_agent))

Set your query here

In [None]:
query = "what is the solution for /content/question.pdf"

Start agent and answer query

In [None]:
from langgraph.prebuilt import create_react_agent
from langchain_core.prompts import ChatPromptTemplate
import tensorflow as tf

prompt = ChatPromptTemplate.from_messages([
    ("system", """You are a helpful agent tasked to solve mathematical problems.
                  If a file is provided, use the `pdfFormula` tool first to extract the math problems.
                  For each math problem: call the calculator tool to compute the result, remember the result and proceed with the next problem
                  In the end: return all solutions in the following format: Task 1: <solution>
                  DONT repeat the instruction. DONT answer in complete sentences. ONLY return the solution"""),
    ("placeholder", "{messages}"),
    ("user", "Remember, always be polite!"),
])

tools = [calculator, pdfFormula]
graph = create_react_agent(qwen7b, tools=tools, prompt=prompt)
inputs = {"messages": [("user", query)]}

for s in graph.stream(inputs, stream_mode="values"):
  message = s["messages"][-1]
  if isinstance(message, tuple):
    print(message)
  else:
    message.pretty_print()

Adapted agent for evaluation

In [None]:
model = claude35

In [None]:
from langgraph.prebuilt import create_react_agent
from langchain_core.prompts import ChatPromptTemplate

def mathAgent(question: str):
  """
  This function takes a math problem as input and returns the solution.

  Args:
    question: The math problem to solve.

  Returns:
    Solution to the math problem.
  """
  prompt = ChatPromptTemplate.from_messages([
      ("system",  """You are a helpful agent tasked to solve mathematical problems.
      If a file is provided: use the `pdfFormula` tool first to extract the math problems.
      Then: use the calculator tool to compute the answer to the problem.
      If there are multiple problems: return the answer for each problem
      You are not allowed to calculate things yourself but have to use the calculator tool.
      Return the solution from the calculator tool, DONT change the answer.
      ONLY return the solution(s) (should be only a number, no metrics for each task). DONT repeat the instruction. DONT answer in complete sentences."""),
      ("placeholder", "{messages}"),
  ])

  tools = [calculator, pdfFormula]
  graph = create_react_agent(model, tools=tools, prompt=prompt)
  inputs = {"messages": [("user", question)]}

  solution = None

  for s in graph.stream(inputs, stream_mode="values"):
    message = s["messages"][-1]
    if isinstance(message, tuple):
      pass
    else:
      solution = message.content

  return solution

## Agent without tools

In [None]:
default_instructions = "You are a helpful agent tasked to solve mathematical problems. ONLY return the solution (should be only a number, no metrics). DONT repeat the instruction. DONT answer in complete sentences."

def my_app(question: str, instructions: str = default_instructions) -> str:
    return model.invoke(
        [
            {"role": "system", "content": instructions},
            {"role": "user", "content": question},
        ]
    ).content

# Evaluation

## Set Up

Metric for checking if the solution provided by the MathAgent is correct

In [None]:
def correctness(outputs: dict, reference_outputs: dict) -> bool:
  return float(outputs["response"]) == float(reference_outputs["output"])

In [None]:
import re

def extractNumbers(text):
  numbers = re.findall(r'\d+\.?\d*', text)
  return numbers[0]

def correctness2(outputs: dict, reference_outputs: dict) -> bool:
  outputs = {key: extractNumbers(value) for key, value in outputs.items()}
  return float(outputs["response"]) == float(reference_outputs["output"])

Wrapper

In [None]:
def wrapper (inputs: str) -> dict:
  return {"response": mathAgent(inputs["input"])}

## PDF Generation

PDF Generator to convert content into a pdf file [Credit](https://wiki.ubuntuusers.de/ReportLab/)

In [None]:
from reportlab.pdfgen import canvas
from reportlab.lib.units import cm
from reportlab.lib.pagesizes import A4, landscape
from reportlab.lib.styles import getSampleStyleSheet
from reportlab.platypus import Frame, Paragraph

def generatePDF(content, num):
  """Generates a PDF with the given text.

  Args:
    text: The text to include in the PDF.

  Saves a pdf file in the current directory.
  """
  name = "question-" + str(num) + ".pdf"

  pdf = canvas.Canvas(name ,pagesize=A4)
  breite, hoehe = A4
  style = getSampleStyleSheet()
  story = []
  f = Frame(1*cm,1*cm,breite-(2*cm),hoehe-(2*cm))
  story.append(Paragraph(content, style['BodyText']))
  f.addFromList(story,pdf)
  pdf.save()

In [None]:
import json

def convertJsonlToPdfs(file_path):
  errors = 0
  try:
      with open(file_path, 'r') as file:
          for line_number, line in enumerate(file, start=1):
              try:
                  json_obj = json.loads(line)
                  input_text = json_obj['inputs']['input']
                  generatePDF(input_text, line_number)
              except json.JSONDecodeError as e:
                  print(f"Skipping invalid JSON line: {line.strip()} - Error: {e}")
                  errors += 1
              except KeyError as e:
                  print(f"Skipping line missing 'inputs' or 'input': {line.strip()} - Error: {e}")
                  errors += 1
  except FileNotFoundError:
      print(f"File not found: {file_path}")
  print("Errors: ", errors)

In [None]:
convertJsonlToPdfs('evaluation.jsonl')

## Evaluation

Evaluation on PDF

In [None]:
from langsmith import Client

client = Client()

experiment_results = client.evaluate(
    wrapper, # Your AI system
    data="", # The data to predict and grade over
    evaluators=[correctness2], # The evaluators to score the results
    experiment_prefix="claude 35 sonnet with tool calling", # A prefix for your experiment names to easily identify them
)

Evaluation on string

In [None]:
from langsmith import Client

client = Client()

experiment_results = client.evaluate(
    wrapper, # Your AI system
    data="", # The data to predict and grade over
    evaluators=[correctness2], # The evaluators to score the results
    experiment_prefix="claude35 with tool calling", # A prefix for your experiment names to easily identify them
)