## Lets Explore How to Apply Templates

the purpose of this notebook is to check what is possible with which open models as it seems that `tokenizer.apply_chat_template` does not always work. Do we need it, can we pass messages of type `AIMessage`, `HumanMessage`, or do we have to convert them?

In [1]:
!pip install -U transformers
!pip install langchain_groq



In [4]:

!pip install -U langchain-community
!pip install -U langchain-core

!pip install python-dotenv
!pip install -Uq ollama
!pip install -qU langchain-ollama



In [8]:
import torch
from transformers import (
    AutoTokenizer,
    BitsAndBytesConfig,
    TrainingArguments,
    AutoModelForCausalLM,
    DataCollatorForLanguageModeling,
    GenerationConfig,
    pipeline
)

In [9]:
model_name = "../ext_models/Meta-Llama-3.1-8B-Instruct"
model_id = "meta-llama/Llama-3.2-3B-Instruct"
DEVICE = "cuda:0" if torch.cuda.is_available() else "cpu"

In [5]:
from langchain_huggingface import HuggingFacePipeline, ChatHuggingFace, HuggingFaceEndpoint
from langchain.prompts import PromptTemplate

# open_llm = HuggingFacePipeline.from_model_id(
#     model_id=model_name,
#     task="text-generation",
#     device=0,
#     pipeline_kwargs=dict(
#         max_new_tokens=1024,
#         do_sample=False,
#         repetition_penalty=1.1,
#     ),
#     model_kwargs=dict(
#         torch_dtype=torch.bfloat16
#     )
# )
tokenizer = AutoTokenizer.from_pretrained(model_name)
generation_config = GenerationConfig.from_pretrained(model_name)
config = generation_config.to_dict()
stop_token = "<|eot_id|>"  
stop_token_id = tokenizer.encode(stop_token)[0]
begin_token = "<|begin_of_text|>"
begin_token_id = tokenizer.encode(begin_token)[0]
# generation_config.eos_token_id = stop_token_id
# generation_config.begin_token_id = begin_token_id


generation_config = {
    **config,
    "max_new_tokens": 1024,
    "temperature": 0.1,
    "top_p": 0.9,
    "do_sample": False,
    "repetition_penalty": 1.1,
    "eos_token_id": stop_token_id,
    "begin_token_id": begin_token_id
}
trimmed_gen = {
    "top_p": 0.9,  # changed from 0.15
    "temperature":0.3,
    "do_sample": False,  # changed from true
    "torch_dtype": torch.bfloat16,  # bfloat16
    "use_fast": True,
    "repetition_penalty": 1.1,
}
# generation_config



For example, replace imports like: `from langchain_core.pydantic_v1 import BaseModel`
with: `from pydantic import BaseModel`
or the v1 compatibility namespace if you are working in a code base that has not been fully upgraded to pydantic 2 yet. 	from pydantic.v1 import BaseModel

  from langchain_huggingface.chat_models.huggingface import (


SchemaError: Invalid Schema:
model.config.extra_fields_behavior
  Input should be 'allow', 'forbid' or 'ignore' [type=literal_error, input_value=<Extra.forbid: 'forbid'>, input_type=Extra]
    For further information visit https://errors.pydantic.dev/2.8/v/literal_error

In [4]:
llm_pipeline = pipeline("text-generation", 
                        model=model_name,  
                        device_map=DEVICE,
                        torch_dtype=torch.bfloat16,
                        max_new_tokens=1000,
                        return_full_text=False) 
llm_pipeline.tokenizer.pad_token_id = llm_pipeline.model.config.eos_token_id
open_llm = HuggingFacePipeline(pipeline=llm_pipeline, pipeline_kwargs=trimmed_gen)

Loading checkpoint shards:   0%|          | 0/4 [00:00<?, ?it/s]

In [5]:
open_chat = ChatHuggingFace(llm=open_llm, verbose=True, tokenizer=tokenizer)

### Using an endpoint

### Try the Tools (optional)

In [3]:
from langchain_core.tools import tool
from typing_extensions import TypedDict, Literal
from pydantic import BaseModel, Field

# from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

# mocked tool
@tool(parse_docstring=True)
def get_current_temperature(location: str, unit: str) -> float:
    """
    Get the current temperature at a location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
        unit: The unit to return the temperature in. (choices: ["celsius", "fahrenheit"])
    
    """
    return 22.0  # A real function should probably actually get the temperature!

@tool(parse_docstring=True)
def get_current_wind_speed(location: str) -> float:
    """
    Get the current wind speed in km/h at a given location.
    
    Args:
        location: The location to get the temperature for, in the format "City, Country"
    """
    return 6.0  # A real function should probably actually get the wind speed!

class Operands(TypedDict):
    a: float
    b: float

class CalculatorModel(BaseModel):
    operands: Operands
    op: Literal["add", "sub", "mult", "div"]

@tool(args_schema=CalculatorModel)
def calculator(operands: Operands, op: Literal["add", "sub", "mult", "div"]) -> float:
    '''Perform basic arithmetic operations on calculator model. Return a float'''
    
    if (op == "add"):
        return operands["a"] + operands["b"]
    elif (op == "mult"):
        return operands["a"] * operands["b"]
    elif (op == 'sub'):
        return operands["a"] - operands["b"]
    elif (op == 'div'):
        return operands["a"] / operands["b"]
    else:
        return -1.
        


@tool()
def multiply(a: int, b:int):
    '''Perform multiplication of two operands supplied as arguments a and b'''

    return a * b

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city to get temperature for, \
    e.g. San Francisco")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city to report population for, \
    e.g. San Francisco")

# tools = [GetWeather, GetPopulation]
# chat_with_tools = open_chat.bind_tools(tools)



In [4]:
from langchain_groq import ChatGroq

open_llm = ChatGroq(model="llama-3.1-70b-versatile")

simple_tool = [multiply]
chat_with_tools = open_llm.bind_tools(tools=[calculator])

In [5]:
from langchain_core.messages import (
    HumanMessage,
    SystemMessage,
)

messages = [
    SystemMessage(content="You're a helpful assistant with the expertise of a physics professor. Please answer the questions"),
    HumanMessage(
        content="What happens when an unstoppable force meets an immovable object?"
    ),
]
messages_for_tools = [
    SystemMessage(content="You are a bot that responds to weather queries. You should reply with the unit used in the queried location."),
    HumanMessage(
        content="Hey, what's the temperature in Paris in Celsius right now?"
    ),
]

messages_for_tools_alt = [
    SystemMessage(content="You are a bot that uses bound tools and responds to weather and population queries."),
    HumanMessage(
        content="Hey, what's the temperature in Paris in Celsius right now?"
    ),
]
messages_for_tools_arithm = [
    SystemMessage(content="You are a smart and knowledgeable assistant. If you need to do calculations, please use the available tool at your disposal. \
    Otherwise just answer the question utilising your vast knowledge of the universe."),
    HumanMessage(
        content="Hey, can you tell me approximately how many centimeters if the inches are equal to the product of 5 and 6 added to the difference of 414 and 298?"
    ),
]
messages_alt = [
    {"role": "system", "content": "You are a helpfull assistant, helping your human friend on his way to happiness."},
    {"role": "user", "content": "what are 3 good travel destinations to travel in autumn?"}
]
# templated = tokenizer.apply_chat_template(messages_for_tools_alt, tokenize=False)

ai_msg = chat_with_tools.invoke(messages_for_tools_arithm)



In [6]:
print(ai_msg.tool_calls)
# chat_with_tools

[{'name': 'calculator', 'args': {'op': 'mult', 'operands': {'a': '5', 'b': '6'}}, 'id': 'call_t0fr', 'type': 'tool_call'}, {'name': 'calculator', 'args': {'op': 'sub', 'operands': {'a': '414', 'b': '298'}}, 'id': 'call_hgdg', 'type': 'tool_call'}, {'name': 'calculator', 'args': {'op': 'add', 'operands': {'a': '30', 'b': '116'}}, 'id': 'call_d0xe', 'type': 'tool_call'}, {'name': 'calculator', 'args': {'op': 'mult', 'operands': {'a': '146', 'b': '2.54'}}, 'id': 'call_dtnf', 'type': 'tool_call'}]


In [7]:
messages_for_tools_arithm.append(ai_msg)

for tool_call in ai_msg.tool_calls:
    selected_tool = {"calculator": calculator}[tool_call["name"].lower()]
    print(selected_tool)
    tool_msg = selected_tool.invoke(tool_call)
    
    messages_for_tools_arithm.append(tool_msg)

messages_for_tools_arithm

name='calculator' description='Perform basic arithmetic operations on calculator model. Return a float' args_schema=<class '__main__.CalculatorModel'> func=<function calculator at 0x716917866fc0>
name='calculator' description='Perform basic arithmetic operations on calculator model. Return a float' args_schema=<class '__main__.CalculatorModel'> func=<function calculator at 0x716917866fc0>
name='calculator' description='Perform basic arithmetic operations on calculator model. Return a float' args_schema=<class '__main__.CalculatorModel'> func=<function calculator at 0x716917866fc0>
name='calculator' description='Perform basic arithmetic operations on calculator model. Return a float' args_schema=<class '__main__.CalculatorModel'> func=<function calculator at 0x716917866fc0>


[SystemMessage(content='You are a smart and knowledgeable assistant. If you need to do calculations, please use the available tool at your disposal.     Otherwise just answer the question utilising your vast knowledge of the universe.', additional_kwargs={}, response_metadata={}),
 HumanMessage(content='Hey, can you tell me approximately how many centimeters if the inches are equal to the product of 5 and 6 added to the difference of 414 and 298?', additional_kwargs={}, response_metadata={}),
 AIMessage(content='', additional_kwargs={'tool_calls': [{'id': 'call_t0fr', 'function': {'arguments': '{"op": "mult", "operands": {"a": "5", "b": "6"}}', 'name': 'calculator'}, 'type': 'function'}, {'id': 'call_hgdg', 'function': {'arguments': '{"op": "sub", "operands": {"a": "414", "b": "298"}}', 'name': 'calculator'}, 'type': 'function'}, {'id': 'call_d0xe', 'function': {'arguments': '{"op": "add", "operands": {"a": "30", "b": "116"}}', 'name': 'calculator'}, 'type': 'function'}, {'id': 'call_d

In [8]:
with_tools = chat_with_tools.invoke(messages_for_tools_arithm)

In [9]:
with_tools.content

'The product of 5 and 6 is 30. 414 - 298 is 116. Adding 116 to 30 is 146. Since 1 inch is equal to 2.54 centimeters, 146 inches is approximately equal to 370.84 centimeters.'

In [91]:
c = chat_with_tools.invoke("how big is Australia in terms of square kilometers?") # how to make this optionally use the tool??

In [93]:
c.tool_calls

[{'name': 'brave_search',
  'args': {'query': 'Australia square kilometers'},
  'id': 'call_dq80',
  'type': 'tool_call'}]

### Directly from the ChatHuggintFace page

In [4]:
from langchain_huggingface import HuggingFaceEndpoint, ChatHuggingFace

from dotenv import load_dotenv, find_dotenv
_ = load_dotenv(find_dotenv()) # read local .env file

llm = HuggingFaceEndpoint(
    repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
    task="text-generation",
    max_new_tokens=1024,
    do_sample=False,
    repetition_penalty=1.03,
)

chat = ChatHuggingFace(llm=llm, verbose=True)

tokenizer_config.json:   0%|          | 0.00/51.0k [00:00<?, ?B/s]

tokenizer.json:   0%|          | 0.00/9.09M [00:00<?, ?B/s]

special_tokens_map.json:   0%|          | 0.00/73.0 [00:00<?, ?B/s]

In [5]:
from pydantic import BaseModel, Field

class GetWeather(BaseModel):
    '''Get the current weather in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

class GetPopulation(BaseModel):
    '''Get the current population in a given location'''

    location: str = Field(..., description="The city and state, e.g. San Francisco, CA")

chat_with_tools = chat.bind_tools([GetWeather, GetPopulation])
ai_msg = chat_with_tools.invoke("Which city is hotter today and which is bigger: LA or NY?")
ai_msg.tool_calls



[{'name': 'GetWeather',
  'args': {'location': 'LA'},
  'id': '0',
  'type': 'tool_call'}]

### Trying Ollama here

what can we do with Ollama for Python. We will combine with langchain. Let's check it out

In [9]:
!pip install ollama



In [10]:
from langchain_ollama import ChatOllama
from ollama import Client
from dotenv import load_dotenv, find_dotenv

_ = load_dotenv(find_dotenv()) # read local .env file

In [15]:
cl = Client(host="ollama")
cl.pull(model="llama3.2")
llm = ChatOllama(
    model="llama3.2",
    temperature=0.33,
    base_url="ollama"
)

In [16]:
llm.invoke("can you tell me which is the biggest country in the world?")

AIMessage(content="The largest country in the world, both in terms of land area and population, is Russia. It covers approximately 17.1 million square kilometers (6.6 million square miles) and has a population of over 145 million people.\n\nHowever, if you're considering other factors such as total area including water, Canada is actually larger than Russia.", additional_kwargs={}, response_metadata={'model': 'llama3.2', 'created_at': '2024-11-15T15:18:21.79818642Z', 'message': {'role': 'assistant', 'content': ''}, 'done_reason': 'stop', 'done': True, 'total_duration': 1400012092, 'load_duration': 1027160920, 'prompt_eval_count': 38, 'prompt_eval_duration': 49000000, 'eval_count': 71, 'eval_duration': 321000000}, id='run-8e8caf56-2214-4ef1-b089-20fc17155a72-0', usage_metadata={'input_tokens': 38, 'output_tokens': 71, 'total_tokens': 109})

In [8]:
import ollama
from ollama import Client

In [9]:
client = Client(host='http://10.101.1.54:11434')
response = client.chat(model='llama3.2', messages=[
  {
    'role': 'user',
    'content': 'Why is the sky blue?',
  },
])

ConnectError: [Errno 111] Connection refused