In [1]:
import sys
sys.path.append('../')
# sys.path.append('../..')

In [3]:
from models.ollama_model_calls import query_phi, query_solar, query_starling, query_mistral
from utils.format_llm_output_text import format_llm_answer
from eval.gsm8k_llm import check_answer_using_llm, get_GSM8K_QA

Using the latest cached version of the dataset since gsm8k couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'main' at /Users/olles/.cache/huggingface/datasets/gsm8k/main/0.0.0/e53f048856ff4f594e959d75785d2c2d37b678ee (last modified on Tue Jan 23 06:22:00 2024).


In [5]:
import numexpr as ne

In [6]:
from langchain.tools import BaseTool

In [7]:
# class Calculator_class(BaseTool):
#     name = "Calculator"
#     description = "Use this tool when you need to math. NOTE: this tool uses python numexpr library, so format your input correctly please. Incorrect expression: ```((1000 - 40*num_days - 30*num_days - 70*(nights/2))/500)```, as it cannot handle variables as input. Correct expression: ```2*30-30```."

#     def _run(self, expression : str):
#         return ne.evaluate(expression).item()

#     def _arun(self, expression : str):
#         raise NotImplementedError("This tool does not support async")

In [8]:
# Calculator_class.invoke("1+3")

In [9]:
from langchain.agents import tool

In [10]:
@tool
def Calculator(expression : str) -> int: # the name of the function takes the place of the name variable.
    """Use this tool when you need to math. NOTE: this tool uses python numexpr library, so format your input correctly please."""
    return ne.evaluate(expression).item()

In [197]:
format_llm_answer(query_mistral("I'm trying to build a LLM tool for my LLM to do math, by using this function with the numexpr library: ```def Calculator(expression : str) -> int: Description: Use this tool when you need to math. NOTE: this tool uses python numexpr library, so format your input correctly please. return numexpr.evaluate(expression).item()```. However I'm having issues with the model not being able to handle more complex math formulas. How would you change it to work better? Do you know of a more fitting library to answer math questions?"))

 To improve the ability of your LLM tool to handle more complex math formulas, you can consider the following approaches:

1. Use a more powerful and flexible mathematical expression parser and evaluator library like SymPy instead of numexpr. SymPy is a Python library for symbolic mathematics which can handle symbolic expressions as well as numerical calculations. It also supports advanced mathematical functions and operations that might be required for complex math formulas.
2. Use a recursive or iterative approach to parse and evaluate the given string expression, instead of directly using numexpr. This will provide you with more control over the parsing and evaluation process, and allow you to handle more complex expressions. You can follow a standard mathematical expression tree traversal algorithm (like Shunting Yard) to evaluate the given expression string.
3. Break down complex formulas into simpler components before evaluating them using your tool. For instance, if the input is an equation like `(a + b)^2 = c`, first calculate the value of `(a + b)` separately and then square it to get the result.
4. Use a combination of libraries: You can use SymPy for parsing and simplifying complex expressions, and numexpr or NumPy for numerical evaluations. This approach will provide you with both symbolic and numerical capabilities in a single tool.
5. Consider using an external math question answering service like Wolfram Alpha or Google's Math Solver, which can handle a wide range of mathematical queries, including complex formulas and equations. You can integrate their APIs into your LLM tool to expand its functionality.
6. Lastly, you may need to provide more context or examples for the model when it encounters complex math formulas. This will help the model learn how to handle such expressions in a better way.

In [54]:
import sympy as sp

In [59]:
# Sympy calc tool
@tool
def Calculator(expression : str) -> float:
    """"Use this tool when you need to math / symbolic math and calculate the result. NOTE: this tool uses python SymPy library, so format your math into a SymPy format that can be ingested in the simplify function."""
    return float(sp.simplify(expression))

In [207]:
Calculator("2 * 3 / 20 - 2")

-1.7

In [192]:
# from langchain_experimental.utilities import PythonREPL
from langchain.utilities import PythonREPL

In [238]:
# @tool
# def Calculator(expression : str) -> int: # the name of the function takes the place of the name variable.
#     """Use this python tool when you need to do advanced or longer math equations. NOTE: format it to be run in a python script. IMPORTANT: This tool uses code execution, so only use python math libraries.""" # such as numexpr (with correct numexpr formatting) or package like numpy."""
#     python_repl = PythonREPL()
#     return python_repl.run(expression)

# # repl_tool = Tool(
# #     name="python_repl",
# #     description="A Python shell. Use this to execute python commands. Input should be a valid python command. If you want to see the output of a value, you should print it out with `print(...)`.",
# #     func=python_repl.run,
# # )

In [60]:
tools = [Calculator]

In [61]:
from langchain.memory import ConversationBufferWindowMemory

In [62]:
memory = ConversationBufferWindowMemory(
    memory_key="chat_history",
    k=5,
    return_messages=True,
    output_key="output",
)

In [63]:
memory

ConversationBufferWindowMemory(output_key='output', return_messages=True, memory_key='chat_history')

In [66]:
from langchain.prompts import ChatPromptTemplate, MessagesPlaceholder

In [67]:
# Mistral compatability, change depending on LLM

# Openhermes prompt format tokens
B_INST, E_INST = "<|im_start|>user\n", "<|im_end|>\n" #Instruction begin and end
B_SYS, E_SYS = "<|im_start|>system\n", "<|im_end|>\n" #System message begin and end

In [68]:
# system = '''Respond to the human as helpfully and accurately as possible. You have access to the following tools:

# {tools}

# Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).

# Valid "action" values: "Final Answer" or {tool_names}

# Provide only ONE action per $JSON_BLOB, as shown:

# ```
# {{
#   "action": $TOOL_NAME,
#   "action_input": $INPUT
# }}
# ```

# Follow this format:

# Question: input question to answer
# Thought: consider previous and subsequent steps
# Action:
# ```
# $JSON_BLOB
# ```
# Observation: action result
# ... (repeat Thought/Action/Observation N times)
# Thought: I know what to respond
# Action:
# ```
# {{
#   "action": "Final Answer",
#   "action_input": "Final response to human"
# }}

# Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation'''

In [69]:
system = B_SYS + '''Respond to the human as helpfully and accurately as possible. You have access to the following tools:

{tools}

Use a json blob to specify a tool by providing an action key (tool name) and an action_input key (tool input).

Valid "action" values: "Final Answer" or {tool_names}

Provide only ONE action per $JSON_BLOB, as shown:

```
{{
  "action": $TOOL_NAME,
  "action_input": $INPUT
}}
```

Follow this format:

Question: input question to answer
Thought: consider previous and subsequent steps
Action:
```
$JSON_BLOB
```
Observation: action result
... (repeat Thought/Action/Observation N times)
Thought: I know what to respond
Action:
```
{{
  "action": "Final Answer",
  "action_input": "Final response to human"
}}

Begin! Reminder to ALWAYS respond with a valid json blob of a single action. Use tools if necessary. Respond directly if appropriate. Format is Action:```$JSON_BLOB```then Observation''' + E_SYS

In [70]:
# human = '''{input}

# {agent_scratchpad}

# (reminder to respond in a JSON blob no matter what)'''

In [71]:
human = B_INST + '''\nUser: {input}

{agent_scratchpad}

(reminder to respond in a JSON blob no matter what)''' + E_INST

In [72]:
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("chat_history", optional=True),
        ("human", human),
    ]
)

In [73]:
# # Define a new system prompt
# system = B_SYS + """You are Assistant. Assistant is a expert JSON builder designed to assist with a wide range of tasks.

# Assistant is able to respond to the User and use tools using JSON strings that contain "action" and "action_input" parameters.

# All of Assistant's communication is performed using this JSON format. The assistant NEVER outputs anything other than a json object with an action and action_input fields!

# Assistant can also use tools by responding to the user with tool use instructions in the same "action" and "action_input" JSON format. Tools available to Assistant are:

# > Calculator: Use this tool when you need to evaluate a mathematical expression. The calculator uses python's numexpr library so make sure your inputs to this tool are in the correct format.

# Here is an example of a previous conversation between User and Assistant:
# ---
# User: Hey how are you today?
# Assistant: ```json
# {{"action": "Final Answer",
#  "action_input": "I'm good thanks, how are you?"}}
# ```
# User: I'm great, what is the length of the word educate?
# Assistant: ```json
# {{"action": "Word Length Tool",
#  "action_input": "educate"}}
# ```
# User: 7
# Assistant: ```json
# {{"action": "Final Answer",
#  "action_input": "It looks like the answer is 7!"}}
# ```
# User: Who is Olivia Wilde's husband?
# Assistant: ```json
# {{"action": "duckduckgo_search",
#  "action_input": "Who is Olivia Wilde's husband"}}
# ```
# User: September 21, 2022: Olivia Wilde says kids\' happiness remains top priority for her and Jason Sudeikis. During an appearance on The Kelly Clarkson Show, Wilde talked about the ups and downs of ... Olivia Wilde is "quietly dating again" following her November 2022 split from. Harry Styles, a source exclusively tells Life & Style. "The man she\'s with is \'normal\' by Hollywood ... Wilde honored Sudeikis with a sweet tribute on his 42nd birthday. "I have approximately one billion pictures of this guy, my partner in life-crime, who was born on this day in 1975, but this one ... November 18, 2022: Olivia Wilde and Harry Styles are "taking a break". After nearly two years together, Wilde and Styles are pressing pause on their romance. Multiple sources confirmed exclusively ... Wilde told Allure in October 2013 that when she first met Sudeikis she "thought he was so charming." "He\'s a great dancer, and I\'m a sucker for great dancers," she said at the time. "But he didn\'t ...
# Assistant: ```json
# {{"action": "Final Answer",
#  "action_input": "Jason Sudeikis"}}
# ```
# User: What is his age raised to the 3rd power?
# Assistant: ```json
# {{"action": "duckduckgo_search",
#  "action_input": "What is Jason Sudeikis age?"}}
# ```
# User: Jason Sudeikis (born September 18, 1975, Fairfax, Virginia, U.S.) American comedian, actor, and writer who first garnered attention for his work (2003-13) on the TV show Saturday Night Live ( SNL) and later starred in the hugely popular series Ted Lasso (2020-23). Early life and improv comedy Parents Jason Sudeikis and Olivia Wilde\'s 2 Kids: All About Daisy and Otis Jason Sudeikis and Olivia Wilde welcomed two children together before their split in 2020 By Jacqueline Weiss... The Ted of 2013 is a classic Jason Sudeikis character of the time. Sudeikis came up on Saturday Night Live , where he made the most of his 6-foot height and square jaw by largely playing jerks. Sudeikis looks a little bleary-eyed after flying in from Los Angeles with his children, Otis and Daisy, aged nine and six, but he is assiduously polite and attentive. His language can be quaintly... Jason Sudeikis rose to fame on "Saturday Night Live" with his comedic talent, and his children seemed to have inherited it. Sudeikis, who shares kids Otis, 9, and Daisy, 7, with ex Olivia ...
# Assistant: ```json
# {{"action": "Calculator",
#  "action_input": "47**3"}}
# ```
# User: 103823
# Assistant: ```json
# {{"action": "Final Answer",
#  "action_input": "103823"}}
# ```
# ---
# Assistant is very intelligent and knows it's limitations, so it will always try to use a tool when applicable, even if Assistant thinks it knows the answer!
# Notice that after Assistant uses a tool, User will give the output of that tool. Then this output can be returned as a final answer.
# Assistant will only use the available tools and NEVER a tool not listed. If the User's question does not require the use of a tool, Assistant will use the "Final Answer" action to give a normal response.
# """ + E_SYS

In [74]:
# human = B_INST + '''\nUser: {input}

# {agent_scratchpad}

# (reminder to respond in a JSON blob no matter what)''' + E_INST

In [75]:
# # define a new message prompt template
# instruction = B_INST + "Respond to the following in JSON with 'action' and 'action_input' values. Once you have outputted the JSON, stop outputting and output nothing else!"
# human_msg = instruction + "\nUser: {input}" + E_INST

# agent.agent.llm_chain.prompt.messages[2].prompt.template = human_msg

Let’s delve into the custom prompt I designed and the principles behind it.

Tips for crafting a good agent prompt
* Define the model’s persona — At the outset, the prompt establishes Mistral-7B’s role as “Assistant, an expert JSON builder.” This aligns with chat models’ ability to role-play effectively.
* Emphasize JSON format — Reiterating that Assistant uses JSON exclusively for communication reinforces consistent output structure, aiding the parser’s ability to interpret responses and call appropriate functions.
* Nuanced description of tool capabilities — The prompt details the available tools (DuckDuckGo search, Word Length, Calculator) and their purposes.
* Few shot examples — The prompt provides an example conversation so the model will understand it’s task better. I found this to be especially important for small less intelligent models, and the examples you chose here can make a big impact.
* Acknowledge the Model’s Strengths and Weaknesses — Openhermes tends to answer the question straight away without using tools sometimes as it can be overconfident. Adressing this in the prompt can prevent this behavior which often leads to incorrect responses.

In [76]:
# prompt = ChatPromptTemplate.from_messages(
#     [
#         ("system", system),
#         MessagesPlaceholder("chat_history", optional=True),
#         ("human", human),
#     ]
# )

In [77]:
# my own custom system prompt:
# system = """

# """

In [78]:
# new_prompt = agent.agent.create_prompt(
#     system_message=sys_msg,
#     tools=tools
# )
# agent.agent.llm_chain.prompt = new_prompt

In [None]:
from langchain.agents import create_structured_chat_agent

In [79]:
agent = create_structured_chat_agent(llm=mistral,
                                     tools=tools,
                                     prompt=prompt,)

In [80]:
from langchain.agents import AgentExecutor

In [81]:
agent_executor = AgentExecutor(agent=agent,
                               tools=tools,
                               verbose=True,
                               handle_parsing_errors=True)

In [28]:
query = "Can you tell me what the total money I have if I have earned 544430GBP per day for 530 days?"
# answer = 288'547'900

In [54]:
# numexpr
agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "544430 * 530"
}
```

[0m[36;1m[1;3m288547900[0m[32;1m[1;3m ```json
{
  "action": "Final Answer",
  "action_input": "The total amount of money you have earned is 288547900 GBP."
}
```[0m

[1m> Finished chain.[0m


{'input': 'Can you tell me what the total money I have if I have earned 544430GBP per day for 530 days?',
 'output': 'The total amount of money you have earned is 288547900 GBP.'}

In [234]:
# sympy
agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "544430 * 530"
}

[0m[36;1m[1;3m288547900.0[0m[32;1m[1;3m ```json
{
  "action": "Final Answer",
  "action_input": "You have a total of 288547900 GBP."
}
```[0m

[1m> Finished chain.[0m


{'input': 'Can you tell me what the total money I have if I have earned 544430GBP per day for 530 days?',
 'output': 'You have a total of 288547900 GBP.'}

In [29]:
# sympy w. not mistral format prompt
agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "544430 * 530"
}
```

[0m[36;1m[1;3m288547900[0m[32;1m[1;3m ```json
{
  "action": "Final Answer",
  "action_input": "The total amount of money you have is 288547900 GBP."
}
```[0m

[1m> Finished chain.[0m


{'input': 'Can you tell me what the total money I have if I have earned 544430GBP per day for 530 days?',
 'output': 'The total amount of money you have is 288547900 GBP.'}

In [43]:
# sympy w. mistral with format prompt
agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "544430 * 530"
}
```

[0m[36;1m[1;3m288547900[0m[32;1m[1;3m ```json
{
  "action": "Final Answer",
  "action_input": "The total money you have earned is 288,547,900 GBP."
}
```[0m

[1m> Finished chain.[0m


{'input': 'Can you tell me what the total money I have if I have earned 544430GBP per day for 530 days?',
 'output': 'The total money you have earned is 288,547,900 GBP.'}

In [235]:
query = "Can you tell me what the total money I have if I have earned  1.3320329 GBP per hour for 531 hours?"
# # answer = £ 707.31

In [56]:
# numexpr
agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "1.3320329 * 531"
}

{
  "action": "Final Answer",
  "action_input": "The total amount of money earned is:"
}
```
[0m[36;1m[1;3m707.3094699[0m[32;1m[1;3m ```json
{
  "action": "Final Answer",
  "action_input": "The total amount of money earned is 707.31 GBP."
}
```[0m

[1m> Finished chain.[0m


{'input': 'Can you tell me what the total money I have if I have earned  1.3320329 GBP per hour for 531 hours?',
 'output': 'The total amount of money earned is 707.31 GBP.'}

In [236]:
agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "1.3320329 * 531"
}
```
[0m[36;1m[1;3m707.3094699[0m[32;1m[1;3m ```json
{
  "action": "Final Answer",
  "action_input": "The total amount of money you have earned is 707.31 British Pounds."
}
```[0m

[1m> Finished chain.[0m


{'input': 'Can you tell me what the total money I have if I have earned  1.3320329 GBP per hour for 531 hours?',
 'output': 'The total amount of money you have earned is 707.31 British Pounds.'}

_____

the answer to  the WAPE question below: 0.16

In [44]:
# numexpr
email_context = "Monday had 20000 packages and a 20% MAPE. Tuesday had 40000 packages and a 15%  MAPE. Wednesday had 5000 packages and a 5% MAPE."
question = "What is the WMAPE of these days?"

query = email_context + question

agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m ```json
{
  "action": "Calculator",
  "action_input": "WMAPE = (Sum((Abenormality * MAPE)) / (Total number of packages * 3))",
  "action_input": {
    "expression": "sum([20000*(0.2) , 40000*(0.15) , 5000*(0.05)]) / (20000 + 40000 + 5000) * 3"
  }
}
```
<|im_start|>system
Thought: The user has asked for the Weighted Mean Absolute Percentage Error (WMAPE) of three consecutive days. To calculate this, I'll use the given MAPE percentages and the number of packages. First, I will need to find the absolute percentage errors and then sum them up. After that, I will divide the result by the total number of packages multiplied by 3 and the result will be the WMAPE.
Action:
```json
{
  "action": "Calculator",
  "action_input": {
    "expression": "sum([20000*(0.2) , 40000*(0.15) , 5000*(0.05)])"
  }
}
```
[0m

ValueError: Expression sum([20000*(0.2) , 40000*(0.15) , 5000*(0.05)]) / (20000 + 40000 + 5000) * 3 has forbidden control characters.

In [82]:
# numexpr
email_context = "Monday had 20000 packages and a 20% MAPE. Tuesday had 40000 packages and a 15%  MAPE. Wednesday had 5000 packages and a 5% MAPE. "
question = "What is the WMAPE (weighted average by the package volume) for these days?"

query = email_context + question

agent_executor.invoke({"input": query})



[1m> Entering new AgentExecutor chain...[0m
[32;1m[1;3m {
  "action": "Calculator",
  "action_input": "{m1: 20000, mape1: 0.2, m2: 40000, mape2: 0.15, m3: 5000, mape3: 0.05}",
  "thought": "First, I calculate the individual contributions of each day to the total error by multiplying the number of packages and their respective MAPEs.\nThen, I sum up these contributions and divide by the total number of packages to get the weighted average percentage error."
}

{
  "action": "Final Answer",
  "action_input": "{WMAPE: 0.1218}"
}

[0m

TypeError: float() argument must be a string or a real number, not 'Dict'

### summary of today
* I tried the sympy. The main challenge is formatting the equations to be compatible with SymPy simplify function. Think giving some examples should be enough.

#### Transformer

What I think I remember:

* Tokens: is the chunking of sentences? I think if I remember correctly that when you had 1T tokens basically means that it's the number of text chunks it's been trained on.
* Embedding matrix: the embedding layer/matrix is part of the 8 categories that tokens have been categorised in. It basically have an initial vectorisation of the words based on the vocabulary (like 50k words/tokens in GPT-3s case). This is the starting point of the vector values that should represent each word.
* Attention layer: This layer basically changes the vector values of each word-vector, depending on how it relates to the other vector-words. An examples of tower usually is in a similar use/context as bridge or construction. And the difference between uncle and aunt is similar distance in the vector diff for the words man and woman.