![Img](https://app.theheadstarter.com/static/hs-logo-opengraph.png)

# Headstarter AI Agent Workshop

#### **Skills: OpenAI, Groq, Llama, OpenRouter**

## **To Get Started:**
1. [Get your Groq API Key](https://console.groq.com/keys)
2. [Get your OpenRouter API Key](https://openrouter.ai/settings/keys)
3. [Get your OpenAI API Key](https://platform.openai.com/api-keys)


### **Interesting Reads**
- [Sam Altman's Blog Post: The Intelligence Age](https://ia.samaltman.com/)
- [What LLMs cannot do](https://ehudreiter.com/2023/12/11/what-llms-cannot-do/)
- [Chain of Thought Prompting](https://www.promptingguide.ai/techniques/cot)
- [Why ChatGPT can't count the number of r's in the word strawberry](https://prompt.16x.engineer/blog/why-chatgpt-cant-count-rs-in-strawberry)


## During the Workshop
- [Any code shared during the workshop will be posted here](https://docs.google.com/document/d/1hPBJt_4Ihkj6v667fWxVjzwCMS4uBPdYlBLd2IqkxJ0/edit?usp=sharing)

# Install necessary libraries

In [69]:
! pip install openai groq

# Set up Groq, OpenRouter, & OpenAI clients

In [70]:
from openai import OpenAI
from google.colab import userdata
import os
import json
from groq import Groq
import json
from typing import List, Dict, Any, Callable
import ast
import io
import sys

groq_api_key = userdata.get("GROQ_API_KEY")
os.environ['GROQ_API_KEY'] = groq_api_key

openrouter_api_key = userdata.get("OPENROUTER_API_KEY")
os.environ['OPENROUTER_API_KEY'] = openrouter_api_key

openai_api_key = userdata.get("OPENAI_API_KEY")
os.environ['OPENAI_API_KEY'] = openai_api_key

groq_client = Groq(api_key=os.getenv('GROQ_API_KEY'))

openrouter_client = OpenAI(
    base_url="https://openrouter.ai/api/v1",
    api_key=os.getenv("OPENROUTER_API_KEY")
)

openai_client = OpenAI(
    base_url="https://api.openai.com/v1",
    api_key=os.getenv("OPENAI_API_KEY")
)

### Define functions to easily query and compare responses from OpenAI, Groq, and OpenRouter

In [81]:
def get_llm_response(client, prompt, openai_model="gpt-4o-mini", json_mode=False):

    if client == "openai":

        kwargs = {
            "model": openai_model,
            "messages": [{"role": "user", "content": prompt}]
        }

        if json_mode:
            kwargs["response_format"] = {"type": "json_object"}

        response = openai_client.chat.completions.create(**kwargs)

    elif client == "groq":

        try:
            models = ["llama-3.1-8b-instant", "llama-3.1-70b-versatile", "llama3-70b-8192", "llama3-8b-8192", "gemma2-9b-it"]

            for model in models:

                try:
                    kwargs = {
                        "model": model,
                        "messages": [{"role": "user", "content": prompt}]
                    }
                    if json_mode:
                        kwargs["response_format"] = {"type": "json_object"}

                    response = groq_client.chat.completions.create(**kwargs)

                    break

                except Exception as e:
                    print(f"Error: {e}")
                    continue

        except Exception as e:
            print(f"Error: {e}")

            kwargs = {
                "model": "meta-llama/llama-3.1-8b-instruct:free",
                "messages": [{"role": "user", "content": prompt}]
            }

            if json_mode:
                kwargs["response_format"] = {"type": "json_object"}

            response = openrouter_client.chat.completions.create(**kwargs)

    else:
        raise ValueError(f"Invalid client: {client}")

    return response.choices[0].message.content


def evaluate_responses(prompt, reasoning_prompt=False, openai_model="gpt-4o-mini"):

    if reasoning_prompt:
        prompt = f"{prompt}\n\n{reasoning_prompt}."

    openai_response = get_llm_response("openai", prompt, openai_model)
    groq_response = get_llm_response("groq", prompt)

    print(f"OpenAI Response: {openai_response}")
    print(f"\n\nGroq Response: {groq_response}")

In [82]:
prompt = "What is the easiest way to find the square root of a problem"
evaluate_responses(prompt, reasoning_prompt=False)


In [83]:
prompt2 = "which is bigger, 9.1 or 9.11"
evaluate_responses(prompt2, reasoning_prompt=False)

# Agent Architecture

[![](https://mermaid.ink/img/pako:eNqFUslugzAQ_ZWRz8kPILUVCZClVdQmqdTK5ODCFFDAjry0iSD_XmOTJodK5WS_Zd7M4JZkIkcSkEKyQwnbKOVgvzf6qlDCi0F52sF4fA-hJ0IaGi24aIRREBbItacn9LlmnKPced2kR7s1aiO5AmU-NFN71cG0fRLiALqUwhQlbAbi7F1TVyuia2RKXItFDo5pmOnqBo5dhgcDmKFlmEaY2oE6SOgvwHgO8REzM5B_2n1kxYsOZtSrxSUocfkzf5m5y5zGX6w27Cqau3IDOpTU8gR3kLBa2Y4W7QqP-jLywzDywtnesd_NLbISHSxpUnFWQ8jVt_0b8VFLll0Tl66TR-q3DLfaf3vaSoMukYxIg7JhVW4fQdvbUqJLbDAlgT3mTO5TkvKz1TG7ks2JZyTQ1j0i5pDb9UYVs2-nIcFnP-YFjfPKNjqA5x9CM8YW?type=png)](https://mermaid.live/edit#pako:eNqFUslugzAQ_ZWRz8kPILUVCZClVdQmqdTK5ODCFFDAjry0iSD_XmOTJodK5WS_Zd7M4JZkIkcSkEKyQwnbKOVgvzf6qlDCi0F52sF4fA-hJ0IaGi24aIRREBbItacn9LlmnKPced2kR7s1aiO5AmU-NFN71cG0fRLiALqUwhQlbAbi7F1TVyuia2RKXItFDo5pmOnqBo5dhgcDmKFlmEaY2oE6SOgvwHgO8REzM5B_2n1kxYsOZtSrxSUocfkzf5m5y5zGX6w27Cqau3IDOpTU8gR3kLBa2Y4W7QqP-jLywzDywtnesd_NLbISHSxpUnFWQ8jVt_0b8VFLll0Tl66TR-q3DLfaf3vaSoMukYxIg7JhVW4fQdvbUqJLbDAlgT3mTO5TkvKz1TG7ks2JZyTQ1j0i5pDb9UYVs2-nIcFnP-YFjfPKNjqA5x9CM8YW)


![agent_architecture_v2](https://github.com/user-attachments/assets/a65b6db9-bef1-4579-aed3-01444ce40544)

### To create our AI Agent, we will define the following functions:

1. **Planner:** This function takes a user's query and breaks it down into smaller, manageable subtasks. It returns these subtasks as a list, where each one is either a reasoning task or a code generation task.

2. **Reasoner:** This function provides reasoning on how to complete a specific subtask, considering both the overall query and the results of any previous subtasks. It returns a short explanation on how to proceed with the current subtask.

3. **Actioner:** Based on the reasoning provided for a subtask, this function decides whether the next step requires generating code or more reasoning. It then returns the chosen action and any necessary details to perform it.

4. **Evaluator:** This function checks if the result of the current subtask is reasonable and aligns with the overall goal. It returns an evaluation of the result and indicates whether the subtask needs to be retried.

5. **generate_and_execute_code:** This function generates and executes Python code based on a given prompt and memory of previous steps. It returns both the generated code and its execution result.

6. **executor:** Depending on the action decided by the "actioner," this function either generates and executes code or returns reasoning. It handles the execution of tasks based on the action type.

7. **final_answer_extractor:** After all subtasks are completed, this function gathers the results from previous steps to extract and provide the final answer to the user's query.

8. **autonomous_agent:** This is the main function that coordinates the process of answering the user's query. It manages the entire sequence of planning, reasoning, action, evaluation, and final answer extraction to produce a complete response.

In [95]:
def planner(user_query) -> List[str]:
  prompt = f"""Given the user's query:  '{user_query}', break down the query into as few subtasks as possible in order to answer the question.

  Each subtask should be either a reasoning task or a code generation task. Never duplicate a task.

  Here are the only 2 actions that can be taken for each subtask:
    -generate_code: This action involves generating python code and executing it to answer the question and executing it in order to make a calculation or verification
    -reasoning: This action involves providing reasoning for what to do to complete the subtask

  Each subtask should begin with either "reasoning" or "generate_code", followed by a colon

  Keep in mind the overall goal of answering the user's query throughout the planning process

  Return the result as a JSON list of strings, where each string is a subtask.

  Here is an exmaple JSON response:

  {{
    "subtasks": ["Subtask 1", "Substack 2", "Substack 3"]
  }}

  """

  response = json.loads(get_llm_response("groq", prompt, json_mode=True))

  return response["subtasks"]

In [96]:
query = "If a student can't answer 347-238 accurately, what common core state standard should they focus on?"
subtasks = planner(query)
subtasks

['reasoning: Identify the mathematical operation represented by the subtraction problem 347-238 and determine its accuracy requirement in the context of common core state standards.',
 'generate_code: Use a Python library like sympy to evaluate the mathematical expression 347-238 and retrieve the expected result, so that we can compare it to known common core state standards and find one that matches an accurate calculation of this',
 "reasoning: After identifying the accurate common core state standard, reason on how it relates to the given subtraction problem in the context of the user's query.",
 "generate_code: Filter the identified common core state standard based on its alignment with the mathematics curriculum and syllabus typically taught in the student's grade level or academic year.",
 'reasoning: After filtering and aligning the common core state standard, reason on how the student can improve their understanding of the mathematics concept underlying the given subtraction pr

In [98]:
from typing import List
import json

def recursive_reasoner(user_query: str, subtasks: List[str], current_subtask_index: int = 0) -> List[str]:
    if current_subtask_index >= len(subtasks):
        return []

    current_subtask = subtasks[current_subtask_index]

    prompt = f"""Given the user's query (long-term goal): '{user_query}'

    Here are all the subtasks to complete in order to answer the user's query:
    <subtasks>
        {json.dumps(subtasks)}
    </subtasks>

    The current subtask to complete is:
    <current_subtask>
        {current_subtask}
    </current_subtask>

    - Provide concise reasoning on how to execute the current subtask, considering previous results and subtasks.
    - Prioritize explicit details over assumed patterns
    - Avoid unnecessary complications in problem-solving

    Return the result as a JSON object with 'reasoning' as a key.

    Example JSON response:
    {{
        "reasoning": "2 sentences max on how to complete the current subtask."
    }}
    """

    response = json.loads(get_llm_response("groq", prompt, json_mode=True))
    current_reasoning = response["reasoning"]

    # Recursively call the function for the next subtask
    next_reasonings = recursive_reasoner(user_query, subtasks, current_subtask_index + 1)

    # Prepend the current reasoning to the list of future reasonings
    return [current_reasoning] + next_reasonings

# Usage
user_query = "Your long-term goal here"
subtasks = ["Subtask 1", "Subtask 2", "Subtask 3"]
all_reasonings = recursive_reasoner(user_query, subtasks)

# Print all reasonings
for i, reasoning in enumerate(all_reasonings, 1):
    print(f"Subtask {i} reasoning: {reasoning}")




In [99]:
def reasoner(user_query: str, subtasks: List[str], current_subtask: str) -> str:
   prompt = f"""Given the user's query (long-term goal): '{user_query}'

   Here are all the subtasks to complete in order to answer the user's query:
   <subtasks>
       {json.dumps(subtasks)}
   </subtasks>

   The current subtask to complete is:
   <current_subtask>
       {current_subtask}
   </current_subtask>

   - Provide concise reasoning on how to execute the current subtask, considering previous results.
   - Prioritize explicit details over assumed patterns
   - Avoid unnecessary complications in problem-solving

   Return the result as a JSON object with 'reasoning' as a key.

   Example JSON response:
   {{
       "reasoning": "2 sentences max on how to complete the current subtask."
   }}
   """




In [100]:
reasoner_output = reasoner(query, subtasks, subtasks[0])

In [101]:
def actioner(user_query: str, subtasks: List[str], current_subtask: str, reasoning: str) -> Dict[str, Any]:
   prompt = f"""Given the user's query (long-term goal): '{user_query}'

   The subtasks are:
   <subtasks>
       {json.dumps(subtasks)}
   </subtasks>

   The current subtask is:
   <current_subtask>
       {current_subtask}
   </current_subtask>

   The reasoning for this subtask is:
   <reasoning>
       {reasoning}
   </reasoning>

   Determine the most appropriate action to take:
       - If the task requires a calculation or verification through code, use the 'generate_code' action.
       - If the task requires reasoning without code or calculations, use the 'reasoning' action.

   Consider the overall goal and previous results when determining the action.

   Return the result as a JSON object with 'action' and 'parameters' keys.  The 'parameters' key should always be a dictionary with 'prompt' as a key.

   Example JSON responses:

   {{
       "action": "generate_code",
       "parameters": {{"prompt": "Write a function to calculate the area of a circle."}}
   }}

   {{
       "action": "reasoning",
       "parameters": {{"prompt": "Explain how to complete the subtask."}}
   }}
   """
   response = json.loads(get_llm_response("groq", prompt, json_mode=True))
   return response


In [102]:
actioner_output =actioner(query, subtasks, subtasks[1], reasoner_output )

In [103]:
def generate_and_execute_code(prompt: str, user_query: str) -> Dict[str, Any]:
   code_generation_prompt = f"""

   Generate Python code to implement the following task: '{prompt}'

   Here is the overall goal of answering the user's query: '{user_query}'





   Here are the guidelines for generating the code:
       - Return only the Python code, without any explanations or markdown formatting.
       - The code should always print or return a value
       - Don't include any backticks or code blocks in your response. Do not include ```python or ``` in your response, just give me the code.
       - Do not ever use the input() function in your code, use defined values instead.
       - Do not ever use NLP techniques in your code, such as importing nltk, spacy, or any other NLP library.
       - Don't ever define a function in your code, just generate the code to execute the subtask.
       - Don't ever provide the execution result in your response, just give me the code.
       - If your code needs to import any libraries, do it within the code itself.
       - The code should be self-contained and ready to execute on its own.
       - Prioritize explicit details over assumed patterns
       - Avoid unnecessary complications in problem-solving
   """

   generated_code = get_llm_response("groq", code_generation_prompt)


   print(f"\n\nGenerated Code: start|{generated_code}|END\n\n")

   old_stdout = sys.stdout
   sys.stdout = buffer = io.StringIO()

   exec(generated_code)

   sys.stdout = old_stdout
   output = buffer.getvalue()

   print(f"\n\n***** Execution Result: |start|{output.strip()}|end| *****\n\n")

   return {
       "generated_code": generated_code,
       "execution_result": output.strip()
   }


def executor(action: str, parameters: Dict[str, Any], user_query: str, ) -> Any:
   if action == "generate_code":
       print(f"Generating code for: {parameters['prompt']}")
       return generate_and_execute_code(parameters["prompt"], user_query)
   elif action == "reasoning":
       return parameters["prompt"]
   else:
       return f"Action '{action}' not implemented"




In [104]:
executor("generate_code", parameters = actioner_output['parameters'], user_query=query)

{'generated_code': 'import math\n\nresult_subtract = 347 - 238\nif result_subtract != 109:\n    if math.floor(347 / 100) >= 3 and math.floor(347 / 100) <= 9 or \\\n       math.floor(238 / 100) >= 2 and math.floor(238 / 100) <= 9:\n        print("Focus on 3.NBT.2 - Fluently add and subtract within 1000.")\n    else:\n        print("Focus on 3.NBT.3 - Read, write, and compare decimals to hundredths.")\n        print("Also consider 4.NBT.4 - Add and subtract multi-digit whole numbers up to four digits in base-ten.")\nelse:\n    print("Student has already mastered the subtraction of multi-digit whole numbers in base-ten.")',
 'execution_result': 'Student has already mastered the subtraction of multi-digit whole numbers in base-ten.'}

# OpenAI o1-preview model getting trivial questions wrong

- [Link to Reddit post](https://www.reddit.com/r/ChatGPT/comments/1ff9w7y/new_o1_still_fails_miserably_at_trivial_questions/)
- Links to ChatGPT threads: [(1)](https://chatgpt.com/share/66f21757-db2c-8012-8b0a-11224aed0c29), [(2)](https://chatgpt.com/share/66e3c1e5-ae00-8007-8820-fee9eb61eae5)
- [Improving reasoning in LLMs through thoughtful prompting](https://www.reddit.com/r/singularity/comments/1fdhs2m/did_i_just_fix_the_data_overfitting_problem_in/?share_id=6DsDLJUu1qEx_bsqFDC8a&utm_content=2&utm_medium=ios_app&utm_name=ioscss&utm_source=share&utm_term=1)


In [66]:
query3 = "The surgeon is the boy's father. He says, 'I can't operate on him!' He's my son"
result = get_llm_response("openai", query3)
print(result)

In [67]:
query = "The surgeon is the boy's father. He says, 'I can't operate on him!' He's my son"
result = autonomous_agent(query3)
print(result)