# Welcome to the start of your adventure in Agentic AI

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Are you ready for action??</h2>
            <span style="color:#ff7800;">Have you completed all the setup steps in the <a href="../setup/">setup</a> folder?<br/>
            Have you read the <a href="../README.md">README</a>? Many common questions are answered here!<br/>
            Have you checked out the guides in the <a href="../guides/01_intro.ipynb">guides</a> folder?<br/>
            Well in that case, you're ready!!
            </span>
        </td>
    </tr>
</table>

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/tools.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#00bfff;">This code is a live resource - keep an eye out for my updates</h2>
            <span style="color:#00bfff;">I push updates regularly. As people ask questions or have problems, I add more examples and improve explanations. As a result, the code below might not be identical to the videos, as I've added more steps and better comments. Consider this like an interactive book that accompanies the lectures.<br/><br/>
            I try to send emails regularly with important updates related to the course. You can find this in the 'Announcements' section of Udemy in the left sidebar. You can also choose to receive my emails via your Notification Settings in Udemy. I'm respectful of your inbox and always try to add value with my emails!
            </span>
        </td>
    </tr>
</table>

### And please do remember to contact me if I can help

And I love to connect: https://www.linkedin.com/in/eddonner/


### New to Notebooks like this one? Head over to the guides folder!

Just to check you've already added the Python and Jupyter extensions to Cursor, if not already installed:
- Open extensions (View >> extensions)
- Search for python, and when the results show, click on the ms-python one, and Install it if not already installed
- Search for jupyter, and when the results show, click on the Microsoft one, and Install it if not already installed  
Then View >> Explorer to bring back the File Explorer.

And then:
1. Click where it says "Select Kernel" near the top right, and select the option called `.venv (Python 3.12.9)` or similar, which should be the first choice or the most prominent choice. You may need to choose "Python Environments" first.
2. Click in each "cell" below, starting with the cell immediately below this text, and press Shift+Enter to run
3. Enjoy!

After you click "Select Kernel", if there is no option like `.venv (Python 3.12.9)` then please do the following:  
1. On Mac: From the Cursor menu, choose Settings >> VS Code Settings (NOTE: be sure to select `VSCode Settings` not `Cursor Settings`);  
On Windows PC: From the File menu, choose Preferences >> VS Code Settings(NOTE: be sure to select `VSCode Settings` not `Cursor Settings`)  
2. In the Settings search bar, type "venv"  
3. In the field "Path to folder with a list of Virtual Environments" put the path to the project root, like C:\Users\username\projects\agents (on a Windows PC) or /Users/username/projects/agents (on Mac or Linux).  
And then try again.

Having problems with missing Python versions in that list? Have you ever used Anaconda before? It might be interferring. Quit Cursor, bring up a new command line, and make sure that your Anaconda environment is deactivated:    
`conda deactivate`  
And if you still have any problems with conda and python versions, it's possible that you will need to run this too:  
`conda config --set auto_activate_base false`  
and then from within the Agents directory, you should be able to run `uv python list` and see the Python 3.12 version.

In [1]:
# First let's do an import. If you get an Import Error, double check that your Kernel is correct..

from dotenv import load_dotenv


In [2]:
# Next it's time to load the API keys into environment variables
# If this returns false, see the next cell!

load_dotenv(override=True)

True

### Wait, did that just output `False`??

If so, the most common reason is that you didn't save your `.env` file after adding the key! Be sure to have saved.

Also, make sure the `.env` file is named precisely `.env` and is in the project root directory (`agents`)

By the way, your `.env` file should have a stop symbol next to it in Cursor on the left, and that's actually a good thing: that's Cursor saying to you, "hey, I realize this is a file filled with secret information, and I'm not going to send it to an external AI to suggest changes, because your keys should not be shown to anyone else."

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/stop.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Final reminders</h2>
            <span style="color:#ff7800;">1. If you're not confident about Environment Variables or Web Endpoints / APIs, please read Topics 3 and 5 in this <a href="../guides/04_technical_foundations.ipynb">technical foundations guide</a>.<br/>
            2. If you want to use AIs other than OpenAI, like Gemini, DeepSeek or Ollama (free), please see the first section in this <a href="../guides/09_ai_apis_and_ollama.ipynb">AI APIs guide</a>.<br/>
            3. If you ever get a Name Error in Python, you can always fix it immediately; see the last section of this <a href="../guides/06_python_foundations.ipynb">Python Foundations guide</a> and follow both tutorials and exercises.<br/>
            </span>
        </td>
    </tr>
</table>

In [3]:
# Check the key - if you're not using OpenAI, check whichever key you're using! Ollama doesn't need a key.

import os
openai_api_key = os.getenv('OPENAI_API_KEY')

if openai_api_key:
    print(f"OpenAI API Key exists and begins {openai_api_key[:8]}")
else:
    print("OpenAI API Key not set - please head to the troubleshooting guide in the setup folder")
    


OpenAI API Key exists and begins sk-proj-


In [4]:
# And now - the all important import statement
# If you get an import error - head over to troubleshooting in the Setup folder
# Even for other LLM providers like Gemini, you still use this OpenAI import - see Guide 9 for why

from openai import OpenAI

In [5]:
# And now we'll create an instance of the OpenAI class
# If you're not sure what it means to create an instance of a class - head over to the guides folder (guide 6)!
# If you get a NameError - head over to the guides folder (guide 6)to learn about NameErrors - always instantly fixable
# If you're not using OpenAI, you just need to slightly modify this - precise instructions are in the AI APIs guide (guide 9)

openai = OpenAI()

In [6]:
# Create a list of messages in the familiar OpenAI format

messages = [{"role": "user", "content": "What is 2+2?"}]

In [7]:
# And now call it! Any problems, head to the troubleshooting guide
# This uses GPT 4.1 nano, the incredibly cheap model
# The APIs guide (guide 9) has exact instructions for using even cheaper or free alternatives to OpenAI
# If you get a NameError, head to the guides folder (guide 6) to learn about NameErrors - always instantly fixable

response = openai.chat.completions.create(
    model="gpt-4.1-nano",
    messages=messages
)

print(response.choices[0].message.content)


2 + 2 equals 4.


In [8]:
# And now - let's ask for a question:

question = "Please propose a hard, challenging question to assess someone's IQ. Respond only with the question."
messages = [{"role": "user", "content": question}]


In [9]:
# ask it - this uses GPT 4.1 mini, still cheap but more powerful than nano

response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages
)

question = response.choices[0].message.content

print(question)


If all Bloops are Razzies and all Razzies are Lazzies, which of the following must be true?

A) All Bloops are definitely Lazzies.  
B) Some Lazzies are Bloops.  
C) No Bloop is a Lazzie.  
D) Some Razzies are not Bloops.  

Explain your reasoning.


In [10]:
# form a new messages list
messages = [{"role": "user", "content": question}]


In [11]:
# Ask it again

response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages
)

answer = response.choices[0].message.content
print(answer)


Let's analyze the given statements step-by-step:

**Given:**  
1. All Bloops are Razzies.  
2. All Razzies are Lazzies.

---

### Step 1: Understanding the relationships

- **All Bloops are Razzies** means every Bloop is included inside the set of Razzies.
  
  Symbolically:  
  \( \text{Bloop} \subseteq \text{Razzie} \)

- **All Razzies are Lazzies** means every Razzie is included inside the set of Lazzies.
  
  Symbolically:  
  \( \text{Razzie} \subseteq \text{Lazzie} \)

---

### Step 2: Logical conclusion by transitivity

Since all Bloops are Razzies, and all Razzies are Lazzies, it follows:

\[
\text{Bloop} \subseteq \text{Razzie} \subseteq \text{Lazzie}
\]

Therefore:

\[
\text{Bloop} \subseteq \text{Lazzie}
\]

Meaning:

- **All Bloops are Lazzies.**

---

### Step 3: Evaluate each option

- **A) All Bloops are definitely Lazzies.**  
  This is **true**, as shown.

- **B) Some Lazzies are Bloops.**  
  If all Bloops are Lazzies, then necessarily *some* Lazzies (at least those B

In [12]:
from IPython.display import Markdown, display

display(Markdown(answer))



Let's analyze the given statements step-by-step:

**Given:**  
1. All Bloops are Razzies.  
2. All Razzies are Lazzies.

---

### Step 1: Understanding the relationships

- **All Bloops are Razzies** means every Bloop is included inside the set of Razzies.
  
  Symbolically:  
  \( \text{Bloop} \subseteq \text{Razzie} \)

- **All Razzies are Lazzies** means every Razzie is included inside the set of Lazzies.
  
  Symbolically:  
  \( \text{Razzie} \subseteq \text{Lazzie} \)

---

### Step 2: Logical conclusion by transitivity

Since all Bloops are Razzies, and all Razzies are Lazzies, it follows:

\[
\text{Bloop} \subseteq \text{Razzie} \subseteq \text{Lazzie}
\]

Therefore:

\[
\text{Bloop} \subseteq \text{Lazzie}
\]

Meaning:

- **All Bloops are Lazzies.**

---

### Step 3: Evaluate each option

- **A) All Bloops are definitely Lazzies.**  
  This is **true**, as shown.

- **B) Some Lazzies are Bloops.**  
  If all Bloops are Lazzies, then necessarily *some* Lazzies (at least those Bloops) exist.  
  However, this requires there to be at least one Bloop (non-empty set). Since the problem does not specify existence, "some" statements can be uncertain.

- **C) No Bloop is a Lazzie.**  
  This is **false** because we've shown all Bloops are Lazzies.

- **D) Some Razzies are not Bloops.**  
  Not necessarily true. It is possible that all Razzies are Bloops (though not specified), or some are not. The statement does not **must** be true.

---

### Final conclusion:

**The statement that must be true based on given information is:**

**A) All Bloops are definitely Lazzies.**

---

# Summary:

- Because all Bloops are contained within Razzies, and all Razzies within Lazzies, all Bloops must be Lazzies.
- Without additional information about the existence or number of Bloops, statements about "some" or "no" cannot be guaranteed.
- Hence, **only A is necessarily true**.

# Congratulations!

That was a small, simple step in the direction of Agentic AI, with your new environment!

Next time things get more interesting...

<table style="margin: 0; text-align: left; width:100%">
    <tr>
        <td style="width: 150px; height: 150px; vertical-align: middle;">
            <img src="../assets/exercise.png" width="150" height="150" style="display: block;" />
        </td>
        <td>
            <h2 style="color:#ff7800;">Exercise</h2>
            <span style="color:#ff7800;">Now try this commercial application:<br/>
            First ask the LLM to pick a business area that might be worth exploring for an Agentic AI opportunity.<br/>
            Then ask the LLM to present a pain-point in that industry - something challenging that might be ripe for an Agentic solution.<br/>
            Finally have 3 third LLM call propose the Agentic AI solution. <br/>
            We will cover this at up-coming labs, so don't worry if you're unsure.. just give it a try!
            </span>
        </td>
    </tr>
</table>

In [13]:
# First create the messages:

messages = [{"role": "user", "content": "How can we explore the potential of LLMs in robotics?"}]

# Then make the first call:

response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages
)

# Then read the business idea:

business_idea = response.choices[0].message.content

# And repeat! In the next message, include the business idea within the message

messages = [{"role": "user", "content": business_idea}]

# Then make the second call:

response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages
)

# Then read the pain point:

pain_point = response.choices[0].message.content

messages = [{"role": "user", "content": f"Propose a solution to the following pain point: {pain_point}"}]

# Then make the second call:

response = openai.chat.completions.create(
    model="gpt-4.1-mini",
    messages=messages
)

# Then read the pain point:

solution = response.choices[0].message.content
print(solution)

This is an excellent and practical approach to bridging LLMs with robotics! To build on your overview and proposed example, here’s a concrete solution to streamline and enhance the workflow of leveraging an LLM for command parsing and robotic action execution:

---

## Proposed Solution: Modular LLM-Powered Robotics Command Pipeline

### Key Idea

Create a modular software pipeline integrating natural language command parsing (using an LLM), grounding in perception modules, and robot control, with clear interfaces and fallback/feedback loops.

---

### Architecture Overview

```
User Command (NL)  -->  LLM Parser  -->  Structured Action Plan (JSON)  -->  
     Perception Module (Vision, Localization) --> Plan Refinement & Validation -->  
     Robot Control Interface  -->  Robot Execution  
         ^                                                         |  
         |------------------------------------ Feedback / Clarification Loop  
```

---

### Detailed Pipeline Steps & Enhancem

In [14]:
from IPython.display import Markdown, display

display(Markdown(business_idea))

Exploring the potential of large language models (LLMs) in robotics is a promising area that combines advances in natural language understanding with physical interaction capabilities. Here are some key approaches and ideas to explore their potential:

### 1. **Natural Language Interfaces for Robots**
- **Goal:** Enable robots to understand and execute complex instructions given in natural language.
- **Approach:** Use LLMs to parse commands, generate actionable plans, and translate high-level goals into low-level robotic actions.
- **Example:** A user says, “Pick up the red cup from the table and put it in the sink,” and the robot leverages an LLM to interpret the command contextually.

### 2. **Task Planning and Reasoning**
- **Goal:** Use LLMs as planners or reasoning agents to structure multi-step tasks.
- **Approach:** Combine LLM-generated sequences with robotic motion planning to handle abstract tasks by decomposing them into executable steps.
- **Example:** An LLM breaks down “Clean the kitchen” into sub-tasks like “wash dishes,” “wipe counters,” and “take out trash,” which the robot then executes.

### 3. **Learning from Language and Interaction**
- **Goal:** Improve robots’ learning by leveraging rich textual knowledge and interactive feedback.
- **Approach:** Use LLMs to provide prior knowledge about objects, environments, or social norms that can augment sensor-based learning.
- **Example:** A robot asking clarifying questions or receiving corrections in natural language and updating its model accordingly.

### 4. **Multi-modal Fusion**
- **Goal:** Combine visual, tactile, and proprioceptive data with language understanding.
- **Approach:** Integrate LLMs with visual models (like vision transformers) to ground language in perception and action.
- **Example:** A robot understands “grab the cube next to the blue ball” by associating linguistic terms with visual input.

### 5. **Sim-to-Real Transfer with Language Supervision**
- **Goal:** Use LLMs to generate diverse language instructions and scenarios to train robotic policies in simulation.
- **Approach:** Generate varied, naturalistic instructions to improve robustness and generalizability before deployment in the real world.

### 6. **Safety and Explainability**
- **Goal:** Use LLMs to explain robotic decision-making or to monitor for unsafe instructions.
- **Approach:** Generate human-readable explanations for robot actions or flag ambiguous commands.

### Practical Steps to Explore:
- **Prototype integrations:** Start by integrating LLM APIs with robotic control frameworks (e.g., ROS).
- **Benchmark tasks:** Design tasks requiring natural language understanding to measure performance gains.
- **Data collection:** Collect datasets of paired language commands and robot actions.
- **Iterative testing:** Evaluate how well LLM-guided instructions perform in real or simulated environments.
- **Collaborations:** Work with NLP researchers to adapt LLMs for grounded language understanding.

### Challenges to Consider:
- Grounding LLMs in the real-world environment.
- Handling ambiguous or incomplete instructions.
- Real-time processing constraints.
- Safety and robustness in physical interactions.

---

If you want, I can provide example code snippets or recommend specific frameworks and architectures to start experimenting!

In [15]:
from IPython.display import Markdown, display

display(Markdown(pain_point))

This is a great overview of leveraging LLMs in robotics! If you want, I can help you with concrete next steps including example code snippets, recommended frameworks, or pointers to relevant models and datasets to jumpstart your experiments.

### Example: Integrating an LLM for Command Parsing in a Robot

Here's a high-level Python example illustrating how you might use an LLM (e.g., OpenAI GPT) to parse natural language commands and convert them into low-level robotic actions:

```python
import openai

openai.api_key = "YOUR_API_KEY"

def parse_command(command_text):
    prompt = f"""
    You are an assistant that converts natural language robot commands into structured steps.
    Input command: "{command_text}"
    Output steps as JSON list of simple actions with parameters, e.g.:
    [
        {{"action": "move", "object": "red cup", "from": "table"}},
        {{"action": "place", "object": "red cup", "location": "sink"}}
    ]
    """
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[{"role": "system", "content": "You convert commands to robot action plans."},
                  {"role": "user", "content": prompt}],
        temperature=0,
        max_tokens=200,
    )
    steps = response['choices'][0]['message']['content']
    return steps

command = "Pick up the red cup from the table and put it in the sink."
plan = parse_command(command)
print("Planned steps for robot:", plan)
```

### Why this helps:
- The LLM acts as a semantic parser from language → structured plan.
- Your robot controller can consume the structured output to trigger motions and manipulations.

---

### Frameworks & Tools to Consider

- **Robot Operating System (ROS)**: Standard middleware for robot control; provides nodes & message passing.
- **LangChain / LLM Chains**: Useful for orchestrating multi-step LLM calls and integrating external tools.
- **CLIP / BLIP / Visual Transformers (ViT)**: To connect language with vision perception for grounding.
- **Action Execution Simulator (e.g. Isaac Gym, PyBullet, or Habitat)**: To first test LLM-based command planning without hardware risks.
- **RoboGPT / SayCan (from Google Research)**: Open-source projects combining LLMs + robot manipulation.

---

### Next Steps

1. **Start small**: Build a simple language interface that turns commands into sequences of primitive robot actions.
2. **Test in simulation**: Connect the textual plan output with a simulator to verify action feasibility.
3. **Add grounding**: Integrate with perception modules to understand references to real-world objects.
4. **Incorporate feedback**: Have the robot confirm commands or ask clarifying questions if uncertain.
5. **Explore datasets**: Look at ALFRED, TEACh, or RoboNLP datasets which pair language with robotics tasks.

---

If you'd like sample code on fusing vision + language, multi-step planning with LLMs, or safety monitoring, just let me know!

In [16]:
from IPython.display import Markdown, display

display(Markdown(solution))

This is an excellent and practical approach to bridging LLMs with robotics! To build on your overview and proposed example, here’s a concrete solution to streamline and enhance the workflow of leveraging an LLM for command parsing and robotic action execution:

---

## Proposed Solution: Modular LLM-Powered Robotics Command Pipeline

### Key Idea

Create a modular software pipeline integrating natural language command parsing (using an LLM), grounding in perception modules, and robot control, with clear interfaces and fallback/feedback loops.

---

### Architecture Overview

```
User Command (NL)  -->  LLM Parser  -->  Structured Action Plan (JSON)  -->  
     Perception Module (Vision, Localization) --> Plan Refinement & Validation -->  
     Robot Control Interface  -->  Robot Execution  
         ^                                                         |  
         |------------------------------------ Feedback / Clarification Loop  
```

---

### Detailed Pipeline Steps & Enhancements

| Step                                      | Enhancement                                                 | Tools / Tips                              |
|-------------------------------------------|-------------------------------------------------------------|------------------------------------------|
| **1. Language Parsing (Your example)**    | - Use **few-shot prompting** or fine-tune an LLM on robotics task commands;<br>- Probabilistic parsing to output confidence scores | OpenAI GPT API / Local LLM (Llama, etc.) |
| **2. Perception & Grounding**              | - Use vision models (e.g., CLIP or BLIP) to identify referenced objects in the scene <br>- Map language references (“red cup”) to detected objects with IDs | OpenCV, PyTorch + pre-trained vision models, ROS perception stack        |
| **3. Plan Refinement & Validation**        | - Check whether the planned actions are feasible in the current environment<br>- Validate object locations, accessibility, and motion constraints | Use physics simulator (PyBullet, Isaac Gym), implement collision checks  |
| **4. Execution Interface (ROS integration)**| - Translate JSON action steps into appropriate ROS messages or robot-specific API calls                        | ROS action servers, MoveIt! for motion planning                          |
| **5. Feedback Loop**                        | - Robot confirms plan or requests clarification<br>- Integrate multi-turn dialogues with the LLM                     | LangChain for multi-step interactions, Dialogflow, or custom logic       |

---

### Sample Extended Python Snippet Sketch

Here’s a sketch that extends your example with feedback and grounding placeholders:

```python
import openai

openai.api_key = "YOUR_API_KEY"

def parse_command(command_text):
    prompt = f"""
    You are an assistant that converts natural language robot commands into structured steps.
    Input command: "{command_text}"
    Output steps as JSON list of simple actions with parameters, e.g.:
    [
        {{"action": "move", "object": "red cup", "from": "table"}},
        {{"action": "place", "object": "red cup", "location": "sink"}}
    ]
    """
    response = openai.ChatCompletion.create(
        model="gpt-4",
        messages=[
            {"role": "system", "content": "You convert commands to robot action plans."},
            {"role": "user", "content": prompt}
        ],
        temperature=0,
        max_tokens=300,
    )
    steps_json = response['choices'][0]['message']['content']
    return steps_json

def ground_objects(steps, perception_data):
    # Example: Map "red cup" -> detected object ID from perception
    grounded_steps = []
    for step in steps:
        obj_name = step.get("object", None)
        if obj_name and obj_name in perception_data:
            step["object_id"] = perception_data[obj_name]["id"]
        grounded_steps.append(step)
    return grounded_steps

def validate_plan(steps):
    # Placeholder for plan plausibility checks (collision, reachability)
    # Return True if valid, else False
    return True

def execute_plan(steps, robot_interface):
    for step in steps:
        # Convert step to robot commands
        robot_interface.send_command(step)

def main(command_text):
    raw_plan = parse_command(command_text)
    print("Raw plan:", raw_plan)

    # Convert raw_plan string to JSON (handle errors!)
    import json
    try:
        steps = json.loads(raw_plan)
    except json.JSONDecodeError:
        print("Failed to parse plan JSON.")
        return

    # Perception data mockup
    perception_data = {
        "red cup": {"id": 101, "location": "table"},
        "sink": {"id": 202, "location": "sink_area"}
    }

    grounded_steps = ground_objects(steps, perception_data)
    if not validate_plan(grounded_steps):
        print("Plan validation failed: plan not executable.")
        return

    # Mock robot interface
    class RobotInterface:
        def send_command(self, cmd):
            print(f"Executing: {cmd}")

    robot_interface = RobotInterface()
    execute_plan(grounded_steps, robot_interface)
    print("Plan executed successfully.")

if __name__ == "__main__":
    user_command = "Pick up the red cup from the table and put it in the sink."
    main(user_command)
```

---

### Summary

- **LLM parses commands → structured plans**
- **Perception grounds language references → real objects**
- **Validation layer ensures feasibility**
- **Robot interface executes validated plan**
- **Feedback enables clarification and safety**

This modular approach lets you gradually improve each stage while testing in simulation or on real hardware.

---

If you want, I can also share:

- Sample ROS node templates bridging LLM output with robot motion planning
- Vision-language grounding demo codes
- Multi-turn clarification dialogs with LangChain
- Dataset preprocessing scripts for ALFRED or TEACh

Just say the word!