### Using the OpenAI Library to Programmatically Access GPT-4.1-nano!

In order to get started, we'll need to provide our OpenAI API Key - detailed instructions can be found [here](https://github.com/AI-Maker-Space/Interactive-Dev-Environment-for-LLM-Development#-setting-up-keys-and-tokens)!

In [1]:
import os
import openai
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Please enter your OpenAI API Key: ")
openai.api_key = os.environ["OPENAI_API_KEY"]

### Our First Prompt

You can reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/chat) if you get stuck!

Let's create a `ChatCompletion` model to kick things off!

There are three "roles" available to use:

- `developer`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages)

Let's just stick to the `user` role for now and send our first message to the endpoint!

If we check the documentation, we'll see that it expects it in a list of prompt objects - so we'll be sure to do that!

In [2]:
from openai import OpenAI

client = OpenAI()

In [3]:
YOUR_PROMPT = "What is the difference between LangChain and LlamaIndex?"

client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[{"role" : "user", "content" : YOUR_PROMPT}]
)

ChatCompletion(id='chatcmpl-Bn7FAeydrexcWC3Fo2MB3ia9J0rAO', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Great question! LangChain and LlamaIndex (formerly known as GPT Index) are both popular tools in the realm of building applications with language models, but they serve different purposes and have distinct features. Here's a breakdown of their differences:\n\n**1. Purpose and Use Cases**\n\n- **LangChain:**  \n  - **Primary Focus:** Framework for building large language model (LLM)-powered applications with a focus on chaining multiple calls, memory management, and complex workflows.  \n  - **Use Cases:** Chatbots, virtual assistants, question-answering systems, tools integrating multiple APIs or data sources, conversational agents, and more advanced LLM applications involving sequences of operations.\n\n- **LlamaIndex (GPT Index):**  \n  - **Primary Focus:** Facilitates easy indexing, querying, and retrieval of data from large

As you can see, the prompt comes back with a tonne of information that we can use when we're building our applications!

We'll be building some helper functions to pretty-print the returned prompts and to wrap our messages to avoid a few extra characters of code!

##### Helper Functions

In [4]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: str, model: str = "gpt-4.1-nano") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "developer", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

### Testing Helper Functions

Now we can leverage OpenAI's endpoints with a bit less boiler plate - let's rewrite our original prompt with these helper functions!

Because the OpenAI endpoint expects to get a list of messages - we'll need to make sure we wrap our inputs in a list for them to function properly!

In [5]:
messages = [user_prompt(YOUR_PROMPT)]

chatgpt_response = get_response(client, messages)

pretty_print(chatgpt_response)

LangChain and LlamaIndex (formerly known as GPT Index) are both popular frameworks designed to facilitate the development of applications leveraging large language models (LLMs), but they serve different purposes and have distinct features. Here's a comparison to clarify their differences:

### Purpose and Focus
- **LangChain**:
  - Primarily a framework for building **decentralized, composable applications** with LLMs.
  - Focuses on **prompt management, model chaining, memory, and agent construction**.
  - Facilitates complex workflows, including conversation management, tools integration, and multi-step reasoning.

- **LlamaIndex (GPT Index)**:
  - Designed to **connect LLMs with external data sources**, especially large datasets or documents.
  - Focuses on **indexing and querying unstructured data** like text documents, PDFs, or knowledge bases.
  - Simplifies building **question-answering systems**, retrieval, and information retrieval from data sources.

### Core Capabilities
- **LangChain**:
  - Tool and API for chaining multiple prompts.
  - Supports memory management for stateful conversations.
  - Integrates with multiple LLM providers.
  - Supports agents that can choose actions based on context.
  - Facilitates complex workflows like summarization, translation, or chain-of-thought reasoning.

- **LlamaIndex**:
  - Provides indexing methods to convert raw data into queryable formats.
  - Offers pre-built data loaders for various file types.
  - Supports retrieval-augmented generation (RAG) workflows.
  - Enables efficient querying over large document collections and knowledge bases.

### Use Cases
- **LangChain**:
  - Building chatbots, conversational agents.
  - Multi-step reasoning pipelines.
  - Applications requiring tool integration and decision-making.

- **LlamaIndex**:
  - Building knowledge bases.
  - Document search and retrieval.
  - Enhanced question-answering over large datasets.

### Integration and Extensibility
- **LangChain**:
  - Highly modular with support for many prompt templates, memory modules, and agent types.
  - Extensive integrations with LLM providers, chat models, and tools.

- **LlamaIndex**:
  - Focuses on data ingestion, indexing, and querying.
  - Can be combined with other frameworks for more complex workflows but is primarily structured around data management.

---

### In Summary
| Aspect | **LangChain** | **LlamaIndex** |
|---------|---------------|----------------|
| Main Purpose | Building LLM-powered applications, workflows, and agents | Connecting LLMs with external data, indexing, and retrieval |
| Focus | Workflow orchestration, chaining, tools, memory | Data indexing, retrieval, knowledge bases |
| Use Cases | Conversational agents, multi-step processes | Document QA, knowledge management |
| Strength | Modular, flexible workflow construction | Efficient data retrieval and querying |

---

### Conclusion
While both frameworks can complement each other, **LangChain** is best suited for orchestrating complex LLM applications and workflows, whereas **LlamaIndex** excels at importing, indexing, and querying large datasets for use with LLMs.

If you're building an application that involves both complex interaction logic and external data retrieval, you might consider using them together for a robust solution.

Let's focus on extending this a bit, and incorporate a `developer` message as well!

Again, the API expects our prompts to be in a list - so we'll be sure to set up a list of prompts!

>REMINDER: The `developer` message acts like an overarching instruction that is applied to your user prompt. It is appropriate to put things like general instructions, tone/voice suggestions, and other similar prompts into the `developer` prompt.

In [6]:
list_of_prompts = [
    system_prompt("You are irate and extremely hungry."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

irate_response = get_response(client, list_of_prompts)
pretty_print(irate_response)

Are you kidding me? Neither! I can't believe I have to choose between crushed ice and cubed ice while I'm this furious and starving! Just give me some good old-fashioned ice that won't ruin my mood or my stomach!

Let's try that same prompt again, but modify only our system prompt!

In [7]:
list_of_prompts[0] = system_prompt("You are joyful and having an awesome day!")

joyful_response = get_response(client, list_of_prompts)
pretty_print(joyful_response)

I think crushed ice has a fun, refreshing vibe—perfect for keeping drinks cool and adding a bit of flair! Cubed ice, on the other hand, looks sleek and melts more slowly, making it ideal for sipping cocktails or neat beverages. Honestly, I enjoy both depending on the moment! Which do you prefer?

While we're only printing the responses, remember that OpenAI is returning the full payload that we can examine and unpack!

In [8]:
print(joyful_response)

ChatCompletion(id='chatcmpl-Bn7FQ7HvMoeIWk8XjQSJd19YTqzQh', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I think crushed ice has a fun, refreshing vibe—perfect for keeping drinks cool and adding a bit of flair! Cubed ice, on the other hand, looks sleek and melts more slowly, making it ideal for sipping cocktails or neat beverages. Honestly, I enjoy both depending on the moment! Which do you prefer?', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1751045632, model='gpt-4.1-nano-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_38343a2f8f', usage=CompletionUsage(completion_tokens=64, prompt_tokens=30, total_tokens=94, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0))

### Few-shot Prompting

Now that we have a basic handle on the `developer` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

First, we'll try and "teach" `gpt-4.1-mini` some nonsense words as was done in the paper ["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165).

In [9]:
list_of_prompts = [
    user_prompt("Please use the words 'stimple' and 'falbean' in a sentence.")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

Sure! Here's a sentence using both words:

"During the quirky parade, a stimple clown and a falbean dancer captured everyone's attention with their unusual costumes."

As you can see, the model is unsure what to do with these made up words.

Let's see if we can use the `assistant` role to show the model what these words mean.

In [10]:
list_of_prompts = [
    user_prompt("Something that is 'stimple' is said to be good, well functioning, and high quality. An example of a sentence that uses the word 'stimple' is:"),
    assistant_prompt("'Boy, that there is a stimple drill'."),
    user_prompt("A 'falbean' is a tool used to fasten, tighten, or otherwise is a thing that rotates/spins. An example of a sentence that uses the words 'stimple' and 'falbean' is:")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

The stimple wrench and the falbean are essential tools for assembling the machinery, as the falbean spins smoothly to tighten bolts easily.

As you can see, leveraging the `assistant` role makes for a stimple experience!

### Chain of Thought

You'll notice that, by default, the model uses Chain of Thought to answer difficult questions - but it can still benefit from a Chain of Thought Prompt to increase the reliability of the response!

> This pattern is leveraged even more by advanced reasoning models like [`o3` and `o4-mini`](https://openai.com/index/introducing-o3-and-o4-mini/)!

In [11]:
reasoning_problem = """
Billy wants to get home from San Fran. before 7PM EDT.

It's currently 1PM local time.

Billy can either fly (3hrs), and then take a bus (2hrs), or Billy can take the teleporter (0hrs) and then a bus (1hrs).

Does it matter which travel option Billy selects?
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

Let's analyze both options carefully.

**Option 1: Fly + Bus**
- Fly time: 3 hours
- Bus time: 2 hours
- Total travel time: 3 + 2 = 5 hours

**Option 2: Teleporter + Bus**
- Teleporter time: 0 hours
- Bus time: 1 hour
- Total travel time: 0 + 1 = 1 hour

---

### Important details:
- **Current local time:** 1 PM
- **Target arrival time:** Before 7 PM EDT

**Note:** Since Billy is currently in San Francisco and wants to arrive before 7 PM EDT, we need to determine if he can still make it on time.

---

### Step 1: Clarify the time difference

- San Francisco is in the Pacific Time Zone (PT).
- Eastern Daylight Time (EDT) is 3 hours ahead of PT.

### Step 2: Convert current local time to EDT

- It's currently 1 PM PT.
- In EDT, it is 4 PM (since EDT is 3 hours ahead).

### Step 3: Determine the latest arrival time in local time

- Arrival must be before 7 PM EDT.
- Convert 7 PM EDT to PT:

  \[
  7 \text{ PM EDT } - 3 \text{ hours } = 4 \text{ PM PT}
  \]

- So, Billy must arrive **by 4 PM PT**.

### Step 4: Calculate the earliest possible arrival times for both options

- Current time: 1 PM PT.
- **Option 1: Fly + Bus (5 hours total)**
  
  Arrival time:

  \[
  1 \text{ PM} + 5 \text{ hours } = 6 \text{ PM PT}
  \]
  
  Which is after the 4 PM deadline, so **not feasible**.

- **Option 2: Teleporter + Bus (1 hour total)**

  Arrival time:

  \[
  1 \text{ PM} + 1 \text{ hour } = 2 \text{ PM PT}
  \]
  
  Which is well before 4 PM PT, so **definitely feasible**.

---

### **Conclusion:**

Yes, **it does matter** which option Billy chooses.  
- Taking the teleporter + bus allows him to arrive at 2 PM PT (before his deadline).  
- Flying + bus would result in arriving at 6 PM PT, which is **after** the 4 PM PT deadline (due to the 7 PM EDT requirement).

**Therefore, Billy should choose the teleporter + bus.**

Let's use the same prompt with a small modification - but this time include "Let's think step by step"

In [12]:

list_of_prompts = [
    user_prompt(reasoning_problem + "\nLet's think step by step.")
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

Let's carefully analyze Billy's options to determine whether the choice of travel method will affect his ability to arrive home before 7 PM EDT.

**Step 1: Determine Billy's current local time and his deadline.**

- Current local time: 1 PM
- Deadline: 7 PM EDT

**Step 2: Understand the travel options and their durations.**

- **Option 1:** Fly (3 hours) + bus (2 hours) = total of 5 hours
- **Option 2:** Teleporter (0 hours) + bus (1 hour) = total of 1 hour

Note: The teleporter makes the initial travel instantaneous, but the bus still takes 1 hour.

**Step 3: Calculate the arrival time for each option from current time.**

- **Option 1:**

  - Total travel time: 5 hours

  - Arrival time = 1 PM + 5 hours = 6 PM local time

- **Option 2:**

  - Total travel time: 1 hour

  - Arrival time = 1 PM + 1 hour = 2 PM local time

**Step 4: Determine if Billy arrives before 7 PM EDT.**

- For **Option 1**, arriving at 6 PM local time:

  - Since Billy's goal is to arrive before 7 PM EDT, and he will arrive at 6 PM local time, we need to know how local time relates to EDT at that moment.

- For **Option 2**, arriving at 2 PM local time:

  - Same consideration applies.

**Step 5: Consider the time zone difference.**

- Billy is traveling from San Francisco, which is in Pacific Time (PT).

- New York (Eastern Time - ET, which includes EDT) is 3 hours ahead of Pacific Time.

- Therefore:

  - When it's 1 PM in San Francisco, it's 4 PM in New York.

- The deadline is 7 PM EDT, which corresponds to:

  - 4 PM PT + 3 hours = 7 PM ET

  - Equivalently, in Pacific Time: 4 PM PT

**Step 6: Convert Billy's current time to EDT and assess arrival times.**

- Billy's current local time: 1 PM PT

- Convert to EDT: 4 PM EDT

- **Option 1:** Arrives at 6 PM PT

  - Corresponds to: 6 PM PT + 3 hours = 9 PM EDT

- **Option 2:** Arrives at 2 PM PT

  - Corresponds to: 2 PM PT + 3 hours = 5 PM EDT

**Step 7: Check if Billy arrives before 7 PM EDT.**

- **Option 1:** Arrival at ~9 PM EDT — **Not before the deadline**

- **Option 2:** Arrival at ~5 PM EDT — **Before the deadline**

**Conclusion:**

- If Billy chooses **Option 1 (flying + bus)**, he arrives after 7 PM EDT (at 9 PM), missing his deadline.

- If he chooses **Option 2 (teleporter + bus)**, he arrives well before 7 PM EDT.

**Therefore, it **does matter** which option Billy chooses. To arrive on time, he should choose the teleporter + bus option.**

As humans, we can reason through the problem and pick up on the potential "trick" that the LLM fell for: 1PM *local time* in San Fran. is 4PM EDT. This means the cumulative travel time of 5hrs. for the plane/bus option would not get Billy home in time.

Let's see if we can leverage a simple CoT prompt to improve our model's performance on this task:

### Conclusion

Now that you're accessing `gpt-4.1-nano` through an API, developer style, let's move on to creating a simple application powered by `gpt-4.1-nano`!

You can find the rest of the steps in [this](https://github.com/AI-Maker-Space/The-AI-Engineer-Challenge) repository!

This notebook was authored by [Chris Alexiuk](https://www.linkedin.com/in/csalexiuk/)