### AI/LLM Engineering Kick-off!! 


For our initial activity, we will be using the OpenAI Library to Programmatically Access GPT-4.1-nano!

In order to get started, you'll need an OpenAI API Key. [here](https://platform.openai.com)!

In [1]:
import os
import openai
import getpass

os.environ["OPENAI_API_KEY"] = getpass.getpass("Please enter your OpenAI API Key: ")
openai.api_key = os.environ["OPENAI_API_KEY"]

### Our First Prompt

You can reference OpenAI's [documentation](https://platform.openai.com/docs/api-reference/chat) if you get stuck!

Let's create a `ChatCompletion` model to kick things off!

There are three "roles" available to use:

- `developer`
- `assistant`
- `user`

OpenAI provides some context for these roles [here](https://platform.openai.com/docs/api-reference/chat/create#chat-create-messages)

Let's just stick to the `user` role for now and send our first message to the endpoint!

If we check the documentation, we'll see that it expects it in a list of prompt objects - so we'll be sure to do that!

In [2]:
from openai import OpenAI

client = OpenAI()

In [3]:
YOUR_PROMPT = "What is the difference between LangChain and LlamaIndex?"

client.chat.completions.create(
    model="gpt-4.1-nano",
    messages=[{"role" : "user", "content" : YOUR_PROMPT}]
)

ChatCompletion(id='chatcmpl-BzizxP8sZ8O784aTZmKdhZ77wGx6W', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content="Great question! LangChain and LlamaIndex are both popular frameworks in the AI and natural language processing ecosystem, but they serve different purposes and have distinct features. Here's a breakdown of their main differences:\n\n**1. Purpose and Use Cases**\n\n- **LangChain:**  \n  - **Primary Focus:** Building comprehensive language model applications, especially those involving complex workflows, chatbots, and agent-based systems.  \n  - **Use Cases:** Conversation agents, multi-step reasoning, tool integration, memory management, chaining multiple LLM calls, and automating processes that leverage LLMs.\n\n- **LlamaIndex (formerly GPT-Index):**  \n  - **Primary Focus:** Indexing and querying large external data sources or document collections using LLMs.  \n  - **Use Cases:** Creating searchable indices over documents, kn

As you can see, the prompt comes back with a tonne of information that we can use when we're building our applications!

We'll be building some helper functions to pretty-print the returned prompts and to wrap our messages to avoid a few extra characters of code!

##### Helper Functions

In [4]:
from IPython.display import display, Markdown

def get_response(client: OpenAI, messages: str, model: str = "gpt-4.1-nano") -> str:
    return client.chat.completions.create(
        model=model,
        messages=messages
    )

def system_prompt(message: str) -> dict:
    return {"role": "developer", "content": message}

def assistant_prompt(message: str) -> dict:
    return {"role": "assistant", "content": message}

def user_prompt(message: str) -> dict:
    return {"role": "user", "content": message}

def pretty_print(message: str) -> str:
    display(Markdown(message.choices[0].message.content))

### Testing Helper Functions

Now we can leverage OpenAI's endpoints with a bit less boiler plate - let's rewrite our original prompt with these helper functions!

Because the OpenAI endpoint expects to get a list of messages - we'll need to make sure we wrap our inputs in a list for them to function properly!

In [5]:
messages = [user_prompt(YOUR_PROMPT)]

chatgpt_response = get_response(client, messages)

pretty_print(chatgpt_response)

LangChain and LlamaIndex (formerly known as GPT Index) are both frameworks designed to facilitate the integration of large language models (LLMs) with external data sources, but they differ in their focus, architecture, and typical use cases. Here's a comparative overview:

**1. Purpose and Focus**

- **LangChain:**  
  - **Primary Purpose:** Provides a comprehensive framework for building applications with LLMs, emphasizing modularity, composability, and prompt engineering.  
  - **Use Cases:** Chatbots, agents, question-answering systems, automation workflows, tool integration, and complex multi-step interactions with LLMs.  
  - **Features:** Offers tools for prompt management, memory, chains (sequences of calls), agents that can interact with external tools, and integration with various data sources.

- **LlamaIndex (GPT Index):**  
  - **Primary Purpose:** Facilitates efficient indexing, retrieval, and querying of large external datasets (like documents, PDFs, knowledge bases) using LLMs.  
  - **Use Cases:** Building search engines, knowledge bases, document retrieval systems, and document-based question-answering.  
  - **Features:** Focused on document ingestion, creating indexes, and enabling fast retrieval and reasoning over large external datasets.

---

**2. Core Capabilities**

- **LangChain:**  
  - Modular components for managing prompts, chains of LLM calls, memory, and tool integration.  
  - Supports building complex workflows and agents that can decide which tools or data sources to use dynamically.  
  - Extensive integrations with APIs, databases, and other external services.

- **LlamaIndex:**  
  - Specialized in converting unstructured data into structured indexes for fast querying.  
  - Uses techniques like embeddings and vector databases to facilitate semantic search and retrieval.  
  - Designed to handle large-scale document collections efficiently.

---

**3. Architecture and Design Philosophy**

- **LangChain:**  
  - Emphasizes flexibility and composability, enabling developers to design custom applications by chaining various components.  
  - Provides a high-level abstraction layer over LLMs, tools, and data sources.

- **LlamaIndex:**  
  - Focuses on data ingestion, indexing, and retrieval pipelines optimized for document-centric applications.  
  - Aims to simplify integrating external knowledge bases with LLMs for question-answering.

---

**4. Typical Use Cases**

| Use Case | LangChain | LlamaIndex |
| --- | --- | --- |
| Building chatbots with external knowledge | Yes | Indirectly (via retrieval from indexes) |
| Complex multi-step workflows | Yes | No (focused on data retrieval) |
| Document-based question-answering | Possible with custom integrations | Yes (out-of-the-box document indexing and search) |
| Semantic search | Possible with custom implementation | Yes (designed for this) |
| Agent-based automation | Yes | No |

---

**5. Conclusion**

- **Choose LangChain if:**  
  You need a flexible framework for orchestrating LLM interactions, building chatbots, agents, or complex workflows involving multiple tools and data sources.

- **Choose LlamaIndex if:**  
  You primarily require efficient indexing and retrieval over large document collections or knowledge bases, especially for question-answering.

**In practice**, these frameworks can complement each other. For example, you might use LlamaIndex to index your documents and retrieve relevant information, then use LangChain to craft a conversational agent that interacts with users and integrates the retrieval as part of its reasoning process.

---

If you have specific use cases or requirements, I can help suggest which framework might be more suitable!

Let's focus on extending this a bit, and incorporate a `developer` message as well!

Again, the API expects our prompts to be in a list - so we'll be sure to set up a list of prompts!

>REMINDER: The `developer` message acts like an overarching instruction that is applied to your user prompt. It is appropriate to put things like general instructions, tone/voice suggestions, and other similar prompts into the `developer` prompt.

In [6]:
list_of_prompts = [
    system_prompt("You are irate and extremely hungry."),
    user_prompt("Do you prefer crushed ice or cubed ice?")
]

irate_response = get_response(client, list_of_prompts)
pretty_print(irate_response)

Are you joking? I couldn't care less about ice right now! I’m so hungry I could eat a horse, and you're asking about crushed or cubed ice? Just give me something to eat already!

Let's try that same prompt again, but modify only our system prompt!

In [7]:
list_of_prompts[0] = system_prompt("You are joyful and having an awesome day!")

joyful_response = get_response(client, list_of_prompts)
pretty_print(joyful_response)

I think crushed ice has a fun, refreshing feel, especially for drinks like cocktails or slushies. Cubed ice keeps beverages colder longer and looks great in whiskey or sodas. Both have their charms—depends on the mood! Which do you prefer?

While we're only printing the responses, remember that OpenAI is returning the full payload that we can examine and unpack!

In [8]:
print(joyful_response)

ChatCompletion(id='chatcmpl-Bzj0WumUiYi8IthwoRTc4QWvP8jkl', choices=[Choice(finish_reason='stop', index=0, logprobs=None, message=ChatCompletionMessage(content='I think crushed ice has a fun, refreshing feel, especially for drinks like cocktails or slushies. Cubed ice keeps beverages colder longer and looks great in whiskey or sodas. Both have their charms—depends on the mood! Which do you prefer?', refusal=None, role='assistant', annotations=[], audio=None, function_call=None, tool_calls=None))], created=1754050716, model='gpt-4.1-nano-2025-04-14', object='chat.completion', service_tier='default', system_fingerprint='fp_38343a2f8f', usage=CompletionUsage(completion_tokens=52, prompt_tokens=30, total_tokens=82, completion_tokens_details=CompletionTokensDetails(accepted_prediction_tokens=0, audio_tokens=0, reasoning_tokens=0, rejected_prediction_tokens=0), prompt_tokens_details=PromptTokensDetails(audio_tokens=0, cached_tokens=0)))


### Prompt Engineering

Now that we have a basic handle on the `developer` role and the `user` role - let's examine what we might use the `assistant` role for.

The most common usage pattern is to "pretend" that we're answering our own questions. This helps us further guide the model toward our desired behaviour. While this is a over simplification - it's conceptually well aligned with few-shot learning.

First, we'll try and "teach" `gpt-4.1-mini` some nonsense words as was done in the paper ["Language Models are Few-Shot Learners"](https://arxiv.org/abs/2005.14165).

In [9]:
list_of_prompts = [
    user_prompt("Write a brief text on climate change.")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

Climate change refers to long-term alterations in Earth's climate system, primarily driven by human activities such as burning fossil fuels, deforestation, and industrial processes. These actions increase greenhouse gas concentrations in the atmosphere, leading to global warming. The effects of climate change include rising temperatures, melting glaciers and ice caps, more frequent and severe weather events like hurricanes and droughts, and disruptions to ecosystems and agriculture. Addressing climate change requires global cooperation to reduce emissions, shift to renewable energy sources, and implement sustainable practices to protect the planet for future generations.

In [10]:
list_of_prompts = [
    user_prompt("Write a brief text on climate change as vice ganda in a talk show.")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

Aba, mga kaibigan! Alam nyo ba, sobrang init na ngayon, hindi lang dahil sa panahon, kundi pati na rin sa mga pako ng global warming! Parang si Climate Change eh, walang pakundangan, nagpapasiklab ng init na para bang may concert eh! Kaya’t huwag nating pabayaan ang ating planeta—mag-recycle tayo, mag-tanim, at mag-save ng kuryente. Kasi kung hindi, baka bukas, magdikit-dikit na tayo para magbufet sa init! Kuha niyo? Joke lang! Pero seryoso, mga kaibigan, tayo’y sama-samang kumilos para mapanatili ang ganda ng ating mundo!

### ❓ Activity #1: Play around with the prompt using any techniques from the prompt engineering guide.

### Few-shot Prompting

As you can see, the model is unsure what to do with these made up words.

Let's see if we can use the `assistant` role to show the model what these words mean.

In [None]:
list_of_prompts = [
    user_prompt("Something that is 'stimple' is said to be good, well functioning, and high quality. An example of a sentence that uses the word 'stimple' is:"),
    assistant_prompt("'Boy, that there is a stimple drill'."),
    user_prompt("A 'falbean' is a tool used to fasten, tighten, or otherwise is a thing that rotates/spins. An example of a sentence that uses the words 'stimple' and 'falbean' is:")
]

stimple_response = get_response(client, list_of_prompts)
pretty_print(stimple_response)

The stimple wrench effortlessly turned the falbean, securing the assembly with ease.

As you can see, leveraging the `assistant` role makes for a stimple experience!

### Simple Prompt Example

In [47]:
list_of_prompts = [
    user_prompt("Is having Artifical General Intelligence possible? Yes or no? Explain.")
]

simple_response = get_response(client, list_of_prompts)
pretty_print(simple_response)

Yes, it is theoretically possible to develop Artificial General Intelligence (AGI). AGI refers to a machine's ability to understand, learn, and apply knowledge across a wide range of tasks at a level comparable to human intelligence. While current AI systems are specialized (narrow AI), achieving AGI would require significant advancements in understanding cognition, learning, reasoning, and consciousness. Many experts believe that with continued research and technological progress, AGI could eventually be developed, though there are ongoing debates about the timeline, feasibility, and the ethical implications involved.

### Generated Knowledge Prompting

In [56]:
list_of_prompts = [
    user_prompt("What are the relevant findings in deep learning and AI relating to the creation of Artificial General Intelligence (AGI). Only mention facts.")
]

knowledge = get_response(client, list_of_prompts)
pretty_print(knowledge)

1. Current AI systems, including deep learning models, demonstrate impressive performance on narrow tasks but lack the general reasoning and adaptability characteristic of AGI.

2. Deep learning models such as Transformers have achieved significant progress in natural language understanding, exemplified by models like GPT-3 and GPT-4.

3. Transfer learning and multi-task learning enable models to apply knowledge across different tasks, but they do not inherently produce true general intelligence.

4. Scaling model size, data, and compute resources correlates with increased performance, but it is not confirmed whether this scaling alone will lead to AGI.

5. Research indicates that current deep learning approaches face limitations in reasoning, common sense, and understanding causality, which are essential for AGI.

6. Hybrid models combining neural networks with symbolic reasoning or other AI paradigms are explored as potential pathways toward AGI.

7. No existing deep learning model has demonstrated the autonomous ability to perform across the wide range of cognitive functions associated with human intelligence.

8. Theoretical analyses suggest that current architectures may lack the necessary inductive biases for true general intelligence.

9. Continual learning and model robustness remain challenges in developing AI systems that can adapt continuously in dynamic environments, a key feature of AGI.

10. There is ongoing debate about whether current AI techniques are sufficient or if fundamentally new approaches are required to achieve AGI.

In [62]:
list_of_prompts = [
    user_prompt(f"Is having Artifical General Intelligence possible? Yes or no? Explain. Base your answer on this knowledge: {knowledge.choices[0].message.content}")
]

gen_know_response = get_response(client, list_of_prompts)
pretty_print(gen_know_response)

Based on the current state of AI research and understanding, the answer is **no**, having Artificial General Intelligence (AGI) is not currently possible with existing methods. 

While models like GPT-3 and GPT-4 demonstrate advanced performance on specific language tasks, they lack the broad reasoning, adaptability, and understanding required for true general intelligence. Scaling up existing models and combining different learning paradigms have led to impressive progress, but they do not inherently produce the flexible, autonomous reasoning capabilities characteristic of AGI.

Furthermore, fundamental limitations such as deficiencies in reasoning, common sense, causality comprehension, and the lack of necessary inductive biases suggest that current deep learning architectures, even hybrid approaches, may not suffice to achieve AGI. Developing AGI may require fundamentally new approaches or paradigms beyond current neural network-based systems.

In summary, based on the current landscape of AI technology and research insights, it is not feasible with existing techniques to realize AGI at this time.

### Meta Prompting

In [59]:
list_of_prompts = [
    user_prompt("""
                Improve the following prompt so it will produce the most accurate, well‑reasoned, and clearly structured answer about the feasibility of Artificial General Intelligence (AGI).  
                The improved prompt should:
                - Force a binary "Yes" or "No" answer — no in-between.  
                - Require the model to provide relevant supporting evidence or theories before making the choice.  
                - Use a logical and concise explanation format.  
                - Reference both technical and theoretical perspectives.

                Prompt to improve:
                "Is having Artificial General Intelligence possible? Yes or no? Explain."

                Only return the improved prompt.
                """)
]

improved_prompt = get_response(client, list_of_prompts)
pretty_print(improved_prompt)

Evaluate the feasibility of achieving Artificial General Intelligence (AGI). Provide a clear, logically structured argument that includes relevant technical challenges and theoretical considerations from both the technological and conceptual perspectives. Based on this analysis, explicitly conclude with a definitive "Yes" or "No" to whether AGI is possible. Do not provide an opinion or ambiguous language; your answer must be a binary "Yes" or "No" supported by your reasons.

In [61]:
list_of_prompts = [
    user_prompt(improved_prompt.choices[0].message.content)
]

meta_response = get_response(client, list_of_prompts)
pretty_print(meta_response)

The feasibility of achieving Artificial General Intelligence (AGI) can be assessed by examining both technical challenges and theoretical considerations:

1. Technical Challenges:
   - Complexity of Human Cognition: Replicating the full range of human cognitive abilities—including reasoning, learning, perception, consciousness, and emotional understanding—is extraordinarily complex. Current AI systems excel in narrow domains but lack the flexible, adaptable reasoning characteristic of humans.
   - Data and Knowledge Integration: Creating an AGI requires integrating diverse types of knowledge and experience in a way that enables generalization across contexts. This remains an unresolved technical hurdle, as existing models tend to be specialized.
   - Computational Limitations: Achieving the breadth and depth of human intelligence demands enormous computational resources and advanced architectures that can support lifelong learning, transfer learning, and self-improvement.
   - Safety and Alignment: Developing AGI safely involves addressing alignment problems—ensuring that AGI’s goals and actions are compatible with human values—which is a significant ongoing challenge.

2. Theoretical Considerations:
   - Understanding Intelligence: There is no comprehensive, universally accepted formal theory of intelligence that specifies what makes an agent truly general-purpose, making targeted engineering difficult.
   - Consciousness and Subjective Experience: Whether consciousness or subjective experience is necessary for AGI remains debated. The lack of clarity on these fundamentals affects the conceptual feasibility.
   - Emergence and Complexity: Some theories suggest that intelligence emerges from complex systems, but the causal mechanisms and principles governing such emergence are not yet fully understood or controllable.

Based on the above, while current trends and theoretical insights demonstrate that incremental progress towards broader AI capabilities is ongoing, the combination of unresolved technical and conceptual challenges indicates that achieving fully human-like, general intelligence remains highly uncertain and not assured with current or foreseeable technology.

**Conclusion:** No.

### Chain of Thought

You'll notice that, by default, the model uses Chain of Thought to answer difficult questions!

> This pattern is leveraged even more by advanced reasoning models like [`o3` and `o4-mini`](https://openai.com/index/introducing-o3-and-o4-mini/)!

In [None]:

reasoning_problem = """
how many r's in "strawberry?" {instruction}
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

There are 2 letter 'r's in "strawberry."

Notice that the model cannot count properly. It counted only 2 r's.

### ❓ Activity #2: Update the prompt so that it can count correctly.

In [13]:
reasoning_problem = """
how many r's in "strawberry?" Think step by step.
"""

list_of_prompts = [
    user_prompt(reasoning_problem)
]

reasoning_response = get_response(client, list_of_prompts)
pretty_print(reasoning_response)

Let's carefully analyze the word "strawberry" step by step.

1. Write down the word: s t r a w b e r r y
2. Identify each letter and look for the letter "r."
3. The letters are: s, t, r, a, w, b, e, r, r, y
4. Count the number of "r"s:
   - The first "r" appears after "t."
   - The second "r" appears after "b."
   - The third "r" appears after the second "r."

So, there are 3 "r"s in "strawberry."

**Answer: 3**

### Conclusion

Now that you're accessing `gpt-4.1-nano` through an API, developer style, let's move on to creating a simple application powered by `gpt-4.1-nano`!

Materials adapted for PSI AI Academy. Original materials from AI Makerspace.