
# Week 1 ‚Äî Introduction to AI Engineering

<a href="https://colab.research.google.com/github/tulane-intro-ai-engineering/main/blob/main/lectures/intro_lecture.ipynb" target="_blank">
  <img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open in Colab"/>
</a>

üìò **Theme:** From Algorithms ‚Üí Systems ‚Üí Reliability  


---

### **Learning Objectives**
By the end of this week, you will be able to:
1. Explain what *AI Engineering* means and how it differs from algorithmic AI.
2. Describe real-world successes and limitations of large language models (LLMs).
3. Interpret key failure types: hallucinations, bias, brittleness.
4. Build a simple mental model of how LLMs generate text.
5. Understand the *Unifying System Diagram* of LLM systems.
6. Run your first API call and reason about it scientifically.
7. Reflect on reliability, trust, and the iterative design mindset.


In [2]:
# @title Setup (Run this first)
!git clone --depth 1 -q https://github.com/tulane-intro-ai-engineering/main.git
import sys; sys.path.append("/content/main")
from course_utils import lab1_setup, show_mermaid

lab1_setup()

üîß Setting up your environment...
  ‚Üí Installing core packages...
installing mermaid-python
  ‚Üí Setting random seed for reproducible results...
  ‚Üí Checking API key...
üîë Enter your OpenAI API key.
   (It will only be stored in this Colab runtime - it's safe!)
   Get your key from: https://platform.openai.com/api-keys
OpenAI API key: ¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑¬∑
‚úÖ API key set.
  ‚Üí Adding course files to path...
‚úÖ Setup complete!
‚úÖ lab1_setup: environment ready.



## üß© **Day 1 ‚Äî What Does It Mean to Engineer AI?**
---
**Guiding question:**  
> How is AI Engineering different from building models, and why does that matter?




### Welcome & Motivation


> ‚ÄúWho here has used ChatGPT or another AI tool this week?‚Äù

<br><br><br><br><br><br>

> ‚ÄúWhen it worked well, why? When did it fail?‚Äù

<br><br><br><br><br><br>

> ‚ÄúThat gap ‚Äî between impressive and unreliable ‚Äî is what AI engineers work to close.‚Äù

**AI Engineering = Designing systems that are repeatable, safe, and measurable.**



### What Is AI Engineering (and How Is It Different)?

| Course | Focus | Core Question |
|:--------|:--------|:-------------|
| *Intro to AI* | Symbolic reasoning, search | ‚ÄúHow do we find the best move?‚Äù |
| *Intro to Deep Learning* | Model architectures | ‚ÄúHow does a CNN learn features?‚Äù |
| *NLP* | Linguistic representation | ‚ÄúHow can we classify text?‚Äù |
| **AI Engineering** | System reliability, safety | ‚ÄúHow can we make AI systems reliable and auditable?‚Äù |

> AI Engineering bridges models, systems, and people. Engineers design pipelines, test behaviors, and trade off between accuracy, latency, and safety.



### LLM Successes and Failures

We'll contrast **impressive** and **unreliable** examples to motivate why *engineering* matters.


In [3]:
# @title Example 1 ‚Äî Helpful Assistant
from openai import OpenAI
client = OpenAI()

response = client.responses.create(
    model="gpt-4o-mini",
    input="Explain how a neural network recognizes handwriting, in one paragraph."
)
display(response.output_text)

RateLimitError: Error code: 429 - {'error': {'message': 'You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.', 'type': 'insufficient_quota', 'param': None, 'code': 'insufficient_quota'}}

In [None]:
# @title Example 2 ‚Äî Confidently Wrong Model (Hallucination)
response = client.responses.create(
    model="gpt-3.5-turbo-0125",
    input="Explain in 50 words or less how the Great Wall of China blocks satellite signals"
)
display(response.output_text)
display("üí≠ Note: This is a hallucination ‚Äî the model confidently states something that is false.")
display("   The Great Wall does NOT block satellite signals. This is why we need to test and verify AI outputs!")

'The Great Wall of China is made of materials like stone and brick that contain minerals which interfere with satellite signals. As these minerals reflect and refract the signals, it creates noise and distortion, making it difficult for satellites to accurately receive and transmit data across the wall.'

'üí≠ Note: This is a hallucination ‚Äî the model confidently states something that is false.'

'   The Great Wall does NOT block satellite signals. This is why we need to test and verify AI outputs!'


### üî§ Building a Mental Model of LLMs

Think of an LLM as an *autocomplete engine on steroids* ‚Äî predicting what comes next, token by token.

**Key insight:** The model doesn't "know" facts ‚Äî it predicts what text is likely to come next based on patterns it learned from training data. This is why it can be both impressive and unreliable.


In [None]:
from IPython.display import Markdown
prompt = "Artificial intelligence is..."
response = client.responses.create(
    model="gpt-4o-mini",
    input=prompt
)
display(Markdown(response.output_text))

Artificial intelligence (AI) refers to the simulation of human intelligence in machines that are programmed to think and learn. AI systems can perform tasks that typically require human-like understanding, such as problem-solving, learning from experience, adapting to new information, and responding to complex requests.

AI encompasses various subfields, including:

1. **Machine Learning**: A method where algorithms learn from data to make decisions or predictions without being explicitly programmed.
  
2. **Natural Language Processing (NLP)**: The ability of machines to understand, interpret, and respond to human language.
  
3. **Computer Vision**: Enabling machines to interpret and make decisions based on visual input from the world (like images or videos).
  
4. **Robotics**: Combining AI with physical machines that can perform tasks autonomously or semi-autonomously.

AI has numerous applications, including virtual assistants, recommendation systems, autonomous vehicles, healthcare diagnostics, and more. It continues to evolve rapidly, influencing many aspects of everyday life and various industries.


### üß† The Unifying System Diagram

This 5-part diagram will guide us through the course:

**User Interaction ‚Üí Prompt & Control ‚Üí Tools & Augmentation ‚Üí Core LLM ‚Üí Output & Monitoring**



<!-- ![LLM System Diagram](https://github.com/tulane-intro-ai-engineering/main/blob/main/lectures/llm_workflow.png?raw=true) -->




In [None]:
# @title Detailed System Diagram

show_mermaid("""
graph TD
    subgraph User Interaction
    U["üë§ Users<br/>Queries / Inputs"]:::user --> IH["Input Handling<br/>‚Ä¢ Formatting<br/>‚Ä¢ Validation<br/>‚Ä¢ Safety Filters"]:::process
    end

    subgraph Prompt & Control
    IH --> PC("Prompt / Control<br/>‚Ä¢ Instructions<br/>‚Ä¢ Examples<br/>‚Ä¢ Constraints<br/>‚Ä¢ Parameters"):::control
    end

    subgraph Tools & Augmentation
    PC --> TF{"Tools / Functions<br/>‚Ä¢ External APIs"}:::tool
    PC --> RAG{"Retrieval (RAG)<br/>‚Ä¢ Embeddings<br/>‚Ä¢ Vector Store<br/>‚Ä¢ Top-k Search"}:::tool
    end

    subgraph Core LLM
    TF --> LLM["Core LLM<br/>‚Ä¢ Next-token generation<br/>‚Ä¢ Sampling<br/>‚Ä¢ Fine-tuned model"]:::model
    RAG --> LLM
    PC --> LLM
    end

    subgraph Output & Monitoring
    LLM --> OP["Output Processing<br/>‚Ä¢ Formatting<br/>‚Ä¢ Citations<br/>‚Ä¢ Refusals<br/>‚Ä¢ Trust Signals"]:::output --> O("Final Output"):::output
    O --> LM["Logging & Monitoring<br/>‚Ä¢ Prompts & Responses<br/>‚Ä¢ Metrics<br/>‚Ä¢ Drift Detection"]:::monitor
    end

    classDef user fill:#d1e7dd,stroke:#333,stroke-width:1px;
    classDef process fill:#e2e3e5,stroke:#333,stroke-width:1px;
    classDef control fill:#cfe2ff,stroke:#333,stroke-width:1px;
    classDef tool fill:#fff3cd,stroke:#333,stroke-width:1px;
    classDef model fill:#f8d7da,stroke:#333,stroke-width:1px;
    classDef output fill:#e9ecef,stroke:#333,stroke-width:1px;
    classDef monitor fill:#fefefe,stroke:#333,stroke-width:1px;
""")

**Where do you think reliability issues arise most often?**

<br><br><br>



### ü§ù Trust Activity

Scenario brainstorming (small groups):  
- Would you trust AI for medical advice? grading essays? writing policy?  
Mark which *stages* of the pipeline you‚Äôd trust vs. audit.

**Discussion:** What common patterns emerge?



### Course Overview & Lab 1 Preview

- Weekly rhythm: Tues = concept, Thurs = lab.  
- Labs = *mini scientific investigations.*  
- **Lab 1:** make your first API call, measure model behavior.

> ‚ÄúNext time, we‚Äôll talk directly to this system ‚Äî through an API.‚Äù



## üíª **Day 2 ‚Äî From Concept to Code: APIs and the Scientific Method**

**Guiding question:**  
> How do we interact with an AI system ‚Äî and test it like scientists?



### üåê What Is an API?

> An **API** (Application Programming Interface) is a set of rules and tools that lets one piece of software talk to another.

Analogy: ordering from a restaurant menu ‚Äî you don‚Äôt enter the kitchen, you make a request.

Diagram:
```
User ‚Üí Request (JSON) ‚Üí Server ‚Üí Model ‚Üí Response (JSON)
```


In [1]:
# Example API request
example_request = {
    "model": "gpt-4o-mini",
    "input": "What is the capital of Japan?",
    # "system" prompt, which is pre-pended to the input.
    "instructions": "You are a helpful assistant."
}
example_request

{'model': 'gpt-4o-mini',
 'input': 'What is the capital of Japan?',
 'instructions': 'You are a helpful assistant.'}


### ‚ö° Live Demo: Hello API


In [None]:
response = client.responses.create(**example_request)
print(response.output_text)
print("Tokens used:", response.usage.total_tokens)

The capital of Japan is Tokyo.
Tokens used: 32


In [None]:
print(response.model_dump_json(indent=2))

{
  "id": "resp_0e9892f328886b8e00696525a7144c8195be83be98c54c03fc",
  "created_at": 1768236455.0,
  "error": null,
  "incomplete_details": null,
  "instructions": "You are a helpful assistant.",
  "metadata": {},
  "model": "gpt-4o-mini-2024-07-18",
  "object": "response",
  "output": [
    {
      "id": "msg_0e9892f328886b8e00696525a8336c81958612b2188facdf01",
      "content": [
        {
          "annotations": [],
          "text": "The capital of Japan is Tokyo.",
          "type": "output_text",
          "logprobs": []
        }
      ],
      "role": "assistant",
      "status": "completed",
      "type": "message"
    }
  ],
  "parallel_tool_calls": true,
  "temperature": 1.0,
  "tool_choice": "auto",
  "tools": [],
  "top_p": 1.0,
  "background": false,
  "conversation": null,
  "max_output_tokens": null,
  "max_tool_calls": null,
  "previous_response_id": null,
  "prompt": null,
  "prompt_cache_key": null,
  "prompt_cache_retention": null,
  "reasoning": {
    "effort": nul

`client.responses.create(...)` is an wrapper for one **HTTP POST** call:

* It a request to **`POST https://api.openai.com/v1/responses`**
* The arguments (`model`, `input`, optional `instructions`, etc.) become a **JSON request body**
* The Open AI API key is sent in the **Authorization** header (Bearer token)
* OpenAI returns an **HTTP response** (e.g., `200 OK`) containing a structured JSON ‚Äúresponse‚Äù object; the result exposes the generated text as `response.output_text`



### üß™ Mini Experiment: Prompt Wording and Response Length

We'll test whether prompt phrasing changes the output length.


In [None]:
prompts = [
    "Explain AI engineering in one sentence.",
    "Explain AI engineering in one sentence using technical language."
]

for p in prompts:
    resp = client.responses.create(
        model="gpt-4o-mini",
        input=p)
    text = resp.output_text
    print(f"Prompt: {p}")
    display(Markdown(text))
    print(f"Length: {len(text)} characters\n")

Prompt: Explain AI engineering in one sentence.


AI engineering is the discipline of designing, developing, and deploying artificial intelligence solutions by integrating principles from computer science, data science, and software engineering to solve real-world problems.

Length: 224 characters

Prompt: Explain AI engineering in one sentence using technical language.


AI engineering is the discipline that encompasses the systematic design, development, and deployment of artificial intelligence systems, integrating algorithms, data architecture, and software engineering principles to create robust and scalable solutions.

Length: 256 characters




### üß≠ Responsible Iteration & Measurement

**AI engineers think in loops:**
1. Observe model behavior.
2. Adjust prompt or parameters.
3. Measure results.
4. Reflect and repeat.

This is not ‚Äúprompt hacking‚Äù ‚Äî it‚Äôs controlled experimentation.



### üß´ Introducing Lab 1

**What you'll do in Lab 1:**
- Make your first API call to an LLM
- **Experiment** with different system prompts (using the scientific method!)
- Build a simple web app with Gradio
- Observe how system prompts affect model behavior

**Lab structure:**
- **Setup** (clone repo + bootstrap)
- **Pre-Lab** (conceptual warmup)
- **Scientific Process** (Question ‚Üí Hypothesis ‚Üí Experiment ‚Üí Measurement)
- **Experiment** (test different system prompts)
- **Results & Reflection** (connect to reliability)

**Connection to today's lecture:** In the lab, you'll apply the scientific method we just discussed to test how system prompts change model outputs.



### üë©‚Äçüíª In-Class Lab Work

Students launch Lab 1 in Colab, test API, record first measurements.

**Exit Ticket:**  
> ‚ÄúWhat surprised you about the model‚Äôs response today?‚Äù



<details>
<summary>üßë‚Äçüè´ Instructor Notes</summary>

**Pacing:**  
- Demos should be quick; skip reruns if latency >15s.  
- Prioritize discussion over full code explanations.  
- If students struggle with setup, pause and debug as a group.

**Engagement Tips:**  
- Use polls for trust activities.  
- Encourage sharing examples of ‚Äúgood/bad‚Äù AI behavior.

**Extensions:**  
- Optional demo: temperature or top_p for creativity.  
- Ask students to predict which prompt will be longer *before* running it.
</details>
