[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/kgweber-cwru/coding-with-ai-wn26/blob/main/week-1-llm-basics-and-api/concepts.ipynb)

# Week 1: Understanding LLMs and API Basics

## Learning Objectives
By the end of this session, you will:
- Understand how large language models work at a conceptual level
- Successfully make API calls to Vertex AI
- Understand key API parameters and their effects
- Build simple text generation scripts

## Part 1: How LLMs Work (Conceptual Overview)

### Tokens: The Building Blocks
- LLMs don't see words, they see **tokens**
- A token is roughly 3-4 characters or about 0.75 words
- "Hello world" ≈ 2-3 tokens
- This matters for cost and context limits!

### Prediction and Probability
- LLMs predict the next token based on all previous tokens
- They assign probabilities to many possible next tokens
- They don't "know" things - they predict statistically likely continuations
- Temperature controls randomness in selection

### Key Limitations
- No real-time information (knowledge cutoff dates)
- Can "hallucinate" plausible-sounding but false information
- Cannot count tokens or characters perfectly
- Context window limits (how much text they can "remember")

## Part 2: Setting Up Your Environment


In [None]:
import os
import sys
from pathlib import Path

IN_COLAB = "google.colab" in sys.modules

if IN_COLAB:
    !pip install -q google-genai google-auth python-dotenv
    from google.colab import auth
    auth.authenticate_user()
    try:
        PROJECT_ID = input("Enter your Google Cloud Project ID (press Enter to use default ADC): ").strip()
    except Exception:
        PROJECT_ID = ""
    if PROJECT_ID:
        os.environ["GOOGLE_CLOUD_PROJECT"] = PROJECT_ID
else:
    def find_service_account_json(max_up=6):
        p = Path.cwd()
        for _ in range(max_up):
            candidate = p / "series-2-coding-llms" / "creds"
            if candidate.exists():
                for f in candidate.glob("*.json"):
                    return str(f.resolve())
            candidate2 = p / "creds"
            if candidate2.exists():
                for f in candidate2.glob("*.json"):
                    return str(f.resolve())
            p = p.parent
        return None

    sa_path = find_service_account_json()
    if sa_path:
        os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = sa_path
    else:
        try:
            from dotenv import load_dotenv
            load_dotenv()
        except Exception:
            pass

In [2]:

import google.auth
from google import genai
from google.genai import types

creds, project = google.auth.default()
project = os.environ.get("GOOGLE_CLOUD_PROJECT", project)
client = genai.Client(vertexai=True, project=project, location="us-central1")
print(f"Using project: {project}")

print("✓ Environment loaded successfully!")

Using project: coding-with-ai-wn-26
✓ Environment loaded successfully!


## Part 3: Your First API Call

The basic structure of a Vertex AI API call:
```python
response = client.models.generate_content(
    model="gemini-2.5-flash",
    contents="Your prompt here"
)
```

In [3]:
# Simple completion
response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents="Say hello!"
)

print(response.text)

Hello there! It's nice to meet you. How can I help you today?


### Understanding the Response Object

Let's examine what the API returns:

In [4]:
# Make another call and explore the response
response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents="What is 2+2?"
)

print("Full response object:")
print(response)
print("\n" + "="*50 + "\n")

print("Just the content:")
print(response.text)
print("\n" + "="*50 + "\n")

print("Token usage:")
# Check if usage metadata is available
if response.usage_metadata:
    print(f"Prompt tokens: {response.usage_metadata.prompt_token_count}")
    print(f"Completion tokens: {response.usage_metadata.candidates_token_count}")
    print(f"Total tokens: {response.usage_metadata.total_token_count}")

Full response object:
sdk_http_response=HttpResponse(
  headers=<dict len=10>
) candidates=[Candidate(
  avg_logprobs=-0.05227931908198765,
  content=Content(
    parts=[
      Part(
        text='2 + 2 = 4'
      ),
    ],
    role='model'
  ),
  finish_reason=<FinishReason.STOP: 'STOP'>
)] create_time=datetime.datetime(2026, 1, 25, 19, 15, 10, 405261, tzinfo=TzInfo(0)) model_version='gemini-2.5-flash-lite' prompt_feedback=None response_id='vmt2aY3eGOmQi-8PzJqLyAw' usage_metadata=GenerateContentResponseUsageMetadata(
  candidates_token_count=7,
  candidates_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=7
    ),
  ],
  prompt_token_count=7,
  prompt_tokens_details=[
    ModalityTokenCount(
      modality=<MediaModality.TEXT: 'TEXT'>,
      token_count=7
    ),
  ],
  total_token_count=14,
  traffic_type=<TrafficType.ON_DEMAND: 'ON_DEMAND'>
) automatic_function_calling_history=[] parsed=None


Just the content:
2 + 2 = 4


Token 

## Part 4: Key API Parameters

### Temperature (0.0 to 2.0)
Controls randomness:
- **0.0**: Deterministic, always picks most likely token
- **0.7**: Balanced (default for most uses)
- **1.5+**: Very creative/random

In [5]:
# Let's compare different temperatures
prompt = "Complete this sentence: The best thing about learning to code is"

for temp in [0.0, 0.7, 1.5]:
    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",
        contents=prompt,
        config=types.GenerateContentConfig(
            temperature=temp,
            max_output_tokens=50
        )
    )
    print(f"Temperature {temp}:")
    print(response.text)
    print("\n" + "-"*50 + "\n")

Temperature 0.0:
Here are a few ways to complete that sentence, depending on the emphasis you want to make:

**Focusing on the creative aspect:**

* The best thing about learning to code is **the ability to bring your ideas to life and build something

--------------------------------------------------

Temperature 0.7:
Here are a few ways to complete that sentence, depending on the emphasis you want to place:

**Focusing on Empowerment & Creation:**

* The best thing about learning to code is **the power it gives you to build and create anything you

--------------------------------------------------

Temperature 1.5:
Here are a few ways to complete the sentence, each highlighting a different aspect of the "best thing" about learning to code:

**Focusing on creation and empowerment:**

* The best thing about learning to code is **the power to bring

--------------------------------------------------



### Max Tokens
Limits the length of the response. Important for cost control!

In [6]:
# Compare different max_output_tokens
prompt = "Explain what a large language model is."

for max_tok in [20, 50, 150]:
    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",
        contents=prompt,
        config=types.GenerateContentConfig(
            max_output_tokens=max_tok
        )
    )
    print(f"Max tokens: {max_tok}")
    print(response.text)
    if response.usage_metadata:
        print(f"Actual tokens used: {response.usage_metadata.candidates_token_count}")
    print("\n" + "-"*50 + "\n")

Max tokens: 20
Let's break down what a **Large Language Model (LLM)** is in an understandable way
Actual tokens used: 20

--------------------------------------------------

Max tokens: 50
Let's break down what a **Large Language Model (LLM)** is in an understandable way.

Imagine a super-powered digital brain that's been trained on an absolutely massive amount of text and code from the internet. That's
Actual tokens used: 50

--------------------------------------------------

Max tokens: 150
A **Large Language Model (LLM)** is a type of artificial intelligence (AI) program designed to understand, generate, and process human language. Think of it as a highly sophisticated and massive computer program that has learned the intricacies of language through exposure to vast amounts of text and code.

Here's a breakdown of what makes them "large" and what they do:

**1. "Large" - The Scale of Data and Parameters:**

*   **Massive Datasets:** LLMs are trained on truly enormous datasets. This

### System Instructions
Set the behavior and personality of the model:

In [7]:
# Without system message
response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents="What is DNA?"
)
print("Without system message:")
print(response.text)
print("\n" + "="*50 + "\n")

# With system instruction
response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents="What is DNA?",
    config=types.GenerateContentConfig(
        system_instruction="You are a biology teacher explaining concepts to 10-year-olds. Use simple language and fun analogies."
    )
)
print("With system message (10-year-old level):")
print(response.text)

Without system message:
**DNA, which stands for Deoxyribonucleic Acid, is the fundamental molecule of life.** It's like a blueprint or a set of instructions that tells every living organism how to develop, survive, and reproduce.

Here's a breakdown of what makes up DNA and its key functions:

**1. The Structure: A Double Helix**

*   **Nucleotides:** DNA is a polymer, meaning it's a long chain made of repeating units called nucleotides. Each nucleotide has three main components:
    *   **A Phosphate Group:** A phosphorus atom bonded to oxygen atoms.
    *   **A Deoxyribose Sugar:** A five-carbon sugar molecule.
    *   **A Nitrogenous Base:** These are the "letters" of the genetic code. There are four types of nitrogenous bases in DNA:
        *   **Adenine (A)**
        *   **Guanine (G)**
        *   **Cytosine (C)**
        *   **Thymine (T)**

*   **The "Ladder" Shape:** These nucleotides link together to form long strands. The sugar and phosphate groups alternate, forming the "b

## Part 5: Practical Examples

### Example 1: Text Summarization

In [8]:
long_text = """
Large language models are artificial intelligence systems trained on vast amounts of text data. 
They learn patterns in language by predicting the next word in a sequence. These models have billions 
of parameters and can generate human-like text, answer questions, write code, and perform various 
language tasks. They work by converting text into numerical representations called tokens, processing 
these tokens through neural network layers, and generating probability distributions for likely next tokens.
"""

response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents=long_text,
    config=types.GenerateContentConfig(
        system_instruction="Summarize the following text in one sentence.",
        temperature=0.3
    )
)

print("Summary:")
print(response.text)

Summary:
Large language models are AI systems trained on massive text datasets that use neural networks to process numerical representations of words and generate human-like text for various tasks.


### Example 2: Text Classification

In [9]:
def classify_sentiment(text):
    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",
        contents=text,
        config=types.GenerateContentConfig(
            system_instruction="Classify the sentiment of the text as: positive, negative, or neutral. Respond with only one word.",
            temperature=0,
            max_output_tokens=10
        )
    )
    return response.text.strip()

# Test it
test_texts = [
    "I love this new feature!",
    "This is the worst experience ever.",
    "The product arrived on time."
]

for text in test_texts:
    sentiment = classify_sentiment(text)
    print(f"Text: {text}")
    print(f"Sentiment: {sentiment}")
    print()

Text: I love this new feature!
Sentiment: Positive

Text: This is the worst experience ever.
Sentiment: Negative

Text: The product arrived on time.
Sentiment: Positive



### Example 3: Information Extraction

In [10]:
def extract_info(text):
    response = client.models.generate_content(
        model="gemini-2.5-flash-lite",
        contents=text,
        config=types.GenerateContentConfig(
            system_instruction="Extract the person's name, email, and phone number from the text. Format as: Name: X | Email: Y | Phone: Z",
            temperature=0
        )
    )
    return response.text

contact_text = "Hi, I'm John Smith. You can reach me at john.smith@email.com or call me at 555-123-4567."

extracted = extract_info(contact_text)
print("Extracted information:")
print(extracted)

Extracted information:
Name: John Smith | Email: john.smith@email.com | Phone: 555-123-4567


## Part 6: Cost Awareness

Understanding and tracking your API costs:

In [11]:
# Pricing 
# Gemini 2.5 Flash Lite ~$0.10 per 1M input tokens, 0.40 per 1M output

def estimate_cost(response, model="gemini-2.5-flash-lite"):
    """Estimate the cost of an API call"""
    # Example pricing (verify at cloud.google.com/vertex-ai/pricing)
    pricing = {
        "gemini-2.5-flash-lite": {"input": 0.10, "output": 0.40}, 
        "gemini-2.5-pro": {"input": 1.25, "output": 10}
    }
    
    # Handle unknown models or default
    if model not in pricing:
        model = "gemini-2.5-flash-lite"
        
    if not response.usage_metadata:
        return {"cost_usd": 0, "total_tokens": 0}

    input_tokens = response.usage_metadata.prompt_token_count
    output_tokens = response.usage_metadata.candidates_token_count
    total_tokens = response.usage_metadata.total_token_count

    input_cost = (input_tokens / 1_000_000) * pricing[model]["input"]
    output_cost = (output_tokens / 1_000_000) * pricing[model]["output"]
    total_cost = input_cost + output_cost
    
    return {
        "input_tokens": input_tokens,
        "output_tokens": output_tokens,
        "total_tokens": total_tokens,
        "cost_usd": total_cost,
        "cost_cents": total_cost * 100
    }

# Test it
response = client.models.generate_content(
    model="gemini-2.5-flash-lite",
    contents="Write a haiku about programming."
)

print(response.text)
print("\n" + "="*50)
cost_info = estimate_cost(response, model="gemini-2.5-flash-lite")
print(f"\nTokens used: {cost_info['total_tokens']}")
print(f"Estimated cost: ${cost_info['cost_usd']:.8f} (or {cost_info['cost_cents']:.6f} cents)")

Code flows like a stream,
Logic weaves a tapestry,
World is built with lines.


Tokens used: 26
Estimated cost: $0.00000830 (or 0.000830 cents)


## Key Takeaways

1. **LLMs predict tokens** based on probability, they don't "know" facts
2. **API structure** is simple: model + contents + parameters
3. **Temperature** controls randomness (0 = deterministic, higher = creative)
4. **max_output_tokens** limits response length and controls costs
5. **System instructions** shape the model's behavior
6. **Always monitor costs** - even small calls add up!

## Next Steps

In Week 2, we'll learn how to:
- Maintain conversation history
- Build multi-turn conversations
- Manage context effectively

Complete the assignment to practice these concepts!