# DS2002 — What an API Is (and How to Use One)
## 2026-02-04 — APIs in Notebooks — Lecture

In this course, we care about **data science systems**: how data moves from where it is created to where it is used.

So far, you have worked with:
- data that was already on your machine
- data you loaded into SQLite yourself

In professional work, you often need data that lives somewhere else:
- a company's servers
- a government dataset service
- a cloud platform
- a public data provider

That is where APIs come in.

This lecture is deliberately detailed. The goal is not just to “get something working,” but to build a mental model you can reuse.



## 1. The Problem APIs Solve

Imagine you want the same piece of information repeatedly, for many different inputs.

Examples:
- “Give me a list of universities in a country.”
- “Give me today’s weather for a city.”
- “Give me the last 30 days of stock prices.”
- “Give me all orders placed yesterday.”

You *could* download a file every time and re-import it, but that does not scale and does not stay current.

Instead, modern systems expose data through a controlled, structured interface:
- you send a request
- you receive a response
- the data is formatted consistently
- the provider can apply rules (limits, authentication, logging)

That interface is an API.



## 2. What an API Is (in Plain English)

API stands for **Application Programming Interface**.

That sounds abstract, so use this definition for this course:

An API is a **published way to ask a system for data or actions**, using a structured request format that the system agrees to support.

When we say “call an API,” we mean:
- your program sends a request in a standard way
- the service answers with data (often JSON)

Most APIs we use in data work today are **web APIs**, meaning the request travels over the internet using HTTP.



## 3. The Restaurant Model 

A restaurant is a good analogy because it includes all the important ideas:

- The **menu** is the documentation: it tells you what you can ask for.
- You don’t walk into the kitchen; you place an **order**.
- The waiter takes your request to the system and brings back a **response**.
- The restaurant can say “no” if you ask for something unsupported.
- If you come too often, they can slow you down or refuse service.

APIs work the same way:
- the provider decides what is available
- you must request it in the format they accept
- you receive a structured response



## 4. HTTP: The Transport Layer for Web APIs

Most web APIs use **HTTP**, the same protocol your browser uses.

When you visit a webpage, your browser sends an HTTP request.

When you call an API, your Python code sends an HTTP request.

The difference is mainly in *what comes back*:
- webpages return HTML meant for humans
- APIs return machine-readable data (usually JSON)

In this course, we will focus on HTTP `GET` requests because they are the most common way to retrieve data.



## 5. Anatomy of a Request (URL + Method + Optional Extras)

A typical API request includes:

1. **Method** (GET, POST, etc.)
2. **URL** (where to send the request)
3. **Query parameters** (extra inputs in the URL)
4. **Headers** (metadata, often authentication)
5. **Body** (for POST/PUT, usually not needed for simple GET)

We will learn each part by actually making requests.



## 6. Kaggle Note: Internet Access

Many Kaggle notebooks require you to turn internet on.

If a request fails with a connection error, check:
- Notebook settings → Internet → On

Everything in this lecture is safe public data access.



## 7. Setup: Python Tools We Use

We will use:
- `requests` to make HTTP calls
- `json` for pretty-printing
- `pandas` to convert JSON into DataFrames

Run the next cell.


In [1]:

import requests
import json
import pandas as pd
from urllib.parse import urlencode

print("Ready.")


Ready.



## 8. Your First API Call (No Key)

We start with an API that requires **no key**:

**Hipolabs Universities API**

This API returns universities around the world.

The endpoint we will use is:
- `http://universities.hipolabs.com/search`

This endpoint accepts query parameters such as:
- `name` (university name)
- `country` (country name)

We will ask for universities with “Virginia” in the name.


In [2]:

url = "http://universities.hipolabs.com/search"
params = {"name": "Virginia"}

resp = requests.get(url, params=params)
print("Status code:", resp.status_code)
resp.headers.get("Content-Type")


Status code: 200


'application/json'


### Interpreting the Response Code

A status code tells you what happened.

Common ones you will see:
- 200: OK (success)
- 400: Bad request (your input is wrong)
- 401: Unauthorized (missing/invalid key)
- 403: Forbidden (you are blocked, or key lacks permission)
- 404: Not found (wrong endpoint)
- 429: Too many requests (rate limited)
- 500+: server problem (their fault)

When debugging an API call, always check the status code first.


In [3]:

data = resp.json()
type(data), len(data)


(list, 27)


### What Did We Get Back?

Most APIs return JSON.

JSON is basically:
- dictionaries (key/value)
- lists of dictionaries
- nested structures

Here, the response is a **list** of universities, where each item is a dictionary.

Let’s look at the first item.


In [4]:

data[0]


{'country': 'United States',
 'domains': ['evms.edu'],
 'web_pages': ['https://www.evms.edu/'],
 'name': 'Eastern Virginia Medical School',
 'state-province': None,
 'alpha_two_code': 'US'}


## 9. Pretty Printing JSON 

Raw JSON can be hard to read because of nesting.

This is a quick way to format JSON in notebooks.


In [5]:

print(json.dumps(data[0], indent=2))


{
  "country": "United States",
  "domains": [
    "evms.edu"
  ],
  "web_pages": [
    "https://www.evms.edu/"
  ],
  "name": "Eastern Virginia Medical School",
  "state-province": null,
  "alpha_two_code": "US"
}



## 10. Turning JSON Into a DataFrame

A major reason we like APIs in data science systems:
- APIs give us structured JSON
- Pandas can convert structured JSON into a DataFrame

When the JSON is a list of “similar” dictionaries, this is straightforward.


In [6]:

df_unis = pd.DataFrame(data)
df_unis.head()


Unnamed: 0,country,domains,web_pages,name,state-province,alpha_two_code
0,United States,[evms.edu],[https://www.evms.edu/],Eastern Virginia Medical School,,US
1,United States,[svu.edu],[https://svu.edu/],Southern Virginia University,Virginia,US
2,United States,[wvstate.edu],[https://www.wvstateu.edu/],West Virginia State University,,US
3,United States,[cvcc.vccs.edu],[http://www.cvcc.vccs.edu],Central Virginia Community College,,US
4,United States,[pvcc.edu],[http://www.pvcc.edu],Piedmont Virginia Community College,,US



### DataFrames Make the Next Step Easy

Once it is a DataFrame, you can:
- select columns
- filter rows
- sort values
- count occurrences
- build summaries

That is exactly why the “API → DataFrame” pattern is common.


In [7]:

df_unis[['name', 'country', 'state-province']].head(10)


Unnamed: 0,name,country,state-province
0,Eastern Virginia Medical School,United States,
1,Southern Virginia University,United States,Virginia
2,West Virginia State University,United States,
3,Central Virginia Community College,United States,
4,Piedmont Virginia Community College,United States,
5,Southside Virginia Community College,United States,
6,Southwest Virginia Community College,United States,
7,Eastern West Virginia Community and Technical ...,United States,
8,Southern West Virginia Community and Technical...,United States,
9,West Virginia Northern Community College,United States,



## 11. Passing Inputs: URL vs `params=`

You will see both of these in the wild.

They are equivalent:

**Method A: in the URL**
- `...?name=Virginia&country=United+States`

**Method B: use the `params` dictionary**
- `requests.get(url, params={...})`

Professionally, we prefer `params=` because:
- it is harder to accidentally break the URL
- it handles URL encoding automatically
- it is easier to add/remove parameters


In [8]:

base_url = "http://universities.hipolabs.com/search"
params = {"name": "Virginia", "country": "United States"}

# Build a URL the manual way (for learning)
manual_url = base_url + "?" + urlencode(params)
manual_url


'http://universities.hipolabs.com/search?name=Virginia&country=United+States'

In [9]:

# Manual URL request
resp_manual = requests.get(manual_url)
print("Manual status:", resp_manual.status_code)

# params= request
resp_params = requests.get(base_url, params=params)
print("params= status:", resp_params.status_code)

print("Same response size?", len(resp_manual.json()) == len(resp_params.json()))


Manual status: 200
params= status: 200
Same response size? True



## 12. A Reusable “Safe GET” Helper

When you start calling APIs frequently, you want a consistent pattern:
- call the endpoint
- check the status
- provide a helpful error message
- return JSON if successful

This function is a good baseline.


In [None]:

def get_json(url, params=None, headers=None, timeout=30):
    resp = requests.get(url, params=params, headers=headers, timeout=timeout)
    if resp.status_code != 200:
        # Provide debugging details
        msg = f"Request failed: {resp.status_code}\nURL: {resp.url}\n"
        try:
            msg += f"Response text (first 300 chars): {resp.text[:300]}"
        except Exception:
            pass
        raise RuntimeError(msg)
    return resp.json()

# quick test
test = get_json("http://universities.hipolabs.com/search", params={"name": "Virginia"})
len(test)



## 13. APIs Often Return More Than You Need

A normal workflow is:
1. call the API
2. convert to DataFrame
3. keep only the columns you care about
4. clean/standardize values
5. analyze

Let’s do a small example: count how many results appear per country.


In [None]:

df_unis['country'].value_counts().head(10)



## 14. Another No-Key API (Open-Meteo)

Now we will use a different style of API: weather data with latitude/longitude.

**Open-Meteo** is a public weather API that supports requests without a key.

The key learning point:
- not all APIs “look” the same
- the request pattern is consistent: URL + parameters → JSON response

We will request current weather for Charlottesville, VA (approx).


In [None]:

url = "https://api.open-meteo.com/v1/forecast"
params = {
    "latitude": 38.03,
    "longitude": -78.48,
    "current_weather": True
}

weather = get_json(url, params=params)
print(weather.keys())


In [None]:

print(json.dumps(weather["current_weather"], indent=2))



### Why Some APIs Use Coordinates

Many providers standardize location using latitude/longitude because:
- city names can be ambiguous
- spelling varies
- coordinate systems are consistent globally

When you see coordinates in an API, that is often the reason.



## 15. Nested JSON and `pd.json_normalize`

Some APIs return JSON with nesting.
In those cases, `pd.DataFrame()` may not flatten the structure the way you want.

Pandas provides `pd.json_normalize` to flatten nested structures.

We will normalize the Open-Meteo response.


In [None]:

df_weather = pd.json_normalize(weather)
df_weather



## 16. APIs With Keys (Authentication)

Many APIs require a key because providers need to:
- prevent abuse
- track usage per user
- enforce rate limits and pricing tiers
- protect data that is not fully public

For this lecture, we will show the pattern using a key-based API.
We will not hardcode a key into the notebook.

Instead, we read it from an environment variable.

In Kaggle you can store secrets safely:
- Add-ons / settings → “Secrets”
- Put your key there (example name: `WEATHERAPI_KEY`)

Then this notebook can access it without exposing it.



## 17. Key-Based Example: WeatherAPI.com (Pattern Demo)

WeatherAPI.com is a common example of a key-based weather service.

Their request pattern includes:
- the endpoint URL
- a query string
- a key parameter (or header depending on provider)

We will write code that:
- works if a key is present
- explains what to do if it is not


In [None]:

import os

WEATHERAPI_KEY = os.getenv("WEATHERAPI_KEY")

if not WEATHERAPI_KEY:
    print("No WEATHERAPI_KEY found.")
    print("In Kaggle: add a secret named WEATHERAPI_KEY, then re-run this cell.")
else:
    print("Key loaded (not printed).")


In [None]:

if WEATHERAPI_KEY:
    url = "https://api.weatherapi.com/v1/current.json"
    params = {
        "key": WEATHERAPI_KEY,
        "q": "Charlottesville,VA",
        "aqi": "no"
    }
    w = get_json(url, params=params)
    print(json.dumps(w, indent=2)[:800])



### Two Ways APIs Accept Keys

APIs usually accept authentication in one of two ways:

1. **Query parameter**
   - `?key=YOUR_KEY&...`
2. **Header**
   - `Authorization: Bearer YOUR_KEY`

We used a query parameter above because that is how WeatherAPI.com is designed.

Do not assume all APIs use the same method. Always check documentation.



## 18. Passing Parameters in a URL Path vs Query String

Some APIs encode inputs in the URL path itself.

Examples of what that looks like:
- `/users/123`
- `/weather/Charlottesville`
- `/artists/42/albums`

Other APIs put inputs in query parameters:
- `?user_id=123`
- `?city=Charlottesville`

Both are valid design choices.

For us, the important part is:
- you must learn to read API documentation
- you must learn to translate documentation into `requests.get(...)`



## 19. Rate Limits and Responsible API Use

Most APIs enforce limits such as:
- N requests per minute
- N requests per day
- max response size

If you send too many requests you may see:
- `429 Too Many Requests`

In real work, you solve this by:
- caching results
- using pagination correctly
- sleeping between calls
- reducing duplicate requests

For class notebooks, the main rule is simple:
**Do not put API calls inside tight loops unless you understand the limit.**



## 20. A Practical Pattern: API → DataFrame → Analysis

We will close with a full miniature pipeline using the Universities API:

- call API
- build DataFrame
- answer a few simple questions

This is the pattern you will reuse later with more complex sources.


In [None]:

unis = get_json("http://universities.hipolabs.com/search", params={"country": "United States"})
df = pd.DataFrame(unis)

df[['name', 'state-province', 'country']].head()


In [None]:

# How many universities are returned?
len(df)


In [None]:

# Which states appear most often?
df['state-province'].value_counts().head(10)


In [None]:

# Filter: universities in Virginia (state-province is often abbreviated or missing depending on records)
df[df['state-province'].fillna("").str.contains("Virginia", case=False)]



## 21. What You Should Walk Away With

At the end of this lecture, you should be able to do the following without guessing:

- explain what an API is
- explain why APIs matter in data systems
- make a GET request in Python
- pass parameters using `params=`
- interpret status codes
- parse JSON responses
- convert JSON into a DataFrame
- know what an API key is and why it exists

If you can do those things, you are ready for the next stage:
working with APIs that return larger and messier data.



## 22. Optional Practice 

Try changing the inputs and re-running:

Universities API:
- `country="Canada"`
- `name="Tech"`
- `name="University"`

Open-Meteo:
- change coordinates to another city
- request hourly data (read docs and try it)

The fastest way to learn APIs is to make small changes and see what breaks.
