# APIs and Endpoints — Bluesky Edition

**Course theme:** Data Loop → *Question → Acquire → Analyze → Report*  
**Today’s focus:** Calling a real data API, understanding endpoints & parameters, and wrangling the JSON response.

We’ll use the **Bluesky public API** for demonstrations. You’ll learn:
 
- How to make HTTP requests with Python’s `requests`  
- How to read endpoint **parameters** and pass them correctly  
- How to parse the **JSON response** and load it into **pandas**  
- How to handle **pagination**, **rate limits**, and **errors**  
- How to save your results and think about **ethics** & **limitations** of API data


## 0. Prerequisites

```bash
# in your environment (Anaconda prompt or terminal)
pip install requests
```


In [2]:
import time
from typing import Dict, Any, List, Optional
import requests
import pandas as pd
import json as js

BASE_URL = "https://api.bsky.app/xrpc"


## 1) Quickstart: `requests` basics

- **Make a GET request:** `requests.get(url, params={...}, headers={...})`  
- **Inspect status:** `resp.status_code` (200 = OK, 429 = rate-limited, 4xx/5xx = errors)  
- **Parse JSON:** `resp.json()`  
- **Headers & auth:** often required; some public endpoints don’t need auth

We’ll start with a **public search endpoint** to avoid credentials.


## 2) Endpoint: `app.bsky.feed.searchPosts` (public)

This endpoint lets us search recent Bluesky posts.

- **Method:** `GET`
- **URL:** `https://api.bsky.app/xrpc/app.bsky.feed.searchPosts`
- **Key parameters:**
  - `q`: search query string (required)
  - `limit`: number of results to return (suggest 10–50 for demos)
  - `cursor`: for pagination (use the `cursor` returned from a prior call)

> Note: Public APIs can change. If something fails today, check the official docs and reduce the scope of parameters.


In [7]:
endpoint = f"{BASE_URL}/app.bsky.feed.searchPosts"
headers = {"User-Agent": "EMAT-Teaching/1.0 (+contact@example.com)"}
params = {
    "q": "data science",
    "limit": 10,  # start small for demos
}

resp = requests.get(endpoint, params=params, headers=headers, timeout=30)
print("Status:", resp.status_code)
data = resp.json() if resp.headers.get("content-type", "").startswith("application/json") else {}
print("Top-level keys:", list(data.keys()))
print(js.dumps(data, indent=2))



Status: 200
Top-level keys: ['posts', 'cursor']
{
  "posts": [
    {
      "uri": "at://did:plc:4llrhdclvdlmmynkwsmg5tdc/app.bsky.feed.post/3lz23ekrhnx23",
      "cid": "bafyreifhwc7aj2f7vzbgapseyusksannbviz74f7k5dtqwvi5vareyhqmm",
      "author": {
        "did": "did:plc:4llrhdclvdlmmynkwsmg5tdc",
        "handle": "atrupar.com",
        "displayName": "Aaron Rupar",
        "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:4llrhdclvdlmmynkwsmg5tdc/bafkreibmhm3h6ar52pogvolisrzjdhwa2myras5vkxzj67twxn2l6pogwu@jpeg",
        "associated": {
          "chat": {
            "allowIncoming": "following"
          },
          "activitySubscription": {
            "allowSubscriptions": "followers"
          }
        },
        "labels": [],
        "createdAt": "2023-04-28T00:47:57.437Z",
        "verification": {
          "verifications": [
            {
              "issuer": "did:plc:z72i7hdynmk6r22z27h6tvur",
              "uri": "at://did:plc:z72i7hdynmk6r22z27h6tvur/app.bsky

In [8]:
rows = []
for post in data["posts"]:
    rows.append({
        "Author": post["author"].get("displayName"),
        "Text": post["record"].get("text"),
        "Likes": post.get("likeCount", 0),
        "Timestamp": post["record"].get("createdAt")
    })

### Interpreting the response

Typical top-level fields you may see:

- `posts`: a list of post objects (each with author and record info)
- `cursor`: a string to request the **next page**

Each post commonly includes:
- `uri`, `cid`: identifiers
- `author`: nested object (e.g., `displayName`, `handle`, `did`)
- `record`: nested object with `text` and timestamps (`createdAt`)
- `indexedAt`: when Bluesky indexed the post

Let’s normalize the JSON into a tidy table.


In [6]:
# Create DataFrame
df = pd.DataFrame(rows, columns=["Author", "Text", "Likes", "Timestamp"])

# Show as table in Jupyter
df.head()

Unnamed: 0,Author,Text,Likes,Timestamp
0,Aaron Rupar,"former CDC official Dr. Houry: ""I first learne...",415,2025-09-17T14:43:40.084Z
1,Data Science Nigeria,As an important chapter within the Data Scien...,0,2025-09-17T14:43:20.605Z
2,Stand Up for Science!,Cassidy: RFK Jr. had said that he didn't ask h...,4,2025-09-17T14:42:51.543Z


# Authentication 

