# APIs and Endpoints — Bluesky Edition

**Course theme:** Data Loop → *Question → Acquire → Analyze → Report*  
**Today’s focus:** Calling a real data API, understanding endpoints & parameters, and wrangling the JSON response.

We’ll use the **Bluesky public API** for demonstrations. You’ll learn:

- How to **authenticate** (when needed) and when you can use **public endpoints**  
- How to make HTTP requests with Python’s `requests`  
- How to read endpoint **parameters** and pass them correctly  
- How to parse the **JSON response** and load it into **pandas**  
- How to handle **pagination**, **rate limits**, and **errors**  
- How to save your results and think about **ethics** & **limitations** of API data


## 0. Prerequisites

```bash
# in your environment (Anaconda prompt or terminal)
pip install requests
```


In [11]:
import time
from typing import Dict, Any, List, Optional
import requests
import pandas as pd
import json as js

BASE_URL = "https://api.bsky.app/xrpc"


## 1) Quickstart: `requests` basics

- **Make a GET request:** `requests.get(url, params={...}, headers={...})`  
- **Inspect status:** `resp.status_code` (200 = OK, 429 = rate-limited, 4xx/5xx = errors)  
- **Parse JSON:** `resp.json()`  
- **Headers & auth:** often required; some public endpoints don’t need auth

We’ll start with a **public search endpoint** to avoid credentials.


## 2) Endpoint: `app.bsky.feed.searchPosts` (public)

This endpoint lets us search recent Bluesky posts.

- **Method:** `GET`
- **URL:** `https://api.bsky.app/xrpc/app.bsky.feed.searchPosts`
- **Key parameters:**
  - `q`: search query string (required)
  - `limit`: number of results to return (suggest 10–50 for demos)
  - `cursor`: for pagination (use the `cursor` returned from a prior call)

> Note: Public APIs can change. If something fails today, check the official docs and reduce the scope of parameters.


In [12]:
endpoint = f"{BASE_URL}/app.bsky.feed.searchPosts"
headers = {"User-Agent": "EMAT-Teaching/1.0 (+contact@example.com)"}
params = {
    "q": "data science",
    "limit": 5,  # start small for demos
}

resp = requests.get(endpoint, params=params, headers=headers, timeout=30)
print("Status:", resp.status_code)
data = resp.json() if resp.headers.get("content-type", "").startswith("application/json") else {}
print("Top-level keys:", list(data.keys()))
print(js.dumps(data, indent=2))



Status: 200
Top-level keys: ['posts', 'cursor']
{
  "posts": [
    {
      "uri": "at://did:plc:uk2zxthkb4rurgew33wkfhwo/app.bsky.feed.post/3lytjauwpl532",
      "cid": "bafyreiarj3lxbryvgjgmldd7gzvouqb2l5qicwcwlxg4q7m7v7ohlnfjgq",
      "author": {
        "did": "did:plc:uk2zxthkb4rurgew33wkfhwo",
        "handle": "rhetoric.streetwi.se.ap.brid.gy",
        "displayName": "rhetoric",
        "avatar": "https://cdn.bsky.app/img/avatar/plain/did:plc:uk2zxthkb4rurgew33wkfhwo/bafkreiaxjtrtf2vmpeh7n2j3g44noida5iy3zm47qhzyppx54wbzxeqauu@jpeg",
        "associated": {
          "chat": {
            "allowIncoming": "none"
          },
          "activitySubscription": {
            "allowSubscriptions": "followers"
          }
        },
        "labels": [
          {
            "src": "did:plc:uk2zxthkb4rurgew33wkfhwo",
            "uri": "at://did:plc:uk2zxthkb4rurgew33wkfhwo/app.bsky.actor.profile/self",
            "cid": "bafyreib2myqvgbufhzjcuh2posvitnafwyvlpf4w75myqhxm4izbxdgxlm",

### Interpreting the response

Typical top-level fields you may see:

- `posts`: a list of post objects (each with author and record info)
- `cursor`: a string to request the **next page**

Each post commonly includes:
- `uri`, `cid`: identifiers
- `author`: nested object (e.g., `displayName`, `handle`, `did`)
- `record`: nested object with `text` and timestamps (`createdAt`)
- `indexedAt`: when Bluesky indexed the post

Let’s normalize the JSON into a tidy table.


In [13]:
def flatten_posts(posts: List[Dict[str, Any]]) -> pd.DataFrame:
    rows = []
    for p in posts:
        author = p.get("author", {}) or {}
        record = p.get("record", {}) or {}
        rows.append({
            "uri": p.get("uri"),
            "cid": p.get("cid"),
            "author_handle": author.get("handle"),
            "author_name": author.get("displayName"),
            "indexed_at": p.get("indexedAt"),
            "created_at": record.get("createdAt"),
            "text": record.get("text"),
        })
    return pd.DataFrame(rows)

posts = data.get("posts", []) or []
df = flatten_posts(posts)
df.head(10)


Unnamed: 0,uri,cid,author_handle,author_name,indexed_at,created_at,text
0,at://did:plc:uk2zxthkb4rurgew33wkfhwo/app.bsky...,bafyreiarj3lxbryvgjgmldd7gzvouqb2l5qicwcwlxg4q...,rhetoric.streetwi.se.ap.brid.gy,rhetoric,2025-09-15T00:08:34.010Z,2025-09-15T00:07:57.000Z,"Instead of chasing more data, focus on what re..."
1,at://did:plc:janrdq6lwlexzcae5x52yyff/app.bsky...,bafyreiahsnamogzvpz3l76cvsvsngigeahadaq6sa6yq3...,feiticeir0s.bsky.social,Feiticeir0,2025-09-14T23:45:08.509Z,2025-09-14T23:45:05.328Z,Source: Towards Data Science\n search.app/3SqX9
2,at://did:plc:rqzqrpmgvpdidh5557pbmyrw/app.bsky...,bafyreiaikbhcaozrtjr4bpo3x4rgz7jeap3sqiqeir7lj...,scholarai.bsky.social,ScholarAI Research Agent on Bluesky,2025-09-14T23:37:46.702Z,2025-09-14T23:37:46.285Z,A recent fascinating development in behavioral...
3,at://did:plc:4xo5m5dle6pl57oobivraajb/app.bsky...,bafyreib7hgkbrx7wyrrwj7b7cgr3xllp6var7vtluxpaw...,soosocean.bsky.social,Southern Ocean Observing System (SOOS),2025-09-14T23:00:45.912Z,2025-09-14T23:00:45.405Z,#SOOS_DATAMonday: 2x🤔 Think twice before publi...
4,at://did:plc:kxu57hzs7hkqinozalbd3tma/app.bsky...,bafyreigqjvhjsnrvtvynuyfutbt5axketovdsmx5o4sw5...,u24usa.bsky.social,,2025-09-14T21:58:04.805Z,2025-09-14T21:58:02Z,The precarious future of consumer genetic priv...


## 3) Basic wrangling & description

Let’s transform timestamps, compute a few simple features, and do light EDA.


In [14]:
if not df.empty:
    # Convert to datetime
    for col in ["indexed_at", "created_at"]:
        df[col] = pd.to_datetime(df[col], errors="coerce")

    # Simple feature: text length
    df["text_len"] = df["text"].fillna("").str.len()

    display(df.head(5))
    print("\nSummary:")
    display(df.describe(include="all"))
else:
    print("No rows returned. Try a different query or reduce 'limit'.")


Unnamed: 0,uri,cid,author_handle,author_name,indexed_at,created_at,text,text_len
0,at://did:plc:uk2zxthkb4rurgew33wkfhwo/app.bsky...,bafyreiarj3lxbryvgjgmldd7gzvouqb2l5qicwcwlxg4q...,rhetoric.streetwi.se.ap.brid.gy,rhetoric,2025-09-15 00:08:34.010000+00:00,2025-09-15 00:07:57+00:00,"Instead of chasing more data, focus on what re...",294
1,at://did:plc:janrdq6lwlexzcae5x52yyff/app.bsky...,bafyreiahsnamogzvpz3l76cvsvsngigeahadaq6sa6yq3...,feiticeir0s.bsky.social,Feiticeir0,2025-09-14 23:45:08.509000+00:00,2025-09-14 23:45:05.328000+00:00,Source: Towards Data Science\n search.app/3SqX9,46
2,at://did:plc:rqzqrpmgvpdidh5557pbmyrw/app.bsky...,bafyreiaikbhcaozrtjr4bpo3x4rgz7jeap3sqiqeir7lj...,scholarai.bsky.social,ScholarAI Research Agent on Bluesky,2025-09-14 23:37:46.702000+00:00,2025-09-14 23:37:46.285000+00:00,A recent fascinating development in behavioral...,300
3,at://did:plc:4xo5m5dle6pl57oobivraajb/app.bsky...,bafyreib7hgkbrx7wyrrwj7b7cgr3xllp6var7vtluxpaw...,soosocean.bsky.social,Southern Ocean Observing System (SOOS),2025-09-14 23:00:45.912000+00:00,2025-09-14 23:00:45.405000+00:00,#SOOS_DATAMonday: 2x🤔 Think twice before publi...,258
4,at://did:plc:kxu57hzs7hkqinozalbd3tma/app.bsky...,bafyreigqjvhjsnrvtvynuyfutbt5axketovdsmx5o4sw5...,u24usa.bsky.social,,2025-09-14 21:58:04.805000+00:00,NaT,The precarious future of consumer genetic priv...,195



Summary:


Unnamed: 0,uri,cid,author_handle,author_name,indexed_at,created_at,text,text_len
count,5,5,5,5,5,4,5,5.0
unique,5,5,5,5,,,5,
top,at://did:plc:uk2zxthkb4rurgew33wkfhwo/app.bsky...,bafyreiarj3lxbryvgjgmldd7gzvouqb2l5qicwcwlxg4q...,rhetoric.streetwi.se.ap.brid.gy,rhetoric,,,"Instead of chasing more data, focus on what re...",
freq,1,1,1,1,,,1,
mean,,,,,2025-09-14 23:18:03.987599872+00:00,2025-09-14 23:37:53.504499968+00:00,,218.6
min,,,,,2025-09-14 21:58:04.805000+00:00,2025-09-14 23:00:45.405000+00:00,,46.0
25%,,,,,2025-09-14 23:00:45.912000+00:00,2025-09-14 23:28:31.064999936+00:00,,195.0
50%,,,,,2025-09-14 23:37:46.702000128+00:00,2025-09-14 23:41:25.806500096+00:00,,258.0
75%,,,,,2025-09-14 23:45:08.508999936+00:00,2025-09-14 23:50:48.246000128+00:00,,294.0
max,,,,,2025-09-15 00:08:34.010000+00:00,2025-09-15 00:07:57+00:00,,300.0


## 4) Pagination with `cursor`

Many APIs return a `cursor` to fetch the **next page**. We’ll write a helper to
collect multiple pages while being gentle with rate limits.


In [15]:
def search_posts(query: str, pages: int = 3, per_page: int = 10, pause: float = 1.0) -> pd.DataFrame:
    endpoint = f"{BASE_URL}/app.bsky.feed.searchPosts"
    cursor: Optional[str] = None
    all_rows: List[Dict[str, Any]] = []

    for i in range(pages):
        params = {"q": query, "limit": per_page}
        if cursor:
            params["cursor"] = cursor

        r = requests.get(endpoint, params=params, timeout=30)
        if r.status_code == 429:
            # Rate-limited: wait and retry once
            time.sleep(5)
            r = requests.get(endpoint, params=params, timeout=30)

        if r.status_code != 200:
            print(f"Request {i+1} failed with status {r.status_code}. Stopping.")
            break

        j = r.json()
        posts = j.get("posts", []) or []
        df_page = flatten_posts(posts)
        all_rows.append(df_page)
        cursor = j.get("cursor")
        print(f"Page {i+1}: {len(df_page)} rows; next cursor present? {'yes' if cursor else 'no'}")

        if not cursor:
            break

        time.sleep(pause)  # be polite to the API

    return pd.concat(all_rows, ignore_index=True) if all_rows else pd.DataFrame()

df_multi = search_posts("machine learning", pages=2, per_page=15, pause=1.0)
df_multi.head(10)


Page 1: 15 rows; next cursor present? yes
Request 2 failed with status 403. Stopping.


Unnamed: 0,uri,cid,author_handle,author_name,indexed_at,created_at,text
0,at://did:plc:s33lmvk7rwmlypc2qm6wn2lr/app.bsky...,bafyreiamcxrq4j23qjowfv34emgwxakbh6crosgpjy4i4...,llms.activitypub.awakari.com.ap.brid.gy,LLMs,2025-09-15T00:12:19.505Z,2025-09-15T00:11:21.000Z,How can an AI train itself if no one is tellin...
1,at://did:plc:5u3nj22pwjgaqtod32gjsvdu/app.bsky...,bafyreicsvp4bgafcaffneizb4zq2brraa2vbndj4rcgzb...,stonedhshi.bsky.social,Stone D.-H. Shi,2025-09-15T00:10:13.604Z,2025-09-15T00:10:12.167Z,"High Impact Research (2023 Paper): ""AB-Amy: ma..."
2,at://did:plc:wvd2adsms7gniik5ggqowjah/app.bsky...,bafyreihv4ughttfz7ytnggbb5jcd3vq4jb42ylc73tzsw...,freecodecamp.bsky.social,freeCodeCamp.org,2025-09-15T00:01:15.003Z,2025-09-15T00:01:14.834Z,"If you're curious about how AI systems work, t..."
3,at://did:plc:7fzs3uks3tjlsvuvdn5dcjx5/app.bsky...,bafyreib3v2vpafqkiyopttpm523rt6d4ufi6rmrppphll...,uneek35.bsky.social,Diego Diaz,2025-09-14T23:50:24.003Z,2025-09-14T23:50:22.523Z,Machine learning can and has worked that first...
4,at://did:plc:j3u5epxp7eery335fgzxcvzc/app.bsky...,bafyreiaeg2e6vrcxar6p2o7izx7wvjtswdrbveyhy676h...,american-news.bsky.social,American Sports,2025-09-14T23:47:08.813Z,2025-09-14T23:47:07Z,"Vikings vs. Falcons props, bets, SportsLine Ma..."
5,at://did:plc:kxe4n6ohyilz7ld4fn46tiyh/app.bsky...,bafyreidlxxole4dsjewkfuod5cn5btedriwkf75fewmct...,shibedrill.site,0xD0EFA6,2025-09-14T23:42:17.103Z,2025-09-14T23:42:16.748Z,that said: I think I am improving a lot at the...
6,at://did:plc:64r4gb2rh6gzja3xu63kk6pn/app.bsky...,bafyreid7qthercllakopeo5ev6bcou5hlubtzdzsujlyo...,mathskath.bsky.social,Katherine Seaton,2025-09-14T23:33:14.106Z,2025-09-14T23:33:12.930Z,I am learning to use a sewing machine. Better ...
7,at://did:plc:s33lmvk7rwmlypc2qm6wn2lr/app.bsky...,bafyreifdujfmr7awobs6n4zliasiwolqhpnwb75ttqe5r...,llms.activitypub.awakari.com.ap.brid.gy,LLMs,2025-09-14T23:11:58.407Z,2025-09-14T23:05:05.000Z,A Wild Hack to Top Google in 10 Hours Using Pe...
8,at://did:plc:friejrvw4jcaxy34tvg4oajl/app.bsky...,bafyreicy65f27esbeykxugmq54t3voza3kp2jdlxqqxn3...,786news.bsky.social,786 News,2025-09-14T23:00:43.209Z,2025-09-14T23:00:41Z,The Machine Ethics podcast: Autonomy AI with A...
9,at://did:plc:fmjo3shi2enqi6aji2hnbzdw/app.bsky...,bafyreifeojdpui42buwxyckm4ur7fpzfp2nvn75b5bvzp...,weel.bsky.social,Coba Weel,2025-09-14T22:49:14.005Z,2025-09-14T22:49:13.649Z,"i mean, AFAICT these tweaks are all just chang..."


## 5) Save your results

You can keep both the raw JSON and a clean CSV for later analysis/reproducibility.


In [16]:
# Save the last raw response (if present) and DataFrame
import os, json

os.makedirs("outputs", exist_ok=True)

# Save a minimal raw example
raw_path = "outputs/bluesky_search_raw.json"
with open(raw_path, "w", encoding="utf-8") as f:
    json.dump(data, f, ensure_ascii=False, indent=2)

# Save the wrangled table
csv_path = "outputs/bluesky_search_results.csv"
if not df.empty:
    df.to_csv(csv_path, index=False)

raw_path, os.path.exists(csv_path)


('outputs/bluesky_search_raw.json', True)

## 6) Another endpoint: profile lookup

Try `app.bsky.actor.getProfile` to resolve data for a handle.

- **Method:** `GET`  
- **URL:** `https://public.api.bsky.app/xrpc/app.bsky.actor.getProfile`  
- **Parameter:** `actor` (handle or DID)


In [17]:
profile_endpoint = f"{BASE_URL}/app.bsky.actor.getProfile"
params = {"actor": "bsky.app"}  # try a known handle, or change this
r = requests.get(profile_endpoint, params=params, timeout=30)
print("Status:", r.status_code)
profile = r.json() if r.ok else {}
print(js.dumps(data, indent=2))


Status: 200


AttributeError: module 'pandas.io.json' has no attribute 'dumps'

## 7) Errors, limits, and ethics

- **Rate limits (429):** pause and retry modestly; don’t hammer endpoints.  
- **HTTP errors:** 400/404 (bad params or not found), 401/403 (auth issues), 5xx (server issues).  
- **Schema drift:** APIs change—defensive coding and **.get()** access help.  
- **Ethics:** Posts can be personal; be thoughtful about storing, sharing, and analyzing user-generated content. Respect terms of service and privacy norms.


## 8) About authentication (optional for today)

Some Bluesky endpoints require authentication using **App Passwords** and session creation. For public demos we used public endpoints. For private or write operations, consult the official docs, create an App Password (not your login password), and store credentials **securely** (e.g., in environment variables or a `.env` file, never hard-code secrets in notebooks).


---

### ✅ Wrap-up

- You practiced the **data loop** with real HTTP calls.  
- You learned how to read **endpoint docs**, pass parameters, and parse **JSON**.  
- You wrangled results into **pandas**, handled **pagination**, and saved outputs.  

*Last updated:* 2025-09-14 11:46:31
