# JSON Practice: Mastodon Public Timeline

Work in driver/navigator pairs with a single laptop. Talk through each idea before you code so both partners understand the plan.

## 1. Setup

Import the core libraries we will need for HTTP requests, JSON inspection, and quick analyses.

In [None]:
import requests
from pprint import pprint
import re

## 2. Fetch the timeline data

Use the provided Mastodon public timeline endpoint. Confirm we received an HTTP 200 response and note the content type.

In [None]:
SOURCE_URL = "https://hci.social/api/v1/timelines/public?limit=20"

response = requests.get(SOURCE_URL, timeout=10)
response.raise_for_status()

print(f"Status code: {response.status_code}")
print(f"Content type: {response.headers.get('content-type')}")

## 3. Load the JSON payload

Convert the HTTP response into Python objects. Verify we received a list of status dictionaries.

In [None]:
posts = response.json()

print(f"Number of posts retrieved: {len(posts)}")
print(f"Type of top-level object: {type(posts).__name__}")
if posts:
    print(f"Type of an individual post: {type(posts[0]).__name__}")

## 4. Explore the raw structure

Start by looking at the keys provided for each post and pretty-printing one representative object.

In [None]:
if posts:
    pprint(posts[0])

In [None]:
if posts:
    sample_post = posts[0]
    print(sorted(sample_post.keys()))
else:
    print('No posts returned from the API.')

### Identify nested collections

Find the fields that contain lists or dictionaries so we know where to drill deeper (media attachments, tags, mentions, etc.).

In [None]:
if posts:
    list_like_fields = sorted({key for key, value in posts[0].items() if isinstance(value, list)})
    dict_like_fields = sorted({key for key, value in posts[0].items() if isinstance(value, dict)})
    print("Fields containing lists:", list_like_fields)
    print("Fields containing nested dictionaries:", dict_like_fields)

## 5. Extract plain-text post bodies

Convert the HTML content field to readable text so we can inspect language patterns.

First, look at the post object printed above and identify the field that contains the post content. (You can confirm by looking at the Mastodon [documentation](https://docs.joinmastodon.org/entities/Status/).) Typically for social platforms, it is called "content" or "text" or "status", and sometimes is embedded in a sub-object. Then for a sample post (`posts[0]`), extract and display that field.

In [None]:
posts[0]['content'] # or posts[0].get('content')

The post content is displayed as HTML. We won't worry about that for now, but note it. Eventually, we will want plain text for our analysis.

Now make a loop that extracts the post content from every post in the list. Store these in a new list called `post_texts`. Print the number of posts collected and display the first post as a sample.

In [None]:
post_texts = []

for post in posts:
    post_texts.append(post.get('content'))

print(f'Collected {len(post_texts)} posts.')
print('Sample post content:', post_texts[0])

## 6. Collect media URLs

Investigate how media attachments are stored so you can gather the direct URLs for later analysis.

Revisit the sample post printed above and locate the field that contains attachments. The Mastodon [status schema](https://docs.joinmastodon.org/entities/Status/#media_attachments) names this field `media_attachments`. Pull that field from the first post so you can inspect its structure.

In [None]:
post = posts[0] if posts else {}
attachments = post['media_attachments']
attachments

If the first post has no attachments, iterate through the feed until you find a post whose `media_attachments` list is not empty. Keep track of the index you used so you can explain your process.

In [None]:
first_with_media = None
for i, post in enumerate(posts):
    if post.get('media_attachments'):
        first_with_media = (i, post)
        break

if first_with_media:
    index, post = first_with_media
    print(f'Found media attachments in post index {index}')
    print(post.get('media_attachments'))
else:
    print('No media attachments found in this dataset.')

Notice that `media_attachments` is a list of dictionaries. Identify the key that points to the direct media URL (typically `url` or `preview_url`).

Now loop through every post, collect the media URLs into a list called `media_urls`, and report how many you found. Also print an example URL so partners can double-check the result.

In [None]:
media_urls = []
for post in posts:
    for media in post.get('media_attachments', []):
        url = media.get('url')
        if url:
            media_urls.append(url)

print(f'Collected {len(media_urls)} media URLs')
if media_urls:
    print('Sample media URL:', media_urls[0])

## 7. Gather hashtags

Examine how hashtags are stored so you can assemble a list for frequency counts and topic exploration. (This will be similar to how you extracted media URLs.)

Check the sample post above to locate the field that stores hashtags. In the Mastodon [status schema](https://docs.joinmastodon.org/entities/Status/#tags), this array is called `tags`. Retrieve it from the first post to confirm the structure.

In [None]:
post = posts[0] if posts else {}
tags = post.get('tags', [])
tags

If the first post has no hashtags, scan forward until you find one that does. Document which index you used so your partner can reproduce the step.

In [None]:
first_with_tags = None
for i, post in enumerate(posts):
    if post.get('tags'):
        first_with_tags = (i, post)
        break

if first_with_tags:
    index, post = first_with_tags
    print(f'Found hashtags in post index {index}')
    post.get('tags')
else:
    print('No hashtags found in this dataset.')

Each tag entry is a dictionary. Identify the key that carries the hashtag text (usually `name`).

Loop through all posts, extract the hashtag text into a list called `hashtags`, and report summary counts. Lowercase each hashtag so variations in capitalization are merged.

In [None]:
hashtags = []
for post in posts:
    for tag in post.get('tags', []):
        name = tag.get('name')
        if name:
            hashtags.append(name.lower())

print(f'Collected {len(hashtags)} hashtags')
if hashtags:
    print('Unique hashtags observed:', len(set(hashtags)))
    print('Sample hashtags:', sorted(set(hashtags))[:5])

## 8. Extract links from post content

Locate outbound URLs embedded in the HTML `content` field so you can analyze where posts send readers. NOTE: There are more sophisticated and reliable ways to extract links from HTML, but for this exercise, we will simply look for the telltale `href` property in the plain text.

Inspect the `content` field on a sample post. Mastodon stores status bodies as HTML strings, so links appear inside `<a>` tags with `href` attributes.

In [None]:
post = posts[0] if posts else {}
content_html = post.get('content') or ''
content_html

If this sample does not contain a link, scan forward until you find a post whose HTML includes an `<a>` tag. Share the index you picked so others can double-check.

In [None]:
first_with_link = None
for i, post in enumerate(posts):
    html = post.get('content') or ''
    if 'href=\"' in html:
        first_with_link = (i, post)
        break

if first_with_link:
    index, post = first_with_link
    print(f'Found links in post index {index}')
    (post.get('content') or '')
else:
    print('No links found in this dataset.')

Notice how each anchor tag uses `href="..."`. Use a regular expression to capture those URL values for any post.

In [None]:
href_pattern = re.compile(r'href=\"(.*?)\"')

Loop through every post, apply the pattern to the HTML content, store the results in a list called `link_urls`, and print a quick summary (count plus an example link).

In [None]:
link_urls = []
for post in posts:
    html = post.get('content') or ''
    link_urls.extend(href_pattern.findall(html))

print(f'Collected {len(link_urls)} links from content')
if link_urls:
    print('Sample link:', link_urls[0])

## 9. Record mentioned accounts

Identify which accounts are referenced inside each post so you can explore interaction patterns.

Inspect a sample post to find the field that lists mentions. In the Mastodon schema, this lives in the `mentions` array.

In [None]:
post = posts[0] if posts else {}
mentions_field = post.get('mentions', [])
mentions_field

If the first post has no mentions, scan forward until you find one that does. Write down the index you chose so partners keep track.

In [None]:
first_with_mentions = None
for i, post in enumerate(posts):
    if post.get('mentions'):
        first_with_mentions = (i, post)
        break

if first_with_mentions:
    index, post = first_with_mentions
    print(f'Found mentions in post index {index}')
    post.get('mentions')
else:
    print('No mentions found in this dataset.')

Each mention entry is a dictionary. Identify the property that stores the account identifier you care about (e.g., `acct`).

Loop through the feed, capture the mentioned account names (or IDs) into a list called `mentions`, and summarize what you found.

In [None]:
mentions = []
for post in posts:
    for mention in post.get('mentions', []):
        acct = mention.get('acct')
        if acct:
            mentions.append(acct)

print(f'Collected {len(mentions)} mentions')
if mentions:
    print('Unique accounts mentioned:', len(set(mentions)))
    print('Sample mentions:', sorted(set(mentions))[:5])

## 10. Gather post authors

Identify how author information is stored so you can summarize who appears in the feed.

Inspect the first post for its `account` field. Mastodon stores author metadata there, including the handle (`acct`). Confirm the structure using the Mastodon [status schema](https://docs.joinmastodon.org/entities/Status/#account).

In [None]:
post = posts[0]
author_info = post.get('account', {})
author_info

If that post lacks author details, scan forward until you find a post whose `account` data is populated. Record the index you used so your partner can reproduce the step.

In [None]:
first_with_account = None
for i, post in enumerate(posts):
    account = post.get('account')
    if account:
        first_with_account = (i, post)
        break

if first_with_account:
    index, post = first_with_account
    print(f'Found account data in post index {index}')
    post.get('account')
else:
    print('No account data found in this dataset.')

Notice the keys available inside each account dictionary (e.g., `acct`, `display_name`). Decide which identifier you want to reuse for filtering later.

Loop through the feed, capture each post author handle into a list called `authors`, and also build a sorted list of unique authors for quick reference.

In [None]:
authors = []
for post in posts:
    account = post.get('account') or {}
    acct = account.get('acct')
    if acct:
        authors.append(acct)

unique_authors = sorted(set(authors))

print(f'Collected {len(authors)} author references')
if authors:
    print('Unique authors observed:', len(unique_authors))
    print('Sample authors:', unique_authors[:5])

## 11. Search posts by account

Use the author list you just built to focus on posts from a specific account.

Review the unique author handles below and choose one to investigate.

In [None]:
print('Sample authors:', unique_authors[:5])

Set `target_account` to the handle you want to analyze. Update the default if you have a different choice.

In [None]:
target_account = unique_authors[0] if unique_authors else ''  # <-- edit this after reviewing the list
print(f"Searching for account: {target_account or '[edit this]'}")

Filter the posts to only those authored by `target_account`. Report how many matches you found and preview a representative example.

In [None]:
posts_from_account = []

for post in posts:
    if post['account']['acct'] == target_account:
        posts_from_account.append(post)

print(f'Matches found: {len(posts_from_account)}')
if posts_from_account:
    print('First match ID:', posts_from_account[0]['id'])
    print('Sample match content:', posts_from_account[0]['content'])

## 12. Search posts by keyword

Scan the post bodies for a case-insensitive keyword to understand how often a topic appears.

Review a few entries in `post_texts` to pick a word or phrase you care about. Remember that we already stored the HTML content as plain text above.

In [None]:
print('Sample post texts:')
for text in post_texts[:3]:
    print('-', text)

Choose a keyword to search for. Make the comparison case-insensitive so capitalization differences do not matter.

In [None]:
target_keyword = 'the'  # <-- edit this to your chosen keyword
print(f"Searching for keyword: {target_keyword}")

Loop through the plain-text bodies and collect posts where the keyword appears. Compare using lowercase copies of the text.

In [None]:
matching_posts = []

for post in posts:
    if target_keyword and target_keyword.lower() in post['content'].lower():
        matching_posts.append(post['content'])

print(f'Posts containing "{target_keyword}": {len(matching_posts)}')
if matching_posts:
    preview = matching_posts[0]
    print('Sample match:', preview)

## 13. Search posts by hashtag

Use the hashtag list you built to focus on posts that mention a particular topic.

Review the unique hashtag values to decide which topic to explore.

In [None]:
unique_hashtags = sorted(set(hashtags))
print("Unique hashtags collected:", len(unique_hashtags))
print("Sample hashtags:", unique_hashtags[:10])

Set `target_hashtag` to the tag you want to analyze. Update the default after you inspect the list.

In [None]:
target_hashtag = unique_hashtags[0] if unique_hashtags else ''  # <-- edit this after reviewing the list
print(f"Searching for hashtag: #{target_hashtag or '[edit this]'}")

Loop through the posts and capture those whose `tags` array contains the chosen hashtag. Report how many matches you found and preview a snippet of one.

In [None]:
matching_posts = []

for post in posts:
    tag_names = [tag['name'].lower() for tag in post['tags']]
    if target_hashtag in tag_names:
        matching_posts.append(post['content'])

print(f'Posts containing #{target_hashtag}: {len(matching_posts)}')
if matching_posts:
    preview = matching_posts[0]
    print('Sample match:', preview)


---

Discuss with your partner: What additional questions could you ask of this feed? What transformations would make it easier to pivot to MongoDB or another document store?