# Section 1 ‚Äî Report Header & Hypothesis

**Report Title:**  
The Effect of Alt Text vs. No Alt Text on Bluesky Image Post Engagement Rates  

**Your Name:**  
Kalayah Bradley  

**Date:**  
October 12, 2025

---

### Hypothesis  
On Bluesky, posts with alt text for images receive more likes and reposts on average than posts and images without alt text. 

---

### Theoretical Rationale  
For users with disabilities, features like alt text are designed to make online activities easier and more accessible. Media accessibility research has shown that inclusive environments frequently have higher levels of engagement, attention, and trust (Ellis & Kent, *Disability and New Media*, 2011).

To elaborate on my theory, users who take the time to post, are more creative, and write quality alt text have improved the quality of their content, which in turn increases activity. Additionally, creators who prioritize accessibility typically attract and retain followers who value these features, which may increase activity and engagement metrics like likes, reposts, shares, and so forth.  

---

### Statistical Application  
I will compare the activity levels of posts with and without alt text in order to test this hypothesis.  


- **Independent Variable:** Presence of alt text (`record.embed.images.alt`)  
- **Dependent Variables:** `likeCount`, `repostCount`   




# Section 2 ‚Äî Endpoint Plan (Design Your Data Collection)

### Planned Endpoints

1. **`app.bsky.feed.searchPosts`** ‚Äî to collect posts containing images.  
   - **Request parameters:**  
     - `q`: search for image-related keywords such as ‚Äúphoto,‚Äù ‚Äúart,‚Äù or ‚Äúillustration.‚Äù  
     - `limit`: number of posts to retrieve (e.g., 100‚Äì200 per request).  
   - **Response fields:**  
     - `post.uri`, `post.cid`, `record.embed.images.alt`, `likeCount`, `repostCount`, and `author.did`.  
   - **Why these fields map to the variables in the hypothesis:**  
     - Gives post metadata and image data, including if alt text is used.  

2. **`app.bsky.actor.getProfiles`** ‚Äî to develop author data.  
   - **Request parameters:**  
     - `actor`: DID (identifier from post data).  
   - **Response fields:**  
     - `followersCount`, `followsCount`, `postsCount`, `createdAt`, `displayName`.  
   - **Why these fields map to the variables in the hypothesis:**  
     - Gives context for making engagement normal.

---

### Reliability and Bias  
- **Reliability:**  Consistent in the fields, Bluesky's API offers structured JSON responses.  
- **Unreliability:**  Larger samples are restricted by rate limits; some posts release `record.embed.images`.  
- **Bot Activity:** Engagement averages can be slightly altered by bot or spam accounts.    
- **Ethics:** The terms of Bluesky's API will only allow for the collection of public data.  

---

### Limitations  
- Updates may be slowed down by activity levels.   
- Sample size is reduced when image objects are missing.  
-  Although they can demonstrate greater effort, alt text options and add-ons do not always indicate higher quality.
- Restricted data areas are present in certain profiles.  


## Section 3 - Data Collection

In [None]:
# Imports
import requests      
import time           
import json as js    
import pandas as pd   

# Define URL for Bluesky API requests
BASE_URL = "https://api.bsky.app/xrpc"

## Data Collection (Endpoint 1):

In [None]:
#endpoint = f"{BASE_URL}/app.bsky.feed.searchPosts"
endpoint = f"{BASE_URL}/app.bsky.feed.searchPosts"
headers = {"User-Agent": "EMAT-Teaching/1.0 (+contact@example.com)"}
params = {"q": "photo", "limit": 50}

resp = requests.get(endpoint, params=params, headers=headers, timeout=30)

print("Status:", resp.status_code)
data = resp.json()
print("top-level keys:", list(data.keys()))
posts = data.get("posts", [])
rows = []

# Loop 
for p in posts:
    record = p.get("record", {})      
    embed = record.get("embed", {})    
    
    # Check if the embed contains any image that includes alt text
    images = embed.get("images", []) if isinstance(embed.get("images"), list) else []
    has_alt = any(img.get("alt") for img in images) if images else False

    stats = {
        "post_uri": p.get("uri"),
        "post_cid": p.get("cid"),
        "author_did": p.get("author", {}).get("did"),
        "likeCount": p.get("likeCount"),
        "repostCount": p.get("repostCount"),
        "has_alt_text": has_alt
    }
    ## Flatten the posts
#print(posts)
    rows.append(stats)

# Change the list into pandas DataFrame
posts_df = pd.DataFrame(rows)

# Display the first few rows to verify that the data loaded correctly
posts_df.head()


Status: 200
top-level keys: ['posts', 'cursor']


Unnamed: 0,post_uri,post_cid,author_did,likeCount,repostCount,has_alt_text
0,at://did:plc:thwspoidoksdkjdjclykbkzn/app.bsky...,bafyreifsmqoim5dymfbl6kdcimwk7etp57atbo7a247xb...,did:plc:thwspoidoksdkjdjclykbkzn,0,0,False
1,at://did:plc:5pzticukc5rgbmc7o3bso7il/app.bsky...,bafyreigcbegn5bciukerck5cjeeu6qr4fexcevkdrbuzc...,did:plc:5pzticukc5rgbmc7o3bso7il,0,0,True
2,at://did:plc:qixf4a5ossg2fkd3ztb3asyk/app.bsky...,bafyreie3u4smd47g5fsvyxryv66uxdj4utxqparferhkq...,did:plc:qixf4a5ossg2fkd3ztb3asyk,0,0,False
3,at://did:plc:ijk2uijbwllxr2z7q3yyzily/app.bsky...,bafyreic2fetoiwmtyi3q4qzcyqbuapkt5h2yau2m6v7tw...,did:plc:ijk2uijbwllxr2z7q3yyzily,0,0,False
4,at://did:plc:kjsl6b5t5klf5cbtnkyxq3ar/app.bsky...,bafyreibs4dtrb7slqvgghrtja6f7wwpostwoekafg6eo4...,did:plc:kjsl6b5t5klf5cbtnkyxq3ar,2,0,False


# Section 3 ‚Äî Data Collection (Endpoint 2)

In [None]:

## Let us get profile data for all the authors from the previous feed
# get unique author ids which is dids
unique_dids = posts_df["author_did"].dropna().unique().tolist()
print("Number of unique authors:", len(unique_dids))
#print(unique_dids)

# Get author profiles for these dids
all_profiles = []

# Loop through each DID and collect profile info
for d in unique_dids:
    #print(js.dumps(d, indent=2))
    params = [("actor", d)]
    #print(d)
    resp = requests.get(f"{BASE_URL}/app.bsky.actor.getProfile", params=params, timeout=30)

    if resp.status_code != 200:
        print("Skipping", d, "status:", resp.status_code)
        continue

    data = resp.json()
    #print(js.dumps(data, indent=2))
    
    # Append this profile in our list 
    # flatten tha data for profile
    all_profiles.append({
        "did": data.get("did"),
        "handle": data.get("handle"),
        "displayName": data.get("displayName"),
        "followersCount": data.get("followersCount"),
        "followsCount": data.get("followsCount"),
        "postsCount": data.get("postsCount"),
        "createdAt": data.get("createdAt"),
        "description": data.get("description"),
    })

# Convert list of dictionaries into a DataFrame
all_profiles_df = pd.DataFrame(all_profiles)

# Display first few rows to verify
all_profiles_df.head(5)




Number of unique authors: 50


Unnamed: 0,did,handle,displayName,followersCount,followsCount,postsCount,createdAt,description
0,did:plc:thwspoidoksdkjdjclykbkzn,disderp.bsky.social,DisDerp,272,65,1517,2024-11-16T19:17:32.786Z,‚ÄúLUCK BE IN THE AIR TONIGHT‚Äù üòÇüòÇüòÇ\n\nAll jokes ...
1,did:plc:5pzticukc5rgbmc7o3bso7il,mattsego.bsky.social,MSG,1364,944,11160,2023-06-22T13:36:45.516Z,"Who can never know, like, what will be told, a..."
2,did:plc:qixf4a5ossg2fkd3ztb3asyk,celinevale2.bsky.social,Celine Vale ‚ù§Ô∏è,233,378,133,2025-08-18T16:56:51.497Z,"Fantasy Illustrator üé® | DnD Art, Party Composi..."
3,did:plc:ijk2uijbwllxr2z7q3yyzily,ankemarsh.bsky.social,Dr Anke Marsh,4385,2861,2211,2023-11-07T07:39:19.516Z,"Palaeoecology, natural history, science, flora..."
4,did:plc:kjsl6b5t5klf5cbtnkyxq3ar,mel.bzky.team,meliss-AAAH!,15074,299,50077,2023-06-15T08:33:05.324Z,opinions ARE my employer's. retweets ARE endor...


## Section 4 - Build DataFrames

In [None]:
# Classic pandas stitch:
# merge joins rows from the two dataframes based on matching key values.
posts_enriched = posts_df.merge(
    # Adds "author_" to every column name in all_profiles_df
    # Why? To avoid name collisions (e.g., both dataframes could have handle, displayName) 
    # and to make the origin obvious: anything about the author now clearly starts with author_.
    all_profiles_df.add_prefix("author_"),
    # left_on="author_did": use posts_df["author_did"] as the join key on the left.
    left_on="author_did",
    # right_on="author_did": use the prefixed key from the right dataframe (formerly did).
    right_on="author_did",
    # how="left": a left join. Keep every row from posts_df (every post), 
    # even if there is no matching profile. If a profile is missing, 
    # the author columns become NaN. 
    # This is what you want for enrichment‚Äîdon‚Äôt drop posts just because the profile lookup failed.
    how="left"
)

posts_enriched = posts_enriched.dropna(subset=["likeCount"])
posts_enriched = posts_enriched.sort_values(by="likeCount", ascending=False)
posts_enriched.head(5)


Unnamed: 0,post_uri,post_cid,author_did,likeCount,repostCount,has_alt_text,author_handle,author_displayName,author_followersCount,author_followsCount,author_postsCount,author_createdAt,author_description
49,at://did:plc:hdtepbxpmzsgrwz4gyiwjlar/app.bsky...,bafyreiew6txuvkupoaldb47pqzss2okrshxdijkwjqzwd...,did:plc:hdtepbxpmzsgrwz4gyiwjlar,13,2,True,jampupper.bsky.social,~ Jammy ~,2462,344,3718,2023-08-26T12:25:48.043Z,He/They | 33 | üîû\nDancing coyutie pup excited ...
41,at://did:plc:uzlhbzc7w2s4np4lz3f7hazu/app.bsky...,bafyreid7f435rheet5grybqyt32v7ml6mu32ctqua4ub2...,did:plc:uzlhbzc7w2s4np4lz3f7hazu,12,0,True,amychu.bsky.social,‚òïÔ∏è AMY CHU #donutkiller,6067,987,739,2023-07-14T13:03:08.612Z,#Legohoarder #whiskeysipper \nI write comics g...
43,at://did:plc:end3y4t6sqefadytxixlgghe/app.bsky...,bafyreiet2p6wep56rtafagckpc2yango5hnm6fupuhcgj...,did:plc:end3y4t6sqefadytxixlgghe,7,0,True,angemmorton.bsky.social,Angela Marie Morton,3841,2093,2560,2023-09-21T04:15:49.325Z,‚óã art & illustration\n‚óã calm art for a busy wo...
47,at://did:plc:irec2mkigwl3zekio35covi2/app.bsky...,bafyreida32xj3hxnnhh3nbbz3sdm6p4mfyzjrf4poollx...,did:plc:irec2mkigwl3zekio35covi2,6,0,True,unclebeard1978.bsky.social,Dr. Uncle Feared ü¶ä üåà,3961,3172,23178,2023-09-03T13:17:52.134Z,"Doctor, author, bear in North Yorkshire. Docto..."
29,at://did:plc:xiubhwhnsfqz2rfbmhhdnvns/app.bsky...,bafyreibdjsywd4kb6j5ljgt7sopfnjg7tgzhzej65mazq...,did:plc:xiubhwhnsfqz2rfbmhhdnvns,4,1,False,minisanctuary.bsky.social,Mini Motley Sanctuary,15721,57994,735,2024-11-09T11:48:55.679Z,A mother daughter team running our special san...


# Section 5 ‚Äî Conclusion

The dataset's initial glance reveals two categories of image posts: those with alt text and those without. According to preliminary research, users who are woke and deeply concerned about community are more likely to support and view posts with alt text, and they also interact with these posts more frequently.  
  

Despite the small sample size, the results support the hypothesis that higher user engagement for accessibility postings is associated with those who demonstrate awareness of accessibility (including alt text). 

**Challenges faced:**  
- The restricted sample size is limited by the Luesky API rate.  
- A few posts lacked `embed` information
- It's challenging to compare in real time (current) since they don't update instantly. 

**Next steps:**  
 Using a t-test, expanding the dataset could further demonstrate and validate the observed and represented patterns. Future research could examine the alt text's quality to see if more thorough and in-depth descriptions result in greater engagement.

**AI Use: ChatGPT (free version)**
I pasted the outline from Github to compare and used AI to assist with formatting, which helped me correct a few small formatting errors. For the hypothesis, there were a few brainstorming exercises. Additionally, I had a few prompts written to ensure that my work and the template matched well, stayed true to the grading guidelines, and fixed a few minor code errors that I was unsure how to resolve on my own. 
