
# **Report Title:** BlueSky API Data Report: Relationship Between Hashtag Usage and Follower Count 
## **Your Name:** Gabriella Dugan
## **Date:** 2025-10-12
##### *AI was used for certain parts to help rewrite and clean up incorrect code*


# Hypothesis

Bluesky users who frequently include hashtags in their posts will have a higher average number of followers than users who do not.

# Theoretical Rationale

Hashtags help make posts a lot easier to find on social media. Posts with hashtags are more likely to show up in searches and everyday feed, which gives users more visibility.
According to a search logistics article, Users who regularly include hashtags in their posts tend to attract more followers, as ‚Äúposts that include at least one hashtag get an average of 29 % more interactions‚Äù than those without.
    "https://www.searchlogistics.com/learn/statistics/hashtags-statistics/"
# Statistical Application

To test this hypothesis, I will compare the mean followersCount between
Users who post **with** hashtags & users who post **without** hashtags.


# Planned Endpoints:

*app.bsky.feed.searchPosts* - collects posts containing hashtags 

- key parameters: 
    q: search query (hashtag keyword)
    limit: number of posts to return per request
- reponse fields:
    uri, cid, record.text, author.did, author.handle, author.displayName
- mapping:
     Used to identify posts and determine if they contain hashtags

*app.bsky.actor.getProfiles* - collects author follower and post counts

- key parameters: 
    actor: the author‚Äôs decentralized ID (DID)
- reponse fields:
    did, handle, displayName, followersCount, followsCount, postsCount, createdAt
- mapping:
     Used to identify posts and determine if they contain hashtags


# Reliability and Bias

Reliability: 
- Data is sourced directly from the Bluesky API(s) and shows public posts from users.
  
Potential Unreliability:
- posts could be deleted or hided for privacy after data collection
- some users could have restricted follow data or access

Ethical Considerations: 
- There is only public, non-sensitive data, following Bluesky‚Äôs guidelines.

# Limitations
- Some posts may be from bots or automated accounts.
- Follower counts may lag behind real-time numbers.

In [9]:
import requests
import pandas as pd
import json as js
import time

BASE_URL = "https://api.bsky.app/xrpc"
headers = {"User-Agent": "EMAT-Teaching/1.0 (+contact@example.com)"}

In [10]:
# define hashtags
HASHTAGS = ["#art", "#music", "#tech"]
MAX_POSTS = 30  

posts_data = []

for tag in HASHTAGS:
    endpoint = f"{BASE_URL}/app.bsky.feed.searchPosts"
    params = {"q": tag, "limit": 10}
    resp = requests.get(endpoint, params=params, headers=headers, timeout=30)
    print(f"Status for {tag}:", resp.status_code)
    data = resp.json()

    posts = data.get("posts", [])
    for p in posts:
        text = p.get("record", {}).get("text", "")
        author = p.get("author", {})
        posts_data.append({
            "post_uri": p.get("uri"),
            "post_cid": p.get("cid"),
            "text": text,
            "has_hashtag": "#" in text,
            "author_did": author.get("did"),
            "author_handle": author.get("handle"),
            "author_displayName": author.get("displayName"),
        })
    time.sleep(1)

posts_df = pd.DataFrame(posts_data)
posts_df.head(5)

Status for #art: 200
Status for #music: 200
Status for #tech: 200


Unnamed: 0,post_uri,post_cid,text,has_hashtag,author_did,author_handle,author_displayName
0,at://did:plc:4wwburbbf566yygcje7464s2/app.bsky...,bafyreiga3yq37m7sgwmwfeqzfya6dpvekykiibl2ddhdi...,Commission for: @samury242.bsky.social - Weapo...,True,did:plc:4wwburbbf566yygcje7464s2,cuppajo.bsky.social,CuppaJo | Commissions are: OPEN!
1,at://did:plc:owhsxhr7bnmeilyxhqbwlxbf/app.bsky...,bafyreicikilvosbzrvpzjvpys7trh5mfms2kvcjwzbuiu...,"Catober Day 14: ""Cat Zoot Suit"" #catober #cato...",True,did:plc:owhsxhr7bnmeilyxhqbwlxbf,wynflaeth.bsky.social,Ash
2,at://did:plc:xwcrgdvma5k3xkapofrep7qy/app.bsky...,bafyreib3ndumj4ib7oibgthmnn26cklpaij2r65y7razf...,"Abandonned electric coal powerplant, #strasbou...",True,did:plc:xwcrgdvma5k3xkapofrep7qy,millerebonds.framapiaf.org.ap.brid.gy,millerebonds
3,at://did:plc:m3fyiwf3jqooiyvpc7oniemi/app.bsky...,bafyreih5bmhsulw4zj33kxv4yz5t4tnyfb5pdlljhtbep...,When gibbons are not brachiating through the c...,True,did:plc:m3fyiwf3jqooiyvpc7oniemi,d2therock.bsky.social,Derek S. Pumpkinsüëæü¶í‚öîÔ∏èüëΩ
4,at://did:plc:7taualtjd5ivllzdhrtavasj/app.bsky...,bafyreia6leh6coayq7xfezuhh5wod5rbxo4h4iotkra6p...,More comic pages. Excruciating! I draw slow‚Ä¶ b...,True,did:plc:7taualtjd5ivllzdhrtavasj,rokumtg.bsky.social,Rokula


In [3]:
# get DIDs from DataFrame
unique_dids = posts_df["author_did"].dropna().unique().tolist()
print("Number of unique authors:", len(unique_dids))

profiles = []
for d in unique_dids:
    r = requests.get(f"{BASE_URL}/app.bsky.actor.getProfile", params={"actor": d}, headers=headers, timeout=30)
    if r.status_code == 200:
        data = r.json()
        profiles.append({
            "did": data.get("did"),
            "handle": data.get("handle"),
            "displayName": data.get("displayName"),
            "followersCount": data.get("followersCount"),
            "followsCount": data.get("followsCount"),
            "postsCount": data.get("postsCount"),
            "createdAt": data.get("createdAt"),
        })
    time.sleep(1)
# profile data into DataFrame
profiles_df = pd.DataFrame(profiles)
profiles_df.head(5)

Number of unique authors: 26


Unnamed: 0,did,handle,displayName,followersCount,followsCount,postsCount,createdAt
0,did:plc:zuxjazkzbshriqsdtgsirypp,oyakodonkk.bsky.social,Oyakodon,2,20,7,2025-03-21T08:21:50.942Z
1,did:plc:262bbuobbd5jdzprl24u5r7d,skatuya.bsky.social,skeletuya üíÄüéÉ COMMISSIONS OPEN,3035,420,1140,2023-07-19T10:56:47.895Z
2,did:plc:sawtpylgqb3wwgst5zcswc23,alcamoth.bsky.social,pierre,524,153,2028,2023-12-10T23:51:48.329Z
3,did:plc:ulsdjrbuee46yrgb37vfrz23,sculptedreef.com,SAHASA üêô,5080,1814,3302,2024-11-18T23:13:34.105Z
4,did:plc:gbul6p3uerxjstktgke4s6tr,wkdesignstudios.bsky.social,Kevin,24,3,25,2025-05-09T03:31:38.840Z


In [11]:
# merge joins rows
posts_enriched = posts_df.merge(
    profiles_df.add_prefix("profile_"),
    left_on="author_did",
    right_on="profile_did",
    how="left"
)


posts_enriched.head(5)

Unnamed: 0,post_uri,post_cid,text,has_hashtag,author_did,author_handle,author_displayName,profile_did,profile_handle,profile_displayName,profile_followersCount,profile_followsCount,profile_postsCount,profile_createdAt
0,at://did:plc:4wwburbbf566yygcje7464s2/app.bsky...,bafyreiga3yq37m7sgwmwfeqzfya6dpvekykiibl2ddhdi...,Commission for: @samury242.bsky.social - Weapo...,True,did:plc:4wwburbbf566yygcje7464s2,cuppajo.bsky.social,CuppaJo | Commissions are: OPEN!,,,,,,,
1,at://did:plc:owhsxhr7bnmeilyxhqbwlxbf/app.bsky...,bafyreicikilvosbzrvpzjvpys7trh5mfms2kvcjwzbuiu...,"Catober Day 14: ""Cat Zoot Suit"" #catober #cato...",True,did:plc:owhsxhr7bnmeilyxhqbwlxbf,wynflaeth.bsky.social,Ash,,,,,,,
2,at://did:plc:xwcrgdvma5k3xkapofrep7qy/app.bsky...,bafyreib3ndumj4ib7oibgthmnn26cklpaij2r65y7razf...,"Abandonned electric coal powerplant, #strasbou...",True,did:plc:xwcrgdvma5k3xkapofrep7qy,millerebonds.framapiaf.org.ap.brid.gy,millerebonds,,,,,,,
3,at://did:plc:m3fyiwf3jqooiyvpc7oniemi/app.bsky...,bafyreih5bmhsulw4zj33kxv4yz5t4tnyfb5pdlljhtbep...,When gibbons are not brachiating through the c...,True,did:plc:m3fyiwf3jqooiyvpc7oniemi,d2therock.bsky.social,Derek S. Pumpkinsüëæü¶í‚öîÔ∏èüëΩ,,,,,,,
4,at://did:plc:7taualtjd5ivllzdhrtavasj/app.bsky...,bafyreia6leh6coayq7xfezuhh5wod5rbxo4h4iotkra6p...,More comic pages. Excruciating! I draw slow‚Ä¶ b...,True,did:plc:7taualtjd5ivllzdhrtavasj,rokumtg.bsky.social,Rokula,,,,,,,


In [6]:
#remove duplicate posts
posts_enriched = posts_enriched.drop_duplicates(subset=["post_uri"])
posts_enriched["followersCount"] = posts_enriched["profile_followersCount"].fillna(0)

# average number 
summary = posts_enriched.groupby("has_hashtag")["followersCount"].mean().reset_index()
summary.columns = ["Has Hashtag", "Avg Followers"]
summary

Unnamed: 0,Has Hashtag,Avg Followers
0,False,1710.0
1,True,1199.482759


# Conclusion

### Observed Patterns:
- Early results suggest that users who use hashtags often have more followers than those who don‚Äôt, supporting the idea that hashtags can increase visibility and follower count.

### Challenges:
- API rate limits and missing some profile information
- Small sample size

### Next Step:
- Larger sample size over longer period of time
- Try different tests
- Look at more engagement & data, not just followers