# Section 1 — Report Header & Hypothesis

**Report Title:** _Replace with your title_  
**Your Name:** _Replace with your name_  
**Date:** _2025-10-07_

### Hypothesis
Write one testable hypothesis that can be evaluated using data available via the Bluesky API.  
_Example:_ “Accounts that post more frequently receive a higher average number of likes per post.”

### Theoretical Rationale
Explain the theory or reasoning behind your hypothesis. Cite any relevant concepts or readings.

### Statistical Application
Explain how your hypothesis could be tested statistically (e.g., group comparison, correlation). <br>
What variables (columns) will you be using.


> Tip: You do not need to fully execute the analysis now, but you should articulate how you would test it.


# Section 2 — Endpoint Plan (Design Your Data Collection)

Identify the **Bluesky API endpoints** you will use and why they are suitable for testing your hypothesis.  
Link: https://docs.bsky.app/docs/category/http-reference

**Planned endpoints (examples; replace with your own):**
- `app.bsky.feed.searchPosts` — to collect posts matching a topic, hashtag, or keyword set.
- `app.bsky.actor.getProfiles` — to enrich authors with profile metadata (e.g., displayName, followersCount).
- `app.bsky.feed.getAuthorFeed` — to get posts authored by a specific actor (for longitudinal behavior).

For **each endpoint**, specify:
1. The key **request parameters** you will use. e.g. search query `q` for `app.bsky.feed.searchPosts`. User profile `did` for `app.bsky.actor.getProfiles`
2. The **response objects/fields** you will extract. e.g. `posts` response in case of `app.bsky.feed.searchPosts`
3. Why these fields map to the variables in your hypothesis.

## Reliabilit and Bias 
Discuss how the data might be **reliable** and **unreliable**. Consider:
- Missingness or unavailable fields; rate limits; unauthenticated vs authenticated access.
- Bot/spam accounts, deleted posts, or moderation effects.
- Ethical considerations and terms of service (collect only what you need; avoid sensitive data).

## Limitations
List any **caveats** in the response objects (e.g., fields not guaranteed, delayed counts, missing information) that could affect your analysis.

# Section 3 Data Collection
Collect posts that match a query. Adjust `QUERY`, `MAX_POSTS`, and any filters your hypothesis requires.


## Data Collection (Endpoint 1): 
e.g. `app.bsky.feed.searchPosts`
Flatten key fields from Bluesky PostView objects.

## Data Collection (Endpoint 2): 

e.g. `app.bsky.actor.getProfiles`
- Enrich the post data with profile attributes (followers count, display name, etc.).  
- We gather unique author identifiers (`did`) from the posts and request them in batches.
- NOTE: Will this be a for loop?


# Section 4 — Build DataFrames

Use a pandas method to combine your DataFrames. Use your own endpoints and dataframes. Adjust based on your plan:
- **merge** on a key (`author_did`), or
- **concat** to stack rows from multiple endpoints, or
- **join** to add columns using an index.
- **Wrangling** (select, clean, sort)

  


# Section 5 — Conclusion

Describe any patterns you observe in the collected data and how they relate to your hypothesis. <br>
Describe challenges you faced.
