## Assignment: Engagement changes on Twitter (X)

Most online platforms do not provide access to their underlying algorithms and rarely share clear details about algorithmic or policy updates. Yet, such changes can significantly shape the broader online ecosystem. To study their effects, researchers often rely on observable metrics as proxies to infer what has changed. One common approach is to examine variations in engagement across multiple accounts to detect patterns or signals that may reveal underlying algorithmic or systemic shifts.

In this assignment, you will analyse the tweets of 37 political accounts from the United States, posted between 2023 and 2024. The goal is to detect if there are any algorithmic changes during this time. We do this by detecting shifts in engagement metrics of these accounts. Furthermore, we will analyse if the changes in engagement are different for Democrats and Republicans, i.e, has any group benefitted due to the changes in the algorithm.


Data:

* You are provided with the following json files.
    * politcians_twitter_data: This file contains a nested dictionary for each account ID with tweet-level metrics such as tweet id, tweet creation time, and engagement metrics.
    * politcians_parties_map: This file contains which party (Democrat or Republican) the political account is affiliated with.
    * politcians_name_twitterid_map: This file contains the mapping between the twitter account ID and the name of the political figure.

Tasks:

* For the purpose of this assignment, we will only analyse 'favorite count' (the number of likes each tweet gets).
* To identify significant shifts in engagement over time, we will apply a Cumulative Sum (CUSUM) analysis to detect deviations from the average value. If there are common timestamps where we observe shifts across many accounts, this could indicate changes in engagement due to algorithmic modifications.
    * To do this, use the ruptures library and apply the Binary Segmentation (Binseg) algorithm on all accounts to identify two change points (timestamps where deviation is detected).
        * Note: You should truncate the timestamps to month–year format before comparing or grouping. For example, “11 May 2024, 10:35:00” simply becomes May 2024.
    * Now pick the top two change points-- When identifying them, measure deviation by the number of accounts that exhibit a change at the same month–year. In other words, select the two month–year points where the largest number of accounts show a detected shift.
* Now for each change point, using accounts which saw deviation at this point:
    * Get the before and after values of the favorite counts using the change point.
        * Use 01-month-date as the cutoff date. Ex: If the change point is May 2024, then all tweets before 1 May 2024 are “before,” and all tweets from 1 May 2024 onward are “after.”
    * Analyse these before and after values to see if the before values are significantly less than the after values. 
    * Hint: Use non-parametric statistical test like Mann-Whitney.
* Once you obtain the p-values, split the users into Republicans and Democrats to investigate whether either group benefitted due to the changes.

Based on your analysis above, can you reach the following conclusions? Why or why not? What other data would you need to reach that conclusion? Mention any assumptions you make. (Answer these in a markdown field in your notebook).

* There were some changes to the algorithm which resulted in the differences in engagement metrics (atleast for one change point).
* The observed changes are solely due to algorithmic modifications.
* The differences (if any) between Republicans and Democrats show that the particular group benefitted (or were impacted negatively) due to the algorithmic changes.

Additionally,

* Do you think the methodology used was correct? Why or why not?
* How would you change the methodology?

### Submission

Complete all tasks and written answers directly in this notebook. Make sure that:
* Your code cells are runnable and your markdown cells are clear.
* All outputs are included in the notebook.
* When finished, send the .ipynb file back.