# BristolUni Followers
Let's now analyse some recently collected Twitter data relating to the followers of The University of Bristol's Twitter account (@BristolUni).

The data is in JSON format so we will need to do some pre-processing. Run the following code to load the data and print the first 3 followers in the list.

In [None]:
import json

with open('./BristolUni_followers.json', 'r') as f:
    followers = json.load(f)
    
print(json.dumps(followers[:3], indent=4))

In in the list of followers we can see the dictionary representing each user includes the users name, username, user id, when the account was created, whether they are a "verified" Twitter users, their description / bio, and their public metrics (including how many users follow them, how many users they follow, how many times they have tweeted, and how many [Twitter lists](https://help.twitter.com/en/using-twitter/twitter-lists) they are included in).

## Process into a DataFrame
To make the it easier to manipulate the data, let's format the JSON data into a DataFrame.

*Note*: We use the `json_normalize` method to "flatten" the dictionary representing each user so that the `public_metrics` variables are individual columns in the DataFrame.

In [None]:
import pandas as pd
df = pd.json_normalize(followers, record_prefix='')
df.columns = df.columns.str.removeprefix("public_metrics.")
df.head()

## Exercise 01: Analyse follower locations
Count the number of followers for each value in the `location` column.

**Question:** What is the most common location of followers of `@BristolUni`?

**Question:** By performing a similar analysis, are you able to say what % of the users are verified?

In [None]:
# (SOLUTION)

## Exercise 02: Analyse follower popularity
Sort the values in the DataFrame by `followers_count` (with `ascending=False`) and print the `head(10)`.

**Question:** Do you notice any commonalities about `@BristolUni`s 10 most popular followers?

In [None]:
# (SOLUTION)

## Exercise 03: Investigate relationships between in public metrics
Run the code below to create a (log-scale) scatter plot of `followers_count` versus `following_count` for each of the users in the dataset.

**Question:** Can you explain anything unusual about the shape of the point cloud? (*Hint:* Solution is hidden [here](https://help.twitter.com/en/using-twitter/twitter-follow-limit#:~:text=Every%20Twitter%20account%20can%20follow,ratio%20of%20followers%20to%20following.)).

Create scatter plots for `followers_count` versus the other variables in `public_metrics`.

**Question:** What does the analysis tell you about popular users of Twitter?

In [None]:
# (SOLUTION)
import matplotlib.pyplot as plt

xlabel = 'followers_count'
ylabel = 'following_count'

fig = plt.figure()
ax = plt.gca()
ax.scatter(df[xlabel] , df[ylabel] , c='blue', alpha=0.1, edgecolors='none')
ax.set_yscale('log')
ax.set_xscale('log')
ax.set_xlabel(xlabel);
ax.set_ylabel(ylabel);

## Exercise 04: Calculate follower / follow ratio
For each follower of BristolUni, calculate their follower / follow ratio and identify the 5 users with highest ratio.

**Question:** Are they different from the followers with most followers?

In [None]:
# (SOLUTION)

## Exerise 06: (Advanced) Choose a location for a promotion event
The University of Bristol is planning to host a promotion event in a particular location and wants to invite 10 local "influencers" from its list of followers to come to the event, in the hope of maximising Twitter activity around the event.

Use the follower list to identify the best location to host the event. Justify your proposed location.

In [None]:
# (SOLUTION)

## Exercise 07: (Advanced) Categorise followers
Using text-classification or topic modelling, analyse the descriptions of users in the follower list and characterise `@BristolUni`'s followers so the University can decide who they should target for particular types of events.

*Note:* You may need to install packages (using `!pip install` beyond the ones provided in this environment).

In [None]:
# (SOLUTION)