# Project on Mastodon
Sebastian Gottschalk, Kerstin Kirchgässner, Rusen Yasar

# Trends

API methods for trends are summarised here: https://docs.joinmastodon.org/methods/trends/

We can begin with the trending hashtags, get statuses mentioning those tags. We can also get trending statuses, and check the tags mentioned in these statuses.

In [1]:
import requests
import json

Focus on mastodon.social

In [2]:
base_url = "https://mastodon.social"

Import tokens

In [13]:
with open("../credentials/mastodon_social/app_token.txt") as text_file:
    app_token = text_file.read()

In [18]:
with open("../credentials/mastodon_social/user_token.txt") as text_file:
    user_token = text_file.read()

## Trending tags

Most frequently used tags in the past week. Max 20.

In [4]:
tags_dir = "/api/v1/trends/tags"
max_tags = "?limit=20"

In [5]:
resp_tags = requests.get(
    base_url + tags_dir + max_tags
)

In [183]:
js_tags = resp_tags.json()

### Getting info on these tags

API methods on tags summarised here: https://docs.joinmastodon.org/methods/tags/

In [14]:
tag_info_dir = "/api/v1/tags/"

Check one tag to see how much history it returns

In [184]:
name_tag1 = js_tags[0]["name"]

In [23]:
resp_tag1 = requests.get(
    base_url + tag_info_dir + name_tag1, 
    headers = {
        "Authorization" : f"Bearer {app_token}"
    }
)

In [24]:
resp_tag1

<Response [200]>

In [185]:
js_tag1 = resp_tag1.json()
len(js_tag1["history"])

7

This gets only 7 days, same as what we know from list of trending tags. 

### Search statuses mentioning this tag

API methods for search are summarised here: https://docs.joinmastodon.org/methods/search/

I cannot find a method to fetch statuses where the hashtag is in the tags attribute. Maybe this is the default behaviour, or maybe the statuses with matching tags will be returned first (I hope).

Attributes of tags are summarised here: https://docs.joinmastodon.org/entities/Status/



In [40]:
search_dir = "/api/v2/search"
search_params_tag1 = f"?q={name_tag1}&limit=40"

In [41]:
resp_search_status_tag1 = requests.get(
    base_url + search_dir + search_params_tag1, 
    headers = {
        "Authorization" : f"Bearer {app_token}"
    }
)

In [43]:
resp_search_status_tag1

<Response [200]>

In [44]:
js_search_status_tag1 = resp_search_status_tag1.json()
js_search_status_tag1

{'accounts': [],
 'statuses': [],
 'hashtags': [{'name': 'ifaidisclaimersweresongs',
   'url': 'https://mastodon.social/tags/ifaidisclaimersweresongs',
   'history': [{'day': '1717027200', 'accounts': '58', 'uses': '154'},
    {'day': '1716940800', 'accounts': '0', 'uses': '0'},
    {'day': '1716854400', 'accounts': '0', 'uses': '0'},
    {'day': '1716768000', 'accounts': '0', 'uses': '0'},
    {'day': '1716681600', 'accounts': '0', 'uses': '0'},
    {'day': '1716595200', 'accounts': '0', 'uses': '0'},
    {'day': '1716508800', 'accounts': '0', 'uses': '0'}]}]}

Apparently Mastodon does not allow searching in the entire databse. So, this approach doesn't work.

Starting with trending tags, the only information we get is their use history in the past seven days.

## Trending statuses

Statuses that have been interacted with more than other (timeframe not clear)

### One batch of statuses

In [45]:
trending_status_dir = "/api/v1/trends/statuses"
trending_status_max = "?limit=40"

In [46]:
resp_trending_status = requests.get(
    base_url + trending_status_dir + trending_status_max
)

In [47]:
resp_trending_status

<Response [200]>

In [63]:
js_trending_status = resp_trending_status.json()

In [70]:
print(js_trending_status[0]["id"])
print(js_trending_status[0]["created_at"])

112529933367515061
2024-05-30T12:01:53.000Z


In [103]:
print(js_trending_status[-1]["id"])
print(js_trending_status[-1]["created_at"])

112529352665489689
2024-05-30T09:34:15.663Z


It looks like some statuses are categorised as trending, the request returns the most recent ones. 

We can try to get all trending statuses up to a certain point in the past. Let's say Friday last week by the earliest.

### Getting multiple batches of statuses

In [82]:
from datetime import datetime

statuses_after = datetime(2024, 5, 24, 0, 0, 0, 1)

In [94]:
batches = {}
batch = 1
offset = 0
time_reached = datetime.now()

while time_reached > statuses_after:
     
    param_offset = f"&offset={offset}"

    resp = requests.get(
        base_url + trending_status_dir + trending_status_max + param_offset
    )

    if resp.status_code == 200:

        js = resp.json()

        if len(js) == 40:

            batches[batch] = js
            
            batch += 1
            offset += 40
            time_reached = datetime.strptime(js[-1]["created_at"], "%Y-%m-%dT%H:%M:%S.%fZ")
        
        else: 
            print("Response returned fewer than 40 statuses")
            break

    else:
        print("Response status not 200")
        break



Response returned less than 40 statuses


In [100]:
print(len(batches))
print(batches[1][0]["created_at"])
print(batches[30][-1]["created_at"])

30
2024-05-30T12:01:53.000Z
2024-05-30T08:45:29.000Z


It looks like there is a limit on how many statuses we can get. This could be only the current day, last 8 (or similar) hours. Or the number of statuses categorised as trending can be fixed (trying a few times, I get about 1200 statuses at each attempt).

It would be a good idea to save this set of statuses, run the loop again at several times towards the end of the day, and combine the results to get a broader view of the day.

In [102]:
with open("../data/raw/trending_statuses_1530.txt", "w") as text_file:
    json.dump(batches, text_file)

## Accounts posting trending statuses

(I realised later that there is no need to get account data separately. All the account info is already embedded in the statuses.)

API methods for accounts: https://docs.joinmastodon.org/methods/accounts/

There is a method for multiple accounts, I try it first:

In [104]:
accounts_dir = "/api/v1/accounts"

Get the account ids from the saved statuses.

In [122]:
accounts_from_statuses = []

for i in range(1, len(batches)+1):
    for j in range(len(batches[i])):
        accounts_from_statuses.append(batches[i][j]["account"]["id"])

accounts_from_statuses_unique = list(set(accounts_from_statuses))

In [126]:
len(accounts_from_statuses_unique)

1200

Interesting, as many accounts as statuses.

Try to get these accounts.

In [127]:
accounts_param = f"?id={accounts_from_statuses_unique}"

In [129]:
resp_accounts_trending = requests.get(
    base_url + accounts_dir + accounts_param,
    headers = {
        "Authorization" : f"Bearer {app_token}"
    }
)

In [131]:
resp_accounts_trending.text

'Error: URI Too Long'

Can't get all accounts at once. The default limit should be 80, try this:

In [136]:
accounts_param = f"?id={accounts_from_statuses_unique[:80]}"

In [137]:
resp_accounts_trending = requests.get(
    base_url + accounts_dir + accounts_param,
    headers = {
        "Authorization" : f"Bearer {app_token}"
    }
)

In [141]:
resp_accounts_trending.content

b'[]'

Still not getting anything. Maybe we should simply run a loop to get accounts one by one. There will be a rate limiting at 300. We have to do it four times 5 minutes apart.

In [142]:
accounts_dict = {}

for account_id in accounts_from_statuses_unique:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

Status code not 200


In [151]:
rem_accounts1 = accounts_from_statuses_unique[300:600]
rem_accounts2 = accounts_from_statuses_unique[600:900]
rem_accounts3 = accounts_from_statuses_unique[900:]

... 5 minutes later

In [152]:
for account_id in rem_accounts1:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

... 5 minutes later

In [154]:
for account_id in rem_accounts2:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

... 5 minutes later

In [156]:
for account_id in rem_accounts3:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

In [157]:
len(accounts_dict)

1200

We retrieved 1200 accounts. Save them for later use.

In [158]:
with open("../data/raw/trending_accounts_1530.txt", "w") as text_file:
    json.dump(accounts_dict, text_file)

## Second round of data collection

Apply the best methods from above.

### Fetch statuses in batches

In [164]:
batches_2 = {}
batch = 1
offset = 0
time_reached = datetime.now()

while time_reached > statuses_after:
     
    param_offset = f"&offset={offset}"

    resp = requests.get(
        base_url + trending_status_dir + trending_status_max + param_offset
    )

    if resp.status_code == 200:

        js = resp.json()

        if len(js) == 40:

            batches_2[batch] = js
            
            batch += 1
            offset += 40
            time_reached = datetime.strptime(js[-1]["created_at"], "%Y-%m-%dT%H:%M:%S.%fZ")
        
        else: 
            batches_2[batch] = js
            print(f"Response returned fewer than 40 statuses at batch {batch}")
            break

    else:
        print("Response status not 200")
        break

Response returned fewer than 40 statuses at batch 36


In [165]:
with open("../data/raw/trending_statuses_1930.txt", "w") as text_file:
    json.dump(batches_2, text_file)

### Fetch accounts in batches

Account ids from the second set of statuses

In [167]:
accounts_grp2 = []

for i in range(1, len(batches_2)+1):
    for j in range(len(batches_2[i])):
        accounts_grp2.append(batches_2[i][j]["account"]["id"])

accounts_grp2_unique = list(set(accounts_grp2))

In [168]:
len(accounts_grp2_unique)

1426

Divide the account ids into five lists

In [169]:
accounts_grp21 = accounts_grp2[:300]
accounts_grp22 = accounts_grp2[300:600]
accounts_grp23 = accounts_grp2[600:900]
accounts_grp24 = accounts_grp2[900:1200]
accounts_grp25 = accounts_grp2[1200:]

Get the first group of data

In [170]:
accounts_dict_2 = {}

for account_id in accounts_grp21:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_2[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

... 5 minutes later

In [172]:
for account_id in accounts_grp22:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_2[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

... 5 minutes later

In [176]:
for account_id in accounts_grp23:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_2[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

... 5 minutes later

In [178]:
for account_id in accounts_grp24:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_2[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

... 5 minutes later

In [180]:
for account_id in accounts_grp25:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_2[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

In [181]:
len(accounts_dict_2)

1426

Save the results

In [182]:
with open("../data/raw/trending_accounts_1930.txt", "w") as text_file:
    json.dump(accounts_dict_2, text_file)

## Third round of data collection

### Fetch statuses

In [186]:
batches_3 = {}
batch = 1
offset = 0
time_reached = datetime.now()

while time_reached > statuses_after:
     
    param_offset = f"&offset={offset}"

    resp = requests.get(
        base_url + trending_status_dir + trending_status_max + param_offset
    )

    if resp.status_code == 200:

        js = resp.json()

        if len(js) == 40:

            batches_3[batch] = js
            
            batch += 1
            offset += 40
            time_reached = datetime.strptime(js[-1]["created_at"], "%Y-%m-%dT%H:%M:%S.%fZ")
        
        else: 
            batches_3[batch] = js
            print(f"Response returned fewer than 40 statuses at batch {batch}")
            break

    else:
        print("Response status not 200")
        break

Response returned fewer than 40 statuses at batch 45


In [187]:
with open("../data/raw/trending_statuses_2355.txt", "w") as text_file:
    json.dump(batches_3, text_file)

### Fetch accounts

In [None]:
accounts_grp3 = []

for i in range(1, len(batches_3)+1):
    for j in range(len(batches_3[i])):
        accounts_grp3.append(batches_3[i][j]["account"]["id"])

accounts_grp3_unique = list(set(accounts_grp3))

In [None]:
len(accounts_grp3_unique)

In [None]:
accounts_grp31 = accounts_grp3_unique[:300]
accounts_grp32 = accounts_grp3_unique[300:600]
accounts_grp33 = accounts_grp3_unique[600:900]
accounts_grp34 = accounts_grp3_unique[900:1200]
accounts_grp35 = accounts_grp3_unique[1200:]

In [None]:
accounts_dict_3 = {}

for account_id in accounts_grp31:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_3[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

In [None]:
accounts_dict_3 = {}

for account_id in accounts_grp32:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_3[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

In [None]:
accounts_dict_3 = {}

for account_id in accounts_grp33:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_3[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

In [None]:
accounts_dict_3 = {}

for account_id in accounts_grp34:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_3[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

In [None]:
accounts_dict_3 = {}

for account_id in accounts_grp35:
    resp = requests.get(
        base_url + accounts_dir + f"/{account_id}", 
        headers = {
        "Authorization" : f"Bearer {app_token}"
        }
    )
    if resp.status_code == 200: 
        accounts_dict_3[account_id] = resp.json()
    else:
        print("Status code not 200")
        break

## Tags again
Get a more up-to-date version of tags, and save them for later use.

In [188]:
resp_tags_new = requests.get(
    base_url + tags_dir + max_tags
)

In [189]:
resp_tags_new

<Response [200]>

In [190]:
js_tags_new = resp_tags_new.json()

In [192]:
with open("../data/raw/trending_tags_2355.txt", "w") as text_file:
    json.dump(js_tags_new, text_file)

## Afterthoughts
After later inspection of status data, I realised that the account info embedded there is the same as the account info that I got separately. So, it was enough to get the statuses.