# Downloading Mastodon toots using Mastodon API

## GGR473 Demonstration: L. Smith, September 2023
In this code, an API is used to access data from Mastodon; a decentralized social media platform.

In [1]:
# Import Python libraries
import json
import requests
import pandas as pd

In [2]:
# Set up the hashtag and API URL
hashtag = 'GIS'
URL = f'https://mastodon.social/api/v1/timelines/tag/{hashtag}'

# Set the limit for the number of toots to retrieve
params = {
    'limit': 100
}

The above example is set to search for toots from the Mastodon.social server that contain '#GIS'  
Alternative actions include:  
- Searching for a specific username: 'https://mastodon.instance/api/v1/accounts/username/statuses'
- Searching a public timeline: 'https://mastodon.instance/api/v1/timelines/public'

Play around with searching for different hashtags within different Mastodon instances (timelines). You can find instances dedicated to different topics from the Mastodon server page - https://joinmastodon.org/servers

In [3]:
# Depending on the search, you may like to set a time limit
# Here, we calculate the timestamps for the past 7 days
since = pd.Timestamp('now', tz='utc') - pd.DateOffset(days=7)

Next we will create a while loop to repeatedly fetch data from the API until a condition is met

In [4]:
# Initialize flag to check if the end of timeline is reached
is_end = False

# Create empty list to store results
results = []

while True:
    # Send GET request to Mastadon API
    r = requests.get(URL, params=params)
    
    # Take the response (JSON string format) and convert into a Python data structure 
    toots = json.loads(r.text)

    # If there are no toots, exit loop
    if len(toots) == 0:
        break
    
    # Iterate through each toot in response
    for t in toots:
        
        # If outside time limit, exit loop
        timestamp = pd.Timestamp(t['created_at'], tz='utc')
        if timestamp <= since:
            is_end = True
            break
        
        # Check if the toot has location data
        if 'geo' in t and t['geo'] is not None:
            location = t['geo']['coordinates']
        else:
            location = None
        
        # Extract desired fields
        toot_data = {
            'id': t['id'],
            'created_at': t['created_at'],
            'content': t['content'],
            'user': t['account']['username'],
            'location': location
        }
        
        # Append toot data to results list
        results.append(toot_data)
    
    # If end of timeline, exit loop
    if is_end:
        break
    
    # Update max ID for next iteration
    max_id = toots[-1]['id']
    params['max_id'] = max_id

# Save the list of results as a pandas dataframe    
df = pd.DataFrame(results)

In [5]:
# View the dataframe
df

Unnamed: 0,id,created_at,in_reply_to_id,in_reply_to_account_id,sensitive,spoiler_text,visibility,language,uri,url,...,content,reblog,account,media_attachments,mentions,tags,emojis,card,poll,application
0,111064277090174226,2023-09-14T15:46:16.000Z,,,False,,public,en,https://mapstodon.space/users/jarrettinho/stat...,https://mapstodon.space/@jarrettinho/111064277...,...,<p>I want to run an unsupervised classificatio...,,"{'id': '110710914465295246', 'username': 'jarr...",[],[],"[{'name': 'raster', 'url': 'https://mastodon.s...",[],,,
1,111064070461940702,2023-09-14T14:53:42.000Z,,,False,,public,en,https://social.tchncs.de/users/yngmar/statuses...,https://social.tchncs.de/@yngmar/1110640703609...,...,<p>Mapping infrastructure by postal service. B...,,"{'id': '219845', 'username': 'yngmar', 'acct':...","[{'id': '111064070384740556', 'type': 'image',...",[],"[{'name': 'lithuania', 'url': 'https://mastodo...",[],,,
2,111063811206446959,2023-09-14T13:47:42.000Z,,,False,,public,en,https://m.ai6yr.org/users/mappingsupport/statu...,https://m.ai6yr.org/@mappingsupport/1110638108...,...,"<p>Interactive <a href=""https://m.ai6yr.org/ta...",,"{'id': '110433002329275149', 'username': 'mapp...","[{'id': '111063811155133403', 'type': 'image',...",[],"[{'name': 'gis', 'url': 'https://mastodon.soci...",[],{'url': 'https://mappingsupport.com/p2/gissurf...,,
3,111063183568875803,2023-09-14T11:08:08.000Z,,,False,,public,en,https://mapstodon.space/users/MattMalone/statu...,https://mapstodon.space/@MattMalone/1110631834...,...,"<p>😐 Yeah, I'm gonna need the other one. <a hr...",,"{'id': '109367090079792364', 'username': 'Matt...","[{'id': '111063183529605421', 'type': 'image',...",[],"[{'name': 'mappymeme', 'url': 'https://mastodo...",[],,,
4,111062445693951796,2023-09-14T08:00:30.000Z,,,False,,public,en,https://mapstodon.space/users/WorldPopProject/...,https://mapstodon.space/@WorldPopProject/11106...,...,<p>💥 WorldPop job alert 💥</p><p>Senior Researc...,,"{'id': '109320708209377099', 'username': 'Worl...",[],[],"[{'name': 'ai', 'url': 'https://mastodon.socia...",[],{'url': 'https://jobs.soton.ac.uk/Vacancy.aspx...,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
60,111029040869524223,2023-09-08T10:25:13.000Z,,,False,,public,sv,https://fosstodon.org/users/gishubio/statuses/...,https://fosstodon.org/@gishubio/11102904078736...,...,<p>Counting points in polygons with a QGIS pro...,,"{'id': '110966170058067848', 'username': 'gish...","[{'id': '111029040806689816', 'type': 'image',...",[],"[{'name': 'gis', 'url': 'https://mastodon.soci...",[],{'url': 'https://www.gishub.io/2023/09/08/coun...,,
61,111028498498121711,2023-09-08T08:07:18.528Z,,,False,,public,de,https://mastodon.social/users/geoObserver/stat...,https://mastodon.social/@geoObserver/111028498...,...,<p>Kurios: Die „Bad Map Projection: ABS(Longit...,,"{'id': '108198924016272145', 'username': 'geoO...","[{'id': '111028498335252315', 'type': 'image',...","[{'id': '172987', 'username': 'xkcd', 'url': '...","[{'name': 'qgis', 'url': 'https://mastodon.soc...",[],{'url': 'https://geoobserver.wordpress.com/202...,,"{'name': 'Web', 'website': None}"
62,111028457565067085,2023-09-08T07:56:53.000Z,,,False,,public,en,https://mapstodon.space/users/Dragons8mycat/st...,https://mapstodon.space/@Dragons8mycat/1110284...,...,"<p>It's follow Friday <a href=""https://mapstod...",,"{'id': '109320293254078532', 'username': 'Drag...",[],"[{'id': '34743', 'username': 'ThomasG77', 'url...","[{'name': 'ff', 'url': 'https://mastodon.socia...",[],,,
63,111028034553918772,2023-09-08T06:09:15.000Z,,,False,,public,de,https://freiburg.social/users/panda/statuses/1...,https://freiburg.social/@panda/111028034326631845,...,"<p>ProjektleiterIn <a href=""https://freiburg.s...",,"{'id': '108199190360858427', 'username': 'pand...",[],[],"[{'name': 'gis', 'url': 'https://mastodon.soci...",[],{'url': 'https://badenova-gruppe.talentry.com/...,,


In [None]:
# Export to csv (this will save the output to the parent folder in Jupyter hub)

df.to_csv(r'gismastodon.csv') 

We are not storing data in a publicly accessible location, nor are we publishing user names or IDs. However, you should always consider the implications of working with data that have not been created explicitly for the purpose of being analysed / answering research questions. We will explore the ethical implications of accessing openly accessible data in week 4 of the course.