# Twitter Text Report

## October 25, 2022

### By Serena Gestring

[Credit for Images](https://www.vangoghgallery.com/painting/)

![](sunf_resize.jpg)

On Friday October 14, 2022, two climate activists [threw tomato soup on Vincent Van Gogh's "Sunflowers" painting](https://www.npr.org/2022/10/14/1129098184/van-gogh-sunflowers-soup-climate-protest-london-gallery) in the London National Gallery to protest fossil fuel extraction. These two individuals belong to the Just Stop Oil activist organization, which has orchestrated a few other art-related protests in the past year. Though the painting itself was unharmed and put back on display, this event has sparked debate on social media regarding the act itself, what it represents, and its effectiveness. Therefore, the driving question I am looking to answer is: "What is the general sentiment regarding climate activism and valid activism methods after the Van Gogh tomato soup protest at the National Gallery?" In essence, I want to know how people are reacting to this protest and to the concept of using destruction of art as a form/attempt of activism. 

I think this is an important conversation to analyze because the awareness and support garnered through activism is essential for actually creating change. If climate activists want to initiate change, then looking at how people react to different methods of protesting/activism will provide insight on what methods are working and what methods are not; which methods are getting people engaged and which are making people turn away. While this particular report is looking at this one specific event, this also starts the broader conversation of what will lead to actual change regarding climate policies and fossil fuel/oil policies that will benefit us and our planet.  

To answer this question, I used the Search/Recent endpoint of the Twitter API to collect tweets relevant to this topic so I can learn about the conversation. In order to access the API, I developed a query, which is given below. The main portion of this query exists in two parts: one group of parentheses contains keywords relevant to the National Gallery, and the other group of parentheses contains key words relating to climate activism and Just Stop Oil. These key words are separated by the two parentheses groupings because it creates AND logic. If all of these keywords were in one group of parentheses and each word/phrase separated by OR logic, then I would be more likely to get tweets related to the National Gallery, Van Gogh's paintings, climate activism, and Just Stop Oil separately, but that are not related to my specific event. Because I want to know about this one specific event, I need the AND logic so only tweets that relate to the National Gallery, Van Gogh, and the Just Stop Oil activists from October 14th protest are collected. In addition, the query specifies that only tweets in English are collected, and the query also excludes re-tweets because it would not be useful or insightful to see the same tweet repeated over and over. 

![](poppies_resize.jpg)

First, I loaded all the necessary software needed for collecting and viewing data. 

In [1]:
import pandas as pd
import json
import requests
import urllib

Then I passed my Twitter Developer bearer token into Jupyter Notebook. This token is necessary for accessing the Twitter API because it proves I have permission to retrieve data from Twitter. I save this authorization in the variable titled "header" which I then use throughout the rest of the code to prove my access to the Twitter API. I ran the header variable to make sure my bearer token had loaded in correctly, and it had so I cleared the cell so my access token would not be visible online. 

In [2]:
bearer_token = pd.read_csv("b_token_1.txt", header = 0, sep = '\t')

In [3]:
header = {'Authorization':'Bearer {}'.format(bearer_token['Bearer_Token'].iloc[0])}

In [None]:
header

The endpoint variable is the specific endpoint of the Twitter API I am trying to access, in this case the Search/Recent endpoint. This is where I will collect my data from. 

In [4]:
endpoint = 'https://api.twitter.com/2/tweets/search/recent'

The following code cell contains my query, which is saved as the variable "query_text," which is then parsed through the URLlib software in order for the query to be readable by the API. The variables "tweet_fields," "user_fields," and "expansions" are also defined to gather additional data on the tweets that will be collected. For the tweet fields I included author_id, public_metrics, created_at, and in_reply_to_user_id. I included that last one because I thought it would be interesting to look at any actual conversations that occurred between Twitter users regarding this topic. The user field only includes the username. 

The "query_url" variable combines the endpoint and the query to create a URL that will be sent in order to access the API and retrieve data. Notice that the maximum results is set to 100, so that 100 pieces of data (tweets) will be retrieved and viewable at a time. I ran the query_url to make sure all parts of my query were included in the url, and they are. 

In [5]:
query_text = '((@NationalGallery OR national gallery OR van gogh OR tomato soup) (climate activism OR climate activists OR just stop oil)) lang:en -is:retweet'
query_encoded = urllib.parse.quote(query_text)
tweet_fields = 'author_id,in_reply_to_user_id,public_metrics,created_at'
user_fields = 'username'
expansions = 'entities.mentions.username'

In [6]:
query_url = endpoint + '?query={}&tweet.fields={}&user.fields={}&expansions={}&max_results=100'.format(query_encoded, tweet_fields, user_fields, expansions)

In [7]:
query_url

'https://api.twitter.com/2/tweets/search/recent?query=%28%28%40NationalGallery%20OR%20national%20gallery%20OR%20van%20gogh%20OR%20tomato%20soup%29%20%28climate%20activism%20OR%20climate%20activists%20OR%20just%20stop%20oil%29%29%20lang%3Aen%20-is%3Aretweet&tweet.fields=author_id,in_reply_to_user_id,public_metrics,created_at&user.fields=username&expansions=entities.mentions.username&max_results=100'

Using the Requests software, I input my query URL and the header variable (that contains my access token) as parameters in the get() method. This allows me to retrieve the data I requested through the Search/Recent endpoint of the Twitter API. I saved that request as the variable "response" because it is the response to my query. 

In the next code cell I checked the status code of my request. The 200 means my request was valid and that I have data to parse. I then ran my response using .text to view my data to make sure what I received was relevant to my driving question. I had to adjust my query a few times until I began seeing tweets that only related to my question. I have since cleared the cell because it was very long and the data was not in the ideal structure for viewing purposes. 

In [8]:
response = requests.get(query_url, headers = header)

In [9]:
response

<Response [200]>

In [10]:
response_dict = json.loads(response.text)

In [None]:
response.text

![](starry_resize.jpg)

While the maximum results only allows me to see 100 tweets at a time, there are actually multiples pages of results that can be viewed. In the code cell below, I have created a function that would generate as many of the results pages as I wanted to see. First, "response_list" is an empty list in which to put the pages of results as they are generated, and "next_token" is an empty string in which to put the next tokens contained in the meta data of each page of results. Next comes a for loop, which contains a conditional. It essentially states that if the iteration is greater than 0 (0 being the first page of results), then pass the query plus the value of the next token, and for anything else just pass the query (this would only happen for the first page of results because there would be no next token to include). Under the for loop is the "this_response" variable, which uses the get() method to retrieve the data collected by the function, which is then loaded as a json dictionary, which is then added to the "response_list" list. Finally, the "response_list" is returned at the end of the function. 

For this report, I wanted a total of 3 pages of results, for a total of 300 tweets. Therefore, I passed the query URL ("query_url") into the query parameter, passed the number 3 for the num_pages parameter (to get three pages of results), and passed the header variable (that contains my bearer token and therefore my access to the API) to the headers parameter. I then created the "my_responses" variable to contain all 300 tweets and their accompanying data. I ran "my_responses" to verify one more time that the data is still relevant to my driving question, and then cleared the cell. 

In [11]:
def twt_recent_search (query, num_pages, headers):
    response_list = []
    next_token = ''
    
    for i in range(0, num_pages):
        if i > 0:
            this_query = query + "&next_token={}".format(next_token)
        else:
            this_query = query
        
        this_response = requests.get(this_query, headers = header)
        this_response_dict = json.loads(this_response.text)
        response_list.append(this_response_dict)
        next_token = this_response_dict['meta']['next_token']
    
    return response_list

In [12]:
my_responses = twt_recent_search(query_url, 3, header)

In [None]:
my_responses

Then I turned the data received from the API into a dataframe using the pandas software. 

In [13]:
results_1 = pd.DataFrame.from_records(my_responses)

In [14]:
data_list = list(results_1['data'])

In [15]:
data_list_of_dfs = [pd.DataFrame(x) for x in data_list]

The concat() method allows me to create one dataframe out of the three pages of results. 

In [16]:
data_df = pd.concat(data_list_of_dfs)

In [None]:
data_df

Becaues the public_metrics column has multiple data points, I created its own dataframe so I could merge that with the larger dataframe using the join() method. This makes the public_metrics more easily readable to viewers. 

In [17]:
public_metrics_df = pd.DataFrame(list(data_df['public_metrics']))

In [18]:
final_df = data_df.join(public_metrics_df)

![](painting_resize.jpg)

Now the two dataframes are joined together. However, the public_metrics column is still included. Therefore, I used the drop() method to delete that column from the dataframe. Also, this final dataframe included an 'entities' column, which had the user_fields information. I was able to create a separate data frame for just the entities information and join it to the larger one, but was unable to figure out how to pull the 'mentions' information from within that. However, this information is also stored in the 'includes' key (different from the 'data' key). Therefore, I had to create a separate dataframe in order to view that information (called "users_df"). Because of this second dataframe, I decided to drop the "entities" column as well. The first five and last five rows of both dataframes are viewable below.  

In [19]:
final_df.drop('public_metrics', axis = 1).drop('entities', axis = 1).head()

Unnamed: 0,created_at,text,id,author_id,edit_history_tweet_ids,in_reply_to_user_id,retweet_count,reply_count,like_count,quote_count
0,2022-10-25T19:09:58.000Z,"That’s all we need, a jumped up idiot encourag...",1584985637555503116,91120555,[1584985637555503116],,0,0,0,0
0,2022-10-25T08:18:48.000Z,@PhreekHyperbole just stop oil splashed tomato...,1584821766580756482,123600006,[1584821766580756482],1.4076213711951299e+18,0,0,0,0
0,2022-10-24T21:17:02.000Z,"The Just Stop Oil protest shocked the world, b...",1584655226858086402,800414768024518656,[1584655226858086402],,0,0,0,0
1,2022-10-25T19:06:17.000Z,not to agree with some of Just Stop Oil's tact...,1584984710018539520,1415934677978861569,[1584984710018539520],,0,0,4,0
1,2022-10-25T08:17:39.000Z,"morning all, my thoughts regarding the climate...",1584821476578177024,1643523890,[1584821476578177024],,0,0,4,0


In [20]:
final_df.drop('public_metrics', axis = 1).tail()

Unnamed: 0,created_at,text,id,author_id,edit_history_tweet_ids,in_reply_to_user_id,entities,retweet_count,reply_count,like_count,quote_count
98,2022-10-25T08:31:25.000Z,Just Stop Oil protests: Bob Geldof reveals why...,1584824940280569856,1430815514704633859,[1584824940280569856],,,0,0,0,0
98,2022-10-24T21:24:04.000Z,@Sluchey1 Climate activists vandalised a portr...,1584656999362736128,1552365153571807233,[1584656999362736128],1.4439554317523108e+18,"{'mentions': [{'start': 0, 'end': 9, 'username...",0,0,0,0
98,2022-10-24T15:22:49.000Z,First the idiots vandalizing van Gogh’s Sunflo...,1584566087709966336,918302785011093504,[1584566087709966336],,,0,0,0,0
99,2022-10-25T08:20:31.000Z,Just Stop Oil protests: Bob Geldof reveals why...,1584822200997392387,1514210880602443776,[1584822200997392387],,,0,0,0,0
99,2022-10-24T21:23:00.000Z,The protest at Potsdam’s Barberini museum foll...,1584656728855117829,33860739,[1584656728855117829],,,0,0,0,0


In [21]:
response_dict['includes'].keys()

dict_keys(['users'])

In [22]:
users_df = pd.DataFrame(response_dict['includes']['users'])

In [23]:
users_df.head()

Unnamed: 0,id,name,username
0,1148702707378769921,Climate Emergency Fund,ClimateEFund
1,2347049341,Vox,voxdotcom
2,10228272,YouTube,YouTube
3,1472840303564537858,FightingAnne1,FightingAnne1
4,14420097,Aja Romano,ajaromano


In [24]:
users_df.tail()

Unnamed: 0,id,name,username
32,442832665,Tanya Gold,TanyaGold1
33,95064543,ABC The Drum,ABCthedrum
34,3250619743,Humra Laeeq,humralq
35,612473,BBC News (UK),BBCNews
36,1168968080690749441,Rishi Sunak,RishiSunak


Finally, I saved this dataframe as a csv file, so I can review the data at a later time. 

In [25]:
final_df.to_csv(r'C:\Users\Serena\EMAT22110_FA22\twt_data.csv')

![](irises_resize.jpg)

The query given above was used to gather 300 tweets relating to the climate protest that involved throwing tomato soup on Van Gogh's "Sunflowers" painting on October 14th, 2022. Upon scanning the data from all 300 tweets, the vast majority of the tweets are indeed about this topic. The data provides many different viewpoints regarding whether the two climate activists are in the right or wrong, if the protests done by Just Stop Oil are effective/valid or not, if a painting is more important than the health of the planet, etc. However, there are a few tweets throughout the data that do not appear to be about this specific event, but rather about the broader idea of climate activism or other related topics, which could be useful in providing more context for the greater picture of activism, fossil fuels, and the climate crisis, but not necessarily useful for answering the driving question of this report. Therefore, this data (and the query that gathered the data) is not the strongest it could be, and that improvements to the keywords or structure of the query could definitely be implemented. With that being said, I do believe this data is an adequate starting point for examining online conversation about this event.  

There are other limitations to consider with this data. To reiterate, the query accessed the Search/Recent endpoint of the Twitter API in order to collect data. A limitation of using this endpoint is that one can only access tweets posted no farther back than a week previous (in other words, only tweets from the last seven days and no later). Therefore, it is very possible to miss a large portion of the tweets relevant to the conversation, and to not get the full story or all of the perspectives and thoughts from individuals around the world. An alternative approach would be to use the Search/All endpoint, which would collect *all* tweets matching the search query as far back as the year 2006. This would allow one to collect and view all tweets relating to this event since the moment the event occurred on October 14th, 2022, providing the complete data set and therefore the complete story behind this conversation (on Twitter specifically). It should be noted the Search/All endpoint is only available to Twitter Developers with Academic Research access, hence why it was not used for this report. 

A potential next step from a Twitter standpoint would be to include more data about the users themselves, such as location and bio information (if provided). This additional information could shed some light on what types of people are making certain kinds of opinions, or in other words, if there are any trends of opinions on this topic in relation to age, ethnicity, nationality, etc. A broader next step would be to scrape data from other social media sites regarding this topic in order to gain a larger insight into general online conversation, since individuals use a wide variety of platforms other than Twitter to communicate. This cumulative data from multiple sources (assuming the data is reliable) would provide the best understanding of the global sentiment behind climate activism using the starting point of throwing tomato soup at Van Gogh's "Sunflowers" to protest fossil fuels.

![](Vincent_van_Gogh_sunflowers.jpg)