<a href="https://colab.research.google.com/github/james-hughes1/wdss-nlp-project/blob/main/WDSS_NLP_Blog_Post.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

#Introduction
Since their inception, social media platforms have been analysed to produce
insights into public opinion using increasingly complex **natural language
processing (NLP)** approaches. Earlier models such as VADER [4] relied on
rule-based linguistic techniques to capture how grammatical features such
as adverbs and conjunctions affected the intensity and negation of senti-
ment within a text. This approach matched the performance of individual
humans on consensus-based sentiment labelling in a variety of online media
contexts, and has been used to analyse the sentiment of tweets express-
ing opinions about climate change[9]. In the early 2010s, word-embedding
vectorisation became a widespread technique for semantic modelling, with
the release of Word2Vec [5] and GloVe [7] data. This enables the study
of lexical semantic relationships on a large scale; in 2017 researchers used
this approach to systematically explore sub-topics within climate change
discourse on Twitter[6].


---


In this way, machine learning represents a novel and emerging paradigm
in climate change discourse analysis. It has brought new understanding to
the shifting landscape of public discussion around climate change by as-
sisting with topic classification [1], sentiment analysis and user-community
clustering [2], especially through analysing social media platforms such as
Twitter. The insights produced by these tasks can complement social sci-
ence research. Indeed, Stede and Patz [10] highlight the increasing overlap of
the two fields in the analysis of climate change discourse and emphasise the
need for considerations around representativeness and validity of modelling
when NLP is used in this way

We discussed many different avenues for research and project focus ini-
tially. One of our ideas revolved around analysing pledges made by political
leaders and conduct some data-backed investigation into how much they
keep to their word. There is plenty of data available, which can be investi-
gated in many different scopes (e.g. national vs global, a few select politics vs
advocates for a particular party, etc). However, the language used by such
politicians is constrained, which could make it difficult to train a model.
Also, it would be difficult to associate a metric with keeping promises, as
this could be subjective and it is unlikely that a database encapsulating
this with objective labels exists. The scope of individuals to investigate is
also somewhat constrained, regardless of the aforementioned scope. Another
idea was to investigate the changes of attitudes of the public towards climate
change following different climate summits. This was a genuinely interest-
ing question for us, the implications of which could be used to forecast how
attention on climate change could evolve through the years. However, very
ambitious scope for a student-led project- the title needs refined in order for
the problem to be well-defined.


After considering the benefits and drawbacks of each of the project ideas
we eventually settled on the key questions outlined below.
Twitter is a widely-used social media platform which can be used to
sample the public’s reaction to international events. Our study used histori-
cal data of Tweets occurring across the courses of multiple COP conferences
from COP24 to COP26. We used a human-labelled dataset [8] to construct
a model to classify the sentiment of a Tweet, limited to negative, neutral,
or positive, which we then used to label our original dataset. Additionally,
we manually categorised the Twitter accounts in our original data - limited
only to the top 100 users ordered by the maximum likes of any of their
tweets in the dataset. Our analysis was then centered around the following
key questions:


---



> 1. Who are the most influential stakeholders in the Twitter discoursethat surrounds the COP events?
2. Is there a relationship between the sentiment of a tweet and how much
exposure it receives?
3. How does this relationship vary across types of users and over time?






# Word Embeddings


---


In order to get a computer to understand the words in each tweet it is
necessary to first convert them into vectors in higher dimensional space.
The idea is to give words with similar meaning/context a similar vector
value. For example, we may want to give the words ‘lion’ and ‘leopard’
similar vector representations in a bank of words containing the names of
all the animals in the world. There are a few choices as to how this can be
achieved, such as **Word2Vec** or **GloVe’s** pre-trained word embeddings; we used the latter to help build an exploratory analysis of our data. The two methods of word embedding
would most likely yield similar results, though they work in
different ways. Word2Vec utilises a feedforward neural network. GloVe uses
a matrix-based approach: it starts by constructing a matrix whose rows are
words and columns are contexts it has identified in the input word bank.
This matrix can then be decomposed into smaller matrices that are easier
to work with (e.g. from word x context to word x feature and feature x
context).

Once the words of a dataset have been word embedded, we can imple-
ment the clustering algorithm to have the computer try and identify patterns
in the words for us. For the sake of this project, we investigated clustering
with different numbers of clusters in the EDA portion of the project.
We encountered another potential limitation when manually categorising
the data in our scraped tweets. A lot of tweets were indeed related to climate
change, but sometimes they discussed an individual rather than the topic of
climate change itself, for example praising Greta Thunberg for her efforts in
this space. This is not directly related to our goal of attributing sentiment
labels to these tweets with *regards to stance on climate change*.

We also used this technique in the main sentiment analysis task; the first layer of our neural network model was a word embedding layer that was fine-tuned on our training data, rather than pre-trained. This improved the accuracy of the model, heuristically because it converts the input text into high-dimensional vectors encoding the semantic content of each word, enabling the model to form a sentiment classification based on this encoded data.

# Clustering Techniques
We used centroid-based clustering. These types of algorithm work by assigning the optimal location for the center of each cluster, known as the
cluster centroid ). The position of each centroid is updated by finding the
average of the vectors that belong to that cluster, terminating once a pre-
specified tolerance has been reached. The K-Means algorithm is the most
popular centroid-based clustering technique.

# Datasets
We scraped historical tweet data, with the queries detailed in Table 2. For
the datasets relating to each conference we limited the query to English
Tweets containing the corresponding string “COP2 ”, which is case insensi-
tive and includes tweets with the substring “#COP24”, for instance. Each
tweet was collected along with important meta-data such as information
about the author account, and metrics such as number of likes and replies.
Next we compiled the users authoring tweets among the most liked tweets
in the dataset, and manually categorised them according to nine stakeholder
categories show in Table 1. We then filtered each of the original datasets
to contain only tweets from these categorised users. Lastly, we added the
sentiment predictions from our trained model.



---



| Table 1                       	| COP 24                              	| COP 25                                  	| COP26                                	|
|-------------------------	|-------------------------------------	|-----------------------------------------	|--------------------------------------	|
| Year                    	| 2018                                	| 2019                                    	| 2021                                 	|
| COP Date Range          	| 2nd Decemer 2018-14th December 2018 	| 2nd December 2019 to 13th December 2019 	| 31st October 2021-12th November 2021 	|
| Query Date Range        	| 15 Nov - 30 Dec                     	| 15 Nov - 27 Dec                         	| 14 Oct - 27 Nov                      	|
| Total Tweets            	| 108405                              	| 152662                                  	| 981982                               	|
| User-Categorised Tweets 	| 3551                                	| 2560                                    	| 3489                                 	|








---


| Category                   	| Description                                                                                                                                                              	| Examples                           	|
|----------------------------	|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------	|------------------------------------	|
| Activist                   	| Accounts belonging to activists  or   charitable   organisations  working to help climate change  mitigation or adaption.                                                	| @GretaThunberg   @SumakHe-  lena   	|
| Business                   	| Accounts related to a business  or personal accounts of business  leaders.                                                                                               	| @BNPParibas @BoschGlobal           	|
| Celebrity                  	| Celebrity & Personal accounts for which no other category applies and with more than 10,000 followers.                                                                   	| @LeoDiCaprio                       	|
| International Organisation 	| Official   accounts   for   non-  commercial   international  organisations.                                                                                             	| @GreenpeaceNZ @UNFCCC              	|
| Journalist                 	| Personal accounts of news re-  porters and commentators.                                                                                                                 	| @katieworth @LeoHickman            	|
| News                       	| Official accounts for large news  and   media   outlets,   including  digital news websites.                                                                             	| @Reuters @AJEnglish                	|
| Politician                 	| Official accounts for individuals  in (or formerly in) government  positions or international polit-  ical organisations.                                                	| @NicolaSturgeon   @Justin  Trudeau 	|
| Scientist                  	| Accounts   related   to   research  organisations   or   personal   ac-  counts   for   people   whose   pri-  mary occupation is research or  public science discourse. 	| @Astro Alex @Peters Gle            	|
| Misc                       	| None   of   the   categories   above  apply.                                                                                                                             	| @Damo Mullen @mamaloe66            	|





---





# The Model
---
We implemented a neural network model in Tensorflow to learn the patterns of label associations with the tweets.

In order to train the model, we required a dataset with sentiment labels attributed to each tweet. This could be done by a computer with clustering, but defeats the purpose of using the dataset for training, since we would have no feasible way of validating so many rows of labelled data.

We decided to use the publicly available data from the <a href='https://aclanthology.org/S17-2088/'>2017 International Workshop on Semantic Evaluation</a>, in particular from English Subtask A, which included roughly 60,000 human-labelled tweets, with 3 labels corresponding to _negative_, _neutral_, or _positive_. The labels were produced based on annotations from at least 5 judges, and strict quality control measures excluded annotations from judges who failed a certain threshold number of hidden tests. We retained 2,000 tweets for testing purposes after the hyper-parameter tuning.

Among the tweets used for model-training, there was a clear class imbalance, with 46.0% of tweets being labelled neutral, 34.9% as positive and 19.1% as negative. Therefore we proposed a suitable baseline accuracy as 46.0%, which is that of a dummy modal-class-predictor model.

We implemented and trained five neural network models of various complexity using TensorFlow on the English tweets sentiment dataset. We experimented with the following Keras layers:

* **Embedding layer:** similar to the GloVe package, this Keras layer takes our set of words and embeds them as vectors in n-space. We have specified the input dimension size (the total number of words in our dataset) and the output dimension size (the dimension of space we want to embed our word vectors into). After some experimentation, we found the output space dimension of $16$ to be a good fit.
* **Pooling layer:** pooling methods aim to reduce the size of the input space by replacing groups of points with single points formed by aggregating the groups' values. The two most common types of pooling methods are: max pooling, in which the maximum value of a group of points is taken as the value of the new aggregated point; and average pooling, where the average value is taken instead. In general, max pooling is able to highlight stark contrasts in a dataset, whereas average pooling is better for `smooothening out' the values in the entirety of the dataset. We opted with average pooling for our chosen model.

![picture](https://drive.google.com/uc?id=1jLqs2LhaiEvh7G62BzacFxnziGtkyAeP)
* **Convolution layer:** a convolution is a filter applied to an input, with multiple filters constituting a feature map. Convolution layers are often used in tandem with pooling layers: convolutions create multiple feature channels in parallel; and the pooling layer reduces the dimension of each of these channels.
* **Dropout layer:** this layer randomly sets activation values to 0, temporarily `dropping out' the corresponding nodes from the network. The main purpose of dropout layers is to help prevent overfitting. The utilisation of this kind of layer is especially useful when training a large neural network on a relatively small dataset.


The testing and training accuracies are shown in the below figure:
![picture](https://drive.google.com/uc?id=1TEk0il1-CDtuCDBvyBjfTeiETxN1nfPn)

From this, we opted with our first model, consisting of architecture shown below:

```python
# Compile model.
embedding_dim = 16

model_1 = tf.keras.Sequential([
    tf.keras.layers.Embedding(vocab_size, embedding_dim),
    tf.keras.layers.GlobalAveragePooling1D(),
    tf.keras.layers.Dense(3, activation="softmax")
])

model_1.compile(optimizer='RMSProp', loss='categorical_crossentropy', metrics='accuracy')
```

We then used the test data to measure the performance of the trained model. The unfiltered confusion matrix for our final model is shown below:

| Unfiltered Confusion Matrix | Predicted Negative | Predicted Neutral | Predicted Positive | Accuracy         | False Signal Rate |
|-----------------------------|--------------------|-------------------|--------------------|------------------|-------------------|
| Actual Negative             | 122                | 147               | 43                 | 0.391026         | 0.137821          |
| Actual Neutral              | 60                 | 632               | 177                | 0.727273         | 0.272727          |
| Actual Positive             | 28                 | 242               | 549                | 0.670330 | 0.148962          |


The confusion matrix for a model with $100\%$ accuracy would be a diagonal matrix. Our model outputs a vector $\{p_1,p_2,p_3\}$ with $p_1+p_2+p_3=1$, analogous to the network's confidence that the input tweet belongs to each of the three classes. We then return the index of the largest $p_i$, so for example if the output for a particular tweet is $\{0.1,0.4,0.5\}$, then we take the sentiment label to be $0$ (i.e. negative. This gives us two different ways of evaluating our model: restricting our view to the final decision made by the model, due to selecting the index of the node with largest $p_i$; or by filtering the results based on the network's confidence in its classification. For comparison, we also display the confusion matrix for the model's classifications with confidence greater than $80\%$ here:

| Confusion Matrix; confidence > 0.8 | Predicted Negative | Predicted Neutral | Predicted Positive | Accuracy | False Signal Rate |
|------------------------------------|--------------------|-------------------|--------------------|----------|-------------------|
| Actual Negative                    | 53                 | 22                | 7                  | 0.646341 | 0.085366          |
| Actual Neutral                     | 11                 | 189               | 32                 | 0.814655 | 0.185345          |
| Actual Positive                    | 3                  | 37                | 311                | 0.886040 | 0.150997          |


I.e. for these $665$ tweets, $\max_{i}p_i>0.8$. This added filter excludes $66.75\%$ of the tweets from the original test set.

# Analysis
---
## Time series

The first part of our analysis concerns treating the datasets as time series. Since our final data collection spanned three separate COP conferences, we decided to use study the profiles of tweet frequency and number of likes over time by averaging these trends over the different conferences.

|                         | COP24                               | COP25                               | COP26                                |
|-------------------------|-------------------------------------|-------------------------------------|--------------------------------------|
| Year                    | 2018                                | 2019                                | 2021                                 |
| COP Date Range          | Sunday 2nd Dec. to Friday 14th Dec. | Monday 2nd Dec. to Friday 13th Dec. | Sunday 31st Oct. to Friday 12th Nov. |
| Query Date Range        | 15th Nov. to 30th Dec.              | 15th Nov. to 27th Dec.              | 14th Oct. to 27th Nov.               |
| Total Tweets            | 108405                              | 152662                              | 981982                               |
| User-Categorised Tweets | 3551                                | 2560                                | 3489                                 |

In order to fairly align the different conference dates, we encoded the dates for each conference as increasing integers, standardised so that the end date in the above table is encoded as zero in each case. In the charts that follow, we annotate the start date as code $-12$, and it should be noted that this encodes the official start date of two of the conferences, and represents a Sunday.


![picture](https://drive.google.com/uc?id=1IRybCAIqK02o2QMwjWwJkPyQcZQohAkn)

To create the above figure we aggregated the total number of tweets for each date and then divided by the total number of tweets, for each conference. We then took a simple average of these three frequency profiles to produce the plot show. As expected, we can see that the greatest volume of tweets occurs during the conference, with this volume rapidly decaying in either time direction before and after the event. Additionally we see two spikes in volume corresponding to the start and finish of the conference; the latter being the sharper fluctuation, on average.

![picture](https://drive.google.com/uc?id=1gQJu_UZqvVbcFC7-Ko87Dbhn2LSd_PYS)

The above figure similarly shows that the mean likes per tweet are greatest during the conference itself. However there is much more fluctuation in the mean likes, especially as the conference progresses and after it ends.

We now decompose the time series by sentiment label. We begin with average relative frequency of tweets in each sentiment class over time:

![picture](https://drive.google.com/uc?id=12YtteWkK3ypFTRFLFBwM0lypqwc2kVKi)

The three datasets are separately standardised to produce a proportion of tweets from each class on each date, and then combined via simple average to create the first plot. The graph appears to show that the proportion of positive and negative tweets remains fairly constant (around 12\%) both before and during most of the conferences. The proportion of positive tweets does not seem to diverge from this significantly following the end of the conference. Negative tweets, however, seem to become more prevalent towards the end of the conferences, spiking rapidly around the final conference date, then remaining at around 16\% in the two weeks following.

Now for the mean likes per tweet over time:

![picture](https://drive.google.com/uc?id=1MeuJ8bxkwGSNTg27-8LRWJGMg07NxWxQ)

We observe that the mean likes for tweets of all sentiment classes increase steadily leading up to the conferences in general. Remarkably, negative tweets seem to receive many more likes on average around the middle of the conference, while the same is not particularly true for positive tweets. In the two weeks following the conference, the mean likes received by positive and negative tweets appears to fluctuate heavily but can be seen to increase substantially on particular days.

## Analysis by category of Twitter user

![picture](https://drive.google.com/uc?id=1171Ba3bDUzUiS2HSv-p7FaaJDEn_0iOl)
![picture](https://drive.google.com/uc?id=1ANazGCuafqhmdZ-7C1Vkg8tRJINQGsO0)

Compared to the original class imbalance of the training data, the above bar charts indicate a prevailing negative sentiment in the data, which shows that there is a clear overall signal of this sentiment throughout the data, even if some of the predictions are subject to noise. Categories such as scientists, activists, business and news show this trend more strongly. Some of the sentiment compositions are surprising; business accounts appeared to produce far fewer positive tweets than negative. We have included a sample of 'business' tweets that were labelled by our model as `negative':

| Content | Category | Sentiment |
|---|---|---|
| At Cop25, a few nations - Brazil, the US, Saudi Arabia and Australia in particular - were emboldened as never before to stand against the world and nakedly try to weaken efforts to tackle climate change to benefit their short term interests. | Business | Negative |
| Day 5 Cop26. Lobbyists working hard to get taxes - delicately called 'true pricing' - imposed on meat &amp; dairy. Below seems to be a clever PR stunt. It would be entirely wrong to suggest that this benefits investors (such as Gates) who have put $6bn into meat substitute companies! https://t.co/qmr50rqBY3 | Business | Negative |
| "We are looking at a 3 to 4-degree temperature rise by the end of the century."\n\nUN Secretary-General @antonioguterres warns but also says, "we can choose another path." #COP25 https://t.co/sQ61M23Ir7 | Business | Negative |
| [Blog] Water scarcity could cost some regions up to 6️% of their GDP.\nSound water management policies hold the key to building resilience: https://t.co/cA76I5EYm3 #ClimateIsWater #COP24 https://t.co/vsp8v5uX8q | Business | Negative |
| At #cop26 , 'building a dialogue' is double-speak for 'finding a way to sell the idea'. A good example of PR posing as communication. How nice of the academics to help Bill Gates, Branson and the investment community with their marketing strategy. https://t.co/TivRdo395J | Business | Negative |

The sample shows that business-labelled users often publish tweets which are completely unrelated to their own business activity, and instead discuss world events and politics brought up by the COP conferences, as opposed to marketing campaigns or positive stories of their actions in the area of sustainability.

Conversely, politicians showed a stronger signal of positive sentiment. This aligns with our intuition, as politicians will want to establish their agendas in a positive manner, and the Twitter platform is one of the best to advocate this. A small sample of these tweets are shown below:

| Content | Category | Sentiment |
|---|---|---|
| The @UNFCCC Secretariat is in the final stages of preparations for the @UN #ClimateChange Conference #COP24 in Katowice (here at all-staff meeting). We're all looking forward to a successful meeting, and finalizing the implementation guidelines of the #ParisAgreement! https://t.co/GGhhGdEbdP | Politician | Positive |
| Meeting with @BorisJohnson in the margins of #G20. \n\nWe talked about #COP26, as well as the negotiations on the Ireland/Northern Ireland Protocol and licensing for fishing boats. \n\n@EU_Commission is intensively engaging for finding solutions. | Politician | Positive |
| Well done #Poland for converging the world on much needed #ClimateAction as @COP24 concludes 👌 | Politician | Positive |
| Incredibly inspiring and very moving to be with youth climate strikers today. \n\nThey are out on the streets to seek to save their future and that of generations to come. \n\n#COP26 https://t.co/SjUHUJGqGM | Politician | Positive |
| Great ratio 👇🏻👇🏻#cop26 https://t.co/0Fy0Rom0T0    | Politician | Positive |


Finally, a box plot depicting the (logarithm of) likes between user categories and sentiment labels:

![picture](https://drive.google.com/uc?id=1QNTQ4K87pE20lXu2TwIIVBdCRlAbSgWV)

The box plot yields insights into the likes across the various groups in the following ways:
* Politicians, news, and international organisation tweets don't appear to show much variance across the sentiment classes.
* Activists and scientists, showed polarity with negative and positive tweets both receiving more likes than neutral tweets.
* Journalist and business accounts receive more likes for tweets which are negative and less for positive tweets. This may indicate that the audiences for these accounts favour negative content such as those invoking controversy or outrage than in general, when it comes to the topic of climate change.
* Celebrities tweet more negatively than positively, yet their positive tweets receive more likes on average.

# Conclusion
---
## Results

We have identified some interesting patterns relating to the most influential stakeholders in the discussion of the COP climate conferences. In our study, we used the number of likes as a metric of the exposure attained by a tweet, and therefore in some sense an indicator for the influence the tweet had in virtual discussions of the COP conferences.

Using this methodology we managed to examine the most influential stakeholders in the discourse by looking at the accounts publishing the tweets which received the greatest number of likes within the whole dataset. We compiled the maximum number of likes garnered by each account's tweets and then put those accounts in the top 100 by this measure into nine distinct categories by manual labelling. We then examined the volume of tweets produced by each category and their distributions of number of likes. This part of the analysis -- and the later analysis involving the categories -- could be improved by further manual labelling so that the distribution of tweets and likes across the categories is more reliable.

Our two-way factor analysis of the effects of user-category and sentiment on the number of likes showed how the influence of a tweet may depend on these two factors. Of particular interest was the interaction between these two effects; the degree to which sentiment has a strong effect appears to depend on the category of the user, and the nature of this interaction depends on the category as well. We saw that multiple categories such as activists and scientists had increased likes for tweets which had positive or negative sentiment, as opposed to neutral, which may show an inclination towards emotionally-charged discussions in these circles.

The later part of the analysis showed evidence that the volume and influence of negative sentiment tweets may have an upward trend over the course of the conferences. Further modelling beyond the simple linear regression could be used to investigate these trends further, as well as more data. Collecting data from a larger number of COP conferences would enable us to reliably answer the third question of our investigation, namely studying the long-term changes across the annual conferences, rather than just within each conference.

## Is the data used truly representative?

Our data is also limited to English language Tweets, which in this case causes a large degree of obscurity due to the international nature of the conferences. Moreover, there is a clear case to be made that the patterns in the sentiments of tweets related to the conferences would be affected by the nationality of the tweet author, for instance in the case of unprecedented natural disasters around the time of the conference.

## Validity of our model

The reliability of our analysis hinges on the accuracy and interpretation of our model predictions. Our model was trained on a large dataset of around 50,000 English tweets, with reliable human labels. The model achieved a reasonable accuracy --- and it should be noted that there are limits to how accurate such a model can become anyway, due to human-judge disagreements. 

As the training data comes from the same social media platform as the data for our study this endows the model with a degree of transferrability, although it could be argued that the sentiment in the discourse around climate change and politics is a task with a high degree of topicality. The natural language features constituting, say, a positive tweet on the platform in general may differ from what constitutes a positive tweet in the specific discourse around the COP conferences.

There is also a deeper ambiguity in the concept of sentiment itself which should be addressed. For instance, consider the following artificial tweet: 

`I'm so happy that everyone is working so hard to solve the issue of climate change', said no-one ever.

The tweet indicates a negative sentiment with regards to climate change. However, the use of the word 'happy' could potentially cause our model to classify this as a tweet with positive sentiment. Not only would this tweet reduce the accuracy of our network, but the tweet itself could be argued as inherently destructive to the model training process; it may suggest to the network that the word `happy' ought to be synonymised with negative sentiment rather than positive (which it ought to in isolated context). Sarcasm of the form shown above, as well as other problematic language features such as irony, humour, satire, and exaggeration, make the task of defining sentiment itself complicated, before even building a model to predict it. We made up $10$ tweets to investigate the difficulties our model may have encountered in this regard during the training process, as illustrated in the below table:


|  | Content | Predicted Negative | Predicted Neutral | Predicted Positive | Predicted | Confidence | Actual |
|---|---|---|---|---|---|---|---|
| 0 | "I'm so happy that everyone is working so hard to solve the issue of climate change", said no-one ever. #sarcasm\n | 0.094003 | 0.094084 | 0.811912 | 2 | 0.811912 | 0 |
| 1 | It's been great to see so many politicians working so hard at achieving nothing with regards to climate change. Keep it up! #sarcasm\n | 0.172700 | 0.026435 | 0.800865 | 2 | 0.800865 | 0 |
| 2 | Coal and gas may harm our planet, but green power cannot hurt me. Go green! #climate\n | 0.841626 | 0.145500 | 0.012874 | 0 | 0.841626 | 2 |
| 3 | Most climate change activists will campaign strongly for change, yet will still use electronics and power-intensive devices. Sure, let's listen to them!\n | 0.555325 | 0.402044 | 0.042631 | 0 | 0.555325 | 0 |
| 4 | Why did the polar bear refuse to go the party? Because he heard that the ice caps were melting! Haha so funny... #sarcasm\n | 0.035278 | 0.099813 | 0.864909 | 2 | 0.864909 | 0 |
| 5 | I believe that we will never solve the issue of climate change. Actually, that is not what I believe.\n | 0.697762 | 0.228066 | 0.074173 | 0 | 0.697762 | 2 |
| 6 | Although the issue of climate change is a challenging one, it's not all doom and gloom. Progress is in the making!\n | 0.587708 | 0.319563 | 0.092730 | 0 | 0.587708 | 2 |
| 7 | The only thing sustainable in this world is my ability to sustain a fake smile, with the knowledge that nature is perishing. #climatechange #savetheearth\n | 0.684905 | 0.153612 | 0.161483 | 0 | 0.684905 | 0 |
| 8 | A lot is uncertain in this world. But one thing I am certain of is that we will be able to eradicate the horrific effects of climate change #wecandothis\n | 0.286373 | 0.502397 | 0.211230 | 1 | 0.502397 | 2 |
| 9 | 2,4,6,8, who do we appreciate? Climate change activists! | 0.290256 | 0.439584 | 0.270160 | 1 | 0.439584 | 2 |


The model managed to classify only $2$ of the $10$ tweets correctly. Unsurprisingly, the model seems very confident in its classification of most of the sarcastic tweets, despite its evaluations being completely incorrect. Additionally, the tweets accumulating the lowest confidence value were signed the neutral sentiment label. This is most likely due to the class imbalance in our training set: in other words, when the model is not confident, it defaults to labelling the tweet as neutral.



## Next Steps

In order to improve the reliability of our analysis, we could increase the scope of our collected data. For instance, we could manually categorise more users to increase the number of tweets in the corresponding part of the analysis. In addition, we could incorporate a range of different languages to capture more global trends in the discourse around the COP conferences. However, this would require adapting our sentiment model.

Refining the sentiment model could involve finding more training data, or training data which is more relevant to the topic of our study. We could also use pre-trained sentiment models and investigate the reproducibility of our analysis with those differing models.