# Machine Learning II - Exam-Project - Sascha Pfeiffer 

# Introduction
---

Welcome to my Python Notebook, where I'm undertaking a study of sentiment analysis in the realm of cryptocurrency, with a particular emphasis on the social media platform Twitter.

Cryptocurrencies are known for their price volatility, which can be influenced by a wide array of factors. Among these, I'm examining the role of Twitter, a platform where discussions about cryptocurrencies are constantly happening in real time.

In this project, I'll be gathering tweets related to different cryptocurrencies using the Twitter API, and then applying sentiment analysis to this data. The aim here is to determine the underlying sentiments expressed in tweets about specific cryptocurrencies.

I plan to train my own model and also leverage the OpenAI API to ascertain the sentiment of these tweets. Over a certain period, I'll amass data from Twitter and use both my model and the OpenAI API to predict the sentiment expressed in these tweets.

Furthermore, I'll create a function to track how sentiment changes over time. For this analysis, I'll only consider tweets where the sentiment predictions from my model and OpenAI coincide. This accumulated data could potentially be useful for future model training.

So, I invite you to join me on this scholarly journey that merges Python, cryptocurrencies, Twitter, and sentiment analysis. Let's explore what Twitter discussions reveal about sentiments within the crypto market!

# Import the credentials
---

A fellow student has to present this project, therefore I implement a function to upload the credentials in a JSON-Format and save them in variables for later use. She or he can request the JSON-File by e-mail from pfeifsas@

This process ensures the secure and organized handling of project credentials.

In [None]:
from google.colab import files
import io
import json

# Use files.upload to produce the "Choose Files" button below, then select your file.
uploaded = files.upload()

# Use io.BytesIO to decode the file, then json.load to open it.
file = io.BytesIO(uploaded['credentials.json'])
credentials = json.load(file)

# Use Python list comprehension to save each credential to a separate variable.
TWITTER_CONSUMER_KEY = credentials['TWITTER_CONSUMER_KEY']
TWITTER_CONSUMER_SECRET = credentials['TWITTER_CONSUMER_SECRET']
TWITTER_ACCESS_TOKEN = credentials['TWITTER_ACCESS_TOKEN']
TWITTER_ACCESS_TOKEN_SECRET = credentials['TWITTER_ACCESS_TOKEN_SECRET']
BEARER_TOKEN = credentials['BEARER_TOKEN']
GPT_SECRET_KEY = credentials['GPT_SECRET_KEY']
MONGO_CONNECTION_STRING = credentials['MONGO_CONNECTION_STRING']


# Install and import all the needed libraries and dependencies
---

This code installs all the necessary libraries and dependencies.

Installing the Libraries

In [None]:
!pip install datasets
!pip install transformers
!pip install openai
!pip install 'pymongo[srv]'

Importing the dependencies

In [None]:
# nltk
from nltk.stem import WordNetLemmatizer
import nltk
nltk.download('wordnet')
nltk.download('stopwords')
from nltk.corpus import stopwords

# plotting
import seaborn as sns
from wordcloud import WordCloud
import matplotlib.pyplot as plt

# sklearn
from sklearn.svm import LinearSVC
from sklearn.naive_bayes import BernoulliNB
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import GridSearchCV

from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics import confusion_matrix, classification_report,roc_auc_score, roc_curve

# utilities
import re
import pickle
import numpy as np
import pandas as pd
import time
from datasets import load_dataset
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch.nn import functional as F
import openai
from pymongo.mongo_client import MongoClient
import requests
from datetime import datetime
import matplotlib.pyplot as plt
from collections import defaultdict
from collections import Counter


# Importing and converting the dataset
---
In order to train my sentiment analysis model, I needed a dataset that contained labeled tweets. Since tweets have a unique structure due to their character limit, I discovered the [sentiment140](https://huggingface.co/datasets/sentiment140) dataset on Huggingface, which consists of 1.6 million sentiment-labeled tweets.

Huggingface offers the option to download the dataset as a CSV file, or it can be imported directly using the datasets library.

Rather than downloading and importing a CSV file, I opted to utilize the datasets library from Huggingface to load the initial training dataset. Once loaded, I converted it into a pandas dataframe for further processing.

In [None]:
# Load the dataset
dataset = load_dataset("sentiment140")

# Access a split and convert to a pandas dataframe
df = dataset['train'].to_pandas()

# Removing the unnecessary columns.
df = df[['sentiment','text']]

# Replacing the values to ease understanding.
df['sentiment'] = df['sentiment'].replace(4,1)

# Plotting the distribution for dataset.
ax = df.groupby('sentiment').count().plot(kind='bar', title='Distribution of data',
                                               legend=False)
ax.set_xticklabels(['Negative','Positive'], rotation=0)

# Storing data in lists.
text, sentiment = list(df['text']), list(df['sentiment'])

In [None]:
df['sentiment'].value_counts()

# Define a preprocessing function
--- 

In my initial implementation, I manually defined a stopword list. However, I decided to enhance the quality of the stopwords by integrating the nltk library, which provides a state-of-the-art stopword list. This improved stopword list was then incorporated into my preprocessing function. The original code is commented out below:

In [None]:
'''## Defining set containing all stopwords in english. 

stopwordlist = ['a', 'about', 'above', 'after', 'again', 'ain', 'all', 'am', 'an',
             'and','any','are', 'as', 'at', 'be', 'because', 'been', 'before',
             'being', 'below', 'between','both', 'by', 'can', 'd', 'did', 'do',
             'does', 'doing', 'down', 'during', 'each','few', 'for', 'from', 
             'further', 'had', 'has', 'have', 'having', 'he', 'her', 'here',
             'hers', 'herself', 'him', 'himself', 'his', 'how', 'i', 'if', 'in',
             'into','is', 'it', 'its', 'itself', 'just', 'll', 'm', 'ma',
             'me', 'more', 'most','my', 'myself', 'now', 'o', 'of', 'on', 'once',
             'only', 'or', 'other', 'our', 'ours','ourselves', 'out', 'own', 're',
             's', 'same', 'she', "shes", 'should', "shouldve",'so', 'some', 'such',
             't', 'than', 'that', "thatll", 'the', 'their', 'theirs', 'them',
             'themselves', 'then', 'there', 'these', 'they', 'this', 'those', 
             'through', 'to', 'too','under', 'until', 'up', 've', 'very', 'was',
             'we', 'were', 'what', 'when', 'where','which','while', 'who', 'whom',
             'why', 'will', 'with', 'won', 'y', 'you', "youd","youll", "youre",
             "youve", 'your', 'yours', 'yourself', 'yourselves']'''

##The function provided below is used to preprocess tweets for sentiment analysis, both for training and prediction purposes.

##First attempt:

There is an updated version of the preprocessing function below.

In [None]:
'''def preprocess(textdata):
    processedText = []

    # Create Lemmatizer and Stemmer.
    wordLemm = WordNetLemmatizer()

    # Defining regex patterns.
    urlPattern        = r"((http://)[^ ]*|(https://)[^ ]*|( www\.)[^ ]*)"
    userPattern       = '@[^\s]+'
    alphaPattern      = "[^a-zA-Z0-9]"
    sequencePattern   = r"(.)\1\1+"
    seqReplacePattern = r"\1\1"
    emoticonPattern   = r"[:;=8][\-o\*\']?[\)\]\(\[dDpP/\\OpP3]"

    for tweet in textdata:
        if tweet is not None:   # skip None values
            tweet = tweet.lower()

            # Replace all URls with 'URL'
            tweet = re.sub(urlPattern,' URL',tweet)       
            # Replace @USERNAME to 'USER'.
            tweet = re.sub(userPattern,' USER', tweet)        
            # Replace all non alphabets.
            tweet = re.sub(alphaPattern, " ", tweet)
            # Replace 3 or more consecutive letters by 2 letter.
            tweet = re.sub(sequencePattern, seqReplacePattern, tweet)
            # Replace emoticons with an empty string
            tweet = re.sub(emoticonPattern, '', tweet)

            tweetwords = ''
            for word in tweet.split():
                # Checking if the word is a stopword.
                if word not in stopwordlist:
                    if len(word)>1:
                        # Lemmatizing the word.
                        word = wordLemm.lemmatize(word)
                        tweetwords += (word+' ')
                
            processedText.append(tweetwords)
        
    return processedText'''


##Second attempt:

###Optimizations:
- Utilizing the NLTK stopword library to improve preprocessing efficiency.
- Incorporating a function that reduces consecutive occurrences of the word "USER" to a single mention when it follows another instance of "USER" within a tweet, thereby improving the accuracy of the sentiment analysis.

Please refer to the subsequent sections of the same notebook for a detailed explanation of why I made multiple changes to the "USER" mentions in the tweets.


In [None]:
stopwordlist = set(stopwords.words('english'))

def preprocess(textdata):
    processedText = []

    # Create Lemmatizer
    wordLemm = WordNetLemmatizer()

    # Defining regex patterns.
    urlPattern        = r"((http://)[^ ]*|(https://)[^ ]*|( www\.)[^ ]*)"
    userPattern       = '@[^\s]+'
    alphaPattern      = "[^a-zA-Z0-9]"
    sequencePattern   = r"(.)\1\1+"
    seqReplacePattern = r"\1\1"
    emoticonPattern   = r"[:;=8][\-o\*\']?[\)\]\(\[dDpP/\\OpP3]"

    for tweet in textdata:
        if tweet is not None:   # Skip None values
            tweet = tweet.lower()

            # Replace all URLs with 'URL'
            tweet = re.sub(urlPattern,' URL',tweet)       
            # Replace @USERNAME with 'USER'.
            tweet = re.sub(userPattern,' USER', tweet)        
            # Replace all non-alphabets.
            tweet = re.sub(alphaPattern, " ", tweet)
            # Replace 3 or more consecutive letters by 2 letters.
            tweet = re.sub(sequencePattern, seqReplacePattern, tweet)
            # Replace emoticons with an empty string.
            tweet = re.sub(emoticonPattern, '', tweet)

            tweetwords = ''
            for word in tweet.split():
                # Checking if the word is a stopword.
                if word not in stopwordlist:
                    if len(word) > 1:
                        # Lemmatize the word.
                        word = wordLemm.lemmatize(word)
                        tweetwords += (word+' ')
                
            tweetwords = reduce_user_mentions([tweetwords])  # Reduce multiple user mentions

            processedText.append(tweetwords[0])

    return processedText


def reduce_user_mentions(tweets):
    processed_tweets = []
    for tweet in tweets:
        words = tweet.split()
        processed_words = [words[i] if words[i] != 'USER' or (i > 0 and words[i-1] != 'USER') else '' for i in range(len(words))]
        processed_tweet = ' '.join(processed_words).strip()
        processed_tweets.append(processed_tweet)
    return processed_tweets


# Preprocessing tweets in  dataset
---

In this step, I am performing preprocessing on all the tweets in my training set. The purpose of this preprocessing step is to clean and transform the tweet data before further analysis or model training.

In [None]:
t = time.time()
processedtext = preprocess(text)
print(f'Text Preprocessing complete.')
print(f'Time Taken: {round(time.time()-t)} seconds')

# Creating word clouds
---
After preprocessing the tweets, I am using a word cloud to visually represent the most frequent words in both positive and negative tweets. By creating word clouds for each sentiment category, I can gain insights into the key themes and sentiments expressed in the dataset. This visual representation allows me to observe the prominent words associated with positive and negative sentiments, helping me understand the overall sentiment distribution and potentially identify important patterns or trends in the data.

## Word-Cloud for positive tweets

In [None]:
data_pos = processedtext[800000:]
wc = WordCloud(max_words = 1000 , width = 1600 , height = 800,
              collocations=False).generate(" ".join(data_pos))
plt.figure(figsize = (20,20))
plt.imshow(wc)

### Word-Cloud for negative tweets

In [None]:
data_neg = processedtext[:800000]
plt.figure(figsize = (20,20))
wc = WordCloud(max_words = 1000 , width = 1600 , height = 800,
               collocations=False).generate(" ".join(data_neg))
plt.imshow(wc)

#Utilizing N-grams to enhance contextual analysis
---

Now I extract N-grams from preprocessed positive and negative tweet datasets. The code utilizes the extract_ngrams function to extract N-grams from a list of preprocessed tweets. The value of n is defined as 3, indicating trigrams (sequences of three words).

Trigrams are then extracted from the positive and negative tweet datasets using the extract_ngrams function. The frequency of these trigrams is calculated using the Counter class from the collections module.

The output displays the most common trigrams in positive and negative tweets, showcasing the language patterns associated with each sentiment category.

In [None]:
# Function to extract N-grams from a list of preprocessed tweets
def extract_ngrams(tweets, n):
    ngrams = []
    for tweet in tweets:
        words = tweet.split()
        ngrams.extend([' '.join(words[i:i+n]) for i in range(len(words)-n+1)])
    return ngrams

# Define the value of N for N-grams
n = 3

# Extract N-grams from positive and negative tweet datasets
positive_ngrams = extract_ngrams(data_pos, n)
negative_ngrams = extract_ngrams(data_neg, n)

# Calculate the frequency of N-grams
positive_ngram_freq = Counter(positive_ngrams)
negative_ngram_freq = Counter(negative_ngrams)

# Print the most common N-grams for positive and negative tweets
print("Most common N-grams in positive tweets:")
print(positive_ngram_freq.most_common(20))

print("\nMost common N-grams in negative tweets:")
print(negative_ngram_freq.most_common(20))


##First attempt:

The analysis of the most common N-grams in positive and negative tweets reveals distinct language patterns associated with each sentiment category. Positive N-grams include gratitude, positive expressions, laughter, and greetings, while negative N-grams consist of disappointment, regrets, empathy, and negative perceptions. This insight enhances our understanding of sentiment trends in the dataset and can contribute to improving sentiment analysis models.

The analysis of N-grams reveals the prevalence of the following word combinations in the dataset:
- For positive tweets: The most frequent N-gram is 'USER USER USER' with 8509 mentions.
- For negative tweets: The most frequent N-gram is also 'USER USER USER' with 2210 mentions.

I enhanced the preprocessing function by incorporating a new feature that reduces consecutive occurrences of the word "USER" to a single mention when it follows another instance of "USER" within a tweet, resulting in improved preprocessing and more accurate sentiment analysis. 

The reason for this enhancement is to improve the quality and effectiveness of the sentiment analysis process. In many cases, tweets may contain multiple consecutive mentions of "USER" without providing any additional valuable information for sentiment analysis. By reducing multiple mentions of "USER" to a single mention, we simplify the tweet representation and eliminate any redundancy caused by repetitive user mentions.

This enhancement helps to streamline the sentiment analysis process by focusing on the essential content of the tweet while disregarding repeated user mentions. It ensures that the sentiment analysis algorithm can better capture the sentiment-related words and phrases in the tweet without being influenced by excessive repetitions of the "USER" word.

#New results after optimization of the preprocessing function

Instead of the previous occurrence of "USER USER USER," the most common 3-word combinations have been updated:

- In positive tweets, the most frequent N-gram is 'happy mother day' with 2033 mentions.
- In negative tweets, the most frequent N-gram is 'wish could go' with 921 mentions.

These updated N-grams provide more meaningful insights in the context of sentiment analysis, as they reflect specific phrases related to positive and negative sentiments.


## Train-Test-Split

In [None]:
X_train, X_test, y_train, y_test = train_test_split(processedtext, sentiment,
                                                    test_size = 0.05, random_state = 0)
print(f'Data Split done.')



## TF-IDF-Vectorizer

I'm using the TfidfVectorizer to convert the text data into numerical features. By setting `ngram_range=(1,2)`, I consider both single words and word pairs. After fitting the vectorizer to the training data, I print the confirmation message and display the number of feature words extracted.

In [None]:
# Initialize the vectorizer
vectorizer = TfidfVectorizer(ngram_range=(1,2), max_features=500000)

# Fit it to the training data
vectorizer.fit(X_train)

print(f'Vectoriser fitted.')
print('No. of feature_words: ', len(vectorizer.get_feature_names_out()))

My goal is to convert messy, unstructured text data into numerical vectors that machine learning models can process. To do this, I use the `TfidfVectorizer` from the `sklearn` library.

These vectorizer has key parameters such as `max_features` and `ngram_range`. `Max_features` limits the number of unique words considered. Varying this number can help manage noise and complexity in my models, but it's a balancing act. `Ngram_range`, on the other hand, determines the length of word groups used. A setting like (1,3) can help capture more context by considering unigrams, bigrams, and trigrams. 

But here's the thing: when I experimented with these parameters, my models' performance didn't improve. I tried both increasing and decreasing `max_features`, and experimented with larger n-grams. It seems the initial settings were already well-tuned for my task and data, and adding complexity didn't help but possibly added more noise or overfitting. So, I decided to stick with my initial vectorization settings, reminding me that machine learning often involves a well-guided trial and error process.

In [None]:
'''# Initialize the vectorizer with desired parameters
vectorizer = TfidfVectorizer(ngram_range=(1,3), max_features=600000)

# Fit it to the training data
vectorizer.fit(X_train)

print(f'Vectorizer fitted.')
print('No. of feature_words: ', len(vectorizer.get_feature_names_out()))'''

Transforming the data set

In [None]:
X_train = vectorizer.transform(X_train)
X_test  = vectorizer.transform(X_test)
print(f'Data Transformed.')

#function to evaluate different models
---
I first implemented a basic function that plotted a confusion-matrix for each model to be trained.

In [None]:
'''def model_Evaluate(model):
    
    # Predict values for Test dataset
    y_pred = model.predict(X_test)

    # Print the evaluation metrics for the dataset.
    print(classification_report(y_test, y_pred))
    
    # Compute and plot the Confusion matrix
    cf_matrix = confusion_matrix(y_test, y_pred)

    categories  = ['Negative','Positive']
    group_names = ['True Neg','False Pos', 'False Neg','True Pos']
    group_percentages = ['{0:.2%}'.format(value) for value in cf_matrix.flatten() / np.sum(cf_matrix)]

    labels = [f'{v1}\n{v2}' for v1, v2 in zip(group_names,group_percentages)]
    labels = np.asarray(labels).reshape(2,2)

    sns.heatmap(cf_matrix, annot = labels, cmap = 'Blues',fmt = '',
                xticklabels = categories, yticklabels = categories)

    plt.xlabel("Predicted values", fontdict = {'size':14}, labelpad = 10)
    plt.ylabel("Actual values"   , fontdict = {'size':14}, labelpad = 10)
    plt.title ("Confusion Matrix", fontdict = {'size':18}, pad = 20)'''

The improved function `model_Evaluate()` helps me assess the performance of a binary classification model. It predicts values for my test set and generates a classification report, including metrics like precision, recall.

It then generates a confusion matrix, visualized as a heatmap. This lets me see the true positives, false positives, true negatives, and false negatives at a glance. I also calculate sensitivity (or recall), specificity, and overall accuracy, each giving me different insights into my model's performance.

Finally, I plot a Receiver Operating Characteristic (ROC) curve and calculate the area under the curve (AUC). This gives me a single, summarizing figure for the performance of my model, showing how well it distinguishes between classes.

Comparing all these metrics gives me a rounded view of my model's performance. Each one tells me something different and important about how well my model is doing and where it might be going wrong. Depending on my task, some metrics may be more important than others, and this function allows me to consider all these factors.

In [None]:
def model_Evaluate(model):
    # Predict values for the test dataset
    y_pred = model.predict(X_test)

    # Print the evaluation metrics for the dataset.
    print(classification_report(y_test, y_pred))

    # Compute and plot the Confusion matrix
    cf_matrix = confusion_matrix(y_test, y_pred)

    categories = ['Negative', 'Positive']
    group_names = ['True Neg', 'False Pos', 'False Neg', 'True Pos']
    group_percentages = ['{0:.2%}'.format(value) for value in cf_matrix.flatten() / np.sum(cf_matrix)]

    labels = [f'{v1}\n{v2}' for v1, v2 in zip(group_names, group_percentages)]
    labels = np.asarray(labels).reshape(2, 2)

    fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 6))

    # Plot Confusion Matrix
    sns.heatmap(cf_matrix, annot=labels, cmap='Blues', fmt='', xticklabels=categories, yticklabels=categories,
                ax=ax1)
    ax1.set_xlabel("Predicted values", fontdict={'size': 14}, labelpad=10)
    ax1.set_ylabel("Actual values", fontdict={'size': 14}, labelpad=10)
    ax1.set_title("Confusion Matrix", fontdict={'size': 18}, pad=20)

    # Compute and plot ROC curve
    fpr, tpr, thresholds = roc_curve(y_test, y_pred)
    roc_auc = roc_auc_score(y_test, y_pred)

    # Round the roc_auc to two decimal places
    roc_auc = round(roc_auc, 2)

    # Plot ROC Curve
    ax2.plot(fpr, tpr, label='ROC Curve (area = %0.2f)' % roc_auc)
    ax2.plot([0, 1], [0, 1], 'k--')
    ax2.set_xlim([0.0, 1.0])
    ax2.set_ylim([0.0, 1.05])
    ax2.set_xlabel('False Positive Rate')
    ax2.set_ylabel('True Positive Rate')
    ax2.set_title('Receiver Operating Characteristic')
    ax2.legend(loc="lower right")

    plt.tight_layout()
    plt.show()


# Model Selection Strategy
---
I am adopting a dual approach for sentiment prediction. The first part of this strategy involves training a custom model, while the second part harnesses the capabilities of the GPT-API for sentiment analysis.

Considering the utilization of the GPT-API, I have decided to train a relatively simpler custom model. I'll be training and comparing the performance of three distinct models:

- Bernoulli Naive Bayes Model
- Linear Support Vector Machine Model
- Logistic Regression Model

Among these, the model that delivers the highest performance will be selected for further optimization. This optimized model will then be used to predict the sentiments of tweets, which I'll retrieve directly from the Twitter API.


## BernoulliNB Model

This is a probabilistic classifier that makes use of Bayes' Theorem with strong independence assumptions. It is particularly suitable for data that can be binary, like the presence or absence of a word in text.

In [None]:
BNBmodel = BernoulliNB(alpha = 2)
BNBmodel.fit(X_train, y_train)
model_Evaluate(BNBmodel)

## Linear SVM Model

This is a maximum-margin classifier which works by constructing a hyperplane or a set of hyperplanes in a high or infinite dimensional space, making it a robust model for text classification tasks.

In [None]:
SVCmodel = LinearSVC()
SVCmodel.fit(X_train, y_train)
model_Evaluate(SVCmodel)

## Logistic Regression Model 

This is a statistical model that uses a logistic function to model a binary dependent variable. In the context of sentiment analysis, it predicts the probability of a particular sentiment (positive or negative) based on input features.

In [None]:
LRmodel = LogisticRegression(C = 2, max_iter = 1000, n_jobs=-1)
LRmodel.fit(X_train, y_train)
model_Evaluate(LRmodel)

##Comparing the performance of the different models

In terms of sensitivity (recall), the Logistic Regression model performs the best with a value of 81.16%. It indicates that the model correctly identifies 81.16% of positive (1) instances.

Regarding specificity, the Logistic Regression model also performs the best with a specificity of 77.94%. It indicates that the model correctly identifies 77.94% of negative (0) instances.

In terms of overall accuracy, the Logistic Regression model achieves an accuracy of 79.55%, indicating the percentage of correctly predicted instances out of the total dataset.

The ROC AUC score is the same for the first two models, which is 78.00%. For the Logistic Regression Model its slightly better with 80%.

Based on these metrics, the Logistic Regression model appears to be the most balanced and accurate among the three models, considering both sensitivity and specificity.

#Optimizing the chosen logistic regression model

##**This operation needs a lot of time, i would not recommend to restart it!**

To optimize the logistic regression model, I will utilize a grid search with K-Fold Cross-Validation (K = 5) to evaluate the model's performance using different hyperparameter combinations. The grid search will consider various metrics and assess their impact on the model's accuracy. During the evaluation process, the code will display the progress, allowing me to track the performance of each combination. After completing the grid search, the best hyperparameters and their corresponding score will be printed, providing valuable insights to improve the model's predictive capabilities and overall performance.

In [None]:
# Create the logistic regression model
LRmodel = LogisticRegression(n_jobs=-1)

# Define the hyperparameter grid
param_grid = {
    'C': [0.1, 1, 2, 5, 10],
    'max_iter': [100, 500, 1000],
}

# Create the grid search object
grid_search = GridSearchCV(LRmodel, param_grid, cv=5, scoring='accuracy', verbose=2)

# Perform grid search to find the best hyperparameters
grid_search.fit(X_train, y_train)

# Print the best hyperparameters and the corresponding score
print("Best Hyperparameters:", grid_search.best_params_)
print("Best Score:", grid_search.best_score_)


###Result from the Grid-Search

The best hyperparameter combination according to the Grid-Search is the following:

Best Hyperparameters: {'C': 1, 'max_iter': 100}
Best Score: 0.793633552631579

###Train the model with the new hyperparameter
Now I train and evaluate the model with those hyperparameters

In [None]:
LRmodel = LogisticRegression(C = 1, max_iter = 100, n_jobs=-1)
LRmodel.fit(X_train, y_train)
model_Evaluate(LRmodel)

The revised accuracy for the Logistic Regression model is 79.61%, showing a slight improvement compared to the previous value of 79.55%. The ROC AUC is still at 80%.

# Load Data from the Twitter-API
---
In this section, I am utilizing the Twitter API to fetch a specific number of tweets. To ensure the quality of the retrieved tweets, I specify a particular currency to be mentioned in the tweets while excluding commonly used spam-bot words. Additionally, I exclude mentions of "NFT" and "NFTs" as they are not relevant to this project focused on crypto-currencies. This approach helps filter out spam and ensure the retrieved tweets are relevant to the desired topic.

In [None]:
search_url = "https://api.twitter.com/2/tweets/search/recent"

# Hardcoded query and excluded words to avoid too many spam-tweets from bots
query = "Chiliz"
excluded_words = ['airdrop', 'bot', 'retweet', 'retweeted', 'RT', 'wallet', 'mint', 'ticket', 'drop', 'opensea', 'blur', 'NFT', 'NFTs', 'giveaway', 'announce', 'announcement']

# Construct the excluded words portion of the query
excluded_query = ' '.join(f'-{word}' for word in excluded_words)

# Combine the query and excluded words
full_query = f'{query} {excluded_query}'

# Set query parameters
query_params = {'query': full_query, 'tweet.fields': 'author_id', 'max_results': 50}

def bearer_oauth(r):
    r.headers["Authorization"] = f"Bearer {BEARER_TOKEN}"
    r.headers["User-Agent"] = "v2FilteredStreamPython"
    return r

def connect_to_endpoint(url, params):
    response = requests.request("GET", url, auth=bearer_oauth, params=params)
    if response.status_code != 200:
        raise Exception(response.status_code, response.text)
    return response.json()

json_response = connect_to_endpoint(search_url, query_params)

# Convert the response to a DataFrame
data = [{'id': tweet['id'], 'text': tweet['text'], 'query': query} for tweet in json_response['data']]
loaded_tweets = pd.DataFrame(data)

# Print the DataFrame
print(loaded_tweets)


# Language recognition
---

In previous versions of the Twitter API, developers had the capability to select the language of the tweets directly from the API. However, this functionality is no longer available. To overcome this limitation, I am utilizing a pre-trained language detection model to filter and retain only English tweets. This ensures that the tweets used for training and prediction align with the language-specific focus of my model, which is trained exclusively on English tweets.

In [None]:
#import the model
tokenizer = AutoTokenizer.from_pretrained("papluca/xlm-roberta-base-language-detection")

model = AutoModelForSequenceClassification.from_pretrained("papluca/xlm-roberta-base-language-detection")

In [None]:
#define a function to get a predicted label for the text of a tweets
def predict_language(text):
    # tokenize the text
    inputs = tokenizer(text, return_tensors="pt")
    
    # run the text through the model and get the logits
    outputs = model(**inputs)
    logits = outputs.logits

    # compute the probabilities from the logits
    probabilities = F.softmax(logits, dim=1).detach().numpy()
    
    # get the label of the highest probability
    predicted_label = model.config.id2label[probabilities.argmax()]
    
    return predicted_label

# create a new column with the predicted language
loaded_tweets['language'] = loaded_tweets['text'].apply(predict_language)

# filter out rows that are not in English
english_tweets = loaded_tweets[loaded_tweets['language'] == 'en']


In [None]:
print(english_tweets)

# Get a Sentiment from the GPT-API for the english tweets
---

To obtain the sentiment for each English tweet, I retrieve them individually from the GPT-API. The dataset is then enhanced by adding the sentiment obtained from the API for each respective tweet. The sentiment prediction is encoded as 1 for positive sentiment and 0 for negative sentiment.

For the prediction task, I opt to use the "text-ada-0001" engine. This decision was primarily driven by financial considerations rather than quality considerations. While the "Davinci" model may be more powerful, its cost is approximately 50 times higher than that of the "text-ada-0001" model.

# **First attempt - please do not rerun this code!**

I documented the evaluation process to test the quality of predictions made by the GPT model. The details of this evaluation are provided at the end of this notebook

In [None]:
'''# Set up  OpenAI API credentials
openai.api_key = GPT_SECRET_KEY

# Set up empty lists to store tweet IDs, tweet texts, and predictions
tweet_ids = []
tweet_texts = []
predictions = []

# Iterate over each tweet in the loaded dataset
for index, row in english_tweets.iterrows():
    tweet_id = row['id']
    tweet_text = row['text']
    
    # Append tweet ID and text to the respective lists
    tweet_ids.append(tweet_id)
    tweet_texts.append(tweet_text)

    # Set up your OpenAI API request
    prompt = f"Analyze the sentiment of the following tweet: '{tweet_text}'. Answer with 'positive' or 'negative' depending on the sentiment, just one word in the answer"
    
    response = openai.Completion.create(
        engine="text-ada-001",
        prompt=prompt,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.0
    )
    
    # Extract the predicted sentiment from the OpenAI API response
    sentiment = response.choices[0].text.strip()
    
    # Append the predicted sentiment to the list of predictions
    predictions.append(sentiment)
    
# Create a new DataFrame with tweet IDs, texts, and predictions
updated_data = pd.DataFrame({'id': tweet_ids, 'text': tweet_texts, 'query': english_tweets['query'], 'gpt_pred': predictions})

# Replace positive/negative in 'gpt_pred' column with 1/0 and '.' with 'N/A'
updated_data['gpt_pred'] = updated_data['gpt_pred'].apply(lambda x: 1 if 'positive' in str(x).lower() else (0 if 'negative' in str(x).lower() else 'N/A'))'''


# **Optimized Code**

In this code, I prioritized the optimization of the GPT-Request to enhance the sentiment analysis for each tweet. To achieve this, I made the following modifications:
- Formulated a new prompt to elicit more accurate sentiment responses.
- Slightly increased the temperature parameter to introduce more randomness in the generated text and encourage diverse sentiment predictions.

In [None]:
# Set up  OpenAI API credentials
openai.api_key = GPT_SECRET_KEY

# Set up empty lists to store tweet IDs, tweet texts, and predictions
tweet_ids = []
tweet_texts = []
predictions = []

# Iterate over each tweet in the loaded dataset
for index, row in english_tweets.iterrows():
    tweet_id = row['id']
    tweet_text = row['text']
    
    # Append tweet ID and text to the respective lists
    tweet_ids.append(tweet_id)
    tweet_texts.append(tweet_text)

    # Set up your OpenAI API request
    prompt = f"Given the following tweet, would you say the sentiment expressed is 'positive' or 'negative'? Tweet: '{tweet_text}'"
    
    response = openai.Completion.create(
        engine="text-ada-001",
        prompt=prompt,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.1
    )
    
    # Extract the predicted sentiment from the OpenAI API response
    sentiment = response.choices[0].text.strip()
    
    # Append the predicted sentiment to the list of predictions
    predictions.append(sentiment)
    
# Create a new DataFrame with tweet IDs, texts, and predictions
updated_data = pd.DataFrame({'id': tweet_ids, 'text': tweet_texts, 'query': english_tweets['query'], 'gpt_pred': predictions})

# Replace positive/negative in 'gpt_pred' column with 1/0 and '.' with 'N/A'
updated_data['gpt_pred'] = updated_data['gpt_pred'].apply(lambda x: 1 if 'positive' in str(x).lower() else (0 if 'negative' in str(x).lower() else 'N/A'))

In [None]:
print(updated_data)

# Get a prediction from the LR-Model 
---

I proceed to predict the sentiment of each tweet using my trained logistic regression model. Subsequently, I enhance the dataset by incorporating the sentiment predictions obtained from the model for each respective tweet. This augmentation adds valuable sentiment information to the dataset for further analysis and evaluation.

In [None]:
# Preprocess the tweets
preprocessed_tweets = preprocess(tweet_texts)

# Transform the tweets to a numerical representation
X = vectorizer.transform(preprocessed_tweets)

# Make predictions
predictions = LRmodel.predict(X)

# Add the predictions to the dataframe
updated_data['lr_pred'] = predictions


In [None]:
print(updated_data)

# Save the Data-Frame to a Mongo-DB for later use
---

All the fetched tweets from the Twitter API, along with the corresponding sentiments obtained from both the GPT model and the linear regression (LR) model, are stored in a MongoDB database. This allows for easy retrieval and utilization of the tweet data and sentiment predictions from both models for subsequent analyses and comparisons.

In [None]:
# Create a new client and connect to the server
client = MongoClient(MONGO_CONNECTION_STRING)

db = client.ML2Project

# Assuming you want to store the DataFrame in a MongoDB collection named "mycollection"
collection = db.cryptotweetssentiment

# Add a timestamp column to the DataFrame with the current date
updated_data['upload_date'] = datetime.now().date().strftime('%Y-%m-%d')

# Convert the DataFrame to a list of dictionaries
data_dict = updated_data.to_dict("records")

# Insert documents into the collection
collection.insert_many(data_dict)

#Load and compare all saved tweets
---
To create the bar plot, I retrieve the relevant data from the MongoDB collection. I filter the data to include only the instances where the GPT-Prediction and LR-Prediction are identical. Then, I plot the number of positive and negative tweets per day using a bar plot. This allows for a comparison of sentiment distribution between the two models, considering only the instances where their predictions align.

In [None]:
client = MongoClient(MONGO_CONNECTION_STRING)
db = client.ML2Project
collection = db.cryptotweetssentiment
queries = collection.distinct("query")

bar_width = 0.35

for query in queries:
    if query in ["evaluation", "new_evaluation"]:
        continue

    data = list(collection.find({"query": query}))
    df = pd.DataFrame(data)
    df['upload_date'] = pd.to_datetime(df['upload_date']).dt.date

    df_filtered = df[df['lr_pred'] == df['gpt_pred']]

    groups = df_filtered.groupby(['upload_date', 'lr_pred'])
    dates = []
    sentiment_counts = defaultdict(lambda: {'Positive': 0, 'Negative': 0, 'N/A': 0})
    for (date, pred), group in groups:
        if date not in dates:  # Append date only if it's not already in the list
            dates.append(date)
        if pred == 1:
            sentiment_counts[date]['Positive'] = group.shape[0]
        elif pred == 0:
            sentiment_counts[date]['Negative'] = group.shape[0]
        else:
            sentiment_counts[date]['N/A'] = group.shape[0]

    fig, ax = plt.subplots()
    r1 = np.arange(len(dates))
    
    ax.bar(r1, [sentiment_counts[date]['Positive'] for date in dates], color='green', width=bar_width, label='Positive')
    ax.bar(r1, [sentiment_counts[date]['Negative'] for date in dates], color='red', width=bar_width, bottom=[sentiment_counts[date]['Positive'] for date in dates], label='Negative')
  

    plt.xticks(r1, [str(date) for date in dates], rotation=45)
    ax.set_ylabel('Count')
    ax.set_title(f'Sentiment Distribution for {query}')
    ax.legend(loc='upper center', bbox_to_anchor=(0.5, -0.2), fancybox=True, ncol=3)

    plt.show()
    print("\n")


#Possible optimizations
---
To optimize the output and usability of my code, I can consider the following suggestions:

1. **I can gather more data from Twitter**: By increasing the amount of data I use for training, I can improve my performance and ability to understand different contexts. Gathering a larger and more diverse dataset from Twitter will enhance my accuracy and enable me to handle a wider range of scenarios.

2. **I can explore using a more advanced GPT engine**: If available, I can leverage a better and more sophisticated language model such as GPT-4. Upgrading to a more advanced model can potentially improve my results due to advancements in architecture and training techniques. However, it's important to consider that utilizing such models may come with higher computational costs.

3. **I can optimize my tweet retrieval process**: To enhance the quality of the data I use, I can implement filters to exclude bot-generated or irrelevant tweets. By removing noise and focusing on high-quality tweets, I can improve the accuracy and reliability of my predictions.

4. **With more computational power, I can train a self-tailored model**: If I have access to additional computational resources, I can train a custom model on a larger dataset. This approach allows me to fine-tune the model specifically to the task at hand, potentially improving my performance and delivering more accurate results.

5. **I can provide a user-friendly interface**: By creating a front-end or backend system, I can offer users the ability to select and visualize specific cryptocurrencies of interest. This way, users can focus on the data they find most relevant, improving the usability and making it easier for them to interpret and analyze the results.

By implementing these optimizations, I can enhance my overall performance, accuracy, and usability when analyzing and visualizing cryptocurrency data from Twitter.

#Comparing the quality of GPT-predictions with a test-data-set
---
 

# ***!!!! Please do not execute this code again - the result is saved in a mongo-db !!!!***
 
 

The gathered data until the 7th of june reveals that the GPT-Request exhibits minimal negative predictions. Consequently, my aim is to evaluate the performance of the GPT Model and the Logistic Regression Model on the labeled test set that is encompassed within the dataset.

In [None]:
'''#converting the test set to a pandas data frame

evaluation_set = dataset['test'].to_pandas()

#randomly select 50 tweets

def select_random_tweets(dataset, random_seed=42):
    # Set the random seed for reproducibility
    np.random.seed(random_seed)
    
    # Separate tweets with sentiment 4 and 0
    sentiment_4 = evaluation_set[evaluation_set['sentiment'] == 4]
    sentiment_0 = evaluation_set[evaluation_set['sentiment'] == 0]

    # Randomly select 25 tweets from each sentiment
    random_tweets_4 = sentiment_4.sample(25)
    random_tweets_0 = sentiment_0.sample(25)
    
    # Combine both DataFrames
    random_tweets = pd.concat([random_tweets_4, random_tweets_0])
    
    return random_tweets

random_tweets = select_random_tweets(evaluation_data)

random_tweets = random_tweets[['text','sentiment']]
random_tweets['sentiment'] = random_tweets['sentiment'].replace(4,1)


def make_predictions(random_tweets):
    # Preprocess the tweets
    preprocessed_tweets = preprocess(random_tweets['text'])  # Replace 'tweet_text' with your column name

    # Transform the tweets to a numerical representation
    X = vectoriser.transform(preprocessed_tweets)

    # Make predictions
    predictions = LRmodel.predict(X)

    # Add the predictions to the dataframe
    random_tweets['lr_pred'] = predictions

    return random_tweets

# Now, call the function on your random_tweets
updated_random_tweets = make_predictions(random_tweets)

# Set up your OpenAI API credentials
openai.api_key = GPT_SECRET_KEY

# Set up empty lists to store tweet indexes, tweet texts, sentiments, LR predictions, and GPT predictions
tweet_indexes = []
tweet_texts = []
sentiments = []
lr_predictions = []
gpt_predictions = []

# Iterate over each tweet in the loaded dataset
for index, row in random_tweets.iterrows():
    tweet_index = index
    tweet_text = row['text']
    sentiment = row['sentiment']
    lr_pred = row['lr_pred']

    # Append tweet index, text, sentiment, and LR prediction to the respective lists
    tweet_indexes.append(tweet_index)
    tweet_texts.append(tweet_text)
    sentiments.append(sentiment)
    lr_predictions.append(lr_pred)

    # Set up your OpenAI API request
    prompt = f"Analyze the sentiment of the following tweet: '{tweet_text}'. Answer with 'positive' or 'negative' depending on the sentiment, just one word in the answer"
    
    response = openai.Completion.create(
        engine="text-ada-001",
        prompt=prompt,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.0
    )
    
    # Extract the predicted sentiment from the OpenAI API response
    sentiment = response.choices[0].text.strip()
    
    # Append the predicted sentiment to the list of GPT predictions
    gpt_predictions.append(sentiment)
    
# Create a new DataFrame with tweet indexes, texts, sentiments, LR predictions, and GPT predictions
updated_data = pd.DataFrame({
    'index': tweet_indexes,
    'text': tweet_texts, 
    'sentiment': sentiments,
    'lr_pred': lr_predictions,
    'gpt_pred': gpt_predictions
})

# Replace positive/negative in 'gpt_pred' column with 1/0 and '.' with 'N/A'
updated_data['gpt_pred'] = updated_data['gpt_pred'].apply(lambda x: 1 if 'positive' in str(x).lower() else (0 if 'negative' in str(x).lower() else 'N/A'))

# Create a new client and connect to the server
client = MongoClient(MONGO_CONNECTION_STRING)

# Connect to the desired database
db = client.ML2Project

# Connect to the desired collection
collection = db.cryptotweetssentiment

# Add a new column 'query' with the value 'evaluation' for all rows
updated_data['query'] = 'evaluation'

# Add a timestamp column to the DataFrame with the current date
updated_data['upload_date'] = datetime.now().date().strftime('%Y-%m-%d')

# Convert the DataFrame to a list of dictionaries
data_dict = updated_data.to_dict("records")

# Insert documents into the collection
collection.insert_many(data_dict)'''

#Comparing the results of the different models with the sentiment from the testset
---

Now I compare the sentiments with a confusion matrix for both predictions.

In [None]:
# Create a new client and connect to the server
client = MongoClient(MONGO_CONNECTION_STRING)

# Connect to the desired database
db = client.ML2Project

# Connect to the desired collection
collection = db.cryptotweetssentiment

# Fetch all documents with query "evaluation" and convert them into a DataFrame
data = pd.DataFrame(list(collection.find({"query": "evaluation"})))

# Define the target labels and prediction values
y_actual = data['sentiment']
y_lr_pred = data['lr_pred']
y_gpt_pred = data['gpt_pred']

# Create confusion matrices
cm_lr = confusion_matrix(y_actual, y_lr_pred)
cm_gpt = confusion_matrix(y_actual, y_gpt_pred)

# Calculate additional metrics
lr_sensitivity = cm_lr[1, 1] / (cm_lr[1, 1] + cm_lr[1, 0])
lr_specificity = cm_lr[0, 0] / (cm_lr[0, 0] + cm_lr[0, 1])
lr_accuracy = (cm_lr[0, 0] + cm_lr[1, 1]) / np.sum(cm_lr)

gpt_sensitivity = cm_gpt[1, 1] / (cm_gpt[1, 1] + cm_gpt[1, 0])
gpt_specificity = cm_gpt[0, 0] / (cm_gpt[0, 0] + cm_gpt[0, 1])
gpt_accuracy = (cm_gpt[0, 0] + cm_gpt[1, 1]) / np.sum(cm_gpt)

# Create subplots for the confusion matrices
fig, axs = plt.subplots(1, 2, figsize=(10, 4))

# Plot LR confusion matrix
sns.heatmap(cm_lr, annot=True, fmt='d', cmap='Blues', ax=axs[0])
axs[0].set_title('Confusion Matrix (LR Predictions)\n\nSensitivity (Recall): {:.2%}\nSpecificity: {:.2%}\nAccuracy: {:.2%}'.format(lr_sensitivity, lr_specificity, lr_accuracy))
axs[0].set_xlabel('Predicted')
axs[0].set_ylabel('Actual')

# Plot GPT confusion matrix
sns.heatmap(cm_gpt, annot=True, fmt='d', cmap='Blues', ax=axs[1])
axs[1].set_title('Confusion Matrix (GPT Predictions)\n\nSensitivity (Recall): {:.2%}\nSpecificity: {:.2%}\nAccuracy: {:.2%}'.format(gpt_sensitivity, gpt_specificity, gpt_accuracy))
axs[1].set_xlabel('Predicted')
axs[1].set_ylabel('Actual')

# Adjust spacing between subplots
plt.tight_layout()

# Show the plot
plt.show()


The confusion matrix reveals that the accuracy for the GPT-Prediction is only 50% since every tweet is predicted as positive. This suggests that the prompt needs to be adjusted to improve the GPT model's prediction performance. The modified code, which addresses this issue, is provided above.

#Evaluate tweets again with a optimized GPT-Query
---
# ***!!!! Please do not execute this code again - the result is saved in a mongo-db !!!!***



In [None]:
'''
# Setting API Key
openai.api_key = GPT_SECRET_KEY

# Connecting to MongoDB
client = MongoClient(MONGO_CONNECTION_STRING)
db = client.ML2Project
collection = db.cryptotweetssentiment

# Dataframe creation from your dataset
evaluation_set = dataset['test'].to_pandas()

# Function for selecting random tweets
def select_random_tweets(dataset, random_seed=42):
    np.random.seed(random_seed)
    sentiment_4 = evaluation_set[evaluation_set['sentiment'] == 4]
    sentiment_0 = evaluation_set[evaluation_set['sentiment'] == 0]
    random_tweets_4 = sentiment_4.sample(25)
    random_tweets_0 = sentiment_0.sample(25)
    random_tweets = pd.concat([random_tweets_4, random_tweets_0])
    return random_tweets

random_tweets = select_random_tweets(evaluation_set)
random_tweets = random_tweets[['text','sentiment']]
random_tweets['sentiment'] = random_tweets['sentiment'].replace(4,1)

# Preprocess and predict with LR model
preprocessed_tweets = preprocess(random_tweets['text'])
X = vectoriser.transform(preprocessed_tweets)
predictions = LRmodel.predict(X)
random_tweets['lr_pred'] = predictions

# Get predictions with GPT
tweet_indexes = []
tweet_texts = []
sentiments = []
lr_predictions = []
gpt_predictions = []

for index, row in random_tweets.iterrows():
    tweet_index = index
    tweet_text = row['text']
    sentiment = row['sentiment']
    lr_pred = row['lr_pred']

    tweet_indexes.append(tweet_index)
    tweet_texts.append(tweet_text)
    sentiments.append(sentiment)
    lr_predictions.append(lr_pred)

    prompt = f"Given the following tweet, would you say the sentiment expressed is 'positive' or 'negative'? Tweet: '{tweet_text}'"
    
    response = openai.Completion.create(
        engine="text-ada-001",
        prompt=prompt,
        max_tokens=100,
        n=1,
        stop=None,
        temperature=0.1
    )

    sentiment = response.choices[0].text.strip()
    gpt_predictions.append(sentiment)

updated_data = pd.DataFrame({
    'index': tweet_indexes,
    'text': tweet_texts, 
    'sentiment': sentiments,
    'lr_pred': lr_predictions,
    'gpt_pred': gpt_predictions
})

updated_data['gpt_pred'] = updated_data['gpt_pred'].apply(lambda x: 1 if 'positive' in str(x).lower() else (0 if 'negative' in str(x).lower() else 'N/A'))

# Adding evaluation to the 'query' column and today's date to 'upload_date'
updated_data['query'] = 'new_evaluation'
updated_data['upload_date'] = datetime.now().date().strftime('%Y-%m-%d')

# Storing the data in MongoDB
data_dict = updated_data.to_dict("records")
collection.insert_many(data_dict)'''

#Confusion matrix with optimized GPT-Promt
---

In the new request there are some N/A-Values, those are ignored in the confusion matrix. 

In [None]:
# Create a new client and connect to the server
client = MongoClient(MONGO_CONNECTION_STRING)

# Connect to the desired database
db = client.ML2Project

# Connect to the desired collection
collection = db.cryptotweetssentiment

# Fetch all documents with query "new_evaluation" and convert them into a DataFrame
data = pd.DataFrame(list(collection.find({"query": "new_evaluation"})))

# Replace 'N/A' values in 'gpt_pred' column with -1
data['gpt_pred'] = data['gpt_pred'].apply(lambda x: -1 if x == 'N/A' else x)

# Exclude rows with 'N/A' or -1 values in 'gpt_pred' column
valid_indices = (data['gpt_pred'] != 'N/A') & (data['gpt_pred'] != -1)
data = data[valid_indices]

# Define the target labels and prediction values
y_actual = data['sentiment']
y_lr_pred = data['lr_pred']
y_gpt_pred = data['gpt_pred']

# Create confusion matrices
cm_lr = confusion_matrix(y_actual, y_lr_pred)
cm_gpt = confusion_matrix(y_actual, y_gpt_pred)

# Calculate additional metrics
lr_sensitivity = cm_lr[1, 1] / (cm_lr[1, 1] + cm_lr[1, 0])
lr_specificity = cm_lr[0, 0] / (cm_lr[0, 0] + cm_lr[0, 1])
lr_accuracy = (cm_lr[0, 0] + cm_lr[1, 1]) / np.sum(cm_lr)

gpt_sensitivity = cm_gpt[1, 1] / (cm_gpt[1, 1] + cm_gpt[1, 0])
gpt_specificity = cm_gpt[0, 0] / (cm_gpt[0, 0] + cm_gpt[0, 1])
gpt_accuracy = (cm_gpt[0, 0] + cm_gpt[1, 1]) / np.sum(cm_gpt)

# Create subplots for the confusion matrices
fig, axs = plt.subplots(1, 2, figsize=(10, 4))

# Plot LR confusion matrix
sns.heatmap(cm_lr, annot=True, fmt='d', cmap='Blues', ax=axs[0])
axs[0].set_title('Confusion Matrix (LR Predictions)\nAdditional Metrics:\nSensitivity (Recall): {:.2%}\nSpecificity: {:.2%}\nAccuracy: {:.2%}'.format(lr_sensitivity, lr_specificity, lr_accuracy))
axs[0].set_xlabel('Predicted')
axs[0].set_ylabel('Actual')

# Plot GPT confusion matrix
sns.heatmap(cm_gpt, annot=True, fmt='d', cmap='Blues', ax=axs[1])
axs[1].set_title('Confusion Matrix (GPT Predictions)\nAdditional Metrics:\nSensitivity (Recall): {:.2%}\nSpecificity: {:.2%}\nAccuracy: {:.2%}'.format(gpt_sensitivity, gpt_specificity, gpt_accuracy))
axs[1].set_xlabel('Predicted')
axs[1].set_ylabel('Actual')

# Adjust spacing between subplots
plt.tight_layout()

# Show the plot
plt.show()


The confusion matrix shows, that the accuracy of the GPT-prediction has increased to 91.30% (ignoring the N/A values)