<a href="https://www.kaggle.com/code/janrumuller/nltk-exercise-gold-bitcoin?scriptVersionId=213573487" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

Doel van deze notebook is om een toepassing van sentiment analyse te demonstreren.  
Er wordt, op een klein tijdsinterval, nagegaan of:
- er een relatie is tussen het sentiment omtrent Bitcoin en de prijs van Bitcoin
- er een relatie is tussen het sentiment omtrent goud en de prijs van goud
- er een relatie is tussen het sentiment omtrent goud en bitcoin

Laatst bewaarde versie: versie 10 dd 17-12-2024

# Research question

Heeft de prijs van goud (X) een effect op de waarde van Bitcoin (BTC-USD) (Y)?

- H0: De prijs van goud heeft geen significante invloed op de waarde van Bitcoin.
- H1: De prijs van goud heeft een significante invloed op de waarde van Bitcoin

Beschrijving: Bitcoin wordt vaak vergeleken met goud als waardeopslag in onzekere 
tijden. Dit onderzoek analyseert of er een correlatie is tussen de goudprijs (X) en de 
waarde van Bitcoin (Y).

Verwachting: Lichte positieve correlatie.

In [38]:
# init
!pip install -q twython
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O
import nltk
import subprocess
from datetime import datetime as dt
from nltk import word_tokenize, PorterStemmer, WordNetLemmatizer, pos_tag
from nltk.sentiment import SentimentIntensityAnalyzer # pre-trained model 1
from textblob import TextBlob                         # pre-trained model 2
from transformers import pipeline                     # pre-trained model 3
from tabulate import tabulate
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

print("nltk version:", nltk.__version__)

huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)


/kaggle/input/bitcoin-and-gold-closing-prices-2018-2024/btcusd_and_goldcfd.csv
/kaggle/input/yfinance/news_GCF.csv
/kaggle/input/yfinance/news_BTC-USD.csv
nltk version: 3.2.4


Purpose: an introduction to the python nltk (natural language toolkit) library by means of an actual research question as formulated by a student of the Minor Big Data and Computer Aided (Financial) Analysis.

# Input files

There are two input files which have been separately created. The files are stored as Kaggle datasets. A third file, 'Datadictionary', is a Word document describing both input files. 

- The first file contains Gold and Bitcoin daily prices.
- The second file contains Gold and Bitcoin news items.

In [2]:
# show df_closing_prices top 5 rows
# files from yfinance
df_closing_prices = pd.read_csv("../input/bitcoin-and-gold-closing-prices-2018-2024/btcusd_and_goldcfd.csv")
df_closing_prices = df_closing_prices.iloc[2:]
df_closing_prices.head(4)

Unnamed: 0,Price,Close_Bitcoin,Close_GoldCFD
2,2018-01-01,13657.2001953125,1313.699951171875
3,2018-01-02,14982.099609375,1313.699951171875
4,2018-01-03,15201.0,1316.199951171875
5,2018-01-04,15599.2001953125,1319.4000244140625


In [31]:
# show df_yahoo_news 8 rows
df_news_items = pd.read_csv("../input/yfinance/news_GCF.csv")
all_titles = '\n'.join(df_news_items['title'])
print(all_titles)

   Unnamed: 0                                  uuid  \
0           0  44bd44d5-0ea4-3e2e-93a5-6e72a7a48220   
1           1  9d2fffdf-3f5a-3554-ab39-8e61f184e398   
2           2  6f5046b2-cec0-32c7-a851-2e8d77b369ec   
3           3  1356686c-d76f-3b93-872f-66d86b11ca63   
4           4  7a4fe652-040e-3b9b-a4fd-0aac84af8365   
5           5  9d556148-c61f-3823-a029-a59b39716a2f   
6           6  af4bb8d3-9c23-4b1d-94c2-97df0b85a671   
7           7  45606d4f-4906-3f0d-ba6a-d80d92caab87   

                                               title            publisher  \
0  Gold Pares Gains as Inflation Affirms Fed’s Ca...            Bloomberg   
1  Nasdaq, S&P 500 fall ahead of this morning's O...  Yahoo Finance Video   
2  Nasdaq up, Dow dips as market reacts to Trump'...  Yahoo Finance Video   
3          Should the US Treasury be buying bitcoin?  Yahoo Finance Video   
4  Dow leads stock gains as market digests Nvidia...  Yahoo Finance Video   
5  There's fixed income opportunities in t

In [44]:
title_len = 70
news_lines = '\n'.join([
    f"{dt.fromtimestamp(row['providerPublishTime']).strftime('%d-%m-%Y')} | {row['title']:<60.60} | {', '.join(eval(row['relatedTickers']))}" 
    for _, row in df_news_items.iterrows()
])
print(news_lines)

27-11-2024 | Gold Pares Gains as Inflation Affirms Fed’s Cautious Easing  | GC=F, ^FTSE
27-11-2024 | Nasdaq, S&P 500 fall ahead of this morning's October PCE dat | ^DJI, ^NDX, ^GSPC, ^IXIC, GC=F, KC=F, BTC-USD
26-11-2024 | Nasdaq up, Dow dips as market reacts to Trump's new tariffs  | GC=F, XRT, ^RUT, ^GSPC, BTC-USD, ^IXIC, ^DJI
26-11-2024 | Should the US Treasury be buying bitcoin?                    | BTC-USD, GC=F
22-11-2024 | Dow leads stock gains as market digests Nvidia, bitcoin move | NVDA, ^GSPC, BTC-USD, ^DJI, GC=F, ^IXIC, ^RUT, DX-Y.NYB, ^NDX
20-11-2024 | There's fixed income opportunities in the short term: Strate | GC=F, BZ=F, CL=F
19-11-2024 | Gold jumps to 1-week high as Russia-Ukraine war escalates    | ^GSPC, GC=F, DX-Y.NYB
19-11-2024 | Dow, stocks slide lower, shaken by Russia-Ukraine tensions   | ^GSPC, GC=F, ^DJI, ^NDX, DX-Y.NYB, ^IXIC, NVDA


# NLP (Natural Language Processing) specifieke terminologie

Hieronder worden de volgende vier begrippen toegelicht: "tokenization", "stemming", "lemmatization" en "Part of Sentence Tagging". Per begrip staat er een korte beschrijving en is er als voorbeeld een bijbehorende functie uit de NLTK bibliotheek toegepast op bovenstaande nieuwsberichten. 

## Tokenization

Tokenization refers to break down the text into smaller units. It entails splitting paragraphs into sentences and sentences into words. It is one of the initial steps of any NLP pipeline. (source: [geeksforgeeks.org](https://www.geeksforgeeks.org/introduction-to-nltk-tokenization-stemming-lemmatization-pos-tagging/))

In [10]:
# sentence > word
tokenized_text = word_tokenize(all_titles)
print(tokenized_text)

['Gold', 'Pares', 'Gains', 'as', 'Inflation', 'Affirms', 'Fed', '’', 's', 'Cautious', 'Easing', 'Path', 'Nasdaq', ',', 'S', '&', 'P', '500', 'fall', 'ahead', 'of', 'this', 'morning', "'s", 'October', 'PCE', 'data', 'Nasdaq', 'up', ',', 'Dow', 'dips', 'as', 'market', 'reacts', 'to', 'Trump', "'s", 'new', 'tariffs', 'Should', 'the', 'US', 'Treasury', 'be', 'buying', 'bitcoin', '?', 'Dow', 'leads', 'stock', 'gains', 'as', 'market', 'digests', 'Nvidia', ',', 'bitcoin', 'moves', 'There', "'s", 'fixed', 'income', 'opportunities', 'in', 'the', 'short', 'term', ':', 'Strategist', 'Gold', 'jumps', 'to', '1-week', 'high', 'as', 'Russia-Ukraine', 'war', 'escalates', 'Dow', ',', 'stocks', 'slide', 'lower', ',', 'shaken', 'by', 'Russia-Ukraine', 'tensions']


## Stemming

Stemming is the process of producing morphological variants of a root/base word. Stemming programs are commonly referred to as stemming algorithms or stemmers. A stemming algorithm reduces the words “chocolates”, “chocolatey”, and “choco” to the root word, “chocolate”. (source: [geeksforgeeks.org](https://www.geeksforgeeks.org/python-stemming-words-with-nltk/))

In [11]:
# Create a Porter Stemmer instance
porter_stemmer = PorterStemmer()
 
# Example words for stemming
#words = ["running", "jumps", "happily", "running", "happily"]
words = tokenized_text 
 
# Apply stemming to each word
stemmed_words = [porter_stemmer.stem(word) for word in words]
 
# Print the results
print("Original words:", words,"\n")
print("Stemmed words:", stemmed_words)

Original words: ['Gold', 'Pares', 'Gains', 'as', 'Inflation', 'Affirms', 'Fed', '’', 's', 'Cautious', 'Easing', 'Path', 'Nasdaq', ',', 'S', '&', 'P', '500', 'fall', 'ahead', 'of', 'this', 'morning', "'s", 'October', 'PCE', 'data', 'Nasdaq', 'up', ',', 'Dow', 'dips', 'as', 'market', 'reacts', 'to', 'Trump', "'s", 'new', 'tariffs', 'Should', 'the', 'US', 'Treasury', 'be', 'buying', 'bitcoin', '?', 'Dow', 'leads', 'stock', 'gains', 'as', 'market', 'digests', 'Nvidia', ',', 'bitcoin', 'moves', 'There', "'s", 'fixed', 'income', 'opportunities', 'in', 'the', 'short', 'term', ':', 'Strategist', 'Gold', 'jumps', 'to', '1-week', 'high', 'as', 'Russia-Ukraine', 'war', 'escalates', 'Dow', ',', 'stocks', 'slide', 'lower', ',', 'shaken', 'by', 'Russia-Ukraine', 'tensions'] 

Stemmed words: ['gold', 'pare', 'gain', 'as', 'inflat', 'affirm', 'fed', '’', 's', 'cautiou', 'eas', 'path', 'nasdaq', ',', 'S', '&', 'P', '500', 'fall', 'ahead', 'of', 'thi', 'morn', "'s", 'octob', 'pce', 'data', 'nasdaq', 'up

## Lematization

Lemmatization is the process of reducing a word to its base form, but unlike stemming, it takes into account the context of the word, and it produces a valid word, unlike stemming which may produce a non-word as the root form. (Source: [geeksforgeeks.org](https://www.geeksforgeeks.org/introduction-to-stemming/))

In [4]:
# Download and unzip wordnet
try:
    nltk.data.find('wordnet.zip')
except:
    nltk.download('wordnet', download_dir='/kaggle/working/')
    command = "unzip /kaggle/working/corpora/wordnet.zip -d /kaggle/working/corpora"
    subprocess.run(command.split())
    nltk.data.path.append('/kaggle/working/')

from nltk.corpus import wordnet

[nltk_data] Downloading package wordnet to /kaggle/working/...
Archive:  /kaggle/working/corpora/wordnet.zip
   creating: /kaggle/working/corpora/wordnet/
  inflating: /kaggle/working/corpora/wordnet/lexnames  
  inflating: /kaggle/working/corpora/wordnet/data.verb  
  inflating: /kaggle/working/corpora/wordnet/index.adv  
  inflating: /kaggle/working/corpora/wordnet/adv.exc  
  inflating: /kaggle/working/corpora/wordnet/index.verb  
  inflating: /kaggle/working/corpora/wordnet/cntlist.rev  
  inflating: /kaggle/working/corpora/wordnet/data.adj  
  inflating: /kaggle/working/corpora/wordnet/index.adj  
  inflating: /kaggle/working/corpora/wordnet/LICENSE  
  inflating: /kaggle/working/corpora/wordnet/citation.bib  
  inflating: /kaggle/working/corpora/wordnet/noun.exc  
  inflating: /kaggle/working/corpora/wordnet/verb.exc  
  inflating: /kaggle/working/corpora/wordnet/README  
  inflating: /kaggle/working/corpora/wordnet/index.sense  
  inflating: /kaggle/working/corpora/wordnet/data.

In [6]:
syns = wordnet.synsets("government")
print(syns[0].definition())

the organization that is the governing authority of a political unit


In [12]:
from nltk.stem import WordNetLemmatizer
# of from nltk.stem.wordnet import WordNetLemmatizer
# create an object of class WordNetLemmatizer
lemmatizer = WordNetLemmatizer()
print(lemmatizer.lemmatize("plays", 'v'))
print(lemmatizer.lemmatize("played", 'v'))
print(lemmatizer.lemmatize("play", 'v'))
print(lemmatizer.lemmatize("playing", 'v'))

play
play
play
play


## POS tag (Part of Sentence Tagging)

Part of Sentence (POS) tagging refers to assigning each word of a sentence to its part of speech. It is significant as it helps to give a better syntactic overview of a sentence. (source: [geeksforgeeks.com](https://www.geeksforgeeks.org/introduction-to-nltk-tokenization-stemming-lemmatization-pos-tagging/#part-of-speech-tagging))

* NNP=Proper noun singular (names of specific people, places, organizations, brands)
* NN =Common noun singular (book, car, tree)
* VBZ=Verb (runs, eats, thinks)
* DT =Determiner (the, a, this, that)
* CC =Coordinating conjunction (and, but, or)
* CD =Cardinal number (one, two, 1, 2)
* IN =Preposition/subordinating conjunction (in,on,by,after)

In [13]:
pos_tag(tokenized_text)

[('Gold', 'NNP'),
 ('Pares', 'NNP'),
 ('Gains', 'NNP'),
 ('as', 'IN'),
 ('Inflation', 'NNP'),
 ('Affirms', 'NNP'),
 ('Fed', 'NNP'),
 ('’', 'NNP'),
 ('s', 'VBP'),
 ('Cautious', 'NNP'),
 ('Easing', 'NNP'),
 ('Path', 'NNP'),
 ('Nasdaq', 'NNP'),
 (',', ','),
 ('S', 'NNP'),
 ('&', 'CC'),
 ('P', 'NNP'),
 ('500', 'CD'),
 ('fall', 'NN'),
 ('ahead', 'RB'),
 ('of', 'IN'),
 ('this', 'DT'),
 ('morning', 'NN'),
 ("'s", 'POS'),
 ('October', 'NNP'),
 ('PCE', 'NNP'),
 ('data', 'NNS'),
 ('Nasdaq', 'NNP'),
 ('up', 'RB'),
 (',', ','),
 ('Dow', 'NNP'),
 ('dips', 'VBZ'),
 ('as', 'IN'),
 ('market', 'NN'),
 ('reacts', 'VBZ'),
 ('to', 'TO'),
 ('Trump', 'NNP'),
 ("'s", 'POS'),
 ('new', 'JJ'),
 ('tariffs', 'NNS'),
 ('Should', 'MD'),
 ('the', 'DT'),
 ('US', 'NNP'),
 ('Treasury', 'NNP'),
 ('be', 'VB'),
 ('buying', 'VBG'),
 ('bitcoin', 'VB'),
 ('?', '.'),
 ('Dow', 'NNP'),
 ('leads', 'VBZ'),
 ('stock', 'NN'),
 ('gains', 'NNS'),
 ('as', 'IN'),
 ('market', 'NN'),
 ('digests', 'VBZ'),
 ('Nvidia', 'NNP'),
 (',', ','),


# Model Comparison

In [14]:
# List of headlines about gold

headlines = df_news_items['title'].tolist()

Model comparison
Add date to headers

## [](http://)VADER sentiment analyse tool

VADER (Valence Aware Dictionary and sEntiment Reasoner) is een lexicon- en rule-based sentiment analyse tool die specifiek is ontwikkeld voor het analyseren van sentimenten in social media tekst. Even de belangrijkste punten:

1. Oorsprong:
- Ontwikkeld door [C.J. Hutto](https://github.com/cjhutto) en Eric Gilbert aan Georgia Institute of Technology
- Gepubliceerd in 2014
- Specifiek ontworpen voor social media en informele tekst

2. Kenmerken:
- Speciaal geoptimaliseerd voor social media taal (emoticons, slang, afkortingen)
- Kan omgaan met hoofdletters voor emphasis ("GOED" vs "goed")
- Begrijpt intensifiers en diminishers ("heel goed" vs "beetje goed")
- Herkent negaties ("niet slecht" ≠ "slecht")

3. In tegenstelling tot taalmodellen:
- Genereert VADER geen tekst
- Begrijpt het niet de context of betekenis van zinnen
- Doet het geen voorspellingen over volgende woorden
- Heeft het geen begrip van grammatica of zinstructuur

VADER is minder complex dan moderne machine learning modellen, maar nog steeds nuttig vanwege zijn:
- Transparantie (je kunt precies zien hoe het tot oordelen komt)
- Snelheid
- Geen training nodig
- Goede prestaties op social media tekst

Belangrijk om te weten: hoewel VADER onderdeel is van NLTK, is het oorspronkelijk een standalone project dat later is geïntegreerd in de NLTK bibliotheek.

(bron: [claude.ai](https://claude.ai))

In [17]:
# Model : VADER (VA) - nltk

sia = SentimentIntensityAnalyzer()

table_data = []
headline_len = 70

# Analyze sentiment and prepare table data in one loop
for headline in headlines:
    VA_sentiment = sia.polarity_scores(headline)  # Get sentiment scores
    VA_score = VA_sentiment['compound']  # Extract compound sentiment
    table_data.append([headline[:headline_len], VA_score])  # Add to table data

# Calculate the average VA_score
average_VA_score = sum([row[1] for row in table_data]) / len(table_data) if table_data else 0
table_data.append(["Average VA Score", average_VA_score])

# Create and print the table
headers = ["Headline", "VA Score"]
table = tabulate(table_data, headers=headers, tablefmt="grid", numalign="right", stralign="left", floatfmt=".4f")
print(table)

+------------------------------------------------------------------+------------+
| Headline                                                         |   VA Score |
| Gold Pares Gains as Inflation Affirms Fed’s Cautious Easing Path |     0.4588 |
+------------------------------------------------------------------+------------+
| Nasdaq, S&P 500 fall ahead of this morning's October PCE data    |     0.0000 |
+------------------------------------------------------------------+------------+
| Nasdaq up, Dow dips as market reacts to Trump's new tariffs      |     0.0000 |
+------------------------------------------------------------------+------------+
| Should the US Treasury be buying bitcoin?                        |     0.2023 |
+------------------------------------------------------------------+------------+
| Dow leads stock gains as market digests Nvidia, bitcoin moves    |     0.3400 |
+------------------------------------------------------------------+------------+
| There's fixed 

## [](http://)TextBlob bibliotheek

TextBlob is een Python bibliotheek voor het verwerken van tekstdata. Hier zijn de belangrijkste punten:

1. Oorsprong:
- Gecreëerd door [Steven Loria](https://github.com/sloria)
- Eerste release in 2013
- Gebouwd bovenop NLTK (Natural Language Toolkit)

2. Doel:
- Vereenvoudigen van veel voorkomende NLP taken
- Toegankelijker maken van NLTK's functionaliteit
- Bieden van een intuïtieve interface voor tekstverwerking

3. Belangrijkste functionaliteit o.a.:
- Part-of-speech tagging
- Woord en zin tokenization
- Woord lemmatization
- Sentiment analyse

In tegenstelling tot VADER is TextBlob meer een algemeen tekstverwerkingsframework dan een gespecialiseerde tool voor één taak. Het biedt een toegankelijke manier om verschillende NLP taken uit te voeren zonder diep in de complexiteit van NLTK te hoeven duiken.

(bron: [claude.ai](https://claude.ai))

In [18]:
# Model TextBlob (TB)

table_data = []
headline_len = 70

# Analyze individual sentiment and store polarity as the compound sentiment score
for headline in headlines:
    blob = TextBlob(headline)
    TB_score = blob.sentiment.polarity  # Compound-like score from TextBlob
    table_data.append([headline[:headline_len], TB_score])

# Calculate the average TB_score
TB_average_score = sum([float(row[1]) for row in table_data]) / len(table_data) if table_data else 0
table_data.append(["Average TB Score", TB_average_score])

headers = ["Headline", "TB Score"]

print(tabulate(table_data, headers=headers, tablefmt="grid", numalign="right", stralign="left", floatfmt=".4f"))

+------------------------------------------------------------------+------------+
| Headline                                                         |   TB Score |
| Gold Pares Gains as Inflation Affirms Fed’s Cautious Easing Path |     0.0000 |
+------------------------------------------------------------------+------------+
| Nasdaq, S&P 500 fall ahead of this morning's October PCE data    |     0.0000 |
+------------------------------------------------------------------+------------+
| Nasdaq up, Dow dips as market reacts to Trump's new tariffs      |     0.1364 |
+------------------------------------------------------------------+------------+
| Should the US Treasury be buying bitcoin?                        |     0.0000 |
+------------------------------------------------------------------+------------+
| Dow leads stock gains as market digests Nvidia, bitcoin moves    |     0.0000 |
+------------------------------------------------------------------+------------+
| There's fixed 

## Hugging Face

Onderdeel van Hugging Face is de Model Hub

1. Model Hub
- Open-source platform voor het delen en gebruiken van ML-modellen
- Vergelijkbaar met GitHub, maar specifiek voor AI/ML modellen

2. Veel gebruikte sentiment analyse modellen:
- distilbert-base-uncased-finetuned-sst-2-english: Geoptimaliseerd voor Engelstalige sentiment analyse
- nlptown/bert-base-multilingual-uncased-sentiment: Geeft 1-5 sterren ratings
- ProsusAI/finbert: Specifiek voor financiële teksten

In tegenstelling tot Vader en TextBlob zijn Hugging Face modellen niet gebasseerd op regels maar op neurale netwerken (transformers).

(bron: [claude.ai](https://claude.ai))

In [28]:
# Model Hugging Face (HF)

table_data = []
headline_len = 70
positive_scores = []
negative_scores = []

sentiment_analyzer = pipeline("sentiment-analysis",
                             model="distilbert-base-uncased-finetuned-sst-2-english"
)

for headline in headlines:
    result = sentiment_analyzer(headline)[0]
    label = result['label']
    score = result['score']
    
    # Append the numeric score to the appropriate list
    if label == "POSITIVE":
        positive_scores.append(score)
    elif label == "NEGATIVE":
        score = -score  # Make the score negative for "NEGATIVE" labels
        negative_scores.append(score)
        
    table_data.append([headline[:headline_len], score])

# Calculate overall average scores
average_positive_score = sum(positive_scores) / len(positive_scores) if positive_scores else 0
average_negative_score = sum(negative_scores) / len(negative_scores) if negative_scores else 0
average_score = average_positive_score + average_negative_score

table_data.append(["Average HF Scores", average_score])

headers = ["Headline", "HF Score"]
print(tabulate(table_data, headers=headers, tablefmt="grid", numalign="right", stralign="left", floatfmt=".4f"))


+------------------------------------------------------------------+------------+
| Headline                                                         |   HF Score |
| Gold Pares Gains as Inflation Affirms Fed’s Cautious Easing Path |     0.9066 |
+------------------------------------------------------------------+------------+
| Nasdaq, S&P 500 fall ahead of this morning's October PCE data    |    -0.9923 |
+------------------------------------------------------------------+------------+
| Nasdaq up, Dow dips as market reacts to Trump's new tariffs      |    -0.9993 |
+------------------------------------------------------------------+------------+
| Should the US Treasury be buying bitcoin?                        |    -0.9993 |
+------------------------------------------------------------------+------------+
| Dow leads stock gains as market digests Nvidia, bitcoin moves    |     0.9776 |
+------------------------------------------------------------------+------------+
| There's fixed 

In [29]:
# Model VADER (VA)
# Model TextBlob (TB)
# Model Hugging Face (HF)

table_data = []
headline_len = 70
VA_scores = []
TB_scores = []
HF_positive = []
HF_negative = []

# Use all three models and get sentiments scores from headlines
for headline in headlines:

    VA_sentiment = sia.polarity_scores(headline)  # Get sentiment scores
    VA_score = VA_sentiment['compound']
    VA_scores.append(VA_score)
    
    TB_blob = TextBlob(headline)
    TB_score = TB_blob.sentiment.polarity  # Compound-like score from TextBlob
    TB_scores.append(TB_score)
    
    HF_result = sentiment_analyzer(headline)[0]
    HF_label = HF_result['label']
    HF_score = HF_result['score'] 
    
    if HF_label == "POSITIVE":
        HF_positive.append(HF_score)
    elif HF_label == "NEGATIVE":
        HF_score = -HF_score  # Make the score negative for "NEGATIVE" labels
        HF_negative.append(HF_score)
        
    table_data.append([headline[:headline_len], VA_score, TB_score, HF_score])

# Calculate the average VA_score
VA_average_score = sum(VA_scores) / len(VA_scores) if table_data else 0
TB_average_score = sum(TB_scores) / len(TB_scores) if table_data else 0
HF_average_positive = sum(HF_positive) / len(HF_positive) if HF_positive else 0
HF_average_negative = sum(HF_negative) / len(HF_negative) if HF_negative else 0
HF_average_score = HF_average_positive + HF_average_negative

table_data.append(["Average Scores", VA_average_score, TB_average_score,HF_average_score])

headers = ["Headline", "VA Score", "TB Score", "HF Score"]
print(tabulate(table_data, headers=headers, tablefmt="grid", numalign="right", stralign="left", floatfmt=".4f"))

+------------------------------------------------------------------+------------+------------+------------+
| Headline                                                         |   VA Score |   TB Score |   HF Score |
| Gold Pares Gains as Inflation Affirms Fed’s Cautious Easing Path |     0.4588 |     0.0000 |     0.9066 |
+------------------------------------------------------------------+------------+------------+------------+
| Nasdaq, S&P 500 fall ahead of this morning's October PCE data    |     0.0000 |     0.0000 |    -0.9923 |
+------------------------------------------------------------------+------------+------------+------------+
| Nasdaq up, Dow dips as market reacts to Trump's new tariffs      |     0.0000 |     0.1364 |    -0.9993 |
+------------------------------------------------------------------+------------+------------+------------+
| Should the US Treasury be buying bitcoin?                        |     0.2023 |     0.0000 |    -0.9993 |
+---------------------------

## Citing

If you publish work that uses NLTK, please cite the NLTK book, as follows:

    Bird, Steven, Edward Loper and Ewan Klein (2009).
    Natural Language Processing with Python.  O'Reilly Media Inc.