# News Droid

This is a notebook to experiment with different methods of getting news sentiment for a given coin.

It is assoaciated with a [Blog post on ProfitView](https://profitviews.net/blog/what-i-learned-when-building-an-ai-news-trading-bot).  You can sign-up there to run a bot that trades using news sentiment:
<div class="button" style="margin: 3rem auto; font-size: 1.1rem">
	<a href="https://profitview.net/register">
	  Sign up for ProfitView
	</a>
</div>

In [None]:
import json
import requests

In [37]:
coin = "Bitcoin"

# The [GDELT](https://www.gdeltproject.org/) Project 

Unfortunately kept getting rate limited by GDELT.  No query went through - even with a small query.

In [None]:
# GDELT API endpoint for news
base_url = "https://api.gdeltproject.org/api/v2/doc/doc"

# Parameters for GDELT query
params = \
    {   'query': coin
    ,   'format': 'json'
    ,   'maxrecords': 5  # GDELT allows up to 250 records
    ,   'timespan': '15m'
    ,   'sort': 'DateDesc'  # Sort by date descending
    ,   'headers': {'User-agent': 'news bot 0.1'}
    }

gdelt_data = ""
# Make the request
response = requests.get(base_url, params=params)
if response.status_code == 200:
    try:
        gdelt_data = response.json()
        # Print formatted JSON
        print(json.dumps(gdelt_data, indent=2))
    except json.JSONDecodeError:
        print("Error decoding JSON. Response might not be in JSON format.")
        print("Response text:", response.text)
else:
    print(f"Request failed with status code {response.status_code}")
    print("Response text:", response.text)

# Print formatted JSON
print(json.dumps(gdelt_data, indent=2))


# [News API](https://newsapi.org/)

This works well, but is delayed 24 hours unless on a paid plan, which is $449/month.

In [None]:
from newsapi import NewsApiClient
from datetime import datetime, timedelta

from dotenv import load_dotenv
import os
load_dotenv()  # Store the News API key in your .env file

In [31]:
newsapi = NewsApiClient(api_key=os.getenv("NEWS_API_KEY"))
newsapi_articles = newsapi.get_everything(q='bitcoin',
                                      from_param=datetime.now() - timedelta(days=2),  # On the free plan, data is delayed 24 hours and only 1 day of data is available
                                      sort_by='popularity')
newsapi_headlines = [article['title'] for article in newsapi_articles["articles"]]


# [Google News RSS](https://news.google.com/)

This is free, but the headlines are likely not as good as the others.


In [34]:
import feedparser

In [None]:
feed_url = f"https://news.google.com/rss/search?q={coin}&tbs=qdr:h"  # News focus is on the last hour (though older news is promoted if it's popular)

feed = feedparser.parse(feed_url)
headlines = [{
        "title": entry.title,
        "published": entry.published
    } for entry in feed.entries]
    
print(headlines)

In [None]:
google_feed_query = f"""Assess this set of news headlines as it pertains to cryptocurrency {coin}. Provide a single floating point number between -1.0 and 1.0, 
			            with -1.0 signifying extreme negativity, 0.0 neutrality and 1.0 extreme positivity.
			            Provide only the number and no other text: 
                        
                        """

google_feed_headlines = "".join([f"Published: {headline['published']}. Headline: {headline['title']}\n" for headline in headlines])

google_feed_query += google_feed_headlines

print(google_feed_query)

In [None]:
from dotenv import load_dotenv
import os
load_dotenv()  # Store the API key in your .env file

GPT_API_KEY = os.getenv("GPT_API_KEY") 

In [None]:
client = openai.OpenAI(api_key=GPT_API_KEY)

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "system", "content": "You are a cryptocurrency trading expert."}, {"role": "user", "content": google_feed_query}]
)
response.choices[0].message.content


# [VADER Sentiment Analysis](https://github.com/cjhutto/vaderSentiment)

Rather than using a LLM, we can use VADER to get a sentiment score.  VADER (Valence Aware Dictionary and sEntiment Reasoner) is a rule-based sentiment analysis tool designed specifically for short pieces of text such as social media posts, headlines, and reviews. VADER works by combining a lexicon of sentiment-laden words with rules that account for the impact of things like punctuation, capitalization, modifiers, and negations.

In [None]:
import nltk

In [None]:
nltk.download('vader_lexicon')

from nltk.sentiment.vader import SentimentIntensityAnalyzer

sid = SentimentIntensityAnalyzer()

vader_title_scores = [(headlines['title'], sid.polarity_scores(headline['title'])['compound']) for headline in headlines]

# GPT-4o-mini Per Headline

In order to compare with the other methods, we need to apply the GPT-4o-mini to each headline.

In [None]:
gpt_title_scores = [(article['title'], client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "system", "content": "You are a cryptocurrency trading expert."}, 
              {"role": "user", "content": 
               f"""Assess this news headline as it pertains to cryptocurrency {coin}: 
               {article['title']}. Provide a float between -1 and 1, where -1 is extremely negative, 0 is neutral, and 1 is extremely positive. Provide only the float, no other text."""}]
).choices[0].message.content) for article in headlines]


# [TextBlob](https://textblob.readthedocs.io/en/dev/)

Unlike VADER, which is rule-based, TextBlob uses a lexicon-based approach combined with a simple machine learning classifier for sentiment analysis.

In [None]:
from textblob import TextBlob

sentiment_scores = [TextBlob(headline['title']).sentiment.polarity for headline in headlines]

# Calculate overall sentiment
avg_sentiment = sum(sentiment_scores) / len(sentiment_scores)


# Compare the scores

It is instructive to compare the scores.  VADER and TextBlob use different methods to get a sentiment score, however neither can take context into account.  GPT-4o-mini should be the most accurate (because it has context), but it is also the most resource-intensive and therefore slower.

In [None]:
import pandas as pd

In [None]:
# Create separate DataFrames for each sentiment score
vader_df = pd.DataFrame(vader_title_scores, columns=['title', 'vader_score'])
textblob_df = pd.DataFrame({
    'title': [headline['title'] for headline in headlines],
    'textblob_score': sentiment_scores
})
gpt_df = pd.DataFrame(gpt_title_scores, columns=['title', 'gpt_score'])

# Merge DataFrames on title
compare_title_scores = vader_df.merge(textblob_df, on='title').merge(gpt_df, on='title')

# Style and display
styled_df = compare_title_scores.style.format({
    'vader_score': '{:.3f}',
    'textblob_score': '{:.3f}',
    'gpt_score': '{}'
}).set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap',
    'max-width': '500px'
}).set_table_styles([
    {'selector': 'td', 'props': [('max-width', '500px'), ('white-space', 'pre-wrap')]},
    {'selector': 'th', 'props': [('max-width', '500px'), ('white-space', 'pre-wrap')]}
])

display(styled_df)

In [None]:
# Create separate DataFrames for each sentiment score
vader_df = pd.DataFrame(orig_title_scores, columns=['title', 'vader_score'])
gpt_df = pd.DataFrame(title_scores, columns=['title', 'gpt_score'])
textblob_df = pd.DataFrame({
    'title': [headline['title'] for headline in headlines],
    'textblob_score': sentiment_scores
})

# Merge DataFrames on title
compare_title_scores = vader_df.merge(gpt_df, on='title').merge(textblob_df, on='title')

# Style and display
styled_df = compare_title_scores.style.format({
    'vader_score': '{:.3f}',
    'textblob_score': '{:.3f}',
    'gpt_score': '{:.3f}',
}).set_properties(**{
    'text-align': 'left',
    'white-space': 'pre-wrap',
    'max-width': '500px'
}).set_table_styles([
    {'selector': 'td', 'props': [('max-width', '500px'), ('white-space', 'pre-wrap')]},
    {'selector': 'th', 'props': [('max-width', '500px'), ('white-space', 'pre-wrap')]}
])

display(styled_df)