The objective of this project is to analyze whether **daily news sentiment**, derived from Nvidia-related headlines, has any noticeable effect on the company’s **stock price**. By combining structured text data (news) with structured financial data (stock prices), we aim to uncover correlations that might hint at a relationship between what’s in the headlines and what’s happening in the markets.

For this project, we will use two primary datasets:

* **Stock Prices from Yahoo Finance** — Using the `yfinance` library, we will obtain daily **closing prices** for Nvidia within our time range.
* **News Headlines from NewsAPI** — We will use the keyword “Nvidia” to extract English-language headlines related to Nvidia.

In [None]:
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

In [None]:
start_date = '2025-01-01'
end_date = datetime.now().strftime('%Y-%m-%d')  # This will give us the present date

In [None]:
stock_data = yf.Ticker('NVDA').history(start=start_date, end=end_date)
stock_data.reset_index(inplace=True)
stock_data.head()

In [None]:
from dotenv import load_dotenv
import os

In [None]:
load_dotenv()
api_key = os.getenv("API_KEY")

In [None]:
import requests

In [None]:
url = 'https://newsapi.org/v2/everything'
params = {
    'q': 'Nvidia', # We will use the keyword “Nvidia” to extract headlines related to Nvidia
    'from': (datetime.now() - timedelta(days=30)).strftime('%Y-%m-%d'), # We will fetch news headlines in the past month
    'sortBy': 'relevancy',
    'apiKey': api_key,
    'pageSize': 100,
    'language': 'en' # We will extract English-language headlines
}

response = requests.get(url, params=params) # Sending a request and getting a response
data = response.json()

if data['status'] != 'ok':
  raise Exception(f"NewsAPI Error: {data['message']}")

articles = data['articles']

In [None]:
news_data = pd.DataFrame(articles)
news_data = news_data[['publishedAt', 'title']]
news_data.columns = ['date', 'headline']
news_data.head()

To ensure the text data was useful for sentiment analysis, we performed the following cleaning steps:

* **Only Alphabets in the Headlines** — Numbers, punctuation, and special characters will not contribute much in sentiment analysis.
* **Removal of Stop Words from the Headlines** — Stopwords are the common words like "the", "is", "and" and many more, that add little meaning. Therefore, we have to erase these words from our headlines.

In [None]:
import nltk
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize

In [None]:
nltk.download('punkt_tab') # This downloads a pretrained tokenizer that helps split text into sentences or words
nltk.download('stopwords') # This downloads the list of stopwords in various languages.
stop_words = set(stopwords.words('english'))

In [None]:
def preprocess_text(text):
  words = word_tokenize(text)
  words = [word for word in words if word.isalpha()] # Filtering out the non-alphabet characters
  words = [word for word in words if word.lower() not in stop_words] # Filtering out the stop words
  return ' '.join(words)

In [None]:
# Processing news headlines
news_data['cleaned_headline'] = news_data['headline'].apply(preprocess_text)
news_data.head()

Now, we have a column that displays every headline’s sentiment score. There are three kinds of sentiment scores:

* $\text{sentiment score} \gt 0$ meaning the headline is in the favour of Nvidia.
* $\text{sentiment score} = 0$ meaning the headline is purely neutral.
* $\text{sentiment score} \lt 0$ meaning the headline is not in the favour of Nvidia.

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

analyzer = SentimentIntensityAnalyzer()

In [None]:
def get_sentiment_score(text):
  score = analyzer.polarity_scores(text) # Every headline will be given a sentiment score.
  return score['compound']

In [None]:
news_data['sentiment_score'] = news_data['cleaned_headline'].apply(get_sentiment_score)
news_data.head()

In [None]:
# This ensures both dates are in the same format (datetime.date) because we are going to merge these dataframes.
news_data['date'] = pd.to_datetime(news_data['date']).dt.date
stock_data['Date'] = pd.to_datetime(stock_data['Date']).dt.date

In [None]:
# Headlines are typically numerous per day, so we will aggregate them into a single daily sentiment score using summation
aggregated_sentiment = news_data.groupby('date')['sentiment_score'].sum().reset_index()

In [None]:
# This is where we we will combine our two dataframes and get one final dataframe.
combined_data = pd.merge(stock_data, aggregated_sentiment, left_on='Date', right_on='date', how='inner')
combined_data.head()

To analyze the relationship between sentiment and stock performance, we will use a dual-axis plot:

* **Nvidia Stock Price** – shown as a blue line chart.
* **Aggregated Sentiment Score** – shown as green/red bars (positive and negative scores).

In [None]:
import matplotlib.pyplot as plt

In [None]:
fig, ax1 = plt.subplots(figsize=(10, 5))

ax1.set_xlabel('Date')
ax1.set_ylabel('Nvidia Stock Price')
ax1.plot(combined_data['Date'], combined_data['Close'], label='Nvidia Stock Price')

ax2 = ax1.twinx()
ax2.set_ylabel('Sentiment Score')

colors = ['green' if val >= 0 else 'red' for val in combined_data['sentiment_score']]
ax2.bar(combined_data['Date'], combined_data['sentiment_score'], label='Aggregated Sentiment Score', color=colors, alpha=0.6)

fig.tight_layout()
plt.title('Nvidia Stock Price vs Aggregated Sentiment Score')
fig.legend(loc='upper left', bbox_to_anchor=(0.1, 0.9))
plt.show()

This project showed that combining textual data with numerical financial data can uncover valuable signals in market behavior. While sentiment scores alone can’t predict prices, they can enhance our understanding of investor psychology and provide early warnings of shifts in perception.