## Sentiment scores generation using Finbert model
 - [Finbert github](https://github.com/ProsusAI/finBERT/tree/master)
 - [Huggingface link](https://huggingface.co/ProsusAI/finbert)

In [1]:
from transformers import AutoTokenizer, AutoModelForSequenceClassification
from torch import nn

import numpy as np
import pandas as pd

  from .autonotebook import tqdm as notebook_tqdm


In [2]:
pd.set_option('display.max_colwidth', None)

In [3]:
tokenizer = AutoTokenizer.from_pretrained("ProsusAI/finbert")
model = AutoModelForSequenceClassification.from_pretrained("ProsusAI/finbert")



In [6]:
def generate_sentiment_values(text: str) -> list[float]:
    inputs = tokenizer(text, return_tensors='pt', padding=True)
    logits = model(**inputs)[0]

    # Convert logits to softmax probabilities.
    probabilities = nn.functional.softmax(logits, dim=-1)

    # The probability labels are 'positive', 'negative', 'neutral' (NEED TO CONFIRM)
    return probabilities.detach().numpy().reshape((3,))

In [7]:
test_string = 'The stocks are falling'
generate_sentiment_values(test_string)

array([0.08021459, 0.26293007, 0.65685534], dtype=float32)

## Finbert on NYTimes news


1. Get sentiment for each article

In [8]:
df = pd.read_csv('../data/tesla_gpt_summarised.csv')
#df = df[['timestamp','article_url','lead_paragraph','abstract','adjusted_date']]

In [10]:
df = df.loc[:, ['timestamp', 'gpt_summary']]

In [11]:
df

Unnamed: 0,timestamp,gpt_summary
0,2021-07-26 19:46:15+00:00,"The significant increase in Tesla's profit and revenue for the second quarter, driven by a substantial increase in car sales, is likely to have a positive impact on the company both in the short and long term. This impressive financial performance can enhance investor confidence, potentially leading to a rise in stock prices as shareholders respond positively to the company's growth trajectory. Furthermore, the news could bolster Tesla's position in the competitive electric vehicle market, attracting new customers and fortifying its brand reputation as a leading innovator in the automotive industry. As a result, this strong financial report not only underscores Tesla's operational success but may also set the stage for future investments and development initiatives, further solidifying its market dominance."
1,2021-08-06 07:00:10+00:00,"The Biden administration's commitment to increasing electric vehicle (EV) sales to 50 percent of new car purchases by 2030 is poised to significantly benefit Tesla, as it already dominates the all-electric vehicle market with a robust lineup. This regulatory push creates an environment favorable for established EV manufacturers and intensifies competition for traditional automakers still reliant on internal combustion engines. As consumer demand for EVs rises, Tesla is well-positioned to capture a larger market share and strengthen its brand as a leader in sustainable transportation. Conversely, automakers resisting the transition may face declining sales and increased pressure to pivot towards electric mobility, further solidifying Tesla’s competitive advantage."
2,2021-07-22 09:00:15+00:00,"The news highlights significant operational challenges Tesla faces in its German factory, indicating a potential delay in production and showcasing a mismatch between the company's culture and local practices. This situation could negatively impact Tesla’s reputation in Europe, especially as it strives to solidify its presence in a market increasingly focused on efficiency and quality. Additionally, the ongoing production hurdles may lead to missed financial targets and affect stock performance in the near term. Nevertheless, if Tesla can adapt and eventually streamline operations, it could capitalize on Germany’s engineering prowess long-term. Overall, this development poses immediate obstacles for Tesla while also reflecting the complexities of global expansion in the automotive industry."
3,2021-08-09 17:51:08+00:00,"This news piece may have mixed implications for Tesla. On one hand, it emphasizes the growing appeal of Tesla vehicles among consumers who prioritize features and brand prestige, even at a premium price. This could reinforce Tesla's position in the luxury electric vehicle market and drive further sales among affluent buyers. On the other hand, the mention of government incentives primarily benefiting wealthier individuals could alienate average families who may feel excluded from the electric vehicle market, potentially limiting Tesla's broader customer base. Overall, while the luxury segment may remain strong for Tesla, the company's long-term growth could be challenged if it fails to expand its appeal to a more diverse audience."
4,2021-07-05 20:30:28+00:00,"The lawsuit against Tesla regarding the tragic death of a teenager in a collision involving its Model 3 while on Autopilot could significantly impact the company, particularly in terms of public perception and regulatory scrutiny. Tesla is known for its innovative approach to autonomous driving technology, but high-profile incidents, especially those resulting in loss of life, raise concerns about the safety and reliability of its systems. This legal battle could lead to increased legal liabilities, a potential decline in consumer trust, and negative media coverage, which may affect sales and stock performance. Investors might become wary of Tesla's liability exposure and future profitability, while the company may be compelled to enhance its safety measures or alter its marketing of Autopilot capabilities, potentially hampering their ambitious plans for autonomous vehicles."
...,...,...
1360,2023-11-20 13:27:11+00:00,"The rapid reshaping of the artificial intelligence landscape due to Sam Altman's firing from OpenAI and his swift transition to Microsoft could have profound implications for Tesla, especially considering Elon Musk's past involvement with OpenAI and interest in AI technology for automotive applications. This upheaval may create opportunities for Tesla to evaluate partnerships or advancements in AI technology, potentially filling a void left by OpenAI's leadership changes. Additionally, if Microsoft's enhanced resources and focus on AI begin to outpace Tesla's own initiatives, it could pressure Tesla to accelerate its own AI developments in self-driving technology and other smart features to maintain competitiveness. Overall, the news induces a cautious atmosphere for Tesla as it may need to adapt quickly to shifts in AI leadership and innovation that directly impact the automotive sector."
1361,2023-10-17 12:11:23+00:00,"The news of President Biden's visit to Israel and the diplomatic efforts regarding the humanitarian crisis in Gaza coinciding with the Future Investment Initiative in Saudi Arabia could present both challenges and opportunities for Tesla. On one hand, the geopolitical tensions may lead to increased scrutiny of companies engaged with Saudi Arabia, and Tesla could face backlash or reputational risks if they choose to participate. On the other hand, if Tesla’s CEO, Elon Musk, attends and engages positively at the conference, it could strengthen Tesla's ties to investors and stakeholders in the region, potentially opening up new business opportunities and partnerships that could support the company's growth objectives. Overall, the uncertain political climate may prompt Tesla to weigh its involvement carefully to mitigate risks while also considering the potential benefits of enhanced regional engagement."
1362,2023-07-07 11:19:35+00:00,"The impending payrolls report could significantly impact Tesla by influencing investor sentiment and the overall economic outlook. If the report indicates stronger-than-expected job growth, it may lead to concerns about rising inflation, prompting the Federal Reserve to maintain or increase interest rates. Higher interest rates could dampen consumer spending and affect the affordability of electric vehicles, ultimately impacting Tesla's sales and stock performance. Conversely, a lower-than-expected job growth could ease inflation fears, potentially benefiting Tesla as it may support economic stability and consumer demand. Overall, the uncertainty surrounding the payrolls report adds a layer of volatility to Tesla's market environment, influencing investor expectations regarding the company's growth trajectory."
1363,2023-05-26 11:24:35+00:00,"The potential resolution of the debt ceiling crisis could positively affect Tesla by stabilizing investor sentiment and promoting overall market confidence. As the U.S. government moves closer to averting a default, there is likely to be a ripple effect on the stock market, leading to increased investment in growth sectors like electric vehicles, where Tesla operates. A favorable outcome may strengthen Tesla's stock price and encourage investor participation, particularly as the company continues to expand its production and sales. Moreover, a stable economic environment enhances consumer spending, which could bolster demand for Tesla’s vehicles, further benefiting the company’s growth prospects."


In [39]:
for i, row in df.iterrows():
    combined_str = f"""{str(row['lead_paragraph'])}\n{str(row['abstract'])}"""
    output = generate_sentiment_values(combined_str)
    df.at[i, 'pos_sentiment'] = output[0]
    df.at[i, 'neg_sentiment'] = output[1]
    df.at[i, 'neutral_sentiment'] = output[2]

    preamble = "Evaluate the following news on S&P price."
    combined_str = f"""<instructions>{preamble}</instructions> <news>{str(row['lead_paragraph'])}\n{str(row['abstract'])}</news>"""    
    output = generate_sentiment_values(combined_str)
    df.at[i, 'pos_sentiment_w_preamb'] = output[0]
    df.at[i, 'neg_sentiment_w_preamb'] = output[1]
    df.at[i, 'neutral_sentiment_w_preamb'] = output[2]

Save to the same file first

In [40]:
df

Unnamed: 0,timestamp,article_url,lead_paragraph,abstract,adjusted_date,pos_sentiment,neg_sentiment,neutral_sentiment,pos_sentiment_w_preamb,neg_sentiment_w_preamb,neutral_sentiment_w_preamb
0,2019-09-30 19:20:11+00:00,https://www.nytimes.com/2019/09/30/business/bu...,Cannes has been a hub for meetings and events ...,While the largest gatherings still go to the b...,2019-10-01,0.054560,0.027506,0.917934,0.049412,0.025419,0.925169
1,2019-09-30 18:44:40+00:00,https://www.nytimes.com/2019/09/30/business/ec...,Congress didn’t unconstitutionally penalize De...,A federal judge rejected a suit by four states...,2019-10-01,0.062097,0.808197,0.129706,0.059578,0.785785,0.154637
2,2019-09-30 15:01:19+00:00,https://www.nytimes.com/2019/09/30/business/we...,WeWork shelved its plans for an initial public...,"The company, which built an office-space behem...",2019-09-30,0.035901,0.860062,0.104038,0.016129,0.929829,0.054042
3,2019-09-30 15:00:08+00:00,https://www.nytimes.com/2019/09/30/business/ja...,"TOKYO — Yasuo Sugiuchi can’t avoid death, but ...","The increase, to 10 percent from 8 percent on ...",2019-09-30,0.225155,0.715339,0.059506,0.712138,0.140160,0.147703
4,2019-09-30 14:34:31+00:00,https://www.nytimes.com/2019/09/30/business/gr...,"Seamless, the food delivery service started tw...",Restaurant owners say Grubhub’s business model...,2019-09-30,0.009128,0.963881,0.026991,0.009450,0.956868,0.033682
...,...,...,...,...,...,...,...,...,...,...,...
12046,2024-09-27 14:10:49+00:00,https://www.nytimes.com/2024/09/27/business/de...,"For the past two decades, European banks have ...",Investors are cheering a possible tie-up betwe...,2024-09-27,0.054860,0.737702,0.207437,0.124981,0.225523,0.649496
12047,2024-09-27 12:43:10+00:00,https://www.nytimes.com/2024/09/27/business/ec...,"Inflation cooled in August, the latest sign of...",Inflation is slowing so much that some economi...,2024-09-27,0.555174,0.399458,0.045368,0.662625,0.272857,0.064518
12048,2024-09-27 12:15:14+00:00,https://www.nytimes.com/2024/09/27/business/de...,"Over the past 48 hours, the biggest spectacle ...",The criminal charges against the embattled may...,2024-09-27,0.027138,0.804501,0.168361,0.032231,0.662571,0.305199
12049,2024-09-27 09:02:56+00:00,https://www.nytimes.com/2024/09/27/technology/...,Hours before former President Donald J. Trump ...,Almost a third of 171 posts last week from the...,2024-09-27,0.018546,0.689842,0.291612,0.019478,0.705193,0.275329


In [41]:
df.to_csv('../data/nyt_snp_headlines_temp.csv', index=False)

2. Group by date

In [44]:
new_df = pd.read_csv('../data/nyt_snp_headlines_with_sentiment.csv')

In [45]:
agg_func = {
    'pos_sentiment': 'mean',
    'neg_sentiment': 'mean',
    'neutral_sentiment': 'mean',
    'pos_sentiment_w_preamb': 'mean',
    'neg_sentiment_w_preamb': 'mean',
    'neutral_sentiment_w_preamb': 'mean'
}
column_rename = {
    'pos_sentiment': 'mean_pos_sentiment',
    'neg_sentiment': 'mean_neg_sentiment',
    'neutral_sentiment': 'mean_neutral_sentiment',
    'pos_sentiment_w_preamb': 'mean_pos_preamble_sentiment',
    'neg_sentiment_w_preamb': 'mean_neg_preamble_sentiment',
    'neutral_sentiment_w_preamb': 'mean_neutral_preamble_sentiment'
}
grouped_by_date_df = new_df.groupby(by='adjusted_date').agg(agg_func).rename(columns=column_rename).reset_index()
grouped_by_date_df.tail()

Unnamed: 0,adjusted_date,mean_pos_sentiment,mean_neg_sentiment,mean_neutral_sentiment,mean_pos_preamble_sentiment,mean_neg_preamble_sentiment,mean_neutral_preamble_sentiment
1493,2024-09-25,0.159386,0.285289,0.555325,0.112141,0.281216,0.606643
1494,2024-09-26,0.296828,0.350619,0.352554,0.26093,0.321001,0.418069
1495,2024-09-27,0.192744,0.503622,0.303634,0.209407,0.408375,0.382218
1496,2024-09-28,0.225603,0.226435,0.547962,0.20913,0.203006,0.587863
1497,adjusted_date,0.019455,0.067139,0.913406,0.017278,0.06523,0.917491


In [46]:
grouped_by_date_df.to_csv('../data/nyt_sentiment.csv', index=False)

## Merge the Sentiments DataFrame into 1 

In [48]:
nyt_sentiment = pd.read_csv('../data/nyt_sentiment.csv')
tesla_sentiment = pd.read_csv('../data/tesla_sentiment.csv')


In [50]:
nyt_sentiment['News'] = 'Market News'

In [61]:
nyt_sentiment

Unnamed: 0,adjusted_date,mean_pos_sentiment,mean_neg_sentiment,mean_neutral_sentiment,mean_pos_preamble_sentiment,mean_neg_preamble_sentiment,mean_neutral_preamble_sentiment,News
0,2019-09-30,0.056308,0.762604,0.181088,0.127192,0.641141,0.231667,Market News
1,2019-10-01,0.084752,0.525347,0.389900,0.069554,0.531837,0.398609,Market News
2,2019-10-02,0.064854,0.476130,0.459017,0.053761,0.438076,0.508164,Market News
3,2019-10-03,0.267430,0.271210,0.461360,0.250733,0.239351,0.509915,Market News
4,2019-10-04,0.132889,0.533843,0.333267,0.070609,0.499317,0.430075,Market News
...,...,...,...,...,...,...,...,...
1493,2024-09-25,0.159386,0.285289,0.555325,0.112141,0.281216,0.606643,Market News
1494,2024-09-26,0.296828,0.350619,0.352554,0.260930,0.321001,0.418069,Market News
1495,2024-09-27,0.192744,0.503622,0.303634,0.209407,0.408375,0.382218,Market News
1496,2024-09-28,0.225603,0.226435,0.547962,0.209130,0.203006,0.587863,Market News


In [62]:
nyt_sentiment = nyt_sentiment[:-1]

In [51]:
tesla_sentiment['News'] = 'Tesla'

In [63]:
tesla_sentiment

Unnamed: 0,adjusted_date,mean_pos_sentiment,mean_neg_sentiment,mean_neutral_sentiment,mean_pos_preamble_sentiment,mean_neg_preamble_sentiment,mean_neutral_preamble_sentiment,News
0,1/1/24,0.932750,0.018450,0.048801,0.883637,0.019422,0.096940,Tesla
1,1/10/21,0.007312,0.974635,0.018053,0.007583,0.970156,0.022261,Tesla
2,1/11/23,0.260249,0.570400,0.169351,0.224915,0.534778,0.240307,Tesla
3,1/12/20,0.068412,0.030966,0.900621,0.037842,0.027356,0.934802,Tesla
4,1/12/23,0.008780,0.971812,0.019408,0.012819,0.961352,0.025829,Tesla
...,...,...,...,...,...,...,...,...
857,9/8/22,0.061572,0.221923,0.716505,0.070154,0.102857,0.826989,Tesla
858,9/9/20,0.064603,0.097978,0.837419,0.056757,0.021830,0.921414,Tesla
859,9/9/21,0.075175,0.014510,0.910315,0.087085,0.011177,0.901738,Tesla
860,9/9/22,0.763597,0.009088,0.227315,0.504756,0.011038,0.484206,Tesla


In [65]:
tesla_sentiment['adjusted_date'] = pd.to_datetime(tesla_sentiment['adjusted_date'], format="%d/%m/%y").dt.strftime("%Y-%m-%d")


In [67]:
union_df = pd.concat([tesla_sentiment, nyt_sentiment], ignore_index=True)