# 1. NB

### Produkt
Based on news headlines from a day, we want to develop a model that is able to predict whether the closing price will rise or fall.

Antagelse: Title udkom inden Closing, så Title påvirker dagens CP.

Step 1: Define sentiment in headlines
- Use a model from huggingface


###

In [2]:
#Library imports
import pandas as pd
from transformers import pipeline


In [80]:
#Importing and investigating the data
data = pd.read_csv('data.csv')
data.head(3)

Unnamed: 0,Title,Date,CP
0,"JPMorgan Predicts 2008 Will Be ""Nothing But Net""",2008-01-02,1447.16
1,Dow Tallies Biggest First-session-of-year Poin...,2008-01-02,1447.16
2,2008 predictions for the S&P 500,2008-01-02,1447.16


In [27]:
#Use finbert model, found on HuggingFace
classifier = pipeline("text-classification", model="ProsusAI/finbert")

results = []

last_price = data.iloc[0]['CP'] #This variable is used for calculating the price difference from day to day
previous_date = data.iloc[0]['Date']

for index, row in data.iterrows():
    """
    This for loop extracts the sentiment from the headlines
    
    Each headline is then awarded a sentiment score: Positive (+) for positive sentiment, 0 for neutral and Negative (-) for negative sentiment
    The score in this reflects how sure the model is. If there are more than one headline per day, the scores are aggregated to mean in a later step
    """
    if index < 2000:
        output = classifier(row['Title'])

        label = output[0]['label']
        score = output[0]['score']

        #Assign sentiment score (- if negative, + if positive)
        if label == 'negative':
            sentiment_score = -score
        elif label == 'neutral':
            sentiment_score = 0
        elif label == 'positive':
            sentiment_score = score      

        price_difference = row['CP'] - last_price

        price_difference_percentage = ((row['CP'] - last_price) / last_price) * 100

        results.append([row['Date'], sentiment_score, price_difference, price_difference_percentage])

        #If the next Date is different, change the last_price variable
        if row['Date'] != data.iloc[index +1]['Date']:
            last_price = row['CP']

df = pd.DataFrame(results, columns=["Date", "Score", "Total price difference", "Percentage price difference"])

#Aggregate to a per-day basis
grouped_df = df.groupby(['Date'], as_index=False).mean() #Use mean of scores. This way, we represent the full sentiment in the market
grouped_df.to_csv('aggregated_per_day.csv')

Device set to use mps:0


Why's the Dow Down While the S&P and Nasdaq Hit Record Highs?
negative
Stock Market Week in Review: Wall Street Was Not Bullish Enough on 2021
negative
Researchers use Wall Street Journal articles to predict stock returns
neutral
Stock Market News for Nov 19, 2021
neutral
The 30 Best Stocks of the Past 30 Years
neutral
Inflation drives investors to US stocks
neutral
Return to Normal Puts S&P 500 at 5000 in June, UBS's Lovell Says
negative
COVID fears weigh on Dow, S&P 500; Nasdaq hits record high
negative
Is S&P 500 going to burst?. Let’s see if the unstoppable rising of… | by Gianluca Malato
neutral
Is the UMAX ETF a solid dividend income idea?
neutral
