# Project Title: The Impact of News on the Market
### •	Team Members:
##### 	Rachel Torres, Christian Attard, Jess Alcalde, Nitin Khade
### •	Project Description/Outline:
##### -	We will look at news data and stock data to determine the effects of the news on how the market behaves.
### •	Research Questions to Answer:
##### -	How do news headlines affect the stock market?
##### -	Is there any correlation between certain types of headlines and effects on the market?
##### -	Does negative news affect stocks greater than positive or neutral?
##### -	Can we assign a factor(weighting) to it?

In [1]:
# import dependencies
import requests
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from pprint import pprint

from news_api import api_key
from x_api import x_api_key

from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
analyser = SentimentIntensityAnalyzer()

# Rather than rewriting code multiple times, we've created a function to call the news API and output data into a dataframe
# preliminary work is in older notebook
from news_pull import sentiment_scores
from news_pull import news_pull

### We'll query the news api to gather news headlines from the web

In [None]:
# Create newDF for general market (S&P & Nasdaq)
genNews = news_pull('News')

In [None]:
# Create newsDF for Wayfair
WayFairNews = news_pull('Wayfair News')

In [None]:
# Create newsDF for Wells Fargo
WellsFargoNews = news_pull('Wells Fargo News')

In [None]:
# Create newsDF for Tesla
TeslaNews = news_pull('Tesla News')

In [None]:
# Create newsDF for Political News
PoliNews = news_pull('Political News')

In [None]:
# Create newsDF for Financial News
FinNews = news_pull('Financial News')

In [None]:
import pandas as pd

In [None]:
# load in stock data
nasdaq_data = "nasdaq.csv"
nasdaq_df = pd.read_csv(nasdaq_data)


sp500_data = "sp500.csv"
sp500_df = pd.read_csv(sp500_data)
sp500_df.head()

tsla_data = "TSLA.csv"
tsla_df = pd.read_csv(tsla_data)

wayfair_data = "W.csv"
wayfair_df = pd.read_csv(wayfair_data)

wf_data = "WFC.csv"
wf_df = pd.read_csv(wf_data)

# Load stock data into notebook as dataframe

In [None]:
nasdaq_df.insert(0, 'Index', 'Nasdaq')
nasdaq_df.head()

sp500_df.insert(0, 'Index', 'S&P 500')
sp500_df.head()

In [None]:
# Merge 2 DataFrames
stock_df = pd.concat([sp500_df, nasdaq_df], ignore_index=True)

#sort by date
stock_df = stock_df.sort_values(by=['Date'])
stock_df

### About the Scoring (taken from vaderSentiment docs)
The compound score is computed by summing the valence scores of each word in the lexicon, adjusted according to the rules, and then normalized to be between -1 (most extreme negative) and +1 (most extreme positive). This is the most useful metric if you want a single unidimensional measure of sentiment for a given sentence. Calling it a 'normalized, weighted composite score' is accurate.

It is also useful for researchers who would like to set standardized thresholds for classifying sentences as either positive, neutral, or negative. Typical threshold values (used in the literature cited on this page) are:

    positive sentiment: compound score >= 0.05
    neutral sentiment: (compound score > -0.05) and (compound score < 0.05)
    negative sentiment: compound score <= -0.05
The pos, neu, and neg scores are ratios for proportions of text that fall in each category (so these should all add up to be 1... or close to it with float operation). These are the most useful metrics if you want multidimensional measures of sentiment for a given sentence.

In [None]:
stock_df['Delta'] = (stock_df.Open - stock_df.Close)*100/stock_df.Open

In [None]:
stock_df.head()

In [None]:
genNews.head()

In [None]:
# change data type of sentiment score to numeric
genNews.sentiment_score = pd.to_numeric(genNews.sentiment_score)

# change name of date column to match stock DF
genNews = genNews.rename(columns = {'datePublished':'Date'})


In [None]:
genNewsSummary = genNews.groupby('Date')['sentiment_score'].mean()

In [None]:
genNewsSummary

In [None]:
# format date column of stock_df
stock_df.Date = pd.to_datetime(stock_df.Date)
stock_df.Date = stock_df.Date.dt.strftime('%m/%d/%Y') 

In [None]:
stock_df.head()

In [None]:
# Merge 2 dataframes on Date
genNewsCombined = pd.merge(stock_df,genNewsSummary, on = 'Date', how = 'inner')

In [None]:
genNewsCombined

In [None]:
# Use Matplotlib and stats to generate graphs and look for relationships