<a href="https://colab.research.google.com/github/kaljuvee/datascience/blob/master/notebooks/news/news_sentiment_price_corr_vxrt.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## News Sentiment Based vs Price Sensi Correlation - Vaxart Inc (NASDAQ:VXRT)

###### Load dependency libraries 

In [2]:
!pip install feedparser
!pip install textblob
!pip install arrow
!pip3 install yfinance --upgrade --no-cache-dir
!pip install vaderSentiment
import yfinance as yf
import pandas as pd
import numpy as np
import feedparser
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Collecting feedparser
[?25l  Downloading https://files.pythonhosted.org/packages/91/d8/7d37fec71ff7c9dbcdd80d2b48bcdd86d6af502156fc93846fb0102cb2c4/feedparser-5.2.1.tar.bz2 (192kB)
[K     |█▊                              | 10kB 17.1MB/s eta 0:00:01[K     |███▍                            | 20kB 1.5MB/s eta 0:00:01[K     |█████▏                          | 30kB 1.8MB/s eta 0:00:01[K     |██████▉                         | 40kB 2.1MB/s eta 0:00:01[K     |████████▌                       | 51kB 1.9MB/s eta 0:00:01[K     |██████████▎                     | 61kB 2.2MB/s eta 0:00:01[K     |████████████                    | 71kB 2.3MB/s eta 0:00:01[K     |█████████████▋                  | 81kB 2.5MB/s eta 0:00:01[K     |███████████████▍                | 92kB 2.5MB/s eta 0:00:01[K     |█████████████████               | 102kB 2.6MB/s eta 0:00:01[K     |██████████████████▊             | 112kB 2.6MB/s eta 0:00:01[K     |████████████████████▌           | 122kB 2.6MB/s eta 0:00:

**News Sentiment vs Price Sensitivity Correlation**

**Pipeline:**
1. **Fetch news** - read in news source via an RSS feed (Feedparser)
2. **Assign sentiment score** - assign sentiment score (Vader and TextBlob)
3. **Historical EOD prices** - fetch historical prices (Yfinance)
4. **Correlation** - calculate correlation between price sensitivity vs sentiment


See more at:
*   https://www.altsignals.ai







### **1. Fetch News from RSS Feed**

In [13]:
from bs4 import BeautifulSoup

def clean_text(raw_html):
  cleantext = BeautifulSoup(raw_html, "lxml").text
  return cleantext

symbol = 'VXRT'
company = 'Vaxart Inc'
rss_url = 'https://www.globenewswire.com/RssFeed/Organization/dqKTlO0WKWyA0lN-FU6zhA=='
cols = ['title', 'summary', 'published', 'link', 'sentiment', 'price_sensi']

news_df = pd.DataFrame(columns = cols)
feed = feedparser.parse(rss_url )

for newsitem in feed['items']:
    news_df = news_df.append({'title': newsitem['title'], 
                      'summary': clean_text(newsitem['summary']), 
                      'published': newsitem['published'], 
                      'link': newsitem['link']}, ignore_index=True)
news_df.head()

Unnamed: 0,title,summary,published,link,sentiment,price_sensi
0,Vaxart’s COVID-19 Vaccine Selected for the U.S...,OWS to Test First Oral COVID-19 Vaccine in Non...,"Fri, 26 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,,
1,"Vaxart, Inc. Signs Memorandum of Understanding...",Enabling Production of A Billion or More COVID...,"Thu, 25 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,,
2,"Vaxart, Inc. Set to Join Russell 3000® Index","SOUTH SAN FRANCISCO, Calif., June 24, 2020 (...","Wed, 24 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,,
3,"Vaxart, Inc. to Present at the H.C. Wainwright...",Live Webcast on Thursday June 25th at 10:25 am...,"Tue, 23 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,,
4,"Vaxart, Inc. Appoints New CEO to Accelerate Ad...",Andrei Floroiu Appointed Chief Executive Offic...,"Mon, 15 Jun 2020 11:30 GMT",http://www.globenewswire.com/news-release/2020...,,


### **2. Assign Sentiment Score**

In [14]:
from textblob import TextBlob

# Sentiment score from TextBlob
def get_textblob_sentiment(text):
    analysis = TextBlob(text)
    return analysis.sentiment.polarity

# Sentiment score from Vader
def get_vader_sentiment(txt):
    analyzer = SentimentIntensityAnalyzer()
    vs = analyzer.polarity_scores(txt)
    return vs['compound']

news_df['sentiment'] = news_df['summary'].apply(lambda x : (get_vader_sentiment(x) + get_textblob_sentiment(x))/2)
news_df.head()

Unnamed: 0,title,summary,published,link,sentiment,price_sensi
0,Vaxart’s COVID-19 Vaccine Selected for the U.S...,OWS to Test First Oral COVID-19 Vaccine in Non...,"Fri, 26 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.125,
1,"Vaxart, Inc. Signs Memorandum of Understanding...",Enabling Production of A Billion or More COVID...,"Thu, 25 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.25,
2,"Vaxart, Inc. Set to Join Russell 3000® Index","SOUTH SAN FRANCISCO, Calif., June 24, 2020 (...","Wed, 24 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.53835,
3,"Vaxart, Inc. to Present at the H.C. Wainwright...",Live Webcast on Thursday June 25th at 10:25 am...,"Tue, 23 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.068182,
4,"Vaxart, Inc. Appoints New CEO to Accelerate Ad...",Andrei Floroiu Appointed Chief Executive Offic...,"Mon, 15 Jun 2020 11:30 GMT",http://www.globenewswire.com/news-release/2020...,0.0,


### **3. Get Historical Price Data**

In [9]:
# An very helpful historical prices library by Ran Roussi -  https://github.com/ranaroussi/yfinance, https://aroussi.com/post/python-yahoo-finance
company_financials = yf.Ticker( symbol )
prices_df = company_financials.history(period='max')
prices_df.sort_values(by = 'Date', ascending = False, inplace = True)
prices_df.head()

Unnamed: 0_level_0,Open,High,Low,Close,Volume,Dividends,Stock Splits
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1
2020-07-10,8.05,8.15,7.61,7.98,10304700,0.0,0.0
2020-07-09,7.98,8.5,7.89,8.19,12825500,0.0,0.0
2020-07-08,8.37,8.86,7.85,8.15,31387600,0.0,0.0
2020-07-07,7.5,8.98,7.08,8.87,62500100,0.0,0.0
2020-07-06,7.12,7.12,6.02,6.44,23134700,0.0,0.0


In [11]:
import arrow
from dateutil.parser import parse

def format_date(published):
    arrow_date = arrow.get(published)
    return arrow_date.format('YYYY-MM-DD')

def get_price(price_series):
    return price_series.iloc[0]

def get_previous_bday(published):
    ts = pd.Timestamp(published) 
    bd = pd.tseries.offsets.BusinessDay(n = 1)
    return ts - bd 

def get_sensi_begin_price(published):
    arrow_start = arrow.get(get_previous_bday(published))
    start_date = format_date(arrow_start)
    return get_price(prices_df.loc[start_date]['Close'])

def get_sensi_end_price(published):
    end_date = format_date(published)
    return get_price(prices_df.loc[end_date]['Open']) 

def get_price_sensi(published):
    return (get_sensi_end_price(published) - get_sensi_begin_price(published))/get_sensi_begin_price(published)

price_sensi_list = []

for i in range(news_df.shape[0]):
    price_sensi_list.append(get_price_sensi(parse(news_df.iloc[i]['published'])))

news_df['price_sensi'] = price_sensi_list
news_df.head()

Unnamed: 0,title,summary,published,link,sentiment,price_sensi
0,Vaxart’s COVID-19 Vaccine Selected for the U.S...,OWS to Test First Oral COVID-19 Vaccine in Non...,"Fri, 26 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.125,0.835463
1,"Vaxart, Inc. Signs Memorandum of Understanding...",Enabling Production of A Billion or More COVID...,"Thu, 25 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.25,0.131661
2,"Vaxart, Inc. Set to Join Russell 3000® Index","SOUTH SAN FRANCISCO, Calif., June 24, 2020 (...","Wed, 24 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.53835,-0.015038
3,"Vaxart, Inc. to Present at the H.C. Wainwright...",Live Webcast on Thursday June 25th at 10:25 am...,"Tue, 23 Jun 2020 12:00 GMT",http://www.globenewswire.com/news-release/2020...,0.068182,0.011236
4,"Vaxart, Inc. Appoints New CEO to Accelerate Ad...",Andrei Floroiu Appointed Chief Executive Offic...,"Mon, 15 Jun 2020 11:30 GMT",http://www.globenewswire.com/news-release/2020...,0.0,-0.030043


In [12]:
print('price_sensi vs sentiment corr:', news_df['price_sensi'].corr(news_df['sentiment']))

price_sensi vs sentiment corr: 0.186601171922371


**Resources**

* [AltSignals.AI](https://www.altsignals.ai/)
* [Yfinance](https://pypi.org/project/yfinance/)
* [TextBlob](https://pypi.org/project/textblob/)
* [Vader Sentiment](https://pypi.org/project/vaderSentiment/)

