<a href="https://colab.research.google.com/github/kaljuvee/datascience/blob/master/notebooks/news/news_sentiment.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

## News Named Entity Extraction (NER) and Sentiment Analysis

In [0]:
from google.colab import drive
drive.mount('/content/drive')

Go to this URL in a browser: https://accounts.google.com/o/oauth2/auth?client_id=947318989803-6bn6qk8qdgf4n4g3pfee6491hc0brc4i.apps.googleusercontent.com&redirect_uri=urn%3aietf%3awg%3aoauth%3a2.0%3aoob&response_type=code&scope=email%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdocs.test%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive%20https%3a%2f%2fwww.googleapis.com%2fauth%2fdrive.photos.readonly%20https%3a%2f%2fwww.googleapis.com%2fauth%2fpeopleapi.readonly

Enter your authorization code:
··········
Mounted at /content/drive


###### Load dependency libraries 

In [0]:
!pip install feedparser
!pip3 install yfinance --upgrade --no-cache-dir
!pip install vaderSentiment
import yfinance as yf
import pandas as pd
import numpy as np
import feedparser
import requests
import json
import yaml
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer

Collecting feedparser
[?25l  Downloading https://files.pythonhosted.org/packages/91/d8/7d37fec71ff7c9dbcdd80d2b48bcdd86d6af502156fc93846fb0102cb2c4/feedparser-5.2.1.tar.bz2 (192kB)
[K     |█▊                              | 10kB 22.6MB/s eta 0:00:01[K     |███▍                            | 20kB 3.1MB/s eta 0:00:01[K     |█████▏                          | 30kB 4.0MB/s eta 0:00:01[K     |██████▉                         | 40kB 4.4MB/s eta 0:00:01[K     |████████▌                       | 51kB 3.6MB/s eta 0:00:01[K     |██████████▎                     | 61kB 4.1MB/s eta 0:00:01[K     |████████████                    | 71kB 4.3MB/s eta 0:00:01[K     |█████████████▋                  | 81kB 4.7MB/s eta 0:00:01[K     |███████████████▍                | 92kB 5.0MB/s eta 0:00:01[K     |█████████████████               | 102kB 4.9MB/s eta 0:00:01[K     |██████████████████▊             | 112kB 4.9MB/s eta 0:00:01[K     |████████████████████▌           | 122kB 4.9MB/s eta 0:00:

**News / NLP Signal Pipeline**

1. **Fetch news** - read in news source via an RSS feed (Feedparser)
2. **Extract entitities** - perform named entity recognition (NER) on the unstructured text (Thomson Reuters Intelligent Tagging (TRIT) / Refinitiv Open Calais)
3. **Filter on news and entities** - filter on entities and events of interest
4. **Sentiment analysis** - extract sentiment on the news item (Vader)
5. **Find signal** - correlate sentiment with price movement
6. **Historical EOD prices** - fetch historical prices (Yfinance)
6. **Backtesting** - back test for PnL performance (Pyfinance)

*   https://www.altsignals.ai
*   [Julian Kaljuvee](https://www.linkedin.com/in/juliankaljuvee/)






In [0]:
# Dictionary of RSS feeds that we will fetch and combine
# GlobeNewsire / Europe - http://www.globenewswire.com/Rss/List
# potential keys: ['summary_detail', 'published_parsed', 'links', 'title', 'summary', 'guidislink', 'title_detail', 'link', 'published', 'id']
newsurls = {
    'globenewswire-us':           'http://www.globenewswire.com/RssFeed/country/United%20States/feedTitle/GlobeNewswire%20-%20News%20from%20United%20States',
}

**1. Fetch news from RSS feed**

In [0]:
# Function to fetch the rss feed and return the parsed RSS
def parse_rss( rss_url ):
    return feedparser.parse( rss_url ) 
    
# Function grabs the rss feed headlines (titles) and returns them as a list
def get_headlines( rss_url ):
    headlines = []
    feed = parse_rss( rss_url )
    for newsitem in feed['items']:
        headlines.append(newsitem['title'])
    return headlines

def get_summaries( rss_url ):
    summaries = []
    feed = parse_rss( rss_url )
    for newsitem in feed['items']:
        summaries.append(newsitem['summary'])
    return summaries

def get_entries( rss_url ):
    entries = []
    feed = parse_rss( rss_url )
    for newsitem in feed['items']:
        entries.append(newsitem.keys())
    return entries

**1.1 Inspect entries available in news feed**

In [0]:
# Inspect the entries available in the RSS feed
entries = []

# Iterate over the feed urls
for key,url in newsurls.items():
    # Call getHeadlines() and combine the returned headlines with allheadlines
    entries.extend( get_entries( url ) )

print(entries[0])

dict_keys(['id', 'guidislink', 'link', 'links', 'tags', 'title', 'title_detail', 'summary', 'summary_detail', 'published', 'published_parsed', 'dc_identifier', 'language', 'publisher', 'publisher_detail', 'contributors', 'dc_modified'])


In [0]:
# A list to hold all headlines and summaries
allheadlines = []
summaries = []
 
# Iterate over the feed urls
for key,url in newsurls.items():
    # Call getHeadlines() and combine the returned headlines with allheadlines
    allheadlines.extend( get_headlines( url ) )
    summaries.extend( get_summaries( url ) )

**1.3 View headlines**

In [0]:
# Iterate over the allheadlines list and print each headline
for hl in allheadlines:
    print(hl)

엠마우스생명과학 운영성과를 업데이트하여 공개
Atlantic Union Bankshares Corporation Prices $150 Million Preferred Stock Depositary Share Offering
HealthEquity Reports First Quarter Ended April 30, 2020 Financial Results
Sientra to Present at the William Blair 40th Annual Growth Stock Conference
Worthington Industries to Webcast Discussion of Fourth Quarter 2020 and Fiscal Year-End Results on June 25
Marinus Pharmaceuticals Announces Closing of $46 Million Public Offering of Common Stock Including Full Exercise of Underwriters’ Option to Purchase Additional Shares
Pura Vida Bracelets Partners With Hectic Ltd. to Expand Distribution in Europe
Kforce Updates Second Quarter Revenue Trends Ahead of Its Presentation at the Robert W. Baird Conference on June 3rd
Iovance Biotherapeutics, Inc. Announces Closing of $603.7 Million Common Stock Public Offering
Capital Southwest Supports Osceola Capital’s Recapitalization of Central Medical Group
MarketAxess Announces Monthly Volume Statistics for May 2020
UPDATE - Cal

**1.3 View news summaries**


In [0]:
# Iterate over the summaries list and print each summary
# TODO: see if HTML chars can be removed
for s in summaries:
    print(s)

<p align="justify">캘리포니아주 토랜스, June  03, 2020  (GLOBE NEWSWIRE) -- <strong>겸상적혈구질환 치료의 선두주자 엠마우스생명과학 (“엠마우스”, OTCQB: EMMA)</strong>은 2019년 12월 31일 연말결산 10-K 보고서 및 2020년 1분기 10-Q 보고서를 미국 증권거래위원회에 공시하기 전, 오늘 운영성과를 업데이트하여 공개했습니다.<br></p>
<p align="left">RICHMOND, Va., June  02, 2020  (GLOBE NEWSWIRE) -- Atlantic Union Bankshares Corporation (the “Company”) today announced the pricing of an offering (the “Offering”) of 6,000,000 Depositary Shares, each representing a 1/400<sup>th</sup> ownership interest in a share of its 6.875% Perpetual Non-Cumulative Preferred Stock, Series A, par value $10.00 per share (“Series A preferred stock”), with a liquidation preference of $10,000 per share of Series A preferred stock (equivalent to $25 per Depositary Share), at an aggregate offering price of $150 million. In connection with the Offering, the Company has granted the underwriters an option for 30 days to purchase up to an additional 900,000 Depositary Shares. <br></p>
<p align="left"><strong>Hig

**2. Named Entity Extraction (NER) - make an API calls to Thomson Reutersr Intelligent Tagging (TRIT) with news headline content**

In [0]:
# Define sample content to be queried
contentText = allheadlines[8]
print(contentText)

Iovance Biotherapeutics, Inc. Announces Closing of $603.7 Million Common Stock Public Offering


**2.1 Query TRIT / OpenCalais JSON API**

In [0]:

headType = "text/raw"
token = 'oSyQfYcRShExGJmJPXRgr4kOFAsIHqoJ'
url = "https://api-eit.refinitiv.com/permid/calais"
payload = contentText.encode('utf8')
headers = {
    'Content-Type': headType,
    'X-AG-Access-Token': token,
    'outputformat': "application/json"
    }

#  The daily limit is 5,000 requests, and the concurrent limit varies by API from 1-4 calls per second. 
TRITResponse = requests.request("POST", url, data=payload, headers=headers)
# Load content into JSON object
JSONResponse = json.loads(TRITResponse.text)
print(json.dumps(JSONResponse, indent=4, sort_keys=True))

**2.2 Get entities in news**

In [0]:
#Get Entities
print('====Entities====')
print('Type, Name')

for key in JSONResponse:
    if ('_typeGroup' in JSONResponse[key]):
        if JSONResponse[key]['_typeGroup'] == 'entities':
            print(JSONResponse[key]['_type'] + ", " + JSONResponse[key]['name'])

====Entities====
Type, Name
Company, Iovance Biotherapeutics Inc.


**2.3 Get RIC code for entity**

In [0]:
#Get RIC code

print('====RIC====')
print('RIC')

for entity in JSONResponse:
    for info in JSONResponse[entity]:
        if (info =='resolutions'):
            for companyinfo in (JSONResponse[entity][info]):
                if 'primaryric' in companyinfo:
                    symbol = companyinfo['primaryric']
                    print(symbol)

====RIC====
RIC
IOVA.OQ


**2.4 Get topics for the news item**

In [0]:
#Print Header
print(symbol)
print('====Topics====')
print('Topics, Score')

for key in JSONResponse:
    if ('_typeGroup' in JSONResponse[key]):
        if JSONResponse[key]['_typeGroup'] == 'topics':
            print(JSONResponse[key]['name'] + ", " + str(JSONResponse[key]['score']))

IOVA.OQ
====Topics====
Topics, Score
Business_Finance, 1
Health_Medical_Pharma, 0.952
Religion_Belief, 0.922


**4. Sentiment Analysis**

In [0]:
# Define function to be used for text senitments analysis 
def get_sentiment(txt):
    analyzer = SentimentIntensityAnalyzer()
    vs = analyzer.polarity_scores(txt)
    print(vs)
    return vs['compound']

In [0]:
print('headline: ', allheadlines[1])
print('headline sentiment: ', get_sentiment(allheadlines[1]))
print('summary: ', summaries[1])
print('summary sentiment: ', get_sentiment(summaries[1]))

headline:  Atlantic Union Bankshares Corporation Prices $150 Million Preferred Stock Depositary Share Offering
{'neg': 0.0, 'neu': 0.833, 'pos': 0.167, 'compound': 0.296}
headline sentiment:  0.296
summary:  <p align="left">RICHMOND, Va., June  02, 2020  (GLOBE NEWSWIRE) -- Atlantic Union Bankshares Corporation (the “Company”) today announced the pricing of an offering (the “Offering”) of 6,000,000 Depositary Shares, each representing a 1/400<sup>th</sup> ownership interest in a share of its 6.875% Perpetual Non-Cumulative Preferred Stock, Series A, par value $10.00 per share (“Series A preferred stock”), with a liquidation preference of $10,000 per share of Series A preferred stock (equivalent to $25 per Depositary Share), at an aggregate offering price of $150 million. In connection with the Offering, the Company has granted the underwriters an option for 30 days to purchase up to an additional 900,000 Depositary Shares. <br></p>
{'neg': 0.0, 'neu': 0.828, 'pos': 0.172, 'compound': 0

**5. Get historical EOD price data**

In [0]:
symbol = 'MSFT'
data = yf.download(symbol, interval = '1m', start="2020-05-20", end="2020-05-21")
data.head(20)


[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Open,High,Low,Close,Adj Close,Volume
Datetime,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1
2020-05-20 09:30:00-04:00,184.809998,184.990005,184.520004,184.910004,184.910004,771913
2020-05-20 09:31:00-04:00,184.899994,184.929993,184.729996,184.850006,184.850006,132685
2020-05-20 09:32:00-04:00,184.860107,185.119995,184.839996,184.869995,184.869995,197000
2020-05-20 09:33:00-04:00,184.910004,184.910004,184.5,184.690002,184.690002,157502
2020-05-20 09:34:00-04:00,184.710007,184.779999,184.520004,184.660004,184.660004,151242
2020-05-20 09:35:00-04:00,184.649994,185.020004,184.639999,184.929993,184.929993,193629
2020-05-20 09:36:00-04:00,184.949997,185.020004,184.830002,184.853104,184.853104,146192
2020-05-20 09:37:00-04:00,184.839996,185.0,184.820007,184.910004,184.910004,78569
2020-05-20 09:38:00-04:00,184.916595,185.0,184.830002,184.880005,184.880005,86758
2020-05-20 09:39:00-04:00,184.860001,185.029999,184.860001,184.990005,184.990005,110310


**Resources**

* https://developers.refinitiv.com/open-permid/intelligent-tagging-restful-apiquick-start
* https://developers.refinitiv.com/article/intelligent-tagging-extract-information-api-response
* https://permid.org/faq
