## News Named Entity Extraction (NER) and Sentiment Analysis

###### Load dependency libraries 

In [1]:
!pip install mysql-connector-python
!pip install feedparser
!pip install textblob
!pip install sqlalchemy
!pip install bs4
import pandas as pd
import numpy as np
import threading
from textblob import TextBlob
import psycopg2
import yaml
import feedparser
import requests
import re
import json
import mysql.connector
import sqlalchemy as db
from bs4 import BeautifulSoup



**News / NLP Signal Pipeline**

1. **Fetch news** - read in news source via an RSS feed (Feedparser)
2. **Extract entitities** - perform named entity recognition (NER) on the unstructured text ( Thomson Reuters Intelligent Tagging (TRIT) / Refinitiv Open Calais)
3. **Filter on news and entities** - filter on entities and events of interest
4. **Sentiment analysis** - extract sentiment on the news item (Texblob)
5. **Find signal** - correlate sentiment with price movement
6. **Historical EOD prices** - fetch historical prices (Eodhistoricaldata.com)
6. **Backtesting** - back test for PnL performance (internal or Zipline)

https://www.altsignals.ai



In [10]:
RSS_PATH = 'config/altsignals-news-rss.yaml'

with open(RSS_PATH) as file:
    news_urls = yaml.full_load(file)
print(news_urls)

{'globenewswire-us': 'http://www.globenewswire.com/RssFeed/country/United%20States/feedTitle/GlobeNewswire%20-%20News%20from%20United%20States', 'globenewswire-ma': 'https://www.globenewswire.com/RssFeed/subjectcode/27-Mergers%20And%20Acquisitions/feedTitle/GlobeNewswire%20-%20Mergers%20And%20Acquisitions', 'globenewswire-earnings': 'http://www.globenewswire.com/RssFeed/subjectcode/13-Earnings%20Releases%20And%20Operating%20Results/feedTitle/GlobeNewswire%20-%20Earnings%20Releases%20And%20Operating%20Results', 'globenewswire-public': 'http://www.globenewswire.com/RssFeed/orgclass/1/feedTitle/GlobeNewswire%20-%20News%20about%20Public%20Companies', 'globenewswire-basic-materials': 'http://www.globenewswire.com/RssFeed/industry/1000-Basic%20Materials/feedTitle/GlobeNewswire%20-%20Industry%20News%20on%20Basic%20Materials', 'globenewswire-aluminum': 'http://www.globenewswire.com/RssFeed/industry/1753-Aluminum/feedTitle/GlobeNewswire%20-%20Industry%20News%20on%20Aluminum', 'globenewswire-coa

**1. Fetch news from RSS feed**

In [11]:
# Define function to be used for text senitments analysis 
def get_clean_txt(txt): 
        clean = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", txt).split())
        return clean
    
def clean_html(raw_html):
  cleantext = BeautifulSoup(raw_html, "lxml").text
  return cleantext

def get_sentiment(txt): 
        #clean text
        clean_txt = get_clean_txt(txt)
        # create TextBlob object of passed tweet text 
        analysis = TextBlob(clean_txt) 
        # set sentiment 
        return analysis.sentiment.polarity

In [12]:
DB_URL = 'postgres://altsignals:altdata2$2@35.228.179.179:5432/altsignals-beta'
engine = db.create_engine(DB_URL)
conn = engine.connect()
metadata = db.MetaData()
news = db.Table('news_item', metadata, autoload=True, autoload_with=engine)
print(news.columns.keys())

['news_item_id', 'title', 'summary', 'published', 'link', 'provider', 'language']


In [13]:
from datetime import datetime
from dateutil.parser import parse

# Function grabs the rss feed headlines (titles) and returns them as a list
def store_item( rss_url, key ):
    feed = feedparser.parse( rss_url )
    for newsitem in feed['items']:
        print('storing item: ', newsitem['title'])
        ins = news.insert().values(title = newsitem['title'], 
                                   summary = clean_html(newsitem['summary']), 
                                   published = parse(newsitem['published']), 
                                   link = newsitem['link'], 
                                   provider = key,
                                   language = newsitem['language'])
        conn.execute(ins)

      
# Iterate over the feed urls
for key,url in news_urls.items():
    store_item( url, key )
    # Call getHeadlines() and combine the returned headlines with allheadlines
    #news_timer = threading.Timer(30.0, )

#news_timer.start()

storing item:  Lamar Advertising Company Prices Private Offering of Senior Notes
storing item:  住友化学株式会社のイノベーションを アナクア社の知財管理プラットフォームAQXがサポート
storing item:  스미토모 화학, Anaqua의 AQX IP 관리 플랫폼 채택
storing item:  Sumitomo Chemical Innovates with Anaqua’s AQX IP Management Platform
storing item:  VAALCO Energy, Inc. Announces First Quarter 2020 Results
storing item:  SeaSpine to Participate in a Virtual Fireside Chat with BTIG
storing item:  1Life Healthcare (One Medical) Executives to Participate in Upcoming Investor Conference
storing item:  vCom’s vManager Mobile App 7 Expands Enterprise Ability to Manage IT Spend on the Go
storing item:  Great Elm Capital Group, Inc. Schedules Fiscal Third Quarter 2020 Earnings Release and Conference Call
storing item:  Deep Down Reports First Quarter 2020 Results
storing item:  EVO Payments to Participate in Upcoming Virtual Investor Conferences
storing item:  Garrison Capital Inc. Declares Second Quarter 2020 Distribution of $0.15 Per Share and Announces 

storing item:  Former Award-Winning Hedge Fund Manager in Natural Resources To Host Webcast/Conference Call to Discuss the State of the Exploration Market
storing item:  Wesdome Announces Latest Drill Results From Kiena Deep A Zone and Resumption of Drilling and Development Activities
storing item:  Plateau Energy Metals Announces Concurrent Private Placement
storing item:  EURO Ressources reports earnings for the first quarter ended March 31, 2020
storing item:  EURO Ressources annonce ses résultats pour le premier trimestre clos le 31 mars 2020
storing item:  Field work to resume at Bramaderos Au Cu Project, Ecuador
storing item:  NOVA Chemicals and Enerkem Collaborate to Close the Loop on Plastics Recycling
storing item:  Norsk Hydro: Minutes from the Annual General Meeting 2020
storing item:  Norsk Hydro: Protokoll for generalforsamlingen 2020
storing item:  Gulf Resources Announces the discovery of Natural Gas Belt by Petro China in Tianbao Towenship
storing item:  Gold X to Merge

storing item:  Gem International Resources Inc. Announces $500,000 Non-Brokered Private Placement
storing item:  MoissaniteCo Reviews
storing item:  Defense Metals Drills 4.32% Total Rare Earth Oxide Over 64 Metres From Surface And Extends Mineralized Zone At Wicheeda Rare Earth Element Deposit
storing item:  Olivut Resources Ltd. Share Option Grant
storing item:  Gem International Resources Inc. Provides Corporate Update
storing item:  Acadian Timber Corp. Announces Election of Directors
storing item:  Glatfelter Increases Dividend by 3.8% to $0.135 per Common Share
storing item:  Western Announces First Quarter 2020 Results and Suspends Quarterly Dividend
storing item:  Acadian Timber Corp. Reports First Quarter Results
storing item:  Glatfelter Reports First Quarter 2020 Results
storing item:  Acadian Timber Corp. Announces Change of Location for Annual and Special Meeting of Shareholders
storing item:  SFI Community Grants Strengthen Connections Between Forests and Communities to A

storing item:  Information au 7 mai 2020
storing item:  Intrepid Announces First Quarter 2020 Results
storing item:  Noranda Income Fund Confirms First Quarter 2020 Results Release Date
storing item:  Le Fonds de revenu Noranda annonce la date des résultats du premier trimestre 2020
storing item:  IBC Advanced Alloys to Release Fiscal Third Quarter 2020 Financial Results After Market Close on Monday, May 11, 2020
storing item:  Eramet: Documents available for consultation for the Combined Shareholders’ General Meeting to be held behind closed doors on 26 May 2020
storing item:  Eramet : Mise à disposition des documents pour l’Assemblée Générale Mixte, tenue à huis clos le 26 mai 2020
storing item:  Ferroglobe PLC Announces Delay in Filing Form 20-F Due to COVID-19
storing item:  Financial information at March 31, 2020 and request for extension of the ongoing discussions with the financial partners
storing item:  Information financière au 31 mars 2020 et demande de prolongation des disc

In [0]:
# A list to hold all headlines and summaries
allheadlines = []
summaries = []

# Iterate over the feed urls
for key,url in newsurls.items():
    # Call getHeadlines() and combine the returned headlines with allheadlines
    allheadlines.extend( getHeadlines( url ) )
    summaries.extend( getSummaries( url ) )

**1.3 View headlines**

In [0]:
# Iterate over the allheadlines list and print each headline
for hl in allheadlines:
    print(hl)

Velocity (VEL) Alert: Johnson Fistel Investigates Velocity Financial, Inc.; Investors Suffering Losses Encouraged to Contact Firm
ALIGN DEADLINE ALERT: Faruqi & Faruqi, LLP Encourages Investors Who Suffered Losses Exceeding $100,000 In Align Technology, Inc. To Contact The Firm
ALLAKOS DEADLINE ALERT: Faruqi & Faruqi, LLP Encourages Investors Who Suffered Losses Exceeding $50,000 In Allakos Inc. To Contact The Firm
FUNKO LEAD PLAINTIFF DEADLINE ALERT: Faruqi & Faruqi, LLP Encourages Investors Who Suffered Losses Exceeding $50,000 In Funko, Inc. To Contact The Firm
WWE DEADLINE ALERT: Faruqi & Faruqi, LLP Encourages Investors Who Suffered Losses Exceeding $50,000 in World Wrestling Entertainment, Inc. to Contact the Firm
ROSEN, A GLOBALLY RECOGNIZED LAW FIRM, Reminds Golden Star Resources Ltd. Investors of Important Deadline in Securities Class Action – GSS
ROSEN, A GLOBALLY RECOGNIZED LAW FIRM, Reminds LogicBio Therapeutics, Inc. Investors of the Important Deadline in Securities Class 

**1.3 View news summaries**


In [0]:
# Iterate over the summaries list and print each summary
# TODO: see if HTML chars can be removed
for s in summaries:
    print(s)

<p>SAN DIEGO, April  26, 2020  (GLOBE NEWSWIRE) -- Shareholder rights law firm Johnson Fistel, LLP is investigating potential violations of the federal securities laws by Velocity Financial, Inc. ("Velocity" or "the Company") (NYSE: VEL).<br></p>
<p align="justify">NEW YORK, April  26, 2020  (GLOBE NEWSWIRE) -- Faruqi &amp; Faruqi, LLP, a leading national securities law firm, reminds investors in Align Technology, Inc. (“Align” or the “Company”) (NASDAQ:ALGN) of the May 1, 2020 deadline to seek the role of lead plaintiff in a federal securities class action that has been filed against the Company.</p>
<p>NEW YORK, April  26, 2020  (GLOBE NEWSWIRE) -- Faruqi &amp; Faruqi, LLP, a leading national securities law firm, reminds investors in Allakos Inc. (“Allakos” or the “Company”) (NASDAQ:ALLK) of the May 11, 2020 deadline to seek the role of lead plaintiff in a federal securities class action that has been filed against the Company.</p>
<p align="left">NEW YORK, April  26, 2020  (GLOBE NE

**2. Named Entity Extraction (NER) - make an API calls to Thomson Reutersr Intelligent Tagging (TRIT) with news headline content**

In [0]:
# Permid / OpenCalais for API access token: oSyQfYcRShExGJmJPXRgr4kOFAsIHqoJ
# https://developers.refinitiv.com/open-permid/intelligent-tagging-restful-api/quick-start
# https://developers.refinitiv.com/article/intelligent-tagging-extract-information-api-response

In [0]:
# Define sample content to be queried
contentText = allheadlines[1]
print(contentText)

Velocity (VEL) Alert: Johnson Fistel Investigates Velocity Financial, Inc.; Investors Suffering Losses Encouraged to Contact Firm


**2.1 Query TRIT / OpenCalais JSON API**

In [0]:
headType = "text/raw"
token = 'oSyQfYcRShExGJmJPXRgr4kOFAsIHqoJ'
url = "https://api-eit.refinitiv.com/permid/calais"
payload = contentText.encode('utf8')
headers = {
    'Content-Type': headType,
    'X-AG-Access-Token': token,
    'outputformat': "application/json"
    }

TRITResponse = requests.request("POST", url, data=payload, headers=headers)
# Load content into JSON object
JSONResponse = json.loads(TRITResponse.text)
# print(json.dumps(JSONResponse, indent=4, sort_keys=True))

**2.2 Get entities in news**

In [0]:
#Get Entities
print('====Entities====')
print('Type, Name')

for key in JSONResponse:
    if ('_typeGroup' in JSONResponse[key]):
        if JSONResponse[key]['_typeGroup'] == 'entities':
            print(JSONResponse[key]['_type'] + ", " + JSONResponse[key]['name'])

====Entities====
Type, Name
Company, JOHNSON FISTEL
Company, velocity financial, inc.


**2.3 Get RIC code for entity**

In [0]:
#Get RIC code

print('====RIC====')
print('RIC')

for entity in JSONResponse:
    for info in JSONResponse[entity]:
        if (info =='resolutions'):
            for companyinfo in (JSONResponse[entity][info]):
                if 'primaryric' in companyinfo:
                    symbol = companyinfo['primaryric']
                    print(symbol)

====RIC====
RIC
VEL.N


**2.4 Get topics for the news item**

In [0]:
#Print Header
print(symbol)
print('====Topics====')
print('Topics, Score')

for key in JSONResponse:
    if ('_typeGroup' in JSONResponse[key]):
        if JSONResponse[key]['_typeGroup'] == 'topics':
            print(JSONResponse[key]['name'] + ", " + str(JSONResponse[key]['score']))

VEL.N
====Topics====
Topics, Score
Business_Finance, 1
Health_Medical_Pharma, 0.935
Disaster_Accident, 0.817


**4. Sentiment Analysis**

In [0]:
# Define function to be used for text senitments analysis 
def get_sentiment(txt): 
        ''' 
        Utility function to clean text by removing links, special characters 
        using simple regex statements and to classify sentiment of passed tweet 
        using textblob's sentiment method 
        '''
        #clean text
        clean_txt = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", txt).split())
        # create TextBlob object of passed tweet text 
        analysis = TextBlob(clean_txt) 
        # set sentiment 
        if analysis.sentiment.polarity > 0: 
            return 'positive'
        elif analysis.sentiment.polarity == 0: 
            return 'neutral'
        else: 
            return 'negative'

In [0]:
print('headline: ', allheadlines[1])
print('headline sentiment: ', get_sentiment(allheadlines[1]))
print('summary: ', summaries[1])
print('summary sentiment: ', get_sentiment(summaries[1]))

headline:  Velocity (VEL) Alert: Johnson Fistel Investigates Velocity Financial, Inc.; Investors Suffering Losses Encouraged to Contact Firm
headline sentiment:  negative
summary:  <p>SAN DIEGO, April  26, 2020  (GLOBE NEWSWIRE) -- Shareholder rights law firm Johnson Fistel, LLP is investigating potential violations of the federal securities laws by Velocity Financial, Inc. ("Velocity" or "the Company") (NYSE: VEL).<br></p>
summary sentiment:  negative


**5. Get historical EOD price data**

In [0]:
eod_api_token = '5cc0ea63d1cda3.37070012'
eod_symbol = symbol.replace('N', 'US')
eod_price_url = 'https://eodhistoricaldata.com/api/eod/' + eod_symbol + '?api_token=' + eod_api_token
price_df = pd.read_csv(eod_price_url)
price_df.sort_values(by=['Date'], inplace=True, ascending=False)
price_df.head()


Unnamed: 0,Date,Open,High,Low,Close,Adjusted_close,Volume
68,3081,,,,,,
67,2020-04-24,3.22,3.22,3.03,3.09,3.09,31200.0
66,2020-04-23,3.13,3.32,3.06,3.14,3.14,68200.0
65,2020-04-22,3.29,3.361,3.1,3.16,3.16,89500.0
64,2020-04-21,3.42,3.42,3.05,3.15,3.15,112400.0
