## News Named Entity Extraction (NER) and Sentiment Analysis

###### Load dependency libraries 

In [1]:
import pandas as pd
import numpy as np
import threading
from textblob import TextBlob
import psycopg2
import requests
import re
import json
import sqlalchemy as db
from bs4 import BeautifulSoup



**News / NLP Signal Pipeline**

1. **Fetch news** - read in news source via an RSS feed (Feedparser) (**news_download**)
2. **Extract entitities** - perform named entity recognition (NER) on the unstructured text (Open Calais) (**entity_extraction**)
3. **Sentiment analysis** - extract sentiment on the news item (Texblob) (**sentiment_assignment**)
4. **Historical EOD prices** - fetch historical prices (Eodhistoricaldata.com) (**price_download**)
5. **Find signal** - correlate sentiment with price movement (**signal_analysis**)
6. **Backtesting** - back test for PnL performance (Pyfinance) (**sentiment_backtest**)

https://www.altsignals.ai



**Get News from Database**

In [3]:
DB_URL = 'postgres://altsignals:altdata2$2@35.228.179.179:5432/altsignals-beta'
engine = db.create_engine(DB_URL)
conn = engine.connect()
metadata = db.MetaData()
news_item = db.Table('news_item', metadata, autoload=True, autoload_with=engine)
print(news_item.columns.keys())

['news_item_id', 'title', 'summary', 'published', 'link', 'provider', 'language']


In [13]:
# Get only English news for now
get_news_query = db.select([news_item.columns.news_item_id, news_item.columns.summary]).where(news_item.columns.language == 'en')
news_result = conn.execute(get_news_query).fetchall()
for r in news_result:
    print('id:', r['news_item_id'], 'summary:', r['summary'])

id: 41 summary: BATON ROUGE, La., May  11, 2020  (GLOBE NEWSWIRE) -- Lamar Advertising Company (Nasdaq: LAMR) announced today that its wholly owned subsidiary, Lamar Media Corp., has agreed to sell $400.0 million in aggregate principal amount of 4 7/8% Senior Notes due 2029 (the “Notes”) through an institutional private placement.  The proceeds, after the payment of fees and expenses, to Lamar Media of this offering are expected to be approximately $394.5 million.  Subject to customary closing conditions, the closing of this offering is expected on or about May 13, 2020.
id: 44 summary: Diverse chemical company selects Anaqua software and services to unify and enhance global patent and trademark management processes Diverse chemical company selects Anaqua software and services to unify and enhance global patent and trademark management processes
id: 45 summary: HOUSTON, May  11, 2020  (GLOBE NEWSWIRE) -- VAALCO Energy, Inc. (NYSE: EGY, LSE: EGY) today reported operational and financial

**2. Named Entity Extraction (NER) - make an API calls to Thomson Reutersr Intelligent Tagging (TRIT) with news headline content**

In [12]:
entity = db.Table('entity', metadata, autoload=True, autoload_with=engine)
print(entity.columns.keys())

['entity_id', 'type', 'name', 'ric', 'news_item_id']


**2.1 Query TRIT / OpenCalais JSON API**

In [17]:
def get_trit(content):
    headType = "text/raw"
    token = 'oSyQfYcRShExGJmJPXRgr4kOFAsIHqoJ'
    url = "https://api-eit.refinitiv.com/permid/calais"
    payload = content.encode('utf8')
    headers = {
        'Content-Type': headType,
        'X-AG-Access-Token': token,
        'outputformat': "application/json"
    }

    TRITResponse = requests.request("POST", url, data=payload, headers=headers)
    # Load content into JSON object
    JSONResponse = json.loads(TRITResponse.text)
    return JSONResponse

**2.2 Get entities in news**

In [25]:
def add_entity(news_id, summary):
    JSONResponse = get_trit(summary)
    for key in JSONResponse:
        if ('_typeGroup' in JSONResponse[key]):
            if JSONResponse[key]['_typeGroup'] == 'entities':
                print(JSONResponse[key]['_type'] + ", " + JSONResponse[key]['name'])
                entity_ins = entity.insert().values(type = JSONResponse[key]['_type'], 
                                                   name = JSONResponse[key]['name'], 
                                                   news_item_id = news_id)
                conn.execute(entity_ins)

In [26]:
for r in news_result:
    add_entity(r['news_item_id'], r['summary'])

Company, Lamar Media Corp.
Company, Lamar Advertising Company
Currency, USD
IndustryTerm, chemical
Company, VAALCO Energy Inc.
City, HOUSTON
Company, SeaSpine Holdings Corporation
ProvinceOrState, California
IndustryTerm, treatment of spinal disorders
City, CARLSBAD
IndustryTerm, surgical solutions
MedicalCondition, spinal disorders
IndustryTerm, medical technology
Company, 1Life Healthcare Inc.
Person, Amir Dan Rubin
Position, CFO
Position, Chair & CEO
City, SAN FRANCISCO
Person, Bjorn Thaler
ProvinceOrState, California
Company, vCom Solutions
Product, vManager Mobile
IndustryTerm, management software
City, RAMON
Product, 12 software platform, vManager Mobile App 7.0
Technology, SAN
Position, leader
OperatingSystem, Android
Technology, Android
Product, iPhone
Company, s IT
IndustryTerm, software platform
Product, vManager Mobile App 7.0
City, WALTHAM
Company, Great Elm Capital Group Inc.
ProvinceOrState, Massachusetts
City, HOUSTON
Company, Deep Down Inc.
IndustryTerm, oil and gas pro

City, TORONTO
OperatingSystem, VMS
Company, Noront Resources Ltd.
Company, CANADA GOOSE
ProvinceOrState, British Columbia
Company, OTC Markets Group
City, VANCOUVER
Company, Macarthur Minerals Limited
Company, OTC Market Group
Position, Annual General Meeting
Company, Norsk Hydro ASA
Company, Norsk Hydro ASA
Currency, NOK
Country, Norway
Company, Norsk Hydro
Company, Norsk Hydro ASA
Company, DNB Markets
Organization, DNB
Company, Danske Bank
Position, general corporate purposes
Company, S&P
Company, Ventilation
Company, AMERICAN STOCK EXCHANGE
Company, Air Conditioning
Company, Heating
Company, Construction Products
Organization, Securities and Exchange Commission
City, CHICAGO
Company, Continental Materials Corporation
IndustryTerm, industry groups
ProvinceOrState, California
Company, Kaiser Aluminum Corporation
Currency, USD
Position, leader
ProvinceOrState, California
Company, Kaiser Aluminum Corporation
Currency, USD
City, PARIS
Company, Constellium SE
Company, Extruded Solutions
C

KeyboardInterrupt: 

**2.3 Get RIC code for entity**

In [0]:
#Get RIC code

print('====RIC====')
print('RIC')

for entity in JSONResponse:
    for info in JSONResponse[entity]:
        if (info =='resolutions'):
            for companyinfo in (JSONResponse[entity][info]):
                if 'primaryric' in companyinfo:
                    symbol = companyinfo['primaryric']
                    print(symbol)

====RIC====
RIC
VEL.N


**2.4 Get topics for the news item**

In [0]:
#Print Header
print(symbol)
print('====Topics====')
print('Topics, Score')

for key in JSONResponse:
    if ('_typeGroup' in JSONResponse[key]):
        if JSONResponse[key]['_typeGroup'] == 'topics':
            print(JSONResponse[key]['name'] + ", " + str(JSONResponse[key]['score']))

VEL.N
====Topics====
Topics, Score
Business_Finance, 1
Health_Medical_Pharma, 0.935
Disaster_Accident, 0.817


**4. Sentiment Analysis**

In [0]:
# Define function to be used for text senitments analysis 
def get_sentiment(txt): 
        ''' 
        Utility function to clean text by removing links, special characters 
        using simple regex statements and to classify sentiment of passed tweet 
        using textblob's sentiment method 
        '''
        #clean text
        clean_txt = ' '.join(re.sub("(@[A-Za-z0-9]+)|([^0-9A-Za-z \t])|(\w+:\/\/\S+)", " ", txt).split())
        # create TextBlob object of passed tweet text 
        analysis = TextBlob(clean_txt) 
        # set sentiment 
        if analysis.sentiment.polarity > 0: 
            return 'positive'
        elif analysis.sentiment.polarity == 0: 
            return 'neutral'
        else: 
            return 'negative'

In [0]:
print('headline: ', allheadlines[1])
print('headline sentiment: ', get_sentiment(allheadlines[1]))
print('summary: ', summaries[1])
print('summary sentiment: ', get_sentiment(summaries[1]))

headline:  Velocity (VEL) Alert: Johnson Fistel Investigates Velocity Financial, Inc.; Investors Suffering Losses Encouraged to Contact Firm
headline sentiment:  negative
summary:  <p>SAN DIEGO, April  26, 2020  (GLOBE NEWSWIRE) -- Shareholder rights law firm Johnson Fistel, LLP is investigating potential violations of the federal securities laws by Velocity Financial, Inc. ("Velocity" or "the Company") (NYSE: VEL).<br></p>
summary sentiment:  negative


**5. Get historical EOD price data**

In [0]:
eod_api_token = '5cc0ea63d1cda3.37070012'
eod_symbol = symbol.replace('N', 'US')
eod_price_url = 'https://eodhistoricaldata.com/api/eod/' + eod_symbol + '?api_token=' + eod_api_token
price_df = pd.read_csv(eod_price_url)
price_df.sort_values(by=['Date'], inplace=True, ascending=False)
price_df.head()


Unnamed: 0,Date,Open,High,Low,Close,Adjusted_close,Volume
68,3081,,,,,,
67,2020-04-24,3.22,3.22,3.03,3.09,3.09,31200.0
66,2020-04-23,3.13,3.32,3.06,3.14,3.14,68200.0
65,2020-04-22,3.29,3.361,3.1,3.16,3.16,89500.0
64,2020-04-21,3.42,3.42,3.05,3.15,3.15,112400.0
