# Natural Language Processing for Signal Generation on News Data

### Intraday Sentiment Strategy

To illustrate the power of Sentiment Analysis we'll construct and backtest a simple strategy.
- Trade intraday over the year of 2013
- Companies: 
    - Apple, Microsoft, Boeing, JPMorgan, Google, GM, Citigroup, Ford, Toyota, HSBC, ICAP
- Assume perfect market entry and exit, no transaction fees
- Sentiment score is the confidence of a text being positive or negative.
- Basic strategy: 
    - BUY when 'sentiment_score' >= 'sentiment_cutoff' and SELL 'time_to_close_position' minutes later. 
    - SHORT SELL when 'sentiment_score' <= -'sentiment_cutoff' and BUY 'time_to_close_position' minutes later. 
    - If news is released when market is closed then BUY as soon as it is open.

In [6]:
import numpy as np
import pandas as pd
from datetime import datetime, date, timedelta,time
from ipywidgets import widgets
import matplotlib.pyplot as plt

signal_df = pd.read_csv("../data/news_data/sentiment_backtest_data.csv")

In [7]:
def run_strat(cutoff,delay,txn_fee = 0.0, output_daily_agg = False):
    assert cutoff <=1.0 and cutoff>=0.0
    assert delay > 0 and delay <=60 and delay % 5 ==0

    upper = cutoff
    lower = -cutoff
    score = "sentiment_score"
    buy_signals = np.array([1 if i> upper else 0 for i in signal_df[score]])
    sell_signals = np.array([1 if i< lower else 0 for i in signal_df[score]])
    signals = buy_signals-sell_signals
    longs = buy_signals.sum()
    shorts = sell_signals.sum()
    fees = txn_fee*(buy_signals+sell_signals)
    name = "ret_delay_{}".format(delay)
    datetimes = signal_df.datetime.astype('datetime64[ns]')
    ret = signal_df[name] * signals
    adj_ret = ret - fees
    cum_ret = np.cumsum(ret)
    adj_cum_ret = np.cumsum(adj_ret)
            
    output = """Sentiment News Strategy 
    cutoff strength: {}
    delay in mins: {}
    txn fees/trade: {}
    adj_cum_ret: {}""".format(cutoff,delay,txn_fee,adj_ret.sum())
    #print(output)
    strategy_result = pd.DataFrame({ 
                        "datetime" : datetimes.values,
                        "adj_cum_return" : adj_cum_ret.values,
                       "cum_return" : cum_ret.values,
                       "adj_return" : adj_ret.values,
                       "return" : ret.values})
    if not output_daily_agg:
        return strategy_result
    
    filtered_index = []
    current_date,prev_index = None, 0
    for i,datum in enumerate(strategy_result.itertuples()):
            
        prev_date = strategy_result.iloc[i-1].datetime.date()
        curr_date = strategy_result.iloc[i].datetime.date()
        
        ## if date change:
        if prev_date < curr_date:
            filtered_index.append(i-1)
        
            
        
    return strategy_result.iloc[filtered_index]

In [8]:
def display_backtest(time_till_close_position,sentiment_cutoff):
    strat_results = run_strat(sentiment_cutoff,time_till_close_position,txn_fee = 0.0, output_daily_agg = True)
    fig, ax = plt.subplots()
    dates = strat_results.datetime.map(lambda x: x.date())
    ax.plot_date(x = dates, y=strat_results.cum_return.values,fmt='-b')
    ax.set_ylabel("Cumulative Return")
    datemin = dates.min()
    datemax = dates.max()
    ax.set_xlim(datemin, datemax)

    fig.autofmt_xdate()
    plt.show()

In [9]:
def interactive_backtest():
    widgets.interact(display_backtest,time_till_close_position=(5,60,5),sentiment_cutoff=(0,1.00,0.05))

In [10]:
interactive_backtest()

interactive(children=(IntSlider(value=30, description='time_till_close_position', max=60, min=5, step=5), Floaâ€¦

Another use case of News in trading is the ability to monitor portfolio holdings and mitigate risk. Being able to identify the possibility of a drop in a stock's price or observing that the market is reacting to the release of particular news can be a useful component in managing risk.

### Case Study: Apple cuts iPhone X production due to weak demand:

News reported on the Nikkei on Monday, January 29th revealed that Apple would cut its production target for the iPhone X from 40 to 20 million units. Apple's stock did not react well, in the wake of the reports stock fell even further even after it was already on the downtrend due to earnings reports. In this case we can see that Apple's stock price is correlated to the sentiment on the news articles related to the iPhone.



<img src="../imgs/apple_price.png">

<img src="../imgs/apple_sentiment.png">

### Case Study:  Aimia Inc Receives Notice of Contract Non-renewal from Air Canada

Aimia Inc is a data-driven marketing and loyalty analytics company. On May 11th, 2017 the company announced that Air Canada, its largest client had given its notice of non-renewal. The market responded accordingly with a sharp drop in price. The relative volume of articles on Aimia on the few days leading up to the announcement skyrocketed. A drastic change in the volume can be a signal for redirecting attention to certain companies. 

<img src="../imgs/aimia_price.png">

<img src="../imgs/aimia_vol.png">

### References

- Glove: https://nlp.stanford.edu/projects/glove/
- Fasttext: https://fasttext.cc/
- News articles per day: https://www.slideshare.net/chartbeat/mockup-infographicv4-27900399
- News data source: https://github.com/philipperemy/financial-news-dataset
- Word embeddings: https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/, 
- Natural Language Processing: https://en.wikipedia.org/wiki/Natural-language_processing
- Sentiment Analysis: https://en.wikipedia.org/wiki/Sentiment_analysis
