# Natural Language Processing for Signal Generation on News Data

<--- TODO: Review usecase goals --->

### Goals of the usecase

1. Building Deep Neural Networks to process and interpret news data.
2. Understand the various building blocks of making a NLP system.
3. Backtest and apply the models to news data for signal generation.

<--- TODO: Review background and concepts related to NLP --->

### Background/Concepts

**What is NLP**

Natural Language Processing (NLP) is a field of artificial intelligence that models the interaction between human (natural) language and computers. The various tasks NLP models can be split into several categories:
1. Syntax: Parsing, part-of-speech tagging, morphological segmentation, stemming...
2. Semantics: Machine translation, sentiment analysis, natural language understanding...
3. Discourse: Automatic summarization (TL:DR)...
4. Speech: Speech recognition, text-to-speech...

In the early days of NLP (pre-1980s) most of these tasks were accomplished with many hand-crafted rules. With the introduction of machine learning and the steady increase of computational power, more NLP models were being built with statistical learning on natural language corpus.

Fast forward to today, where Deep learning models are achieving state-of-the-art performance in a variety of vision and NLP tasks. The combination of larger datasets, advancement in GPU technology, increased research into deep learning architectures/applications and the vast number of problems a modern company deals with means that today's Data Scientist must be capable of deploying Deep Learning solutions. 

**How does Computer Understand Text**

Maching learning and deep learning algorithms can't directly take character/string as input.Therefore, We first have to represent the features(meaning, syntactic structure, ...) from the text by numerical data strcutures before we apply any ML algorithm.
<img src="../imgs/understand_text.png">

**Word Embeddings**

A Word Embedding is a mathematical mapping from a vast dimensional space where each word occupies a dimension to a reduced-dimension, continuous vector space. Typically a large corpus of text is used to train and develop these embeddings.
The embedded word vectors captures the meaning of words. Each dimension in these vector cpatures part of meaning of the corresponding word. This allows the result vector to have several neat features:
* Nearest Neighbors - the cosine similarity between two word vectors is can be an effective measure of the linguistic or semantic similarity of the corresponding words.
* Linear substructures - in contrast to the cosine similarity, an great deal of information is captured in the vector differences between word vectors. GloVe tries to captures the information pertaining to the relationships between words and this can be showcased through the vector differences. 
<img src="../imgs/Word-Vectors.png">
There are various methods to generating word embeddings. These include neural networks, probabilistic models, dimensionality reduction etc.




<--- TODO: Review concepts and background on sentiment analysis, find examples --->

**Sentiment Analysis for News**

It is undeniable that following the news release of a story with strong impact on an industry or company the market prices intraday will react accordingly. For the average person with an investment account, the news is a substantial signal in the decision making process to buy/sell a certain stock. However to make a systematic approach to trading on news signals is simply impossible for a human to do manually. 

- There are 92000+ news article released per day
- An average human can read at a speed of 200-250 words per minute
- Reading at this rate, for 8 hours continuously, a human may process up to 40-50 articles per day. This calculation disregards the time it takes to find the articles and make any analysis.

In Finance, the efficiency and speed at which you process information can be vital for making well-informed, smart decisions. We can leverage deep learning in order to train models that provide sentiment scores for headlines, articles, tweets, and posts. These sentiments can produce valuable signals to support a buy/sell/hold decision as well as valuation models. 

**Multi-channel LSTM network for Sentiment Classification**

 A multi-channel network simply means we can use more than one type of embedding. This means that the network has access to more features from seperate word embeddings. The idea is that a single type of embedding may not contain enough information, By utilizing another embedding, that is either trained with a different corpus or a different algorithm entirely we can have stronger more robust features. 


### The Complete Model
<img src="../imgs/model_overview.png">

<--- TODO: Packages and Configurations: Review packages information --->

### Toolkit Packages

### GloVe

https://nlp.stanford.edu/projects/glove/

GloVe (Global Vectors for Word Representation) is an unsupervised learning  algorithm. It trains on the word-to-word co-occurrence statistics from a corpus and attempts to learn word vectors such that their dot product equals the logarithms of the words' probability of co-occurrence. 

### FastText
FastText is a library for text classification and representation learning developed by Facebook AI Research. Its focus is on speed and scalability while maintaining comparable levels of performance compared to other methods. FastText provides two methods for computing word representations from a corpus. Both define a supervised learning task in which by learning this task well will generate useful word vectors. 

### Skipgram
The Skipgram model attempts to utilize given word to predict the word(s) surrounding it. Skipgram thus learns the likelihood of a word being present based on the occurence of the word(s) that appear near it in the corpus. You can think of the task as predicting the context given a word.

### Continuous Bag of Words (Cbow)
The CBow method is the inverse, instead it takes a bag of words surrounding the target word and attempts to make a prediction. You can think of this as predicting the word given the context. 

### References

- Glove: https://nlp.stanford.edu/projects/glove/
- Fasttext: https://fasttext.cc/
- News articles per day: https://www.slideshare.net/chartbeat/mockup-infographicv4-27900399
- News data source: https://github.com/philipperemy/financial-news-dataset
- Word embeddings: https://www.analyticsvidhya.com/blog/2017/06/word-embeddings-count-word2veec/, 
- Natural Language Processing: https://en.wikipedia.org/wiki/Natural-language_processing
- Sentiment Analysis: https://en.wikipedia.org/wiki/Sentiment_analysis
