NLP Systems & Future Directions
===

---
By the end of this class you should be to:
---

- Describe the workflow for Data Science products
- Be inspired to check out tensors and deep learning

---
Data Science Workflow
---

![](images/crispdm_process_diagram.png)
Cross-Industry Standard Process for Data Mining (CRISP-DM) 

---
My Data Science Workflow
---

1. Ask
2. Acquire
3. Process
4. Model
5. Deliver

[Specific example here](http://www.slideshare.net/foco24/a-data-science-workflow-nonprofit-edition)

People, especially data scientist, over-focus on the modeling step. For example, we have 2 (or more classes) on modeling but no dedicated class for processing or delivering.

You will have to repeat each step many times, thus it makes sense to create a pipeline that automatic building of each step.

---
Data Science Project Management 101
---

<img src="http://img.pandawhale.com/KBP1hI-software-engineering-tree-swin-pDYw.jpeg" style="width: 400px;"/>

> Do the simplest thing that could possibly work

[Core tenet of Agile](http://c2.com/cgi-bin/wiki?DoTheSimplestThingThatCouldPossiblyWork)

---
> Satisfy the customer through early and continuous delivery of valuable software

[Delivering Data Science](https://github.com/ianozsvald/data_science_delivered)

---

<img src="http://ichef.bbci.co.uk/news/660/media/images/62558000/jpg/_62558092_15.thecountand8-richardtermine.jpg" style="width: 400px;"/>

> Data science is mostly counting and logistic regression

---
Check for understanding
---

<details><summary>
Why is Data Science mostly counting and logistic regression?
</summary>
If you don't have counts correct, you can't do anything else. When you have counts (especially lots of counts), you can fit models.
<br>
Logistic regression is just binary linear model. Simple to calculate and interpret.
</details>

----
Visualizations: The alapha and omega of data
----

![](images/pie.jpg)

Visualizations are way to understand raw data, modeling results, and to communicate to people.

[A collection of text visualizations](http://textvis.lnu.se/)

---
NL is a great UI
---

![](http://static.boredpanda.com/blog/wp-content/uploads/2015/11/poor-design-decisions-481__605.jpg)
Design is important

We use NL every day to communicate with each other. In the future, we'll use it to communicate more and more with computers. It is artificial that we have to type and use screens. UI will evolve to become transparent. NLP will be a force multiplier for that transition

As an intermediate step chat interface. It is silly that there are some many apps. I predict in the future. There will be meta-apps. That are AI chatbots that will give you information and do things for you.

__Reasons to be bullish on chat at UI__:

- Chat is a common interface for many APIs / IoT (the Internet of Things)
- Every user interaction is also a survey
- UI is testable, while the surface area of a visual interface is almost untestable
- The UI is a log file

[Chat as UI 1](http://www.wired.com/2015/06/future-ui-design-old-school-text-messages/) 
[Chat as UI 2](https://medium.com/@acroll/on-chat-as-interface-92a68d2bf854#.i9rhhfql3)

---
NLP algorithms: Application beyond words
----

NLP has borrowed from other disciplines (Math, Statistics, and Machine Learning). Now it is starting to give back. 

For example, word2vec is applied to clinical diagnosis. A research could clinical diagnosis notes to find the co-occurrence of disease and symptoms. There is too much medical research for any doctor to up on it all.

Casey Stella treated discrete clinical events (i.e. Diagnoses, drugs prescribed, etc.) in a medical dataset as non-textual "words”. He found that heart disease or hardening of the arteries is associated with "Personal history of peptic ulcer disease". Partially due to smokers having a higher than average incidence of peptic ulcers and atherosclerosis. 

In special topics, we are going to take about how probabilistic graphical models (PGMs) can be used to understand disease. 

[Using Natural Language Processing on Non-Textual Data with MLLib](https://www.youtube.com/watch?v=rWphXAdcoe0)   
[Slides](http://www.slideshare.net/gpano/natural-language-processing-on-nontextual-data)   
[Code](https://github.com/cestella/presentations/blob/master/NLP_on_non_textual_data/src/main/ipython/clinical2vec.ipynb)

---
What the hell is a tensor?
---

A tensor is a multidimensional array.

![](images/tensor_intro.png)

![](images/tensor_dim.png)

---
Why tensors?
---

![](images/higher_order.png)

Tensors can model higher order relationships. 

![](images/relationships.png)

If you want to only represent pair-wise relationships, say co-occurrence of every pair of words in a set of documents, then use a matrix. On the other hand, if you want to learn the probability of a range of triplets of words, then we need a tensor to record that relationship. 

Tensors can extend beyond document-term matrix to document-term-year-author tensor.

---
Tensors: The way of the future
----

The Data Science community is just starting to develop the wide-spread knowledge base, language, and tools to handle tensors.

---
"Tensor Methods - A New Paradigm for Training Probabilistic Models, Neural Networks and Reinforcement Learning"

https://www.youtube.com/watch?v=YpnlAQTY1Mc

http://www.slideshare.net/SessionsEvents/animashree-anandkumar-electrical-engineering-and-cs-dept-uc-irvine-at-mlconf-sf-111315

---
Deep Learning is eating machine learning
----

![](images/current_ml.png)

![](images/future_ml.png)

We spent much of the course learning how to do feature engineering. Deep learning automates much of that process.

---
Types of neural networks
---

| Abbreviation|  Name | Description |  
|:-------:|:------ | :------| 
| [NN](https://en.wikipedia.org/wiki/Artificial_neural_network) | Neural Network | Restricted Boltzmann with at least 1 hidden layer |
| [CNN](https://en.wikipedia.org/wiki/Convolutional_neural_network) | Convolutional Neural Network | Like vision, learns reoccurring features in a visual field |
| [RNN](https://en.wikipedia.org/wiki/Recurrent_neural_network)| Recurrent Neural Network| Learns sequences through a type of short-term memory |
| [LTSM](https://en.wikipedia.org/wiki/Long_short-term_memory) | Long short-term memory | Like RNN but can learn between long time lags | 

---
NLP is just Time Series disguise
---

Recurrent neural networks (RNN) are awesome!

<img src="images/charseq.jpeg" style="width: 400px;"/>

<img src="images/rnn_unit.png" style="width: 400px;"/>

RNN learns sequences, in particular mapping of sequence-to-sequence

Many challenges can be described as sequence-to-sequence.

Examples:

- Translation
- Image captioning


---
RNNs for NLP
---

![](http://simaaron.github.io/images/RNN_arc_2.png)

2 RNNs going into different directions

The forward RNN reads the input sequence from start to end, while the backward RNN reads it from end to start. 

The two RNNs are stacked on top of each others and their states are typically combined by appending the two vectors. 

![](https://devblogs.nvidia.com/wp-content/uploads/2015/07/Figure2_biRNN.png)

Bidirectional RNNs are often used in Natural Language problems, where we want to take the context from both before and after a word into account before making a prediction.


---
Sources
----
[Bidirectional Recurrent Neural Networks as Generative Models](https://papers.nips.cc/paper/5651-bidirectional-recurrent-neural-networks-as-generative-models.pdf)
- [Implementation in TensorFlow](https://github.com/aymericdamien/TensorFlow-Examples/blob/master/examples/3%20-%20Neural%20Networks/bidirectional_rnn.py)

---
Rap Battle with RNNs
---

<img src="http://orig03.deviantart.net/2ae0/f/2013/326/1/c/unit_mmlp2__the_rap_bot_by_frederickwalter-d6v5g7c.jpg" style="width: 400px;"/>

>We describe an unconventional line of attack in our quest to teach machines how to rap battle by improvising hip hop lyrics on the fly, in which a novel recursive bilingual neural network, TRAAM, implicitly learns soft, context-dependent generalizations over the structural relationships between associated parts of challenge and response raps, while avoiding
the exponential complexity costs that symbolic models would require. 

>TRAAM learns feature vectorssimultaneously using context from both the
challenge and the response, such that challengeresponse association patterns with similar structure tend to have similar vectors. Improvisation is modeled as a quasi-translation learning problem, where TRAAM is trained to improvise fluent and rhyming responses to challenge lyrics.

| Challenge | Model Response  |  
|:-------:|:------:|
| picture on the quota its time to roll | ya for a quarter i mind and soul |
| thug deep in my soul that got me bugged | love you on the control the drugs |  
| nights of 51 jipped be light on this cash | in the concrete mics you right in the ass |
| what would i do | just me and you |
| we get rid of the child | and the number and a wild |

[Source](http://ijcai.org/Proceedings/15/Papers/358.pdf)

---
Additional Reading on Neural Networks
---
- [A Primer on Neural Network Models for Natural Language Processing](http://u.cs.biu.ac.il/~yogo/nnlp.pdf)
- [Compositionality with Deep Neural Networks](https://staff.fnwi.uva.nl/e.kanoulas/wp-content/uploads/Lecture-6-2-Compositinality-with-Deep-Neural-Networks.pdf) 
- [Deep learning for NLP by Standford](http://cs224d.stanford.edu/)
- [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)

---
NLP skills are in high demand
----

![](images/desperate_recruiter.png)

---
Summary
---

- Always be adding value. Think about your work.
- Keep one eye on the future: chat, visualization, tensors, and deep learning.

<br>
<br> 
<br>

----