# RNN for text and sequences

### Reference
* Deep Learning with Python, F. Chollet
* All figures are borrowed from above book

### Preprocessing
* Preprocessing text data into useful representations
* Working with RNN

### Applications of RNN

* Document classification and timeseries classification, such as identifying the topic of an article or the author of a book
* Timeseries comparisons, such as estimating how closely related two documents or two stock tickers are
* Sequence-to-sequence learning, such as decoding an English sentence into French
* Sentiment analysis, such as classifying the sentiment of tweets or movie reviews as positive or negative
* Timeseries forecasting, such as predicting the future weather at a certain location, given recent weather data

### *Vetorizing* text

* Examples
  * Segment text into words, and transform each word into a vector.
  * Segment text into characters, and transform each character into a vector.
* **Tokenization**: segment text into words or charaters
* **Embedding**: transform them into a vectors
  * *one-hot* encoding (sparse vector)
  * **embedding** (dense vector)

Tokenization
<img width="30%" alt="tokenization" src="https://user-images.githubusercontent.com/11681225/46912311-f9f97580-cfac-11e8-9ce1-7aedc99f0982.png">

Word embedding
<img width="30%" alt="word_embedding" src="https://user-images.githubusercontent.com/11681225/46912315-0d0c4580-cfad-11e8-8cb4-cc10a6a1860c.png">


* Two ways of word embedding
  * Learned from data
  * import pre-trained word vector (like word2vec)

```python
# TensorFlow code (one-hot embedding)
one_hot_matrix = tf.get_variable(name='one_hot_embedding',
                                 initializer=tf.eye(num_words, dtype=tf.float32),
                                 trainable=False) 
embeddings = tf.nn.embedding_lookup(params=one_hot_matrix, ids=train_seq)

# TensorFlow code (dense vector embedding)
embedding_matrix = tf.get_variable(name='embedding',
                                   shape=[num_words, embedding_size],
                                   initializer=tf.random_uniform_initializer(minval=-0.1,
                                                                             maxval=0.1))
embeddings = tf.nn.embedding_lookup(params=embedding_matrix, ids=train_seq)

```