# RNN for text and sequences

### Reference
* Deep Learning with Python, F. Chollet


### Preprocessing
* Preprocessing text data into useful representations
* Working with RNN

### Applications of RNN

* Document classification and timeseries classification, such as identifying the topic of an article or the author of a book
* Timeseries comparisons, such as estimating how closely related two documents or two stock tickers are
* Sequence-to-sequence learning, such as decoding an English sentence into French
* Sentiment analysis, such as classifying the sentiment of tweets or movie reviews as positive or negative
* Timeseries forecasting, such as predicting the future weather at a certain location, given recent weather data

### *Vetorizing* text

* Examples
  * Segment text into words, and transform each word into a vector.
  * Segment text into characters, and transform each character into a vector.
* **Tokenization**: segment text into words or charaters
* **Embedding**: transform them into a vectors
  * *one-hot* encoding (sparse vector)
  * **embedding** (dense vector)

Tokenization
<img src="../figures/tokenization.png" width="30%">

Word embedding
<img src="../figures/word_embedding.png" width="30%">

* Two ways of word embedding
  * Learned from data
  * import pre-trained word vector (like word2vec)

```python
# TensorFlow code (one-hot embedding)
one_hot = tf.get_variable(name='one_hot_embedding',
                          initializer=tf.eye(num_words, dtype=tf.float32),
                          trainable=False) 
inputs = tf.nn.embedding_lookup(params=one_hot, ids=train_data)

# TensorFlow code (dense vector embedding)
one_hot = tf.get_variable(name='embedding',
                          shape=[max_features, embedding_size],
                          initializer=tf.random_uniform_initializer())
inputs = tf.nn.embedding_lookup(params=one_hot, ids=train_data)

```