# Sequences

### Pick **one** of these two topics.

The lecture notebooks are adapted from the [official repository](https://github.com/fchollet/deep-learning-with-python-notebooks) and can be used as starter code.

---

## 1. Time Series with RNNs


1. Download and prepare the timeseries data from listings 10.1 to 10.7.
    ```
    # !wget https://s3.amazonaws.com/keras-datasets/jena_climate_2009_2016.csv.zip
    # !unzip jena_climate_2009_2016.csv.zip
    ```
     
2. Build an LSTM network, importing code from the first recurrent baseline section (DLWP, 10.2.5). 

3.  Experiment with different hyperparameters to improve the results of advanced RNN models in DLWP 10.4.

4. Suggestions for experiments:

    - Adjust the number of units in each recurrent layer;
    - Try different learning rates;
    - Substitute LSTM layers for GRU layers;
    - Try adding a recurrent regularizer;
    - Try bigger densely connected regressors on top of the recurrent base, or even a stack of dense layers;
    - Try stacking more layers, like in listing 10.23, or make it bidirectional (listing 10.24);    
    - Try and reproduce the non-recurrent results from DLWP 10.2.2 to 10.2.5 and compare those with what you have:
      - The commonsense baseline (listing 10.9);
      - The feedforward net (listing 10.10);
      - The 1D convnet from DLWP p.291.


---

## 2. Text classification with a Transformer

1. Download the data and load it into datasets as is done in section 11.3.1 (there is a modified version of the dataset-building code in the lecture notebooks): 
    ```bash
    !curl -O https://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz
    !tar -xf aclImdb_v1.tar.gz
    !rm -r aclImdb/train/unsup
    ```  
       
    Note that if you already downloaded this data for previous experiments, you can create a symlink:
    ```bash
    ln -s path/to/aclImdb [newName]
    ```
     
2. Import the code for text classification with the Transformer Encoder from listings 11.21, 11.24 and 11.25 and train it.

3. Experiment with different hyperparameters. How do the results compare with the first method we used in Topic 3 (you can find experiments in the lecture notebook `3-getting-started/04.1.classifying-movie-reviews-imdb.ipynb`)? 

4. Suggestions for experiments:

    - Reproduce the steps from DLPW listings 11.12 and 11.17, and compare the performance of your bidirectional LSTM with the transformer you built above.
    - Test pretrained embeddings: download the *GloVe* file as described in listing 11.18 and 11.19. The comment after listing 11.20 mentions that in this case, the *GloVe* embeddings do not make much of a difference. Can you reproduce this result? If you reduce the size of your dataset, do you notice a threshold at which the model pretrained embedding works better than the one trained from scratch?
    - Reproduce the steps from DLWP section 11.3 (using the model-building utility from listing 11.5), and compare your results with what you have:
        - Create a bag-of-word model (listing 11.3);
        - Create a n-grams (n > 1) model (listings 11.7 and 11.8).
    - Advanced: can you modify and encapsulate your Transformer model so that it is easier to change its architecture (number of layers, number of heads, etc.)?
    


---

#### Remember to run your best performing models on the test set.