Skip to content

An LSTM Neural Network for Time Series Forecasting, trained on Wikipedia's Web Traffic dataset from Kaggle.

License

Notifications You must be signed in to change notification settings

IvanBongiorni/RNN_TimeSeries-Forecast

Repository files navigation

Recurrent Neural Network for Time Series Forecasting


This is a time series forecasting project based on the Wikipedia Web Traffic Time Series Forecasting dataset from Kaggle. Two RNN architectures are implemented:

  • A "Vanilla" RNN regressor.
  • A Seq2seq regressor.

Both are implemented in TensorFlow 2, with custom training functions optimized with Autograph.

Structure of the repository

Main files:

  • config.yaml: config file for hyperparameters.
  • dataprep.py: data preprocessing pipeline.
  • train.py: training pipeline.
  • tools.py: contains useful processing functions to be iterated in main pipelines.
  • model.py: builds model.

I also added a visualize_performance.ipynb Jupyter Notebook to visually inspect models' performance on Test data.

Folders:

  • /data_raw/: requires unzipped train_2.csv file from Kaggle. Available is an imputed.csv dataset, containing imputed time series, coming from my other repository on a GAN for imputation of missing data in time series.
  • /data_processed/: divided in /Train/ and /Test/ directories.
  • /saved_models/: contains all saved TensorFlow models, both regressors.
  • /utils/: for pics and other secondary files.

How to run code

After you clone the repository locally, download the raw dataset from Kaggle, and place unzipped train_2.csv file in /data_raw/ folder. Then, time series forecast is executed in two steps. First, run data preprocessing pipeline:

python -m dataprep

This will generate Training+Validation and Test files, stored in /data_processed/ subdirectories. Second, launch training pipeline with:

python -m train

This will either create, train and save a new model, or load and train an already existing one, stored in /saved_models/ folder.

Finally, Test set performance will be evaluated from test.ipynb notebook.

Modules

numpy==1.18.3
pandas==1.0.3
scikit-learn==0.22.2.post1
scipy==1.4.1
tensorflow==2.1.0
tqdm==4.45.0

Hardware

I used a pretty powerful laptop, with 64GB or RAM and NVidia RTX 2070 GPU. I highly recommend GPU training to avoid excessive computational times.