This repo is a documentation for GPS Tech Team's work on Esri's GIS event sequence data. The goal is to get a working predictor for the 1.46GB dataset.
The Repo includes following components:
- Notebook to preprocess the dataset and get n-item sequence
- Notebook to downsize categorical features, perform EDA, and train LSTM model
- Notebook to submit machine learning training job to cluster
- Pipeline for training in Azure Machine Learning Studio, with custom metric logs