Team Members:
- Rupert Chaplin
- Artemis Dampa
- Megane Martinez
This project explores a number of different techniques to tackle a hierarchical load forecasting problem - a challenge which was released on Kaggle in 2012.
Data - contains source datafiles, as provided for the Kaggle competition.
Data/Outputs - destination for any outputs generated by our code
Code - contains all scripts.
This project has been developed in Python 2.7. Some elements require additional libraries/packages/hardware - as listed in requirements below.
This code runs a multiple regression to predict load values. It replicates Tao Hong's 'vanilla benchmark' model. http://repository.lib.ncsu.edu/ir/bitstream/1840.16/6457/1/etd.pdf
Requirements: Pandas, Numpy, SKLearn, matplotlib
main() can be run directly.
This code runs a neural network to predict load values.
Requirements: Pandas, Numpy, SKLearn, Keras (http://keras.io), Theano, compatible GPU hardware.
main() can be run directly.
This code runs a gradient boosting regression to predict load values.
Requirements: Pandas, Numpy, SKLearn, matplotlib.
main() can be run directly.
This code runs ARIMA modelling for energy loads.
Requirements: Pandas, Numpy, SKLearn, matplotlib, Pyper (with R installed)
main() can be run directly for value predictions. For data exploration uncomment dataExplorationAndPlotting(subts) in main.
This code runs ARIMA modelling for temperatures.
Requirements: Pandas, Numpy, SKLearn, matplotlib, Pyper (with R installed)
The script can be run directly.
Contains data preprocessing steps. This code includes helper functions, which are invoked by the modules below to provide data as required. There is no need to run this script directly, although the main() function will create a set of csv files containing processed input data, which can be useful for debugging or exploratory data analysis in other packages.
The function get_data(temp_estimate_source='historic') is the main function called by model scripts. It returns pre-processed training and test datasets. The parameter temp_estimate_source can be set as 'historic' to use temperature estimates calculated on historic mean values, 'arima' to load arima estimates [as generated by arimaTemp.py] or 'actuals' [data as released after the conclusion of the Kaggle competition].
This code contains a helper function to calculated Weighted Root Mean Square Error, which is the evaluation metric used for the Kaggle competition. It called from other modules and not run directly.
Parameters can be set to save prediction result files simultaneously with generating the WRMSE score.
This code contains helper function to process temperatures data for ARIMA modelling. Results are one .csv file per station that will be stored in data/outputs. It is called from arimaTemp.py.