IRDM2016 - Information Retrieval and Data Mining 2016

UCL group project - Time Series Forecasting

Team Members:

Rupert Chaplin
Artemis Dampa
Megane Martinez

Kaggle Global Energy Forecasting Competition 2012 - Load Forecasting

This project explores a number of different techniques to tackle a hierarchical load forecasting problem - a challenge which was released on Kaggle in 2012.

Manual

File Structure

Data - contains source datafiles, as provided for the Kaggle competition.

Data/Outputs - destination for any outputs generated by our code

Code - contains all scripts.

Implementation

This project has been developed in Python 2.7. Some elements require additional libraries/packages/hardware - as listed in requirements below.

Code outline

Models

benchmark.py

This code runs a multiple regression to predict load values. It replicates Tao Hong's 'vanilla benchmark' model. http://repository.lib.ncsu.edu/ir/bitstream/1840.16/6457/1/etd.pdf

Requirements: Pandas, Numpy, SKLearn, matplotlib

main() can be run directly.

nn.py

This code runs a neural network to predict load values.

Requirements: Pandas, Numpy, SKLearn, Keras (http://keras.io), Theano, compatible GPU hardware.

main() can be run directly.

gradientboosting.py

This code runs a gradient boosting regression to predict load values.

Requirements: Pandas, Numpy, SKLearn, matplotlib.

main() can be run directly.

arima.py

This code runs ARIMA modelling for energy loads.

Requirements: Pandas, Numpy, SKLearn, matplotlib, Pyper (with R installed)

main() can be run directly for value predictions. For data exploration uncomment dataExplorationAndPlotting(subts) in main.

arimaTemp.py

This code runs ARIMA modelling for temperatures.

Requirements: Pandas, Numpy, SKLearn, matplotlib, Pyper (with R installed)

The script can be run directly.

Helper code

processandmergedata.py

Contains data preprocessing steps. This code includes helper functions, which are invoked by the modules below to provide data as required. There is no need to run this script directly, although the main() function will create a set of csv files containing processed input data, which can be useful for debugging or exploratory data analysis in other packages.

The function get_data(temp_estimate_source='historic') is the main function called by model scripts. It returns pre-processed training and test datasets. The parameter temp_estimate_source can be set as 'historic' to use temperature estimates calculated on historic mean values, 'arima' to load arima estimates [as generated by arimaTemp.py] or 'actuals' [data as released after the conclusion of the Kaggle competition].

wrmse.py

This code contains a helper function to calculated Weighted Root Mean Square Error, which is the evaluation metric used for the Kaggle competition. It called from other modules and not run directly.

Parameters can be set to save prediction result files simultaneously with generating the WRMSE score.

processTemp.py

This code contains helper function to process temperatures data for ARIMA modelling. Results are one .csv file per station that will be stored in data/outputs. It is called from arimaTemp.py.

Name		Name	Last commit message	Last commit date
Latest commit History 140 Commits
code		code
data		data
report		report
.gitignore		.gitignore
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

IRDM2016 - Information Retrieval and Data Mining 2016

UCL group project - Time Series Forecasting

Kaggle Global Energy Forecasting Competition 2012 - Load Forecasting

Manual

File Structure

Implementation

Code outline

Models

benchmark.py

nn.py

gradientboosting.py

arima.py

arimaTemp.py

Helper code

processandmergedata.py

wrmse.py

processTemp.py

About

Releases

Packages

Contributors 3

Languages

rupchap/IRDM2016

Folders and files

Latest commit

History

Repository files navigation

IRDM2016 - Information Retrieval and Data Mining 2016

UCL group project - Time Series Forecasting

Kaggle Global Energy Forecasting Competition 2012 - Load Forecasting

Manual

File Structure

Implementation

Code outline

Models

benchmark.py

nn.py

gradientboosting.py

arima.py

arimaTemp.py

Helper code

processandmergedata.py

wrmse.py

processTemp.py

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages