GitHub - shivgarg/kaggle-webpage-visits: My approach towards kaggle competition for time series forecasting

Introduction

This repository contains code that was used to take part in Kaggle Web Traffic Series competition. For details about the competition and data, please visit the competition page. Various approaches were used to get good results . Some of the approaches tried are:-

Facebook Prophet : This is an free for use tool by Facebook used to forecast in time series. model_prophet.py file implements the code base to train the data and predict for future dates. For each page, a fb prophet object is learnt.
ElasticNet : An Elastic Net implementation training a common model for all webapges.
Multi-Layer Perceptron : A MLP implementation in keras. This approach wasn't experimented much with as it was very slow.
XGBoost Regression : This approach trained a XGboost regression model for all webpages. This mehtod gave the best results than all other methods .

Features that were used to get a forecast for a day were Wikipedia domain, access device, acess type, day, month, weekday, Days since Jan 1, mean of last 7 days, mean of last 30 days, Median 15, median 30, median 45, median 60 days .

Requirements

To use the code , you need to install certain python packages. Most of the packages are available on python pip. The python packages installed via pip are :-

scikit-learn
keras with tensorflow backend
numpy
pandas

The xgboost package was installed by following these instructions.

Running Code

python generate_csv.py <raw training data> <processed file name> <no. of threads>
python model_<model_name>.py <raw training data> <key file> <processed file> <output file>

The predictions will be given as output in the required format in the output file

Other Approaches

Some approaches which could not have been tried but can give good or even better results are :-

Use RNN's : RNN's are good for accumulating data which is in a serial fashion. LSTM's / GRU's can give good results. Many good submissions used LSTM's/GRU's
Combine pages by fetching wiki article and use clustering data and train a model for all webpages in a cluster. Different clustering criteria can be used like wiki page data, page domain and use ensembles of these models to arrive at final answer. A python script is in the repo to fetch the wiki article data.
The wiki page visits can show a strong correlation to web page searches and search trends. The search topics that have been rising for a few days can indicate a growth in wikipedia page views.

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
code		code
data		data
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Introduction

Requirements

Running Code

Other Approaches

About

Releases

Packages

Languages

License

shivgarg/kaggle-webpage-visits

Folders and files

Latest commit

History

Repository files navigation

Introduction

Requirements

Running Code

Other Approaches

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages