Skip to content

so24def/top27percent_61th_Kaggle_datathon_energy_distribution_prediction

Repository files navigation

Predicting the Energy Distribution using past years of data

Problem type: Time Series Regression

Includes solution of Gdz Elektrik Datathon 2023. I attended the competition solo and ranked 61th(top %27) out of 342 competitors and 234 teams.

Solution

  • Exploratory analysis and visualization of the time series
  • Testing, checking and visualizing the components of time series
  • Checking the correlation of lag values with ACF-PACF and also lag plots
  • Feature engineering; extracting calendar features, detecting and extracting most correlated lag features(based on a threshold), adding and fixing a few external data
  • Feature selection with Sequential Feature Selection, RFECV, LOFO (not included in this repo)
  • Model selection; CatBoostRegressor, LGBMRegressor, XGBRegressor. Continued with CatBoost-LGBM ensemble
  • Hyperparameter tuning with Optuna
  • Modelling and making predictions using both direct/recursive methods (I expected recursive method to improve my scores but it didn't help)

Bonus

  • I also added the function that I created to recursively predict with a chosen step size, instead of directly predicting the whole set. This method can also be called as walk forward method. More detail can be found in the regarding notebook.

External Data/Sources


https://www.kaggle.com/so24def

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published