This is the source code for the final project of the course "How to Win a Data Science Competition: Learn from Top Kagglers" from Coursera.
My main features include:
- Extracting type and subtype codes from items/shop names.
- Agregations for target variable for shops and items alone
- Lag features for target variable, revenue, sales for shops and items, among others
- Mean encoding for item category and item id
- Train/test split is set up considering time, also I use a simple mix ensemble model with LGB and Linear Regression.
Submited on april 4th 2020. My public and private LB scores are: 0.975568 and 0.977608.