https://www.kaggle.com/manjeetsingh/retaildataset
- entries are recorded on different days of the week
- some weeks have multiple entires
- between start date and end date, about 20% of all weeks are missing
- normalize dates: since we are dealing with weekly sales, convert all dates to the Monday of that week
- consolidate all records from a week into one dated the Monday of that week by adding their weekly sales together.
- for each missing week, create a record dated the Monday of that week; impute its weekly sales value as follows:
- if it's a non-holiday week, use the weekly sales value from the closeset available non-holiday week
- else (it's a holiday week), use the mean weekly sales from the same holiday if exist,otherwise use the closest avaiable holiday weekly sales.
- save metadata correction models in postgres database
- create table model_coeffs that has metadata correction model coefficients
- tables col_means and col_stds store the means and standard deviations of predictors so predicted scaled sales can be transformed back to original scale.