# **Rohlik Sales Forecasting, Kaggle Competition**

Why This Matters
Accurate sales forecasts are crucial for planning process, supply chain processes, delivery logistics and inventory management. By optimizing forecasts, we can minimize waste and streamline operations, making our e-grocery services more sustainable and efficient.

Your Impact
Your participation in this challenge will directly contribute to Rohlik mission of sustainable and efficient e-grocery delivery. Your insights will help us enhance customer service and achieve a greener future.

Dataset Description
You are provided with historical sales data for selected Rohlik inventory and date. IDs, sales, total orders and price columns are altered to keep the real values confidential. Some features are not available in test as they are not known at the moment of making the prediction. The task is to forecast the sales column for a given id, constructed from unique_id and date (e. g. id 1226_2024-06-03 from unique_id 1226 and date 2024-06-03), for the test set.

Files
sales_train.csv - training set containing the historical sales data for given date and inventory with selected features described below
sales_test.csv - full testing set
inventory.csv - additional information about inventory like its product (same products across all warehouses share same product unique id and name, but have different unique id)
solution.csv - full submission file in the correct format
calendar.csv - calendar containing data about holidays or warehouse specific events, some columns are already in the train data but there are additional rows in this file for dates where some warehouses could be closed due to public holiday or Sunday (and therefore they are not in the train set)

# *Data*

## Columns
### *sales_train.csv and sales_test.csv*

- <span style="background-color: grey">unique_id</span>- unique id for inventory
- <span style="background-color: grey">date</span> - date
- <span style="background-color: grey">warehouse</span> - warehouse name
- <span style="background-color: grey">total_orders</span> - historical orders for selected Rohlik warehouse known also for test set as a part of this challenge
- <span style="background-color: grey">sales</span> - Target value, sales volume (either in pcs or kg) adjusted by availability. The sales with lower availability than 1 are already adjusted to full potential sales by Rohlik internal logic. There might be missing dates both in train and 
test for a given inventory due to various reasons. This column is missing in test.csv as it is target variable.
- <span style="background-color: grey">sell_price_main</span> - sell price
- <span style="background-color: grey">availability</span> - proportion of the day that the inventory was available to customers. The inventory doesn't need to be available at all times. A value of 1 means it was available for the entire day. This column is missing in test.csv as it is not known at the moment of making the prediction.
- <span style="background-color: grey">type_0_discount</span>, <span style="background-color: grey">type_1_discount</span>, … - Rohlik is running different types of promo sale actions, these show the percentage of the original price during promo ((original price - current_price) / original_price). Multiple discounts with different type can be run at the same time, but always the highest possible discount among these types is used for sales. Negative discount value should be interpreted as no discount.

### *inventory.csv*

- <span style="background-color: grey">unique_id</span> - inventory id for a single keeping unit
- <span style="background-color: grey">product_unique_id</span> - product id, inventory in each warehouse has the same product unique id (same products across all warehouses has the same product id, but different unique id)
- <span style="background-color: grey">name</span> - inventory id for a single keeping unit
- <span style="background-color: grey">L1_category_name</span>, <span style="background-color: grey">L2_category_name</span>, … - name of the internal category, the higher the number, the more granular information is present
- <span style="background-color: grey">warehouse</span> - warehouse name

### *calendar.csv*

- <span style="background-color: grey">warehouse</span> - warehouse name
- <span style="background-color: grey">date</span> - date
- <span style="background-color: grey">holiday_name</span> - name of public holiday if any
- <span style="background-color: grey">holiday</span> - 0/1 indicating the presence of holidays
- <span style="background-color: grey">shops_closed</span> - public holiday with most of the shops or large part of shops closed
- <span style="background-color: grey">winter_school_holidays</span> - winter school holidays
- <span style="background-color: grey">school_holidays</span> - school holidays

### *test_weights.csv*

- <span style="background-color: grey">unique_id</span> - inventory id for a single keeping unit
- <span style="background-color: grey">weight</span> - weight used for final metric computation

In [1]:
#Data Download
!kaggle datasets download -d rohlik-sales-forecasting-challenge-v2

Missing the required parameter `owner_slug` when calling `metadata_get`


In [None]:
#Unzip Data
!unzip Data/rohlik-sales-forecasting-challenge-v2.zip

## *Import Data*

In [2]:
!pip install polars --break-system-packages

Defaulting to user installation because normal site-packages is not writeable
Collecting polars
  Downloading polars-1.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (14 kB)
Downloading polars-1.21.0-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (31.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m31.6/31.6 MB[0m [31m3.4 MB/s[0m eta [36m0:00:00[0m00:01[0m00:01[0m
[?25hInstalling collected packages: polars
Successfully installed polars-1.21.0


In [3]:
#Import Data using Polars
import polars as pl
sales_train     = pl.read_csv(f"Data/sales_train.csv")
inventory       = pl.read_csv(f"Data/inventory.csv")
calendar        = pl.read_csv(f"Data/calendar.csv")
test_weights    = pl.read_csv(f"Data/test_weights.csv")
solution        = pl.read_csv(f"Data/solution.csv")

# Print first 5 rows for each DataFrame
print("Sales Train:\n", sales_train.head(5), "\n")
print("Inventory:\n", inventory.head(5), "\n")
print("Calendar:\n", calendar.head(5), "\n")
print("Test Weights:\n", test_weights.head(5), "\n")
print("Solution:\n", solution.head(5), "\n")

Sales Train:
 shape: (5, 14)
┌───────────┬───────────┬───────────┬───────────┬───┬───────────┬───────────┬───────────┬──────────┐
│ unique_id ┆ date      ┆ warehouse ┆ total_ord ┆ … ┆ type_3_di ┆ type_4_di ┆ type_5_di ┆ type_6_d │
│ ---       ┆ ---       ┆ ---       ┆ ers       ┆   ┆ scount    ┆ scount    ┆ scount    ┆ iscount  │
│ i64       ┆ str       ┆ str       ┆ ---       ┆   ┆ ---       ┆ ---       ┆ ---       ┆ ---      │
│           ┆           ┆           ┆ f64       ┆   ┆ f64       ┆ f64       ┆ f64       ┆ f64      │
╞═══════════╪═══════════╪═══════════╪═══════════╪═══╪═══════════╪═══════════╪═══════════╪══════════╡
│ 4845      ┆ 2024-03-1 ┆ Budapest_ ┆ 6436.0    ┆ … ┆ 0.0       ┆ 0.15312   ┆ 0.0       ┆ 0.0      │
│           ┆ 0         ┆ 1         ┆           ┆   ┆           ┆           ┆           ┆          │
│ 4845      ┆ 2021-05-2 ┆ Budapest_ ┆ 4663.0    ┆ … ┆ 0.0       ┆ 0.15025   ┆ 0.0       ┆ 0.0      │
│           ┆ 5         ┆ 1         ┆           ┆   ┆         

In [11]:
sales_train

unique_id,date,warehouse,total_orders,sales,sell_price_main,availability,type_0_discount,type_1_discount,type_2_discount,type_3_discount,type_4_discount,type_5_discount,type_6_discount
i64,str,str,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64,f64
4845,"""2024-03-10""","""Budapest_1""",6436.0,16.34,646.26,1.0,0.0,0.0,0.0,0.0,0.15312,0.0,0.0
4845,"""2021-05-25""","""Budapest_1""",4663.0,12.63,455.96,1.0,0.0,0.0,0.0,0.0,0.15025,0.0,0.0
4845,"""2021-12-20""","""Budapest_1""",6507.0,34.55,455.96,1.0,0.0,0.0,0.0,0.0,0.15025,0.0,0.0
4845,"""2023-04-29""","""Budapest_1""",5463.0,34.52,646.26,0.96,0.20024,0.0,0.0,0.0,0.15312,0.0,0.0
4845,"""2022-04-01""","""Budapest_1""",5997.0,35.92,486.41,1.0,0.0,0.0,0.0,0.0,0.15649,0.0,0.0
…,…,…,…,…,…,…,…,…,…,…,…,…,…
4941,"""2023-06-21""","""Prague_1""",9988.0,26.56,34.06,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4941,"""2023-06-24""","""Prague_1""",8518.0,27.42,34.06,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4941,"""2023-06-23""","""Prague_1""",10424.0,33.39,34.06,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
4941,"""2023-06-22""","""Prague_1""",10342.0,22.88,34.06,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
