## Mercado Libre Tech Challenge Part 1: Exploratory Data Analysis of Lightning Deals

### Objetive

##### Analyize results from a lightning deals exercise.

**Guiding questions**
1. What is distribution of deal durations?
2. What is the distribution of sales at different time granularities: monthly, day of the month, day of the week and hour of the day?
3. What is the distribution of sales among product categories?

### Imports & Utils

In [53]:
import pandas as pd
import numpy as np
from IPython.display import display
import matplotlib.pyplot as plt
import seaborn as sns
from datetime import datetime, timedelta
import plotly.express as px
import plotly.graph_objects as go
from plotly.subplots import make_subplots


def get_dataset_general_features(df: pd.DataFrame) -> None:
    """Prints general info about the given dataset

    Args:
        df (pd.DataFrame): dataset to get general info
    """
    nrows, ncols = df.shape
    print(f"Dataset number of rows: {nrows}")
    print(f"Dataset number of columns: {ncols}\n")
    print("Dataset info:")
    print(f"{df.info()}\n")

    print("Dataset sample:")
    display(df.sample(10, random_state=152))
    

def sales_revenue_by_category(
    data: pd.DataFrame, 
    plotting_categories: list, 
    grouping_categories:list, 
    plot_data:list[dict], 
    plot_heigth:int, 
    plot_width:int, 
    category_selection:dict = None, 
    plot: bool = False
    ) -> None:
    """Generates summary and plots of sales/revenue per product category

    Args:
        data (pd.DataFrame): dataframe with sales data
        plotting_categories (list): categories to plot
        grouping_categories (list): categories for grouping
        plot_data (list[dict]): data to plot
        plot_heigth (int): plot height
        plot_width (int): plot width
        category_selection (dict, optional): categories to analyze. Defaults to None.
        plot (bool, optional): whether to display plots or not. Defaults to False.
    """
    
    if category_selection is not None:
        data = data.loc[data[category_selection["category_level"]] == category_selection["category_name"]]
    
    
    # Generate groups
    sales_category = data[
        plotting_categories + grouping_categories
        ].groupby(
            by=grouping_categories
            ).agg(
                {x: [
                    "sum", 
                    # "mean",
                    # "min", 
                    # "median", 
                    # "max"
                    ] for x in plotting_categories}
            ).reset_index()
    sales_category.columns = ["_".join(col) for col in sales_category.columns.values]
    sales_category.rename(columns={f"{x}_": x for x in grouping_categories}, inplace=True)    
    with pd.option_context("display.max_rows", 200):
        display(sales_category.sort_values(by=[f"{x}_sum" for x in plotting_categories], ascending=False))

    if plot:    
        # Plot
        fig = make_subplots(rows=1, cols=2, specs=[[{}, {}]], shared_xaxes=False, shared_yaxes=True, horizontal_spacing=0.01)
        for pdata in plot_data:
            x_col = pdata["x_plot_column"]
            y_col = pdata["y_plot_column"]
            fig.append_trace(go.Bar(
                x=sales_category[x_col],
                y=sales_category[y_col],
                marker=dict(line=dict(width=1)),
                name=pdata["title"],
                orientation='h',
            ), 1, pdata["layout_column"])
        fig.update_layout(width=plot_width, height=plot_heigth)
        fig.show()
    

### Load data

In [54]:
data_path = "../data"
data_file_name = "ofertas_relampago.csv"
data = pd.read_csv(f"{data_path}/{data_file_name}")

### Dataset formatting and clean up

##### Original dataset features

In [55]:
get_dataset_general_features(df=data)

Dataset number of rows: 48746
Dataset number of columns: 13

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48746 entries, 0 to 48745
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype  
---  ------                     --------------  -----  
 0   OFFER_START_DATE           48746 non-null  object 
 1   OFFER_START_DTTM           48746 non-null  object 
 2   OFFER_FINISH_DTTM          48746 non-null  object 
 3   OFFER_TYPE                 48746 non-null  object 
 4   INVOLVED_STOCK             48746 non-null  int64  
 5   REMAINING_STOCK_AFTER_END  48746 non-null  int64  
 6   SOLD_AMOUNT                24579 non-null  float64
 7   SOLD_QUANTITY              24579 non-null  float64
 8   ORIGIN                     11316 non-null  object 
 9   SHIPPING_PAYMENT_TYPE      48746 non-null  object 
 10  DOM_DOMAIN_AGG1            48746 non-null  object 
 11  VERTICAL                   48746 non-null  object 
 12  DOMAIN_ID                  

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
45248,2021-07-26,2021-07-26 13:00:00+00:00,2021-07-26 19:00:06+00:00,lightning_deal,15,15,,,,free_shipping,APPAREL ACCESORIES,APP & SPORTS,MLM-RINGS
40833,2021-06-15,2021-06-15 14:00:00+00:00,2021-06-15 16:08:22+00:00,lightning_deal,14,0,269.6,15.0,A,free_shipping,PHARMACEUTICS,BEAUTY & HEALTH,MLM-SURGICAL_AND_INDUSTRIAL_MASKS
17022,2021-07-24,2021-07-24 13:00:00+00:00,2021-07-24 19:00:04+00:00,lightning_deal,5,4,4.26,1.0,,none,PERSONAL CARE,BEAUTY & HEALTH,MLM-NAIL_POLISH
5980,2021-07-13,2021-07-13 19:00:00+00:00,2021-07-14 01:00:00+00:00,lightning_deal,15,15,,,,free_shipping,APPAREL,APP & SPORTS,MLM-SOCKS
6587,2021-06-16,2021-06-16 07:00:00+00:00,2021-06-16 13:00:03+00:00,lightning_deal,5,4,1.73,1.0,,none,PHARMACEUTICS,BEAUTY & HEALTH,MLM-ESSENTIAL_OILS
37310,2021-07-22,2021-07-22 19:00:00+00:00,2021-07-23 01:00:01+00:00,lightning_deal,5,5,,,,free_shipping,HOME&DECOR,HOME & INDUSTRY,MLM-SOAP_AND_DETERGENT_DISPENSERS
4618,2021-07-23,2021-07-23 07:00:00+00:00,2021-07-23 07:00:00+00:00,lightning_deal,6,6,,,,none,APPAREL ACCESORIES,APP & SPORTS,MLM-WALLETS
45186,2021-07-26,2021-07-26 19:00:00+00:00,2021-07-27 01:00:06+00:00,lightning_deal,5,4,9.91,1.0,,free_shipping,FOODS,CPG,MLM-MILK
31581,2021-07-25,2021-07-25 18:00:00+00:00,2021-07-26 02:00:00+00:00,lightning_deal,10,6,167.63,4.0,A,free_shipping,ELECTRONICS,CE,MLM-WATER_HEATERS
17837,2021-07-10,2021-07-10 13:00:00+00:00,2021-07-10 19:00:03+00:00,lightning_deal,5,5,7.47,1.0,,free_shipping,STATIONARY,HOME & INDUSTRY,MLM-ADHESIVE_TAPES


In [56]:
#Show product origin in dataset
display(data[["OFFER_TYPE", "ORIGIN"]].groupby(by="ORIGIN").count().reset_index().rename(columns={"OFFER_TYPE": "count"}))
        
#Show product shipping payment types in dataset
display(data[["OFFER_TYPE", "SHIPPING_PAYMENT_TYPE"] ].groupby(by="SHIPPING_PAYMENT_TYPE").count().reset_index().rename(columns={"OFFER_TYPE": "count"}))

Unnamed: 0,ORIGIN,count
0,A,11316


Unnamed: 0,SHIPPING_PAYMENT_TYPE,count
0,free_shipping,26658
1,none,22088


##### General observations on column characteristics

The dataset contains 48746 rows and 13 columns

Columns are:
1. `OFFER_START_DATE`: lightning deal start date
2. `OFFER_START_DTTM`: lightning deal start datetime
3. `OFFER_FINISH_DTTM`: lightning deal end datetime  
4. `OFFER_TYPE`: tag describing the deal (lightning_deal)
5. `INVOLVED_STOCK`: number of stock units accesible to the lightning deal
6. `REMAINING_STOCK_AFTER_END`: number of stock units remaining after the deal ended
7. `SOLD_AMOUNT`: revenue
8. `SOLD_QUANTITY`: number of units sold
9. `ORIGIN`: seller (?)
10. `SHIPPING_PAYMENT_TYPE`: shipping payment type
11. `VERTICAL`: product categories level 1
12. `DOM_DOMAIN_AGG1`: product categories level 2
13. `DOMAIN_ID`: product categories level 3

Columns with NaNs:
- `SOLD_AMOUNT`: 24167 rows
- `SOLD_QUANTITY`: 24167 rows
- `ORIGIN`: 37430 rows

The column ORIGIN has only one value: `A` (11316 rows)

The `SHIPPING_PAYMENT_TYPE` column has has only one value: `free_shipping` (26658 rows, while 22088 are `none`). To normalize the treatment of nulls in the dataset, `none` strings will be set to `None` (see **Formatting** section below).

##### Questions for the client
Does `SHIPPING_PAYMENT_TYPE` being `none` means that shipping costs apply? If that is the case, then those products should show a tendency to have less sales compared to products with `free_shipping`.

In [57]:
# # Set some columns lists to facilitate data handling
# inventory_cols = ["INVOLVED_STOCK", "REMAINING_STOCK_AFTER_END"]
# sales_cols = ["SOLD_AMOUNT", "SOLD_QUANTITY"]
# category_cols = ["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"]

In [58]:
# # [Optional] Show product categories in dataset. Warning: if show_categories is True over 1200 rows are shown.
# show_categories = False
# if show_categories:
#     with pd.option_context("display.max_rows", 1300):
#         display(data[["OFFER_TYPE"] + category_cols].groupby(by=category_cols).count().reset_index().rename(columns={"OFFER_TYPE": "count"}))

#### Formatting
Several columns have dtype `object` which is not optimal for efficient pandas dataframe manipulations (e.g. searches, filtering, grouping). For this reason dtypes will be modified.

In addition, columns with null entries will be normalized to use the same representation.

In [59]:
# Changing dtypes
data = data.astype(
    {
        'OFFER_START_DATE': np.datetime64, 
        'OFFER_START_DTTM': np.datetime64, 
        'OFFER_FINISH_DTTM': np.datetime64,
        'INVOLVED_STOCK': np.int64, 
        'REMAINING_STOCK_AFTER_END': np.int64,
        'SOLD_AMOUNT': np.float64, 
        'SOLD_QUANTITY': np.float64, 
    }
)

# Nomalizing nulls treatment
data.loc[data["ORIGIN"].isnull(), "ORIGIN"] = None
data.loc[data["SHIPPING_PAYMENT_TYPE"] == "none", "SHIPPING_PAYMENT_TYPE"] = None

# Check formatting output and example
print("Dataset info:")
print(f"{data.info()}\n")

print("Dataset sample:")
display(data.sample(10, random_state=152))

Dataset info:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 48746 entries, 0 to 48745
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   OFFER_START_DATE           48746 non-null  datetime64[ns]
 1   OFFER_START_DTTM           48746 non-null  datetime64[ns]
 2   OFFER_FINISH_DTTM          48746 non-null  datetime64[ns]
 3   OFFER_TYPE                 48746 non-null  object        
 4   INVOLVED_STOCK             48746 non-null  int64         
 5   REMAINING_STOCK_AFTER_END  48746 non-null  int64         
 6   SOLD_AMOUNT                24579 non-null  float64       
 7   SOLD_QUANTITY              24579 non-null  float64       
 8   ORIGIN                     11316 non-null  object        
 9   SHIPPING_PAYMENT_TYPE      26658 non-null  object        
 10  DOM_DOMAIN_AGG1            48746 non-null  object        
 11  VERTICAL                   48746 non-null  object    

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
45248,2021-07-26,2021-07-26 13:00:00,2021-07-26 19:00:06,lightning_deal,15,15,,,,free_shipping,APPAREL ACCESORIES,APP & SPORTS,MLM-RINGS
40833,2021-06-15,2021-06-15 14:00:00,2021-06-15 16:08:22,lightning_deal,14,0,269.6,15.0,A,free_shipping,PHARMACEUTICS,BEAUTY & HEALTH,MLM-SURGICAL_AND_INDUSTRIAL_MASKS
17022,2021-07-24,2021-07-24 13:00:00,2021-07-24 19:00:04,lightning_deal,5,4,4.26,1.0,,,PERSONAL CARE,BEAUTY & HEALTH,MLM-NAIL_POLISH
5980,2021-07-13,2021-07-13 19:00:00,2021-07-14 01:00:00,lightning_deal,15,15,,,,free_shipping,APPAREL,APP & SPORTS,MLM-SOCKS
6587,2021-06-16,2021-06-16 07:00:00,2021-06-16 13:00:03,lightning_deal,5,4,1.73,1.0,,,PHARMACEUTICS,BEAUTY & HEALTH,MLM-ESSENTIAL_OILS
37310,2021-07-22,2021-07-22 19:00:00,2021-07-23 01:00:01,lightning_deal,5,5,,,,free_shipping,HOME&DECOR,HOME & INDUSTRY,MLM-SOAP_AND_DETERGENT_DISPENSERS
4618,2021-07-23,2021-07-23 07:00:00,2021-07-23 07:00:00,lightning_deal,6,6,,,,,APPAREL ACCESORIES,APP & SPORTS,MLM-WALLETS
45186,2021-07-26,2021-07-26 19:00:00,2021-07-27 01:00:06,lightning_deal,5,4,9.91,1.0,,free_shipping,FOODS,CPG,MLM-MILK
31581,2021-07-25,2021-07-25 18:00:00,2021-07-26 02:00:00,lightning_deal,10,6,167.63,4.0,A,free_shipping,ELECTRONICS,CE,MLM-WATER_HEATERS
17837,2021-07-10,2021-07-10 13:00:00,2021-07-10 19:00:03,lightning_deal,5,5,7.47,1.0,,free_shipping,STATIONARY,HOME & INDUSTRY,MLM-ADHESIVE_TAPES


#### Clean up

**Tasks**

*Clean sales inconsistencies*

- Columns `SOLD_AMOUNT` and `SOLD_QUANTITY` have NaNs possibly indicating that no sales occurred in those cases. This can be confirmed by checking if in those rows `INVOLVED_STOCK` == `REMAINING_STOCK_AFTER_END`. If that is the case, then `SOLD_AMOUNT` and `SOLD_QUANTITY` can be set to zero.
- `SOLD_QUANTITY` should coincide with the difference between `INVOLVED_STOCK` and `REMAINING_STOCK_AFTER_END`.
- If `REMAINING_STOCK_AFTER_END` is lower than zero, it means more units than those allowed for lightning deals were sold. 

*Clean duplicates*
    
**Questions for the client**

1. What's the business procedure when more units than those allowed for the lightning deal are sold? Are the extra units cancelled (i.e. `SOLD_QUANTITY` should be made equal to `INVOLVED_STOCK`)? Are the extra sales allowed (i.e. `INVOLVED_STOCK` should be made equal to `SOLD_QUANTITY`)? 
2. Column `ORIGIN` has NaNs and it only consists of the value `A`, thus, at this stage is not much useful. What's the reason for the lack of data here? Is it reasonable to only have one value in that column? what's the importance of this feature?

##### Cleaning sales inconsistencies

**Inconsistency 1. Non null sales reported while initial and final stock are equal**

In [60]:
# Check inconsistent cases
data.loc[data["INVOLVED_STOCK"] == data["REMAINING_STOCK_AFTER_END"]][["SOLD_AMOUNT", "SOLD_QUANTITY"]].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 23792 entries, 1 to 48745
Data columns (total 2 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   SOLD_AMOUNT    213 non-null    float64
 1   SOLD_QUANTITY  213 non-null    float64
dtypes: float64(2)
memory usage: 557.6 KB


In [61]:
# Display inconsistent cases
data.loc[
    (data["INVOLVED_STOCK"] == data["REMAINING_STOCK_AFTER_END"]) 
    & ~(data["SOLD_AMOUNT"].isnull())
    & ~(data["SOLD_QUANTITY"].isnull())
    ]

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
394,2021-06-22,2021-06-22 18:00:00,2021-06-23 01:00:01,lightning_deal,40,40,28.46,3.0,A,free_shipping,PERSONAL CARE,BEAUTY & HEALTH,MLM-FACIAL_SKIN_CARE_PRODUCTS
544,2021-06-22,2021-06-22 14:00:00,2021-06-22 22:00:00,lightning_deal,30,30,4.22,1.0,A,,TOOLS AND CONSTRUCTION,HOME & INDUSTRY,MLM-TOOL_AND_CONSTRUCTION_SUPPLIES
1033,2021-07-08,2021-07-08 07:00:00,2021-07-08 13:00:05,lightning_deal,15,15,5.40,1.0,,,ELECTRONICS,CE,MLM-GAME_CONSOLES
1733,2021-07-08,2021-07-08 07:00:00,2021-07-08 13:00:01,lightning_deal,5,5,5.23,1.0,,free_shipping,ELECTRONICS,CE,MLM-GAME_CONSOLES_VIDEO_GAMES_AND_ARCADE_MACHINES
1860,2021-07-08,2021-07-08 19:00:00,2021-07-09 01:00:01,lightning_deal,15,15,6.03,1.0,,,APPAREL,APP & SPORTS,MLM-PAJAMAS
...,...,...,...,...,...,...,...,...,...,...,...,...,...
47137,2021-07-06,2021-07-06 13:00:00,2021-07-06 19:00:04,lightning_deal,10,10,4.55,1.0,,free_shipping,SPORTS,APP & SPORTS,MLM-FOOTBALL_SHIRTS
47167,2021-07-06,2021-07-06 19:00:00,2021-07-07 01:00:05,lightning_deal,15,15,1.33,1.0,,,FOODS,CPG,MLM-CHOCOLATES
47189,2021-07-06,2021-07-06 07:00:00,2021-07-06 13:00:01,lightning_deal,15,15,2.53,1.0,,,APPAREL ACCESORIES,APP & SPORTS,MLM-NECKLACES
48570,2021-06-19,2021-06-19 19:00:00,2021-06-20 01:00:05,lightning_deal,5,5,5.21,4.0,,,SPORTS,APP & SPORTS,MLM-KINESIOLOGY_TAPES


**Observations**

213 out of 23792 rows where `INVOLVED_STOCK` == `REMAINING_STOCK_AFTER_END` **DO NOT** have NaNs in `SOLD_AMOUNT` and `SOLD_QUANTITY`. This appears to be:
1. An error in the data in terms of not updating `REMAINING_STOCK_AFTER_END` with actual sales.
2. An error in the reported values in `SOLD_AMOUNT` and `SOLD_QUANTITY`. 

Considering this represents ~0.4% of the data, these rows will NOT be considered in the current analysis, until confirmation of from the client on the source of this inconsistency.

**Questions for the client**

Which of the two error options mentioned above would be the reason for the sales inconsistncy described above?

In [62]:
# Drop inconsistent cases.
data.drop(index=data[(data["INVOLVED_STOCK"] == data["REMAINING_STOCK_AFTER_END"]) & ~(data["SOLD_AMOUNT"].isnull()) & ~(data["SOLD_QUANTITY"].isnull())].index, inplace=True)

# Set sales to zero when no change in stock is observed
data.loc[data["INVOLVED_STOCK"] == data["REMAINING_STOCK_AFTER_END"], ("SOLD_AMOUNT", "SOLD_QUANTITY")] = (0, 0)


**Inconsistency 2. More sales than allowed in the lightning deal**

In [63]:
# Check inconsistent cases: negative REMAINING_STOCK_AFTER_END
data.loc[data["REMAINING_STOCK_AFTER_END"] < 0].info()

<class 'pandas.core.frame.DataFrame'>
Int64Index: 1273 entries, 0 to 48719
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   OFFER_START_DATE           1273 non-null   datetime64[ns]
 1   OFFER_START_DTTM           1273 non-null   datetime64[ns]
 2   OFFER_FINISH_DTTM          1273 non-null   datetime64[ns]
 3   OFFER_TYPE                 1273 non-null   object        
 4   INVOLVED_STOCK             1273 non-null   int64         
 5   REMAINING_STOCK_AFTER_END  1273 non-null   int64         
 6   SOLD_AMOUNT                1241 non-null   float64       
 7   SOLD_QUANTITY              1241 non-null   float64       
 8   ORIGIN                     167 non-null    object        
 9   SHIPPING_PAYMENT_TYPE      184 non-null    object        
 10  DOM_DOMAIN_AGG1            1273 non-null   object        
 11  VERTICAL                   1273 non-null   object        
 12  DOMAI

In [64]:
# Display inconsistent cases
data.loc[(data["REMAINING_STOCK_AFTER_END"] < 0) & (data["SOLD_AMOUNT"].isnull()) & (data["SOLD_QUANTITY"].isnull())]

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
8,2021-06-22,2021-06-22 19:00:00,2021-06-22 23:05:32,lightning_deal,10,-1,,,,free_shipping,COMPUTERS,CE,MLM-HEADPHONES
3399,2021-06-26,2021-06-26 13:00:00,2021-06-26 15:23:17,lightning_deal,5,-1,,,,,HOME&DECOR,HOME & INDUSTRY,MLM-DRINKING_GLASSES
4210,2021-07-23,2021-07-23 07:00:00,2021-07-23 10:24:24,lightning_deal,15,-2,,,,,COMPUTERS,CE,MLM-SPEAKERS
4219,2021-07-23,2021-07-23 07:00:00,2021-07-23 10:24:24,lightning_deal,15,-2,,,,,COMPUTERS,CE,MLM-SPEAKERS
5967,2021-07-13,2021-07-13 19:00:00,2021-07-13 23:57:00,lightning_deal,5,-3,,,,,ELECTRONICS,CE,MLM-GAMEPADS_AND_JOYSTICKS
6652,2021-06-16,2021-06-16 07:00:00,2021-06-16 10:19:57,lightning_deal,15,-1,,,,,COMPUTERS,CE,MLM-SPEAKERS
6653,2021-06-16,2021-06-16 07:00:00,2021-06-16 10:19:56,lightning_deal,15,-1,,,,,COMPUTERS,CE,MLM-SPEAKERS
6656,2021-06-16,2021-06-16 07:00:00,2021-06-16 10:19:56,lightning_deal,15,-1,,,,,COMPUTERS,CE,MLM-SPEAKERS
6657,2021-06-16,2021-06-16 07:00:00,2021-06-16 10:19:57,lightning_deal,15,-1,,,,,COMPUTERS,CE,MLM-SPEAKERS
6658,2021-06-16,2021-06-16 07:00:00,2021-06-16 10:19:57,lightning_deal,15,-1,,,,,COMPUTERS,CE,MLM-SPEAKERS


**Observations**

Two types of rows with `REMAINING_STOCK_AFTER_END` < 0 (~2.6% of total rows):
- Consistent: `SOLD_QUANTITY` +  `REMAINING_STOCK_AFTER_END` = `INVOLVED_STOCK`
- Inconsistent: `SOLD_QUANTITY` and `SOLD_AMOUNT` are NaN (~0.07% of total rows)

The inconsistent cases seem to be: 
1. An error in the values of `REMAINING_STOCK_AFTER_END` (no actual sales occurred).
2. `SOLD_QUANTITY` and `SOLD_AMOUNT` not being updated.

Considering that rows with negative `REMAINING_STOCK_AFTER_END` reprensent ~2.6% of total rows in the dataframe, they will be dropped, until confirmation from the client on the reason for these data issue.

**Questions for the client**

Which of the two error sources mentioned above would explain the observed sales inconsistency?

In [65]:
# Drop inconsistent cases
data.drop(index=data.loc[(data["REMAINING_STOCK_AFTER_END"] < 0)].index, inplace=True)

**Inconsistency 3. Remaining null sales after clean up**

In [66]:
# Check inconsistency cases: remaining null sales
get_dataset_general_features(df=data)

Dataset number of rows: 47260
Dataset number of columns: 13

Dataset info:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 47260 entries, 1 to 48745
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   OFFER_START_DATE           47260 non-null  datetime64[ns]
 1   OFFER_START_DTTM           47260 non-null  datetime64[ns]
 2   OFFER_FINISH_DTTM          47260 non-null  datetime64[ns]
 3   OFFER_TYPE                 47260 non-null  object        
 4   INVOLVED_STOCK             47260 non-null  int64         
 5   REMAINING_STOCK_AFTER_END  47260 non-null  int64         
 6   SOLD_AMOUNT                46704 non-null  float64       
 7   SOLD_QUANTITY              46704 non-null  float64       
 8   ORIGIN                     11121 non-null  object        
 9   SHIPPING_PAYMENT_TYPE      26369 non-null  object        
 10  DOM_DOMAIN_AGG1            47260 non-null  object      

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
18929,2021-07-04,2021-07-04 07:00:00,2021-07-04 13:00:03,lightning_deal,15,15,0.0,0.0,,,APPAREL ACCESORIES,APP & SPORTS,MLM-FANNY_PACKS
14893,2021-07-09,2021-07-09 13:00:00,2021-07-09 19:00:05,lightning_deal,15,2,41.03,13.0,,free_shipping,SPORTS,APP & SPORTS,MLM-RESISTANCE_BANDS
30898,2021-07-11,2021-07-11 13:00:00,2021-07-11 19:00:02,lightning_deal,15,15,0.0,0.0,,free_shipping,HOME&DECOR,HOME & INDUSTRY,MLM-OUTDOOR_TABLES
47261,2021-07-06,2021-07-06 13:00:00,2021-07-06 19:00:00,lightning_deal,15,12,5.35,3.0,,,STATIONARY,HOME & INDUSTRY,MLM-MARKERS_AND_HIGHLIGHTERS
15916,2021-06-23,2021-06-23 19:00:00,2021-06-24 01:00:00,lightning_deal,5,5,0.0,0.0,,,"BOOKS, MULTIMEDIA & OTHER E!",ENTERTAINMENT,MLM-MUSIC_ALBUMS
3663,2021-06-26,2021-06-26 07:00:00,2021-06-26 13:00:00,lightning_deal,15,15,0.0,0.0,,,COMPUTERS,CE,MLM-DATA_CABLES_AND_ADAPTERS
26318,2021-06-17,2021-06-17 13:00:00,2021-06-17 13:00:04,lightning_deal,15,15,0.0,0.0,,free_shipping,FOOTWEAR,APP & SPORTS,MLM-BOOTS_AND_BOOTIES
44099,2021-07-12,2021-07-12 07:00:00,2021-07-12 13:00:02,lightning_deal,10,10,0.0,0.0,,free_shipping,COMPUTERS,CE,MLM-MICROPHONES
13626,2021-07-19,2021-07-19 13:00:00,2021-07-19 13:00:00,lightning_deal,10,10,0.0,0.0,,free_shipping,AUTOPARTS,ACC,MLM-VEHICLE_ACCESSORIES
37828,2021-07-22,2021-07-22 13:00:00,2021-07-22 19:00:01,lightning_deal,15,9,6.38,6.0,,,TOYS AND GAMES,T & B,MLM-ACTION_FIGURES


In [67]:
# Display inconsistent cases
data.loc[data["SOLD_AMOUNT"].isnull()]

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
299,2021-06-22,2021-06-22 13:00:00,2021-06-22 19:00:01,lightning_deal,15,7,,,,free_shipping,ELECTRONICS,CE,MLM-FANS
522,2021-06-22,2021-06-22 13:00:00,2021-06-22 21:00:00,lightning_deal,100,96,,,A,free_shipping,MOBILE,CE,MLM-TABLETS
623,2021-06-22,2021-06-22 07:00:00,2021-06-22 13:00:04,lightning_deal,10,5,,,,,COMPUTERS,CE,MLM-MICROPHONES
641,2021-06-22,2021-06-22 19:00:00,2021-06-22 21:22:13,lightning_deal,15,0,,,,free_shipping,ELECTRONICS,CE,MLM-MEMORY_CARDS
668,2021-06-22,2021-06-22 13:00:00,2021-06-22 19:00:02,lightning_deal,5,3,,,,free_shipping,INDUSTRY,HOME & INDUSTRY,MLM-OFFICE_CHAIRS
...,...,...,...,...,...,...,...,...,...,...,...,...,...
48498,2021-06-19,2021-06-19 13:00:00,2021-06-19 21:00:00,lightning_deal,100,91,,,A,free_shipping,MOBILE,CE,MLM-TABLETS
48586,2021-06-19,2021-06-19 14:00:00,2021-06-19 22:00:00,lightning_deal,30,29,,,A,free_shipping,MOBILE,CE,MLM-CELLPHONES
48610,2021-06-19,2021-06-19 15:00:00,2021-06-19 23:00:00,lightning_deal,25,22,,,A,free_shipping,TOOLS AND CONSTRUCTION,HOME & INDUSTRY,MLM-POWER_GRINDERS
48660,2021-06-19,2021-06-19 15:00:00,2021-06-19 23:00:01,lightning_deal,10,9,,,A,free_shipping,ELECTRONICS,CE,MLM-HAIR_CLIPPERS


**Observations**

556 rows contain NaNs in `SOLD_AMOUNT` and `SOLD_QUANTITY` while `INVOLVED_STOCK` is different than `REMAINING_STOCK_AFTER_END`, thus: 
1. There is an error in the `REMAINING_STOCK_AFTER_END` 
2. `SOLD_AMOUNT` and `SOLD_QUANTITY` were not updated. 

This issue occurs in ~1.1% of total rows and will be dropped untils discussing this issue with the client.

**Questions for the client**

Which of the two error sources mentioned above would explain the observed sales inconsistency?

In [68]:
# Drop inconsistent cases
data.drop(index=data[(data["INVOLVED_STOCK"] > data["REMAINING_STOCK_AFTER_END"]) & (data["SOLD_AMOUNT"].isnull()) & (data["SOLD_QUANTITY"].isnull())].index, inplace=True)

##### Cleaning duplicates


In [69]:
print(f"Number of duplicate rows in dataset: {data.shape[0] - data.drop_duplicates().shape[0]}")
print("Removing duplicates ...")
print(f"Dataset number of rows before cleaning duplicates: {data.shape[0]}")
data.drop_duplicates(inplace=True)
print(f"Dataset number of rows after cleaning duplicates: {data.shape[0]}")


Number of duplicate rows in dataset: 871
Removing duplicates ...
Dataset number of rows before cleaning duplicates: 46704
Dataset number of rows after cleaning duplicates: 45833


#### Dataset after formatting and clean up.

In [70]:
get_dataset_general_features(df=data)

Dataset number of rows: 45833
Dataset number of columns: 13

Dataset info:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 45833 entries, 1 to 48745
Data columns (total 13 columns):
 #   Column                     Non-Null Count  Dtype         
---  ------                     --------------  -----         
 0   OFFER_START_DATE           45833 non-null  datetime64[ns]
 1   OFFER_START_DTTM           45833 non-null  datetime64[ns]
 2   OFFER_FINISH_DTTM          45833 non-null  datetime64[ns]
 3   OFFER_TYPE                 45833 non-null  object        
 4   INVOLVED_STOCK             45833 non-null  int64         
 5   REMAINING_STOCK_AFTER_END  45833 non-null  int64         
 6   SOLD_AMOUNT                45833 non-null  float64       
 7   SOLD_QUANTITY              45833 non-null  float64       
 8   ORIGIN                     10427 non-null  object        
 9   SHIPPING_PAYMENT_TYPE      25264 non-null  object        
 10  DOM_DOMAIN_AGG1            45833 non-null  object      

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,DOM_DOMAIN_AGG1,VERTICAL,DOMAIN_ID
26041,2021-07-18,2021-07-18 13:00:00,2021-07-18 19:00:35,lightning_deal,15,15,0.0,0.0,,free_shipping,APPAREL,APP & SPORTS,MLM-PAJAMAS
24239,2021-07-31,2021-07-31 19:00:00,2021-08-01 01:00:00,lightning_deal,5,5,0.0,0.0,,,TOOLS AND CONSTRUCTION,HOME & INDUSTRY,MLM-DRILL_BITS
32476,2021-07-15,2021-07-15 13:00:00,2021-07-15 19:00:00,lightning_deal,5,5,0.0,0.0,,,APPAREL ACCESORIES,APP & SPORTS,MLM-RINGS
27860,2021-06-24,2021-06-24 19:00:00,2021-06-25 01:00:01,lightning_deal,15,15,0.0,0.0,,,MOBILE,CE,MLM-MOBILE_DEVICE_CHARGERS
487,2021-06-22,2021-06-22 12:00:00,2021-06-22 20:00:01,lightning_deal,450,411,173.53,39.0,A,,PHARMACEUTICS,BEAUTY & HEALTH,MLM-SURGICAL_AND_INDUSTRIAL_MASKS
7919,2021-07-07,2021-07-07 07:00:00,2021-07-07 13:00:06,lightning_deal,5,5,0.0,0.0,,free_shipping,APPAREL ACCESORIES,APP & SPORTS,MLM-WRISTWATCHES
40407,2021-06-01,2021-06-01 13:00:00,2021-06-01 19:00:07,lightning_deal,15,14,5.15,1.0,,free_shipping,TOOLS AND CONSTRUCTION,HOME & INDUSTRY,MLM-TOOL_NOZZLES
27340,2021-06-24,2021-06-24 17:00:00,2021-06-25 01:00:02,lightning_deal,20,17,50.42,3.0,A,free_shipping,ELECTRONICS,CE,MLM-SMART_SPEAKERS
16046,2021-06-23,2021-06-23 19:00:00,2021-06-23 23:11:06,lightning_deal,5,0,30.65,5.0,,free_shipping,APPAREL ACCESORIES,APP & SPORTS,MLM-HANDBAGS
18844,2021-07-04,2021-07-04 13:00:00,2021-07-04 16:52:23,lightning_deal,5,0,18.03,5.0,,free_shipping,PERSONAL CARE,BEAUTY & HEALTH,MLM-FALSE_EYELASHES


### Dataset Analysis

#### 1. Deal durations

In [71]:
# Add time representions of different granularity

data["OFFER_DURATION_MINUTES"] = (data["OFFER_FINISH_DTTM"] - data["OFFER_START_DTTM"]).astype('timedelta64[m]')
data["OFFER_DURATION_HOURS"] = (data["OFFER_FINISH_DTTM"] - data["OFFER_START_DTTM"]).astype('timedelta64[h]')
data["OFFER_START_WEEKDAY"] = data["OFFER_START_DTTM"].dt.weekday
data["OFFER_START_MONTH"] = data["OFFER_START_DTTM"].dt.month
data["OFFER_START_DAY"] = data["OFFER_START_DTTM"].dt.day
data["OFFER_START_DAYNAME"] = data["OFFER_START_DTTM"].dt.day_name()
data["OFFER_START_HOUR"] = data["OFFER_START_DTTM"].dt.hour
data["OFFER_START_YEAR"] = data["OFFER_START_DTTM"].dt.year

In [72]:
# Display sample
display(data.sample(10, random_state=152))

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,...,VERTICAL,DOMAIN_ID,OFFER_DURATION_MINUTES,OFFER_DURATION_HOURS,OFFER_START_WEEKDAY,OFFER_START_MONTH,OFFER_START_DAY,OFFER_START_DAYNAME,OFFER_START_HOUR,OFFER_START_YEAR
26041,2021-07-18,2021-07-18 13:00:00,2021-07-18 19:00:35,lightning_deal,15,15,0.0,0.0,,free_shipping,...,APP & SPORTS,MLM-PAJAMAS,360.0,6.0,6,7,18,Sunday,13,2021
24239,2021-07-31,2021-07-31 19:00:00,2021-08-01 01:00:00,lightning_deal,5,5,0.0,0.0,,,...,HOME & INDUSTRY,MLM-DRILL_BITS,360.0,6.0,5,7,31,Saturday,19,2021
32476,2021-07-15,2021-07-15 13:00:00,2021-07-15 19:00:00,lightning_deal,5,5,0.0,0.0,,,...,APP & SPORTS,MLM-RINGS,360.0,6.0,3,7,15,Thursday,13,2021
27860,2021-06-24,2021-06-24 19:00:00,2021-06-25 01:00:01,lightning_deal,15,15,0.0,0.0,,,...,CE,MLM-MOBILE_DEVICE_CHARGERS,360.0,6.0,3,6,24,Thursday,19,2021
487,2021-06-22,2021-06-22 12:00:00,2021-06-22 20:00:01,lightning_deal,450,411,173.53,39.0,A,,...,BEAUTY & HEALTH,MLM-SURGICAL_AND_INDUSTRIAL_MASKS,480.0,8.0,1,6,22,Tuesday,12,2021
7919,2021-07-07,2021-07-07 07:00:00,2021-07-07 13:00:06,lightning_deal,5,5,0.0,0.0,,free_shipping,...,APP & SPORTS,MLM-WRISTWATCHES,360.0,6.0,2,7,7,Wednesday,7,2021
40407,2021-06-01,2021-06-01 13:00:00,2021-06-01 19:00:07,lightning_deal,15,14,5.15,1.0,,free_shipping,...,HOME & INDUSTRY,MLM-TOOL_NOZZLES,360.0,6.0,1,6,1,Tuesday,13,2021
27340,2021-06-24,2021-06-24 17:00:00,2021-06-25 01:00:02,lightning_deal,20,17,50.42,3.0,A,free_shipping,...,CE,MLM-SMART_SPEAKERS,480.0,8.0,3,6,24,Thursday,17,2021
16046,2021-06-23,2021-06-23 19:00:00,2021-06-23 23:11:06,lightning_deal,5,0,30.65,5.0,,free_shipping,...,APP & SPORTS,MLM-HANDBAGS,251.0,4.0,2,6,23,Wednesday,19,2021
18844,2021-07-04,2021-07-04 13:00:00,2021-07-04 16:52:23,lightning_deal,5,0,18.03,5.0,,free_shipping,...,BEAUTY & HEALTH,MLM-FALSE_EYELASHES,232.0,3.0,6,7,4,Sunday,13,2021


In [73]:
# Get years, months and weekdays of available lightning deals
print(f"Deal start years: {np.sort(data['OFFER_START_YEAR'].unique())}")
print(f"Deal start months: {np.sort(data['OFFER_START_MONTH'].unique())}")
print(f"Deal start weekdays: {np.sort(data['OFFER_START_WEEKDAY'].unique())}")
print(f"Deal start month days: {np.sort(data['OFFER_START_DAY'].unique())}")
print(f"Deal start days: {data['OFFER_START_DAYNAME'].unique()}")
print(f"Deal start hour: {np.sort(data['OFFER_START_HOUR'].unique())}")

Deal start years: [2021]
Deal start months: [6 7]
Deal start weekdays: [0 1 2 3 4 5 6]
Deal start month days: [ 1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24
 25 26 27 28 29 30 31]
Deal start days: ['Tuesday' 'Thursday' 'Wednesday' 'Saturday' 'Friday' 'Sunday' 'Monday']
Deal start hour: [ 0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23]


**Observation**

Deals took place in June and July of 2021, and starting datetimes span all days of the month and every hour of the day.

In [74]:
# Get distribution of deal durations
print("Duration distribution")
display(
    data[["OFFER_DURATION_HOURS", "OFFER_DURATION_MINUTES"]]
    .groupby(by="OFFER_DURATION_HOURS")
    .agg(
        {
            "OFFER_DURATION_MINUTES": ["count", "min", "mean", "max"]
        }
    )
    .reset_index()
)

Duration distribution


Unnamed: 0_level_0,OFFER_DURATION_HOURS,OFFER_DURATION_MINUTES,OFFER_DURATION_MINUTES,OFFER_DURATION_MINUTES,OFFER_DURATION_MINUTES
Unnamed: 0_level_1,Unnamed: 1_level_1,count,min,mean,max
0,0.0,3885,0.0,1.28417,59.0
1,1.0,299,60.0,90.354515,119.0
2,2.0,448,120.0,148.912946,179.0
3,3.0,507,180.0,210.609467,239.0
4,4.0,681,240.0,269.734214,299.0
5,5.0,1130,300.0,320.744248,359.0
6,6.0,29682,360.0,360.599252,419.0
7,7.0,1914,420.0,420.76907,477.0
8,8.0,7119,480.0,480.394718,536.0
9,9.0,3,541.0,543.0,546.0


In [75]:
fig = px.histogram(data, x="OFFER_DURATION_MINUTES", hover_data=data.columns, width=1200, height=700, barmode="overlay", range_x=(0,722), nbins=int(data["OFFER_DURATION_MINUTES"].max()/10))
fig.show()

**Observation**

Lightning deals durantion ranges from 0 minutes to 104 hours (> 4.3 days) with most cases falling in the range 6-8 hours. The case of 104 hours of deal duration seems to be an outlier and will not be considered in the rest of the analysis.

In [76]:
# Drop large duration outlier
data.drop(index=data[data["OFFER_DURATION_HOURS"] == 104].index, inplace=True)

In [77]:
# Display zero minutes duration deals
display(data[data["OFFER_DURATION_MINUTES"] == 0])

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,...,VERTICAL,DOMAIN_ID,OFFER_DURATION_MINUTES,OFFER_DURATION_HOURS,OFFER_START_WEEKDAY,OFFER_START_MONTH,OFFER_START_DAY,OFFER_START_DAYNAME,OFFER_START_HOUR,OFFER_START_YEAR
18,2021-06-22,2021-06-22 19:00:00,2021-06-22 19:00:03,lightning_deal,10,10,0.0,0.0,,free_shipping,...,T & B,MLM-PARTY_SUPPLIES,0.0,0.0,1,6,22,Tuesday,19,2021
25,2021-06-22,2021-06-22 13:00:00,2021-06-22 13:00:00,lightning_deal,5,5,0.0,0.0,,free_shipping,...,HOME & INDUSTRY,MLM-SHOWER_HEADS,0.0,0.0,1,6,22,Tuesday,13,2021
72,2021-06-22,2021-06-22 07:00:00,2021-06-22 07:00:00,lightning_deal,5,5,0.0,0.0,,free_shipping,...,ENTERTAINMENT,MLM-BOOKS,0.0,0.0,1,6,22,Tuesday,7,2021
90,2021-06-22,2021-06-22 16:00:00,2021-06-22 16:00:00,lightning_deal,3,3,0.0,0.0,A,free_shipping,...,CPG,MLM-CATS_AND_DOGS_FOODS,0.0,0.0,1,6,22,Tuesday,16,2021
114,2021-06-22,2021-06-22 13:00:00,2021-06-22 13:00:01,lightning_deal,15,15,0.0,0.0,,,...,ACC,MLM-VEHICLE_LED_BULBS,0.0,0.0,1,6,22,Tuesday,13,2021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
48646,2021-06-19,2021-06-19 13:00:00,2021-06-19 13:00:00,lightning_deal,8,8,0.0,0.0,,,...,APP & SPORTS,MLM-UNDERPANTS,0.0,0.0,5,6,19,Saturday,13,2021
48647,2021-06-19,2021-06-19 19:00:00,2021-06-19 19:00:00,lightning_deal,5,5,0.0,0.0,,free_shipping,...,APP & SPORTS,MLM-UNDERPANTS,0.0,0.0,5,6,19,Saturday,19,2021
48700,2021-06-19,2021-06-19 07:00:00,2021-06-19 07:00:00,lightning_deal,10,10,0.0,0.0,,,...,CE,MLM-PLAGUES_ULTRASONIC_REPELLENTS,0.0,0.0,5,6,19,Saturday,7,2021
48709,2021-06-19,2021-06-19 07:00:00,2021-06-19 07:00:00,lightning_deal,5,5,0.0,0.0,,,...,HOME & INDUSTRY,MLM-PILLOWS,0.0,0.0,5,6,19,Saturday,7,2021


**Observation**

Around 7% of rows have a deal duration of zero minutes. These cases will be removed for the present analysis. 

**Questions for the client**

What is the reason for having deals with zero minutes duration? Is that a datetime error?

In [78]:
# Drop zero duration deals
data.drop(index=data[data["OFFER_DURATION_MINUTES"] == 0].index, inplace=True)

In [79]:
# Display cases with duration lower than 15 minutes
display(data.loc[
    (data["OFFER_DURATION_MINUTES"] > 0)
    & (data["OFFER_DURATION_MINUTES"] < 15)
    ])

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,...,VERTICAL,DOMAIN_ID,OFFER_DURATION_MINUTES,OFFER_DURATION_HOURS,OFFER_START_WEEKDAY,OFFER_START_MONTH,OFFER_START_DAY,OFFER_START_DAYNAME,OFFER_START_HOUR,OFFER_START_YEAR
889,2021-07-08,2021-07-08 11:00:00,2021-07-08 11:01:46,lightning_deal,60,60,0.00,0.0,A,,...,APP & SPORTS,MLM-BLOUSES,1.0,0.0,3,7,8,Thursday,11,2021
979,2021-07-08,2021-07-08 11:00:00,2021-07-08 11:01:46,lightning_deal,21,21,0.00,0.0,A,,...,APP & SPORTS,MLM-SWEATERS_AND_CARDIGANS,1.0,0.0,3,7,8,Thursday,11,2021
2243,2021-06-02,2021-06-02 07:00:00,2021-06-02 07:01:09,lightning_deal,15,15,0.00,0.0,,free_shipping,...,APP & SPORTS,MLM-SHIRTS,1.0,0.0,2,6,2,Wednesday,7,2021
2417,2021-06-02,2021-06-02 13:00:00,2021-06-02 13:01:43,lightning_deal,5,5,0.00,0.0,,free_shipping,...,CE,MLM-COMPUTER_EQUIPMENT_AND_SPARE_PARTS,1.0,0.0,2,6,2,Wednesday,13,2021
2841,2021-06-12,2021-06-12 19:00:00,2021-06-12 19:07:25,lightning_deal,5,0,9.65,5.0,,,...,ACC,MLM-AUTOMOTIVE_LED_LIGHT_BARS,7.0,0.0,5,6,12,Saturday,19,2021
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
46490,2021-07-16,2021-07-16 16:00:00,2021-07-16 16:01:06,lightning_deal,7,7,0.00,0.0,A,,...,APP & SPORTS,MLM-SUNGLASSES,1.0,0.0,4,7,16,Friday,16,2021
46494,2021-07-16,2021-07-16 11:00:00,2021-07-16 11:01:50,lightning_deal,44,44,0.00,0.0,A,free_shipping,...,APP & SPORTS,MLM-SUNGLASSES,1.0,0.0,4,7,16,Friday,11,2021
46497,2021-07-16,2021-07-16 11:00:00,2021-07-16 11:01:50,lightning_deal,3,3,0.00,0.0,A,free_shipping,...,APP & SPORTS,MLM-SUNGLASSES,1.0,0.0,4,7,16,Friday,11,2021
46523,2021-07-16,2021-07-16 11:00:00,2021-07-16 11:01:50,lightning_deal,148,148,0.00,0.0,A,free_shipping,...,APP & SPORTS,MLM-PANTS,1.0,0.0,4,7,16,Friday,11,2021


**Observation**

Deals with only a few minutes of duration are observed. For example 69 cases show durations lower than 15 minutes. Having deals for too few minutes doesn't seem to make sense, and entries in the dataset with such low durations could be an error in the data. For the present analysis, such low durations will be removed (~0.1% total rows).

**Question for the client**

What is the minimum deal duration of the lightning deals? Do deal durations of 15 minutes or less make sense?

In [80]:
# Drop too short deals (duration < 15 minuts)
data.drop(index=data[data["OFFER_DURATION_MINUTES"] < 15].index, inplace=True)

In [81]:
# To compare sales of different lightning deal events, because each event can have a different duration (see analysis above), 
# it is useful to normalize by a certain time unit. Herein sales per hour will be used.

data["OFFER_DURATION_HOUR_BUCKETS"] = pd.cut(x=data["OFFER_DURATION_MINUTES"], bins = np.array(range(0,106)) * 60, labels=np.array(range(1,106)))
data = data.astype({"OFFER_DURATION_HOUR_BUCKETS": np.int64})
data["SOLD_QUANTITY_HOURLY"] = data["SOLD_QUANTITY"] / data["OFFER_DURATION_HOUR_BUCKETS"]

In [82]:
display(data.sample(10))

Unnamed: 0,OFFER_START_DATE,OFFER_START_DTTM,OFFER_FINISH_DTTM,OFFER_TYPE,INVOLVED_STOCK,REMAINING_STOCK_AFTER_END,SOLD_AMOUNT,SOLD_QUANTITY,ORIGIN,SHIPPING_PAYMENT_TYPE,...,OFFER_DURATION_MINUTES,OFFER_DURATION_HOURS,OFFER_START_WEEKDAY,OFFER_START_MONTH,OFFER_START_DAY,OFFER_START_DAYNAME,OFFER_START_HOUR,OFFER_START_YEAR,OFFER_DURATION_HOUR_BUCKETS,SOLD_QUANTITY_HOURLY
14988,2021-07-09,2021-07-09 19:00:00,2021-07-10 01:00:06,lightning_deal,5,1,10.51,5.0,,,...,360.0,6.0,4,7,9,Friday,19,2021,6,0.833333
38,2021-06-22,2021-06-22 13:00:00,2021-06-22 19:00:00,lightning_deal,15,14,2.58,1.0,,,...,360.0,6.0,1,6,22,Tuesday,13,2021,6,0.166667
19536,2021-07-14,2021-07-14 19:00:00,2021-07-15 01:00:03,lightning_deal,15,9,19.35,6.0,,,...,360.0,6.0,2,7,14,Wednesday,19,2021,6,1.0
47667,2021-07-06,2021-07-06 07:00:00,2021-07-06 13:00:03,lightning_deal,5,4,9.31,1.0,,free_shipping,...,360.0,6.0,1,7,6,Tuesday,7,2021,6,0.166667
28831,2021-07-21,2021-07-21 12:00:00,2021-07-21 19:00:04,lightning_deal,10,10,0.0,0.0,A,,...,420.0,7.0,2,7,21,Wednesday,12,2021,7,0.0
3373,2021-06-26,2021-06-26 13:00:00,2021-06-26 19:00:02,lightning_deal,15,15,0.0,0.0,,free_shipping,...,360.0,6.0,5,6,26,Saturday,13,2021,6,0.0
8196,2021-07-07,2021-07-07 13:00:00,2021-07-07 19:00:05,lightning_deal,5,5,0.0,0.0,,free_shipping,...,360.0,6.0,2,7,7,Wednesday,13,2021,6,0.0
48171,2021-06-19,2021-06-19 19:00:00,2021-06-20 01:00:00,lightning_deal,15,13,15.43,2.0,,free_shipping,...,360.0,6.0,5,6,19,Saturday,19,2021,6,0.333333
20417,2021-06-13,2021-06-13 13:00:00,2021-06-13 19:00:02,lightning_deal,5,1,2.94,4.0,,,...,360.0,6.0,6,6,13,Sunday,13,2021,6,0.666667
35967,2021-07-05,2021-07-05 13:00:00,2021-07-05 19:00:05,lightning_deal,5,4,3.27,1.0,,,...,360.0,6.0,0,7,5,Monday,13,2021,6,0.166667


#### 2. Distribution of sales at different time granularities (monthly, day of the month, day of the week and hour of the day)

**Starting and Ending times**

In [83]:
# Min/max dates of lightning deals
print(f"First date of lightning deals: {data['OFFER_START_DTTM'].min()}")
print(f"Last date of lightning deals: {data['OFFER_FINISH_DTTM'].max()}\n")
print(f"First date of lightning deals (with sales): {data.loc[data['SOLD_QUANTITY'] > 0]['OFFER_START_DTTM'].min()}")
print(f"Last date of lightning deals (with sales): {data.loc[data['SOLD_QUANTITY'] > 0]['OFFER_FINISH_DTTM'].max()}")


First date of lightning deals: 2021-06-01 07:00:00
Last date of lightning deals: 2021-08-01 03:00:00

First date of lightning deals (with sales): 2021-06-01 07:00:00
Last date of lightning deals (with sales): 2021-08-01 01:00:34


**Observations**

The lighting deals started on 2021-06-01 07:00:00 and finished on 2021-08-01 03:00:00. 

Sales started right away in the first time period available (starting date and time: 2021-06-01 07:00:00), and lasted almost until the very last available period (ending date and time: 2021-08-01 01:00:34)

**Normalization of sales by number of deals**

According to the information provided in the dataset, in any given hour, there might be different number of deals taking place. Comparing any two hours, or days, or any other time unit, then requires normalizong sales by the number of activated deals. Otherwise, higher sales for time A compared to time B could lead to the conclusion that time A is better than B for applying deals, which might be wrong if time A simply had higher number of deals activated.

In [84]:
current_date = data['OFFER_START_DTTM'].min()
end_date = data['OFFER_FINISH_DTTM'].max()

# # Group by date
data_hourly_list = []
data_dict = data.to_dict(orient="records")
while current_date <= end_date:
    hourly_deal = {
        "datetime": current_date, 
        "hourly_sales": 0,
        "hourly_sales_norm": 0, 
        "num_deals": 0, 
        # "categories": []
        }
    for d in data_dict:
        if d["OFFER_START_DTTM"] <= current_date < d["OFFER_FINISH_DTTM"]:
            hourly_deal["num_deals"] += 1
            hourly_deal["hourly_sales"] = hourly_deal["hourly_sales"] + d["SOLD_QUANTITY"] / d["OFFER_DURATION_HOUR_BUCKETS"]
            # hourly_deal["categories"].append(
            #     {
            #         "level_1": d["VERTICAL"],
            #         "level_2": d["DOM_DOMAIN_AGG1"],
            #         "level_3": d["DOMAIN_ID"]
            #     }
            # )
    if hourly_deal["num_deals"] > 0:
        hourly_deal["hourly_sales_norm"] = hourly_deal["hourly_sales"] / hourly_deal["num_deals"]
    data_hourly_list.append(hourly_deal)
    current_date = current_date + timedelta(hours=1)

data_hourly = pd.DataFrame(data_hourly_list)

# Get years, months and weekdays of available lightning deals
data_hourly["datetime_month"] = data_hourly["datetime"].dt.month
data_hourly["datatime_day"] = data_hourly["datetime"].dt.day
data_hourly["datetime_dayname"] = data_hourly["datetime"].dt.day_name()
data_hourly["datetime_hour"] = data_hourly["datetime"].dt.hour
data_hourly["date"] = data_hourly["datetime"].dt.date
data_hourly["weekday"] = data_hourly["datetime"].dt.weekday

# Keep only hours with deals (actual data)
data_hourly.drop(index=data_hourly[data_hourly["num_deals"] == 0].index, inplace=True)


In [85]:
data_hourly.head(10)

Unnamed: 0,datetime,hourly_sales,hourly_sales_norm,num_deals,datetime_month,datatime_day,datetime_dayname,datetime_hour,date,weekday
0,2021-06-01 07:00:00,39.654762,0.177824,223,6,1,Tuesday,7,2021-06-01,1
1,2021-06-01 08:00:00,39.654762,0.177824,223,6,1,Tuesday,8,2021-06-01,1
2,2021-06-01 09:00:00,39.654762,0.177824,223,6,1,Tuesday,9,2021-06-01,1
3,2021-06-01 10:00:00,96.688095,0.427823,226,6,1,Tuesday,10,2021-06-01,1
4,2021-06-01 11:00:00,116.604762,0.51595,226,6,1,Tuesday,11,2021-06-01,1
5,2021-06-01 12:00:00,142.654762,0.594395,240,6,1,Tuesday,12,2021-06-01,1
6,2021-06-01 13:00:00,204.791667,0.445199,460,6,1,Tuesday,13,2021-06-01,1
7,2021-06-01 14:00:00,247.303571,0.981363,252,6,1,Tuesday,14,2021-06-01,1
8,2021-06-01 15:00:00,277.828571,1.093813,254,6,1,Tuesday,15,2021-06-01,1
9,2021-06-01 16:00:00,218.778571,0.857955,255,6,1,Tuesday,16,2021-06-01,1


##### Sales by month

In [86]:
# Average hourly sales by month
sales_by_month = data_hourly[["hourly_sales_norm", "datetime_month"]].groupby(by="datetime_month").agg({"hourly_sales_norm": ["mean"]}).reset_index()
sales_by_month.columns = ["_".join(col) for col in sales_by_month.columns.values]
sales_by_month.rename(columns={"datetime_month_": "datetime_month"}, inplace=True)
display(sales_by_month)

Unnamed: 0,datetime_month,hourly_sales_norm_mean
0,6,0.674335
1,7,0.890868
2,8,0.187911


In [87]:
# Number of deals activated by month
data[["OFFER_START_MONTH", "OFFER_TYPE"]].groupby(by="OFFER_START_MONTH").agg("count").reset_index().rename(columns={"OFFER_TYPE": "count"})

Unnamed: 0,OFFER_START_MONTH,count
0,6,15844
1,7,26219


In [88]:
# Display sales activated in August (only a few)
data_hourly.loc[data_hourly["datetime_month"] == 8]

Unnamed: 0,datetime,hourly_sales,hourly_sales_norm,num_deals,datetime_month,datatime_day,datetime_dayname,datetime_hour,date,weekday
1457,2021-08-01 00:00:00,85.416667,0.296586,288,8,1,Sunday,0,2021-08-01,6
1458,2021-08-01 01:00:00,64.916667,0.267147,243,8,1,Sunday,1,2021-08-01,6
1459,2021-08-01 02:00:00,0.0,0.0,1,8,1,Sunday,2,2021-08-01,6


In [89]:
# Plot distribution of sales by month
fig = px.histogram(data_hourly, x="hourly_sales_norm", color="datetime_month", hover_data=data_hourly.columns, width=1200, height=500, barmode="overlay", range_x=(0,4), nbins=20)
fig.show()

**Observation**

July shows higher number of lightining deals than June (26219 and 15845, respectively).

Number of hourly sales normalized by number of deals is higher in July compared to June, as indicated by the monthly average vales and a right shifted distribution for July compared to June (see plot above).

##### Sales by date

In [90]:
# Sales by date
sales_by_date = data_hourly[["hourly_sales_norm", "date"]].groupby(by="date").agg({"hourly_sales_norm": ["mean"]}).reset_index()
sales_by_date.columns = ["_".join(col) for col in sales_by_date.columns.values]
sales_by_date.rename(columns={"date_": "date"}, inplace=True)

sales_by_date =  sales_by_date.merge(right=data_hourly[["datetime_dayname", "date"]].drop_duplicates(), on="date", how="left")

with pd.option_context("display.max_rows", 62):
    display(sales_by_date)

Unnamed: 0,date,hourly_sales_norm_mean,datetime_dayname
0,2021-06-01,0.607127,Tuesday
1,2021-06-02,0.601684,Wednesday
2,2021-06-03,0.885273,Thursday
3,2021-06-04,0.56731,Friday
4,2021-06-05,0.625657,Saturday
5,2021-06-06,0.374988,Sunday
6,2021-06-07,0.675963,Monday
7,2021-06-08,0.980451,Tuesday
8,2021-06-09,0.662789,Wednesday
9,2021-06-10,0.734814,Thursday


In [91]:
# Plot sales by date
fig = px.line(sales_by_date, x="date", y="hourly_sales_norm_mean", markers=True, hover_data=sales_by_date, width=1200, height=400)
fig.show()

**Observations**

Weekly periodicity can be observed with lower hourly sales on during weekends and higer sales at the beginning of the week.

On June 30th a peak in sales is observed, might be related to some holyday or special date in the country, or perhaps the first indication of the increase in sales that take place in July (see next point).

An increase in sales is observed after July 12th, that dampens in the following two weeks. Peak sales accumulate on the begining of the week, indicating that the normal weekly seaonality is maintained but with a total higher volume of sales. Clearly July is a relevant month to activate lightning sales.

It is interesting to note that the level of sales during the weekends in July drop to the same leve as in June. It would be interesting to compare against weekends without deals, to evaluate if activating deals during the weekends can have an effect on profit.

##### Sales by day of the week

In [92]:
# Sales by date of the week
sales_by_weekday = data_hourly[["hourly_sales_norm", "weekday", "datetime_dayname"]].groupby(by=["datetime_dayname", "weekday"]).agg({"hourly_sales_norm": ["mean"]}).reset_index()
sales_by_weekday.columns = ["_".join(col) for col in sales_by_weekday.columns.values]
sales_by_weekday.rename(columns={"datetime_dayname_": "datetime_dayname", "weekday_": "weekday"}, inplace=True)
sales_by_weekday = sales_by_weekday.sort_values("weekday")
display(sales_by_weekday)

Unnamed: 0,datetime_dayname,weekday,hourly_sales_norm_mean
1,Monday,0,0.956128
5,Tuesday,1,0.940903
6,Wednesday,2,0.915252
4,Thursday,3,0.881422
0,Friday,4,0.705128
2,Saturday,5,0.559284
3,Sunday,6,0.489077


In [93]:
# Plot sales by day of the week
fig = px.line(sales_by_weekday, x="datetime_dayname", y="hourly_sales_norm_mean", markers=True, hover_data=sales_by_weekday, width=1200, height=400)
fig.show()

In [94]:
# Sales by day of the week
sales_by_day = data[
    ["SOLD_QUANTITY", "SOLD_QUANTITY_HOURLY", "OFFER_START_DAYNAME"]
    ].groupby(
        by="OFFER_START_DAYNAME"
        ).agg(
            {
                "SOLD_QUANTITY_HOURLY": ["mean"]
            }
        ).reset_index()
sales_by_day.columns = ["_".join(col) for col in sales_by_day.columns.values]
sales_by_day.rename(columns={"OFFER_START_DAYNAME_": "OFFER_START_DAYNAME"}, inplace=True)
display(sales_by_day)

Unnamed: 0,OFFER_START_DAYNAME,SOLD_QUANTITY_HOURLY_mean
0,Friday,0.85005
1,Monday,1.214637
2,Saturday,0.68232
3,Sunday,0.624049
4,Thursday,0.973633
5,Tuesday,1.119496
6,Wednesday,1.094429


**Observations**

As observed in the previous section, sales are higher at the begining of the week and drop during the weekend. This probably reflects the normal periodicty of people's activity during the week, and hints on the idea of focusing deals on the first days of the week to maximize profit.

##### Sales by hour of the day

In [95]:
# Sales by hour of the day
sales_by_dayhour = data_hourly[["hourly_sales_norm", "datetime_hour"]].groupby(by=["datetime_hour"]).agg({"hourly_sales_norm": ["mean"]}).reset_index()
sales_by_dayhour.columns = ["_".join(col) for col in sales_by_dayhour.columns.values]
sales_by_dayhour.rename(columns={"datetime_hour_": "datetime_hour"}, inplace=True)
sales_by_dayhour = sales_by_dayhour.sort_values("datetime_hour")
display(sales_by_dayhour)

Unnamed: 0,datetime_hour,hourly_sales_norm_mean
0,0,0.456311
1,1,0.438487
2,2,0.375959
3,3,0.253305
4,4,0.312033
5,5,0.269803
6,6,0.423838
7,7,0.281198
8,8,0.281113
9,9,0.286952


In [96]:
# Plot sales by hour of the day
fig = px.line(sales_by_dayhour, x="datetime_hour", y="hourly_sales_norm_mean", markers=True, hover_data=sales_by_dayhour, width=1200, height=400)
fig.show()

**Observations**

During any given day, sales start to increase around noon, peaking at mid afternoon (~15hs) and start to drop when the night comes, after 20hs. Sales stay low during thee night and the morning.

This intra day periodicity is most probably related to people's normal daily routine, although, we'd need data of sales without lightning deals to check if deals have any effect on this periodicity.

This data, hints on the period between 11am and 9pm as being the most suitable for focusing deals activation.


#### 3. Distribution of sales among product categories

In [97]:
# Display categories examples
data[["SOLD_AMOUNT", "SOLD_QUANTITY", "VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"]].sample(15)

Unnamed: 0,SOLD_AMOUNT,SOLD_QUANTITY,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID
5654,3.55,2.0,APP & SPORTS,APPAREL ACCESORIES,MLM-SUNGLASSES
8467,0.0,0.0,BEAUTY & HEALTH,PHARMACEUTICS,MLM-ESSENTIAL_OILS
14715,7.11,1.0,T & B,TOYS AND GAMES,MLM-DOLLS
42306,17.69,2.0,HOME & INDUSTRY,HOME&DECOR,MLM-BED_SHEETS
2124,0.0,0.0,CE,COMPUTERS,MLM-HEADPHONES
43791,28.8,3.0,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-OXIMETERS
16896,2.36,2.0,HOME & INDUSTRY,INDUSTRY,MLM-SAFETY_GOGGLES
16726,47.75,7.0,CE,ELECTRONICS,MLM-HAIR_CLIPPERS
9372,0.0,0.0,APP & SPORTS,SPORTS,MLM-FOOTBALL_SHOES
27302,58.73,20.0,BEAUTY & HEALTH,PHARMACEUTICS,MLM-SURGICAL_AND_INDUSTRIAL_MASKS


In [98]:
plot_data = [
    {"x_plot_column": "SOLD_QUANTITY_sum", "y_plot_column": "VERTICAL", "title": "Total Sales", "layout_column": 1},
    {"x_plot_column": "SOLD_AMOUNT_sum", "y_plot_column": "VERTICAL", "title": "Total Revenue", "layout_column": 2},
    ]
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL"], plot_data=plot_data, plot_heigth=800, plot_width=1200, plot=True)

Unnamed: 0,VERTICAL,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
2,BEAUTY & HEALTH,166718.0,516168.31
6,HOME & INDUSTRY,22971.0,142696.82
1,APP & SPORTS,22713.0,125092.53
3,CE,19630.0,352415.06
4,CPG,4459.0,14897.63
0,ACC,4260.0,24743.88
8,T & B,1928.0,13423.63
5,ENTERTAINMENT,353.0,1933.53
7,OTHERS,339.0,1881.13


**Observations**

In terms of sales, the most dominant category during lightning deals are BEAUTY & HEALTH, followed with far less sales by HOME & INDUSTRY, APP & SPROTS and CE. However, in terms of revenue, although BEAUTY & HEALTH is still the one with the highest revenue, it is closely followed by CE. APP & SPORT and HOME & INDUSTRY show mild revenue contributions. 

The case of CE is interesting because, being more expensive products, smaller increases in sales could lead to substantial contributions to revenue.

From this data alone it is not possible to know if this differences in sales and revenue between products is a consequence of the lightinig deals or are just the normal distribution between product categories, only enhanced by the activation of the deals.


In [99]:
plot_data = [
    {"x_plot_column": "SOLD_QUANTITY_sum", "y_plot_column": "DOM_DOMAIN_AGG1", "title": "Total Sales", "layout_column": 1},
    {"x_plot_column": "SOLD_AMOUNT_sum", "y_plot_column": "DOM_DOMAIN_AGG1", "title": "Total Revenue", "layout_column": 2},
    ]
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, plot=True)

Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
13,BEAUTY & HEALTH,PHARMACEUTICS,139967.0,421059.09
11,BEAUTY & HEALTH,BEAUTY EQUIPMENT,21519.0,78345.01
25,HOME & INDUSTRY,HOME&DECOR,15053.0,78863.68
14,CE,COMPUTERS,9514.0,77927.78
6,APP & SPORTS,APPAREL,7388.0,35526.15
7,APP & SPORTS,APPAREL ACCESORIES,7156.0,35359.12
9,APP & SPORTS,SPORTS,5713.0,29696.31
12,BEAUTY & HEALTH,PERSONAL CARE,5232.0,16764.21
16,CE,MOBILE,5227.0,156890.1
15,CE,ELECTRONICS,4889.0,117597.18


**Observations**

The most relevant category in both in terms of sales and revenue is PHARMACEUTICS (BEAUTY & HEALTH), followed by MOBILE, COMPUTERS and ELECTRONICS (CE) and then HOME & DECOR (HOME & INDUSTRY) and BEAUTY EQUIPMENT (BEAUTY & HEALTH).

The following is the list of products that are most dominant in terms of sales/revenue for each of the mentioned categories:
- PHARMACEUTICS: SURGICAL AND INDUSTRIAL MASKS. 
- MOBILE: CELLPHONES, SMARTWATCHES
- COMPUTERS: HEADPHONES
- ELECTRONICS: TELEVISIONS
- BEAUTY EQUIPMENT: DISPOSABLE GLOVES
- HOME & DECOR: MLM-LED_STRIPS, MLM-BED_SHEETS, MLM-LIGHT_BULBS, MLM-WALL_AND_CEILING_LIGHTS

As mentioned in previous observations, without no-deals sales data is it not possible to know if the trends observed herein are caused, at least partialy, by the lightning deals. However, it is worth noting the those products not highlighted do not seem to be important drivers of sales, even when lightning deals are activated.


In [100]:
plot_data = [
    {"x_plot_column": "SOLD_QUANTITY_sum", "y_plot_column": "DOMAIN_ID", "title": "Total Sales", "layout_column": 1},
    {"x_plot_column": "SOLD_AMOUNT_sum", "y_plot_column": "DOMAIN_ID", "title": "Total Revenue", "layout_column": 2},
    ]
category_selection = {"category_level": "DOM_DOMAIN_AGG1", "category_name": "PHARMACEUTICS"}
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, category_selection=category_selection)

category_selection = {"category_level": "DOM_DOMAIN_AGG1", "category_name": "ELECTRONICS"}
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, category_selection=category_selection)

category_selection = {"category_level": "DOM_DOMAIN_AGG1", "category_name": "MOBILE"}
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, category_selection=category_selection)

category_selection = {"category_level": "DOM_DOMAIN_AGG1", "category_name": "BEAUTY EQUIPMENT"}
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, category_selection=category_selection)

category_selection = {"category_level": "DOM_DOMAIN_AGG1", "category_name": "COMPUTERS"}
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, category_selection=category_selection)

category_selection = {"category_level": "DOM_DOMAIN_AGG1", "category_name": "HOME&DECOR"}
sales_revenue_by_category(data=data, plotting_categories=["SOLD_QUANTITY", "SOLD_AMOUNT"], grouping_categories=["VERTICAL", "DOM_DOMAIN_AGG1", "DOMAIN_ID"], plot_data=plot_data, plot_heigth=1200, plot_width=1200, category_selection=category_selection)

Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
38,BEAUTY & HEALTH,PHARMACEUTICS,MLM-SURGICAL_AND_INDUSTRIAL_MASKS,134360.0,394889.76
39,BEAUTY & HEALTH,PHARMACEUTICS,MLM-THERMOMETERS,3608.0,8205.59
5,BEAUTY & HEALTH,PHARMACEUTICS,MLM-BLOOD_PRESSURE_MONITORS,643.0,3659.02
23,BEAUTY & HEALTH,PHARMACEUTICS,MLM-OIL_DIFFUSERS,529.0,2265.65
11,BEAUTY & HEALTH,PHARMACEUTICS,MLM-ESSENTIAL_OILS,354.0,1270.67
2,BEAUTY & HEALTH,PHARMACEUTICS,MLM-ANTIBACTERIAL_GELS,185.0,1491.09
33,BEAUTY & HEALTH,PHARMACEUTICS,MLM-REUSABLE_MASKS,119.0,237.81
30,BEAUTY & HEALTH,PHARMACEUTICS,MLM-OXYGEN_CONCENTRATORS,56.0,8094.57
32,BEAUTY & HEALTH,PHARMACEUTICS,MLM-PILL_BOXES,27.0,45.5
20,BEAUTY & HEALTH,PHARMACEUTICS,MLM-MEDICAL_GEL_PACKS,18.0,110.99


Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
40,CE,ELECTRONICS,MLM-HAIR_CLIPPERS,690.0,4180.42
83,CE,ELECTRONICS,MLM-TELEVISIONS,448.0,65848.41
92,CE,ELECTRONICS,MLM-VACUUM_CLEANERS,324.0,2533.77
81,CE,ELECTRONICS,MLM-STREAMING_MEDIA_DEVICES,318.0,4213.71
35,CE,ELECTRONICS,MLM-GAMEPADS_AND_JOYSTICKS,284.0,4464.26
46,CE,ELECTRONICS,MLM-HAIR_STRAIGHTENERS,220.0,1655.65
41,CE,ELECTRONICS,MLM-HAIR_DRYERS,166.0,1143.82
94,CE,ELECTRONICS,MLM-VIDEO_GAMES,160.0,1631.3
22,CE,ELECTRONICS,MLM-DEEP_FRYERS,158.0,3954.95
33,CE,ELECTRONICS,MLM-FANS,144.0,1793.54


Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
0,CE,MOBILE,MLM-CELLPHONES,1756.0,110484.3
14,CE,MOBILE,MLM-SMARTWATCHES,1218.0,20928.81
7,CE,MOBILE,MLM-CELLPHONE_COVERS,803.0,2382.48
15,CE,MOBILE,MLM-TABLETS,405.0,19123.19
2,CE,MOBILE,MLM-CELLPHONE_ACCESSORIES,405.0,1089.85
11,CE,MOBILE,MLM-MOBILE_DEVICE_CHARGERS,274.0,956.51
12,CE,MOBILE,MLM-PORTABLE_CELLPHONE_CHARGERS,188.0,1088.72
16,CE,MOBILE,MLM-TABLET_CASES,133.0,644.46
18,CE,MOBILE,MLM-TABLET_STANDS_AND_MOUNTS,20.0,52.18
10,CE,MOBILE,MLM-CELLPHONE_USB_AND_AUXILIARY_ADAPTERS,7.0,6.97


Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
6,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-DISPOSABLE_GLOVES,10944.0,45638.17
27,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-OXIMETERS,8524.0,17341.91
4,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-BODYWEIGHT_SCALES,1033.0,4216.55
29,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-PORTABLE_ELECTRIC_MASSAGERS,493.0,6742.28
16,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-HEALTH_CARE_SUPPLIES,330.0,1831.27
11,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-EPILATORS_AND_TRIMMERS,50.0,433.09
21,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-MEDICAL_EQUIPMENT,25.0,731.87
32,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-SANITIZING_GUNS,17.0,286.17
5,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-CURLING_IRONS,14.0,58.59
1,BEAUTY & HEALTH,BEAUTY EQUIPMENT,MLM-AESTHETIC_TREATMENT_TABLES_AND_CHAIRS,13.0,502.37


Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
24,CE,COMPUTERS,MLM-HEADPHONES,3682.0,26350.61
7,CE,COMPUTERS,MLM-COMPUTER_MICE,893.0,2512.94
34,CE,COMPUTERS,MLM-MOUSE_PADS,711.0,1637.41
12,CE,COMPUTERS,MLM-DATA_CABLES_AND_ADAPTERS,665.0,1188.89
47,CE,COMPUTERS,MLM-SPEAKERS,631.0,3564.42
4,CE,COMPUTERS,MLM-AUDIO_AND_VIDEO_CABLES_AND_ADAPTERS,564.0,1330.26
44,CE,COMPUTERS,MLM-ROUTERS_AND_WIRELESS_SYSTEMS,352.0,3989.41
26,CE,COMPUTERS,MLM-KEYBOARD_AND_MOUSE_KITS,342.0,1865.37
6,CE,COMPUTERS,MLM-COMPUTER_EQUIPMENT_AND_SPARE_PARTS,333.0,1639.84
32,CE,COMPUTERS,MLM-MICROPHONES,243.0,1960.33


Unnamed: 0,VERTICAL,DOM_DOMAIN_AGG1,DOMAIN_ID,SOLD_QUANTITY_sum,SOLD_AMOUNT_sum
114,HOME & INDUSTRY,HOME&DECOR,MLM-LED_STRIPS,1600.0,8670.56
11,HOME & INDUSTRY,HOME&DECOR,MLM-BED_SHEETS,1255.0,6262.06
115,HOME & INDUSTRY,HOME&DECOR,MLM-LIGHT_BULBS,1181.0,4890.42
163,HOME & INDUSTRY,HOME&DECOR,MLM-TABLE_AND_DESK_LAMPS,1085.0,3137.67
22,HOME & INDUSTRY,HOME&DECOR,MLM-CHRISTMAS_LIGHTS,1015.0,3270.44
174,HOME & INDUSTRY,HOME&DECOR,MLM-WALL_AND_CEILING_LIGHTS,909.0,4887.55
12,HOME & INDUSTRY,HOME&DECOR,MLM-BLANKETS,617.0,1544.69
134,HOME & INDUSTRY,HOME&DECOR,MLM-PILLOWS,572.0,3210.53
118,HOME & INDUSTRY,HOME&DECOR,MLM-MANUAL_DRINKING_WATER_PUMPS,517.0,868.74
79,HOME & INDUSTRY,HOME&DECOR,MLM-HOME_SHELVES,458.0,1956.06


### Final Remarks

- From the available data, the times and types of products that are main drivers of sales during lightning deals could be identified.
- Without no-deals sales data it is not possible to tell if the observed trends in sales are a consequence of the deal itself or a consequence of normal time periodicity in sales and tendency of some products to accumulate more sales/revenue than others.
- Although it is not possible to meausure a possitive effect on the sales of the dominant products, many products with no relevant sales during lightning dales can be observed, which is useful to decide on how to focus future deals.
- Higher sales volumes are observed from the end of June and during July, for the country under study, July seems to be an interesting month for lightning deals. To get a clearer idea on this, data on the rest of the month of the year would be required.