## Scope del Proyecto

Los datos para este problema serán proporcionados mediante un dataset del sitio web Kaggle https://www.kaggle.com/datasets/mohamedharris/supermart-grocery-sales-retail-analytics-dataset


Este set de datos contiene las siguientes columnas:

- Order ID - Id de la orden
- Customer Name - Nombre del cliente
- Category - Categoria del producto
- Sub Category - Sub categoria del producto
- City - Ciudad del cliente
- Order Date - Fecha de la orden
- Region - Región de la ciudad.
- Sales - Venta de la orden
- Discount - Porcentaje de Descuento de la orden
- Profit - Ganancia generada en la orden
- State - Provincia o estado de la ciudad

La idea general del proyecto es descomponer este set de datos en varias dimensiones tales como:
- Dimensión de categoria
- Dimension de cliente
- Dimension de Fechas (Generadas de forma aleatoria para complementar el set de datos)

### Exploración de la data

In [28]:
import pandas as pd
import matplotlib as plt

In [29]:
dataset = pd.read_csv('./sales.csv')
dataset

Unnamed: 0,Order ID,Customer Name,Category,Sub Category,City,Order Date,Region,Sales,Discount,Profit,State
0,OD1,Harish,Oil & Masala,Masalas,Vellore,11-08-2017,North,1254,0.12,401.28,Tamil Nadu
1,OD2,Sudha,Beverages,Health Drinks,Krishnagiri,11-08-2017,South,749,0.18,149.80,Tamil Nadu
2,OD3,Hussain,Food Grains,Atta & Flour,Perambalur,06-12-2017,West,2360,0.21,165.20,Tamil Nadu
3,OD4,Jackson,Fruits & Veggies,Fresh Vegetables,Dharmapuri,10-11-2016,South,896,0.25,89.60,Tamil Nadu
4,OD5,Ridhesh,Food Grains,Organic Staples,Ooty,10-11-2016,South,2355,0.26,918.45,Tamil Nadu
...,...,...,...,...,...,...,...,...,...,...,...
9989,OD9990,Sudeep,"Eggs, Meat & Fish",Eggs,Madurai,12/24/2015,West,945,0.16,359.10,Tamil Nadu
9990,OD9991,Alan,Bakery,Biscuits,Kanyakumari,07-12-2015,West,1195,0.26,71.70,Tamil Nadu
9991,OD9992,Ravi,Food Grains,Rice,Bodi,06-06-2017,West,1567,0.16,501.44,Tamil Nadu
9992,OD9993,Peer,Oil & Masala,Spices,Pudukottai,10/16/2018,West,1659,0.15,597.24,Tamil Nadu


Descripcion de columnas, tipos y total de valores no nulos

In [30]:
dataset.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 11 columns):
 #   Column         Non-Null Count  Dtype  
---  ------         --------------  -----  
 0   Order ID       9994 non-null   object 
 1   Customer Name  9994 non-null   object 
 2   Category       9994 non-null   object 
 3   Sub Category   9994 non-null   object 
 4   City           9994 non-null   object 
 5   Order Date     9994 non-null   object 
 6   Region         9994 non-null   object 
 7   Sales          9994 non-null   int64  
 8   Discount       9994 non-null   float64
 9   Profit         9994 non-null   float64
 10  State          9994 non-null   object 
dtypes: float64(2), int64(1), object(8)
memory usage: 859.0+ KB


Descripcion de valores unicos, moda, frecuencia y valores altos

In [31]:
dataset.describe()

Unnamed: 0,Sales,Discount,Profit
count,9994.0,9994.0,9994.0
mean,1496.596158,0.226817,374.937082
std,577.559036,0.074636,239.932881
min,500.0,0.1,25.25
25%,1000.0,0.16,180.0225
50%,1498.0,0.23,320.78
75%,1994.75,0.29,525.6275
max,2500.0,0.35,1120.95


Analizamos los datos nulos

In [32]:
dataset.isnull().sum()

Order ID         0
Customer Name    0
Category         0
Sub Category     0
City             0
Order Date       0
Region           0
Sales            0
Discount         0
Profit           0
State            0
dtype: int64

Analizamos la columna Customer Name

In [34]:
dataset.groupby(by='Customer Name').count()

Unnamed: 0_level_0,Order ID,Category,Sub Category,City,Order Date,Region,Sales,Discount,Profit,State
Customer Name,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Adavan,205,205,205,205,205,205,205,205,205,205
Aditi,187,187,187,187,187,187,187,187,187,187
Akash,196,196,196,196,196,196,196,196,196,196
Alan,198,198,198,198,198,198,198,198,198,198
Amrish,227,227,227,227,227,227,227,227,227,227
Amy,196,196,196,196,196,196,196,196,196,196
Anu,186,186,186,186,186,186,186,186,186,186
Arutra,218,218,218,218,218,218,218,218,218,218
Arvind,203,203,203,203,203,203,203,203,203,203
Esther,189,189,189,189,189,189,189,189,189,189


Analizamos la columna de Category

In [35]:
dataset.groupby(by='Category').count()

Unnamed: 0_level_0,Order ID,Customer Name,Sub Category,City,Order Date,Region,Sales,Discount,Profit,State
Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Bakery,1413,1413,1413,1413,1413,1413,1413,1413,1413,1413
Beverages,1400,1400,1400,1400,1400,1400,1400,1400,1400,1400
"Eggs, Meat & Fish",1490,1490,1490,1490,1490,1490,1490,1490,1490,1490
Food Grains,1398,1398,1398,1398,1398,1398,1398,1398,1398,1398
Fruits & Veggies,1418,1418,1418,1418,1418,1418,1418,1418,1418,1418
Oil & Masala,1361,1361,1361,1361,1361,1361,1361,1361,1361,1361
Snacks,1514,1514,1514,1514,1514,1514,1514,1514,1514,1514


Analizamos la columna de Sub Category 

In [36]:
dataset.groupby(by='Sub Category').count()

Unnamed: 0_level_0,Order ID,Customer Name,Category,City,Order Date,Region,Sales,Discount,Profit,State
Sub Category,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1
Atta & Flour,353,353,353,353,353,353,353,353,353,353
Biscuits,459,459,459,459,459,459,459,459,459,459
Breads & Buns,502,502,502,502,502,502,502,502,502,502
Cakes,452,452,452,452,452,452,452,452,452,452
Chicken,348,348,348,348,348,348,348,348,348,348
Chocolates,499,499,499,499,499,499,499,499,499,499
Cookies,520,520,520,520,520,520,520,520,520,520
Dals & Pulses,343,343,343,343,343,343,343,343,343,343
Edible Oil & Ghee,451,451,451,451,451,451,451,451,451,451
Eggs,379,379,379,379,379,379,379,379,379,379
