# Instrucciones
Realice un análisis exploratorio y visual con la librería Seaborn o Matplotlib, para encontrar los insights más importantes a su juicio.
1. Un notebook ordenado, documentado y reproducible con su análisis
2. Una infografica en donde presente los 3 principales insights encontrados (Considere que estta infografia será presentada a toda l organización)

Para realizar la infografia, se sugiere utilizar Canva, con su cuenta Google. La web de Canva es: http://www.canva.com/

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Calidad de figuras en notebook
plt.rcParams["figure.dpi"] = 120

# Estilo global (Seaborn ajusta Matplotlib)
sns.set_theme(style="whitegrid")

# Para ver todas las columnas en DataFrames
pd.set_option("display.max_columns", None)

## Lectura y wrangling inicial

In [3]:
# lectura con parseo de fechas
df = pd.read_csv('Superstore.csv', encoding='windows-1252', parse_dates=['Order Date','Ship Date'])

In [4]:
df

Unnamed: 0,Row ID,Order ID,Order Date,Ship Date,Ship Mode,Customer ID,Customer Name,Segment,Country,City,...,Postal Code,Region,Product ID,Category,Sub-Category,Product Name,Sales,Quantity,Discount,Profit
0,1,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-BO-10001798,Furniture,Bookcases,Bush Somerset Collection Bookcase,261.9600,2,0.00,41.9136
1,2,CA-2016-152156,2016-11-08,2016-11-11,Second Class,CG-12520,Claire Gute,Consumer,United States,Henderson,...,42420,South,FUR-CH-10000454,Furniture,Chairs,"Hon Deluxe Fabric Upholstered Stacking Chairs,...",731.9400,3,0.00,219.5820
2,3,CA-2016-138688,2016-06-12,2016-06-16,Second Class,DV-13045,Darrin Van Huff,Corporate,United States,Los Angeles,...,90036,West,OFF-LA-10000240,Office Supplies,Labels,Self-Adhesive Address Labels for Typewriters b...,14.6200,2,0.00,6.8714
3,4,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,FUR-TA-10000577,Furniture,Tables,Bretford CR4500 Series Slim Rectangular Table,957.5775,5,0.45,-383.0310
4,5,US-2015-108966,2015-10-11,2015-10-18,Standard Class,SO-20335,Sean O'Donnell,Consumer,United States,Fort Lauderdale,...,33311,South,OFF-ST-10000760,Office Supplies,Storage,Eldon Fold 'N Roll Cart System,22.3680,2,0.20,2.5164
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
9989,9990,CA-2014-110422,2014-01-21,2014-01-23,Second Class,TB-21400,Tom Boeckenhauer,Consumer,United States,Miami,...,33180,South,FUR-FU-10001889,Furniture,Furnishings,Ultra Door Pull Handle,25.2480,3,0.20,4.1028
9990,9991,CA-2017-121258,2017-02-26,2017-03-03,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627,West,FUR-FU-10000747,Furniture,Furnishings,Tenex B1-RE Series Chair Mats for Low Pile Car...,91.9600,2,0.00,15.6332
9991,9992,CA-2017-121258,2017-02-26,2017-03-03,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627,West,TEC-PH-10003645,Technology,Phones,Aastra 57i VoIP phone,258.5760,2,0.20,19.3932
9992,9993,CA-2017-121258,2017-02-26,2017-03-03,Standard Class,DB-13060,Dave Brooks,Consumer,United States,Costa Mesa,...,92627,West,OFF-PA-10004041,Office Supplies,Paper,"It's Hot Message Books with Stickers, 2 3/4"" x 5""",29.6000,4,0.00,13.3200


In [5]:
# eliminamos columnas que no serán de utilidad
df.drop(['Row ID', 'Postal Code'], axis=1, inplace=True)

In [6]:
# estructura del dataframe
df.info()

<class 'pandas.DataFrame'>
RangeIndex: 9994 entries, 0 to 9993
Data columns (total 19 columns):
 #   Column         Non-Null Count  Dtype         
---  ------         --------------  -----         
 0   Order ID       9994 non-null   str           
 1   Order Date     9994 non-null   datetime64[us]
 2   Ship Date      9994 non-null   datetime64[us]
 3   Ship Mode      9994 non-null   str           
 4   Customer ID    9994 non-null   str           
 5   Customer Name  9994 non-null   str           
 6   Segment        9994 non-null   str           
 7   Country        9994 non-null   str           
 8   City           9994 non-null   str           
 9   State          9994 non-null   str           
 10  Region         9994 non-null   str           
 11  Product ID     9994 non-null   str           
 12  Category       9994 non-null   str           
 13  Sub-Category   9994 non-null   str           
 14  Product Name   9994 non-null   str           
 15  Sales          9994 non-null   f

In [7]:
# algunos registros de ejemplo
df.sample(4).T

Unnamed: 0,3334,5140,9933,4680
Order ID,US-2017-109253,CA-2015-154886,CA-2014-166555,CA-2014-114510
Order Date,2017-08-21 00:00:00,2015-11-08 00:00:00,2014-07-11 00:00:00,2014-03-14 00:00:00
Ship Date,2017-08-22 00:00:00,2015-11-12 00:00:00,2014-07-14 00:00:00,2014-03-19 00:00:00
Ship Mode,First Class,Standard Class,First Class,Standard Class
Customer ID,PR-18880,SW-20455,JK-15205,JF-15295
Customer Name,Patrick Ryan,Shaun Weien,Jamie Kunitz,Jason Fortune-
Segment,Consumer,Consumer,Consumer,Consumer
Country,United States,United States,United States,United States
City,Oakland,San Francisco,Niagara Falls,Logan
State,California,California,New York,Utah


## Análisis Visual

In [9]:
# -------------------------
# 1) Dataset de juguete
# -------------------------
df = pd.DataFrame({
    "x": [1, 2, 3, 4, 5],
    "y": [3, 5, 4, 6, 8]
})
df

Unnamed: 0,x,y
0,1,3
1,2,5
2,3,4
3,4,6
4,5,8


## Visualización de datos

In [13]:
# Calidad de figuras en notebook
plt.rcParams["figure.dpi"] = 120

# Estilo global (Seaborn ajusta Matplotlib)
sns.set_theme(style="whitegrid")

# Para ver todas las columnas en DataFrames
pd.set_option("display.max_columns", None)

## Carga de datos

In [14]:
path_region = "datos-covid-por-region.csv"
path_etareo = "datos-covid-etareo.csv"

df_region = pd.read_csv(path_region)
df_etareo = pd.read_csv(path_etareo)

print("df_region:", df_region.shape)
display(df_region.head())

print("\ndf_etareo:", df_etareo.shape)
display(df_etareo.head())

FileNotFoundError: [Errno 2] No such file or directory: 'datos-covid-por-region.csv'