# 00 - Preparación y Agrupación de Datos (Pandas)

**Objetivo del notebook**: construir un flujo continuo que conecte (1) la exploración y preparación del dataset **Titanic** con (2) técnicas de **agrupamiento y pivoteo** en Pandas.

A lo largo del notebook trabajaremos **siempre con** `titanic.csv`.

---


## 1. Carga del dataset y primeras validaciones

Antes de transformar o agregar variables, validamos que el archivo esté accesible y que los tipos de datos tengan sentido.


---
# Series y DataFrames

- Lectura de datos
- Métodos básicos de exploración
- La estructura Serie
- La estructura DataFrame
- Selección de subset de datos
- Operacione estadísticas
- Filtrado de datos
- Creación de columnas en un dataframe


In [1]:
import pandas as pd

#### Leemos los datos desde un archivo

In [2]:
df = pd.read_csv('titanic.csv')

In [3]:
# Ajustes de visualización (opcional)
pd.set_option('display.max_columns', 50)
pd.set_option('display.width', 120)


In [4]:
df

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.2500,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C
2,3,1,3,"Heikkinen, Miss. Laina",female,26.0,0,0,STON/O2. 3101282,7.9250,,S
3,4,1,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",female,35.0,1,0,113803,53.1000,C123,S
4,5,0,3,"Allen, Mr. William Henry",male,35.0,0,0,373450,8.0500,,S
...,...,...,...,...,...,...,...,...,...,...,...,...
886,887,0,2,"Montvila, Rev. Juozas",male,27.0,0,0,211536,13.0000,,S
887,888,1,1,"Graham, Miss. Margaret Edith",female,19.0,0,0,112053,30.0000,B42,S
888,889,0,3,"Johnston, Miss. Catherine Helen ""Carrie""",female,,1,2,W./C. 6607,23.4500,,S
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0000,C148,C


| Variable    | Traducción al Español     | Descripción                                                           |
| ----------- | ------------------------- | --------------------------------------------------------------------- |
| PassengerId | ID Pasajero               | Identificador único del pasajero                                      |
| Survived    | Sobrevivió                | Indica si el pasajero sobrevivió (0 = No, 1 = Sí)                     |
| Pclass      | Clase del Pasaje          | Clase del ticket (1 = Primera, 2 = Segunda, 3 = Tercera)              |
| Name        | Nombre                    | Nombre completo del pasajero                                          |
| Sex         | Sexo                      | Sexo del pasajero                                                     |
| Age         | Edad                      | Edad del pasajero en años                                             |
| SibSp       | Hermanos/Cónyuges a Bordo | Número de hermanos y/o cónyuges a bordo                               |
| Parch       | Padres/Hijos a Bordo      | Número de padres y/o hijos a bordo                                    |
| Ticket      | Número de Ticket          | Código del ticket                                                     |
| Fare        | Tarifa                    | Precio pagado por el pasaje                                           |
| Cabin       | Cabina                    | Identificador de la cabina                                            |
| Embarked    | Puerto de Embarque        | Puerto donde embarcó (C = Cherbourg, Q = Queenstown, S = Southampton) |


## 2. Exploración rápida para entender el dato

En esta sección usamos métodos básicos de Pandas para responder preguntas iniciales:
- ¿Cuántas filas/columnas tenemos?
- ¿Qué tipos de variables existen?
- ¿Hay valores faltantes?

Estas validaciones son clave porque condicionan **cómo** vamos a agrupar y resumir más adelante.


#### Métodos básicos de exploración de un DataFrame

In [5]:
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C


In [6]:
df.tail(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
889,890,1,1,"Behr, Mr. Karl Howell",male,26.0,0,0,111369,30.0,C148,C
890,891,0,3,"Dooley, Mr. Patrick",male,32.0,0,0,370376,7.75,,Q


In [7]:
df.describe(include='all')

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
count,891.0,891.0,891.0,891,891,714.0,891.0,891.0,891.0,891.0,204,889
unique,,,,891,2,,,,681.0,,147,3
top,,,,"Braund, Mr. Owen Harris",male,,,,347082.0,,G6,S
freq,,,,1,577,,,,7.0,,4,644
mean,446.0,0.383838,2.308642,,,29.699118,0.523008,0.381594,,32.204208,,
std,257.353842,0.486592,0.836071,,,14.526497,1.102743,0.806057,,49.693429,,
min,1.0,0.0,1.0,,,0.42,0.0,0.0,,0.0,,
25%,223.5,0.0,2.0,,,20.125,0.0,0.0,,7.9104,,
50%,446.0,0.0,3.0,,,28.0,0.0,0.0,,14.4542,,
75%,668.5,1.0,3.0,,,38.0,1.0,0.0,,31.0,,


In [8]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 891 entries, 0 to 890
Data columns (total 12 columns):
 #   Column       Non-Null Count  Dtype  
---  ------       --------------  -----  
 0   PassengerId  891 non-null    int64  
 1   Survived     891 non-null    int64  
 2   Pclass       891 non-null    int64  
 3   Name         891 non-null    object 
 4   Sex          891 non-null    object 
 5   Age          714 non-null    float64
 6   SibSp        891 non-null    int64  
 7   Parch        891 non-null    int64  
 8   Ticket       891 non-null    object 
 9   Fare         891 non-null    float64
 10  Cabin        204 non-null    object 
 11  Embarked     889 non-null    object 
dtypes: float64(2), int64(5), object(5)
memory usage: 83.7+ KB


#### La Serie

Seleccionando una columna del DataFrame

In [9]:
df['Name']

0                                Braund, Mr. Owen Harris
1      Cumings, Mrs. John Bradley (Florence Briggs Th...
2                                 Heikkinen, Miss. Laina
3           Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                               Allen, Mr. William Henry
                             ...                        
886                                Montvila, Rev. Juozas
887                         Graham, Miss. Margaret Edith
888             Johnston, Miss. Catherine Helen "Carrie"
889                                Behr, Mr. Karl Howell
890                                  Dooley, Mr. Patrick
Name: Name, Length: 891, dtype: object

Seleccionando una Fila del DataFrame

In [10]:
df.iloc[0]

PassengerId                          1
Survived                             0
Pclass                               3
Name           Braund, Mr. Owen Harris
Sex                               male
Age                               22.0
SibSp                                1
Parch                                0
Ticket                       A/5 21171
Fare                              7.25
Cabin                              NaN
Embarked                             S
Name: 0, dtype: object

#### Seleccionando columnas de un DataFrame

In [11]:
df[ ['Name','Age'] ]

Unnamed: 0,Name,Age
0,"Braund, Mr. Owen Harris",22.0
1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0
2,"Heikkinen, Miss. Laina",26.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
...,...,...
886,"Montvila, Rev. Juozas",27.0
887,"Graham, Miss. Margaret Edith",19.0
888,"Johnston, Miss. Catherine Helen ""Carrie""",
889,"Behr, Mr. Karl Howell",26.0


#### Seleccionando filas de un DataFrame

In [12]:
df.iloc[5:8]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
6,7,0,1,"McCarthy, Mr. Timothy J",male,54.0,0,0,17463,51.8625,E46,S
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S


In [13]:
df.iloc[ [5,7,17] ]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
5,6,0,3,"Moran, Mr. James",male,,0,0,330877,8.4583,,Q
7,8,0,3,"Palsson, Master. Gosta Leonard",male,2.0,3,1,349909,21.075,,S
17,18,1,2,"Williams, Mr. Charles Eugene",male,,0,0,244373,13.0,,S


#### Seleccionando celdas

In [14]:
df.loc[2:5, 'Name']

2                          Heikkinen, Miss. Laina
3    Futrelle, Mrs. Jacques Heath (Lily May Peel)
4                        Allen, Mr. William Henry
5                                Moran, Mr. James
Name: Name, dtype: object

In [15]:
df.loc[2:5, ['Name','Age']]

Unnamed: 0,Name,Age
2,"Heikkinen, Miss. Laina",26.0
3,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0
4,"Allen, Mr. William Henry",35.0
5,"Moran, Mr. James",


#### Realizando Cálculos Estadísticos en una Serie o DataFrame

In [17]:
df.min()

TypeError: '<=' not supported between instances of 'float' and 'str'

In [18]:
df.select_dtypes(include="number").min()

PassengerId    1.00
Survived       0.00
Pclass         1.00
Age            0.42
SibSp          0.00
Parch          0.00
Fare           0.00
dtype: float64

In [19]:
df['Fare'].min()

np.float64(0.0)

In [None]:
df.select_dtypes(include="number").max()

In [None]:
df['Fare'].max()

In [None]:
df.count()

In [None]:
df['Fare'].count()

In [None]:
df.select_dtypes(include="number").median()

In [None]:
df['Fare'].median()

In [None]:
df.select_dtypes(include="number").mean()

In [None]:
df['Fare'].mean()

In [None]:
df.select_dtypes(include="number").quantile(q=0.1)

In [None]:
df['Fare'].quantile(q=0.1)

In [None]:
df.select_dtypes(include="number").quantile(q=0.5)

In [None]:
df['Fare'].quantile(q=0.5)

In [None]:
df.select_dtypes(include="number").quantile(q=0.9)

In [None]:
df['Fare'].quantile(q=0.9)

#### Filtrando Filas de un DataFrame

In [21]:
df[ df['Fare'] > 500  ]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
258,259,1,1,"Ward, Miss. Anna",female,35.0,0,0,PC 17755,512.3292,,C
679,680,1,1,"Cardeza, Mr. Thomas Drake Martinez",male,36.0,0,1,PC 17755,512.3292,B51 B53 B55,C
737,738,1,1,"Lesurer, Mr. Gustave J",male,35.0,0,0,PC 17755,512.3292,B101,C


In [22]:
df[ (df['Fare'] > 500) & (df['Sex'] == 'female') ]

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
258,259,1,1,"Ward, Miss. Anna",female,35.0,0,0,PC 17755,512.3292,,C


#### Agregar columnas al dataframe

In [24]:
df['Taxes'] = 5

In [25]:
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Taxes
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,5
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,5


In [27]:
df['Taxes'] = df['Fare'] * 0.05 + 1

In [28]:
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Taxes
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,1.3625
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,4.564165


In [29]:
df['Total'] = df['Fare'] + df['Taxes']

In [30]:
df.head(2)

Unnamed: 0,PassengerId,Survived,Pclass,Name,Sex,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked,Taxes,Total
0,1,0,3,"Braund, Mr. Owen Harris",male,22.0,1,0,A/5 21171,7.25,,S,1.3625,8.6125
1,2,1,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",female,38.0,1,0,PC 17599,71.2833,C85,C,4.564165,75.847465


### Eliminar filas y columnas

In [31]:
df.drop('Total', axis=1, inplace=True)

In [None]:
df.head()

In [None]:
df.drop(0, axis=0, inplace=True)

In [None]:
df.head()

---

## 3. Del análisis descriptivo a la agregación

Hasta aquí trabajamos principalmente con:
- selección de columnas/filas/celdas,
- estadísticos descriptivos,
- filtrado,
- creación/eliminación de columnas,
- tratamiento inicial de valores faltantes.

Todo lo anterior se conoce como **data preparation / data wrangling**: dejar el dataset en condiciones para responder preguntas.

El siguiente paso natural es **resumir información por grupos** (por ejemplo, por `Sex`, `Pclass`, `Embarked`, rangos de edad, etc.). Para eso usaremos:
- **Multi-Índices**
- **groupby()** y agregaciones avanzadas
- **pivot_table()**, **melt()**

La diferencia principal es el foco:
- antes: operación *fila a fila* o *columna a columna*
- ahora: operación *grupo a grupo*

---


# 4. Agrupamiento de Datos con Titanic

A continuación, aplicamos las mismas herramientas del notebook de agrupamiento, pero usando el dataset **Titanic**.


In [32]:
# Re-cargamos el dataset para iniciar el bloque de agrupaciones desde una base limpia
# (en la sección anterior hicimos modificaciones educativas al DataFrame)
df = pd.read_csv('titanic.csv')
(df.shape, df.columns)

((891, 12),
 Index(['PassengerId', 'Survived', 'Pclass', 'Name', 'Sex', 'Age', 'SibSp', 'Parch', 'Ticket', 'Fare', 'Cabin',
        'Embarked'],
       dtype='object'))

In [None]:
df

## 4.1 Multi-índices

Un **MultiIndex** permite indexar un DataFrame por más de una clave. Es útil cuando:
- quieres consultar rápidamente por combinaciones (ej. `Sex` + `Pclass`),
- vas a producir reportes jerárquicos,
- o quieres preparar el dato para ciertas operaciones de agregación.


In [33]:
# Creamos una vista con multi-índice (no modifica df a menos que lo reasignemos)
df_mi = df.set_index(['Sex','Pclass']).sort_index()
df_mi.head(10)


Unnamed: 0_level_0,Unnamed: 1_level_0,PassengerId,Survived,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
female,1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,1,0,PC 17599,71.2833,C85,C
female,1,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,1,0,113803,53.1,C123,S
female,1,12,1,"Bonnell, Miss. Elizabeth",58.0,0,0,113783,26.55,C103,S
female,1,32,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",,1,0,PC 17569,146.5208,B78,C
female,1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",49.0,1,0,PC 17572,76.7292,D33,C
female,1,62,1,"Icard, Miss. Amelie",38.0,0,0,113572,80.0,B28,
female,1,89,1,"Fortune, Miss. Mabel Helen",23.0,3,2,19950,263.0,C23 C25 C27,S
female,1,137,1,"Newsom, Miss. Helen Monypeny",19.0,0,2,11752,26.2833,D47,S
female,1,152,1,"Pears, Mrs. Thomas (Edith Wearne)",22.0,1,0,113776,66.6,C2,S
female,1,167,1,"Chibnall, Mrs. (Edith Martha Bowerman)",,0,1,113505,55.0,E33,S


In [34]:
# Acceso a una combinación específica de llaves
# Ejemplo: mujeres en 1ra clase
try:
    df_mi.loc[('female', 1)].head()
except KeyError:
    # Si el dataset tiene las llaves con otro formato, mostramos alternativas
    df_mi.index.levels


In [35]:
df_mi.loc[('female', 1)]

Unnamed: 0_level_0,Unnamed: 1_level_0,PassengerId,Survived,Name,Age,SibSp,Parch,Ticket,Fare,Cabin,Embarked
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1
female,1,2,1,"Cumings, Mrs. John Bradley (Florence Briggs Th...",38.0,1,0,PC 17599,71.2833,C85,C
female,1,4,1,"Futrelle, Mrs. Jacques Heath (Lily May Peel)",35.0,1,0,113803,53.1000,C123,S
female,1,12,1,"Bonnell, Miss. Elizabeth",58.0,0,0,113783,26.5500,C103,S
female,1,32,1,"Spencer, Mrs. William Augustus (Marie Eugenie)",,1,0,PC 17569,146.5208,B78,C
female,1,53,1,"Harper, Mrs. Henry Sleeper (Myna Haxtun)",49.0,1,0,PC 17572,76.7292,D33,C
female,...,...,...,...,...,...,...,...,...,...,...
female,1,857,1,"Wick, Mrs. George Dennick (Mary Hitchcock)",45.0,1,1,36928,164.8667,,S
female,1,863,1,"Swift, Mrs. Frederick Joel (Margaret Welles Ba...",48.0,0,0,17466,25.9292,D17,S
female,1,872,1,"Beckwith, Mrs. Richard Leonard (Sallie Monypeny)",47.0,1,1,11751,52.5542,D35,S
female,1,880,1,"Potter, Mrs. Thomas Jr (Lily Alexenia Wilson)",56.0,0,1,11767,83.1583,C50,C


In [36]:
df_mi.index.names

FrozenList(['Sex', 'Pclass'])

In [37]:
df_mi.index.levels

FrozenList([['female', 'male'], [1, 2, 3]])

## 4.2 groupby(): el corazón del resumen por grupos

`groupby()` separa el dataset en grupos y luego aplica una o más funciones de agregación.

Ejemplos típicos con Titanic:
- tasa de supervivencia por sexo,
- promedio de tarifa por clase,
- distribución de edades por puerto de embarque.


In [38]:
# Tasa de supervivencia por sexo (si Survived está en 0/1, el promedio es la tasa)
df.groupby('Sex')['Survived'].mean().sort_values(ascending=False)


Sex
female    0.742038
male      0.188908
Name: Survived, dtype: float64

In [40]:
# Supervivencia por (Sexo, Clase) con MultiIndex en el resultado
surv_by_sex_class = df.groupby(['Sex','Pclass'])['Survived'].mean()
surv_by_sex_class


Sex     Pclass
female  1         0.968085
        2         0.921053
        3         0.500000
male    1         0.368852
        2         0.157407
        3         0.135447
Name: Survived, dtype: float64

In [41]:
# Múltiples agregaciones a la vez con .agg()
# - size: número de registros
# - mean Fare: tarifa promedio
# - median Age: mediana de edad
summary = (
    df.groupby(['Sex','Pclass'])
      .agg(n_passengers=('PassengerId','size'),
           surv_rate=('Survived','mean'),
           fare_mean=('Fare','mean'),
           age_median=('Age','median'))
      .sort_values(['surv_rate','n_passengers'], ascending=[False, False])
)
summary


Unnamed: 0_level_0,Unnamed: 1_level_0,n_passengers,surv_rate,fare_mean,age_median
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
female,1,94,0.968085,106.125798,35.0
female,2,76,0.921053,21.970121,28.0
female,3,144,0.5,16.11881,21.5
male,1,122,0.368852,67.226127,40.0
male,2,108,0.157407,19.741782,30.0
male,3,347,0.135447,12.661633,25.0


### groupby + transform(): volver del nivel grupo al nivel fila

A veces no basta con la tabla agregada: queremos **traer** una métrica grupal de vuelta al DataFrame para:
- construir features,
- comparar cada fila contra su grupo,
- normalizar por grupo.

`transform()` es la herramienta estándar, porque devuelve una serie con el **mismo largo** que el DataFrame original.


In [42]:
# Ejemplo: diferencia de la tarifa de cada pasajero respecto al promedio de su clase
# (esto es un feature engineering clásico)
df = df.copy()  # por seguridad
class_fare_mean = df.groupby('Pclass')['Fare'].transform('mean')
df['Fare_vs_ClassMean'] = df['Fare'] - class_fare_mean

(df[['Pclass','Fare','Fare_vs_ClassMean']].head(10))


Unnamed: 0,Pclass,Fare,Fare_vs_ClassMean
0,3,7.25,-6.42555
1,1,71.2833,-12.871387
2,3,7.925,-5.75055
3,1,53.1,-31.054687
4,3,8.05,-5.62555
5,3,8.4583,-5.21725
6,1,51.8625,-32.292187
7,3,21.075,7.39945
8,3,11.1333,-2.54225
9,2,30.0708,9.408617


### groupby + apply(): lógica personalizada por grupo

`apply()` es flexible (permite lógica arbitraria), pero suele ser más lento. Úsalo cuando:
- no puedes expresar la operación con `agg()` / `transform()` / operaciones vectorizadas,
- o cuando necesitas devolver estructuras complejas.


In [43]:
# Ejemplo: obtener el Top 3 de tarifas por (Sex, Pclass)

def top_fares(group, n=3):
    # Nos quedamos solo con las columnas relevantes para evitar efectos colaterales
    cols = ['Name','Fare','Survived']
    return group[cols].sort_values('Fare', ascending=False).head(n)

Top3 = (
    df[['Sex','Pclass','Name','Fare','Survived']]
      .groupby(['Sex','Pclass'], group_keys=True)
      .apply(lambda g: top_fares(g, n=3))
)

Top3

  .apply(lambda g: top_fares(g, n=3))


Unnamed: 0_level_0,Unnamed: 1_level_0,Unnamed: 2_level_0,Name,Fare,Survived
Sex,Pclass,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1
female,1,258,"Ward, Miss. Anna",512.3292,1
female,1,88,"Fortune, Miss. Mabel Helen",263.0,1
female,1,341,"Fortune, Miss. Alice Elizabeth",263.0,1
female,2,615,"Herman, Miss. Alice",65.0,1
female,2,754,"Herman, Mrs. Samuel (Jane Laver)",65.0,1
female,2,43,"Laroche, Miss. Simonne Marie Anne Andree",41.5792,1
female,3,792,"Sage, Miss. Stella Anna",69.55,0
female,3,863,"Sage, Miss. Dorothy Edith ""Dolly""",69.55,0
female,3,180,"Sage, Miss. Constance Gladys",69.55,0
male,1,679,"Cardeza, Mr. Thomas Drake Martinez",512.3292,1


## 4.3 Pivoteo de tablas

Las tablas pivote son un formato típico de reporte:
- filas = categoría
- columnas = categoría
- valores = métrica agregada

En Pandas, `pivot_table()` es preferible a `pivot()` cuando puede haber duplicados, ya que permite definir la función de agregación.


In [44]:
# Tasa de supervivencia por Sexo (filas) y Clase (columnas)
pivot_surv = pd.pivot_table(
    df,
    values='Survived',
    index='Sex',
    columns='Pclass',
    aggfunc='mean'
)
pivot_surv


Pclass,1,2,3
Sex,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
female,0.968085,0.921053,0.5
male,0.368852,0.157407,0.135447


In [45]:
# Ejemplo con múltiples métricas en pivot_table
pivot_multi = pd.pivot_table(
    df,
    values=['Survived','Fare'],
    index='Embarked',
    columns='Pclass',
    aggfunc={'Survived':'mean','Fare':'mean'}
)
pivot_multi


Unnamed: 0_level_0,Fare,Fare,Fare,Survived,Survived,Survived
Pclass,1,2,3,1,2,3
Embarked,Unnamed: 1_level_2,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2,Unnamed: 6_level_2
C,104.718529,25.358335,11.214083,0.694118,0.529412,0.378788
Q,90.0,12.35,11.183393,0.5,0.666667,0.375
S,70.364862,20.327439,14.644083,0.582677,0.463415,0.189802


## 4.4 Despivoteo (melt)

`melt()` convierte datos desde formato "ancho" (muchas columnas) a formato "largo" (columna de variable + columna de valor).

Esto es especialmente útil para:
- alimentar visualizaciones,
- estandarizar estructura para modelado,
- o para merges más simples.

Aquí tomaremos `pivot_surv` y lo llevaremos a formato largo.


In [46]:
surv_long = (
    pivot_surv
      .reset_index()
      .melt(id_vars='Sex', var_name='Pclass', value_name='SurvivalRate')
      .sort_values(['Sex','Pclass'])
)
surv_long


Unnamed: 0,Sex,Pclass,SurvivalRate
0,female,1,0.968085
2,female,2,0.921053
4,female,3,0.5
1,male,1,0.368852
3,male,2,0.157407
5,male,3,0.135447


---

## 5. Cierre

En este flujo conectamos dos ideas:

1) **Preparación / exploración**: entender el dataset, limpiar, filtrar y crear variables.
2) **Agrupación / reporte**: resumir el dataset por categorías para obtener conclusiones accionables.

En la práctica, la preparación define *qué tan confiable* es la agregación. Y la agregación define *qué tan bien* comunicamos hallazgos a negocio.
