Redimensionamiento de tablas
=======

* *60:00 min* | Última modificación: Agosto 13, 2021 | YouTube

En la preparación de datos usualmente es necesario realizar transformaciones que modifican la estructura de una tabla, con el fin de transformar los datos a un formato que pueda ser usado para proveer insights.

Al finalizar el documento, usted estará en capacidad de transformar una tabla usando los operadores:

* Melt & Pivot.

* Stack & Unstack.

* Tablas dinámicas.

## Preparación

In [1]:
import numpy as np
import pandas as pd

pd.set_option("display.notebook_repr_html", False)

In [2]:
iris = pd.read_csv(
    "https://raw.githubusercontent.com/jdvelasq/datalabs/master/datasets/iris.csv",
    sep=",",
    thousands=None,
    decimal=".",
)

iris.head()

   Sepal_Length  Sepal_Width  Petal_Length  Petal_Width Species
0           5.1          3.5           1.4          0.2  setosa
1           4.9          3.0           1.4          0.2  setosa
2           4.7          3.2           1.3          0.2  setosa
3           4.6          3.1           1.5          0.2  setosa
4           5.0          3.6           1.4          0.2  setosa

## Melt & Pivot

In [3]:
#
# Agrega una clave para identificar cada caso
#
iris["id"] = list(range(150))
iris.head()

   Sepal_Length  Sepal_Width  Petal_Length  Petal_Width Species  id
0           5.1          3.5           1.4          0.2  setosa   0
1           4.9          3.0           1.4          0.2  setosa   1
2           4.7          3.2           1.3          0.2  setosa   2
3           4.6          3.1           1.5          0.2  setosa   3
4           5.0          3.6           1.4          0.2  setosa   4

In [4]:
iris_melt = pd.melt(
    iris,
    id_vars="id",
    var_name="Variables",
    value_name="Values",
)
iris_melt.head()

   id     Variables Values
0   0  Sepal_Length    5.1
1   1  Sepal_Length    4.9
2   2  Sepal_Length    4.7
3   3  Sepal_Length    4.6
4   4  Sepal_Length      5

In [5]:
iris_melt.tail()

      id Variables     Values
745  145   Species  virginica
746  146   Species  virginica
747  147   Species  virginica
748  148   Species  virginica
749  149   Species  virginica

In [6]:
iris_melt.pivot(
    index="id",
    columns="Variables",
    values="Values",
).head(10)

Variables Petal_Length Petal_Width Sepal_Length Sepal_Width Species
id                                                                 
0                  1.4         0.2          5.1         3.5  setosa
1                  1.4         0.2          4.9           3  setosa
2                  1.3         0.2          4.7         3.2  setosa
3                  1.5         0.2          4.6         3.1  setosa
4                  1.4         0.2            5         3.6  setosa
5                  1.7         0.4          5.4         3.9  setosa
6                  1.4         0.3          4.6         3.4  setosa
7                  1.5         0.2            5         3.4  setosa
8                  1.4         0.2          4.4         2.9  setosa
9                  1.5         0.1          4.9         3.1  setosa

## Stack & Unstack

In [7]:
iris.stack().head(24)

0  Sepal_Length       5.1
   Sepal_Width        3.5
   Petal_Length       1.4
   Petal_Width        0.2
   Species         setosa
   id                   0
1  Sepal_Length       4.9
   Sepal_Width          3
   Petal_Length       1.4
   Petal_Width        0.2
   Species         setosa
   id                   1
2  Sepal_Length       4.7
   Sepal_Width        3.2
   Petal_Length       1.3
   Petal_Width        0.2
   Species         setosa
   id                   2
3  Sepal_Length       4.6
   Sepal_Width        3.1
   Petal_Length       1.5
   Petal_Width        0.2
   Species         setosa
   id                   3
dtype: object

In [8]:
iris.stack().unstack().head(4)

  Sepal_Length Sepal_Width Petal_Length Petal_Width Species id
0          5.1         3.5          1.4         0.2  setosa  0
1          4.9           3          1.4         0.2  setosa  1
2          4.7         3.2          1.3         0.2  setosa  2
3          4.6         3.1          1.5         0.2  setosa  3

## Tablas dinámicas

In [9]:
df = pd.DataFrame(
    {
        "key1": ["a", "a", "b", "b", "c", "c"],
        "key2": ["A", "B", "A", "B", "A", "B"],
        "values1": [1, 2, 3, 4, 5, 6],
        "values2": [7, 8, 9, 10, 11, 12],
    }
)
df

  key1 key2  values1  values2
0    a    A        1        7
1    a    B        2        8
2    b    A        3        9
3    b    B        4       10
4    c    A        5       11
5    c    B        6       12

In [10]:
pd.pivot_table(
    df,
    index=["key1", "key2"],
    values=["values1", "values2"],
)

           values1  values2
key1 key2                  
a    A           1        7
     B           2        8
b    A           3        9
     B           4       10
c    A           5       11
     B           6       12

In [11]:
pd.pivot_table(
    df,
    index=["key2", "key1"],
    values=["values1", "values2"],
)

           values1  values2
key2 key1                  
A    a           1        7
     b           3        9
     c           5       11
B    a           2        8
     b           4       10
     c           6       12