## Multi-indexación
Hasta ahora hemos visto series y dataframes pandas con índices sencillos, pero pueden tener también índices jerárquicos o multi-índices, lo que abre la puerta a sofisticados procesos de manipulación y análisis de datos.
### Creación del multi-índices
Podemos crear un multi-índice de cuatro formas distintas:
* pd.MultiIndex.from_arrays()
* pd.MultiIndex.from_tuples()
* pd.MultiIndex.from_product()
* pd.MultiIndex.from_frame()


In [1]:
import pandas as pd
import numpy as np

#### Opción 1: Multi-índices a partir de una lista de arrays (pd.MultiIndex.from_arrays())

In [2]:
index = pd.MultiIndex.from_arrays([[2018, 2018, 2018, 2019, 2019, 2019],
        ["Spain", "Portugal", "France", "Spain", "Portugal", "France"]],
        names = ["Year", "Country"])
index

MultiIndex([(2018,    'Spain'),
            (2018, 'Portugal'),
            (2018,   'France'),
            (2019,    'Spain'),
            (2019, 'Portugal'),
            (2019,   'France')],
           names=['Year', 'Country'])

In [3]:
#Si esto lo colocamos en un dataframe
data = pd.DataFrame(data = [18, 20, 10, 15, 12, 18], index = index, columns = ["Sales"])
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


#### Opción 2: Multi-índices a partir de un array de tuplas (pd.MultiIndex.from_tuples())

In [4]:
index = pd.MultiIndex.from_tuples([
    (2018, "Spain"),
    (2018, "Portugal"),
    (2018, "France"),
    (2019, "Spain"),
    (2019, "Portugal"),
    (2019, "France")],
    names = ["Year", "Country"])
index

MultiIndex([(2018,    'Spain'),
            (2018, 'Portugal'),
            (2018,   'France'),
            (2019,    'Spain'),
            (2019, 'Portugal'),
            (2019,   'France')],
           names=['Year', 'Country'])

In [5]:
data = pd.DataFrame(data = [18, 20, 10, 15, 12, 18], index = index, columns = ["Sales"])
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


#### Opción 3: Multi-índices a partir de un producto cartesiano de arrays(pd.MultiIndex.from_product())

In [6]:
index = pd.MultiIndex.from_product([[2018, 2019],["Spain", "Portugal", "France"]],
    names = ["Year", "Country"]
)
index

MultiIndex([(2018,    'Spain'),
            (2018, 'Portugal'),
            (2018,   'France'),
            (2019,    'Spain'),
            (2019, 'Portugal'),
            (2019,   'France')],
           names=['Year', 'Country'])

In [7]:
data = pd.DataFrame(data = [18, 20, 10, 15, 12, 18], index = index, columns = ["Sales"])
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


#### Opción 4: Multi-índices a partir de un dataframe(pd.MultiIndex.from_frame())

In [8]:
df = pd.DataFrame({
    "Year":[2018, 2018, 2018, 2019, 2019, 2019],
    "Country": ["Spain", "Portugal", "France", "Spain", "Portugal", "France"]
})
df

Unnamed: 0,Year,Country
0,2018,Spain
1,2018,Portugal
2,2018,France
3,2019,Spain
4,2019,Portugal
5,2019,France


In [9]:
index = pd.MultiIndex.from_frame(df)
index

MultiIndex([(2018,    'Spain'),
            (2018, 'Portugal'),
            (2018,   'France'),
            (2019,    'Spain'),
            (2019, 'Portugal'),
            (2019,   'France')],
           names=['Year', 'Country'])

In [10]:
data = pd.DataFrame(data = [18, 20, 10, 15, 12, 18], index = index, columns = ["Sales"])
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


#### Extracción de un nivel del índice


In [11]:
index = pd.MultiIndex.from_product(
    [[2018, 2019],["Spain", "Portugal", "France"]],
    names = ["Year", "Country"]
)
data = pd.DataFrame(data = [18, 20, 10, 15, 12, 18], index = index, columns = ["Sales"])
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


In [12]:
data.index.get_level_values(0)

Int64Index([2018, 2018, 2018, 2019, 2019, 2019], dtype='int64', name='Year')

In [13]:
data.index.get_level_values(1)

Index(['Spain', 'Portugal', 'France', 'Spain', 'Portugal', 'France'], dtype='object', name='Country')

In [14]:
data.index.get_level_values("Year")

Int64Index([2018, 2018, 2018, 2019, 2019, 2019], dtype='int64', name='Year')

In [15]:
data.index.get_level_values("Country")

Index(['Spain', 'Portugal', 'France', 'Spain', 'Portugal', 'France'], dtype='object', name='Country')

#### Selección con multi-índices


In [16]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


In [17]:
#Podemos extraer, las filas correspondientes al año 2018 con la expresión.
data.loc[2018]

Unnamed: 0_level_0,Sales
Country,Unnamed: 1_level_1
Spain,18
Portugal,20
France,10


In [18]:
data.loc[(2018, "Spain")]

Sales    18
Name: (2018, Spain), dtype: int64

#### Aplicación de funciones estadísticas

In [19]:
data

Unnamed: 0_level_0,Unnamed: 1_level_0,Sales
Year,Country,Unnamed: 2_level_1
2018,Spain,18
2018,Portugal,20
2018,France,10
2019,Spain,15
2019,Portugal,12
2019,France,18


In [20]:
#Podemos calcular la media
data.mean()

Sales    15.5
dtype: float64

In [21]:
#Para especificar el nivel en que aplicar la media:
data.mean(level = "Year") #Calculará la media de cada año

Unnamed: 0_level_0,Sales
Year,Unnamed: 1_level_1
2018,16
2019,15


In [23]:
data.mean(level= "Country")

Unnamed: 0_level_0,Sales
Country,Unnamed: 1_level_1
Spain,16.5
Portugal,16.0
France,14.0
