# Introduccion al módulo pandas
## Breve descripción y ejemplos. 
## Parte 2. Selecciones con índices en rebanadas y reducción dimensional. 

### Marco Arieli Herrera-Valdez$^1$
####$^1$ Laboratorio de Fisiología de Sistemas, Departamento de Matemáticas, Facultad de Ciencias, Universidad Nacional Autónoma de México

(Basado en el tutorial https://pandas.pydata.org/pandas-docs/stable/getting_started/10min.html)


In [1]:
import pandas as pd
import numpy as np

## Selección de filas, columnas, partes en cuadros de datos

En el tutorial anterior definimos algunas series y cuadros de texto para trabajar. Rápidamente...

In [11]:
n=10
fechas = pd.date_range('20200221',periods=n)
print(fechas)
df= pd.DataFrame(np.random.randn(len(fechas),4),index=fechas,columns=list('ABCD'))
print(df)

DatetimeIndex(['2020-02-21', '2020-02-22', '2020-02-23', '2020-02-24',
               '2020-02-25', '2020-02-26', '2020-02-27', '2020-02-28',
               '2020-02-29', '2020-03-01'],
              dtype='datetime64[ns]', freq='D')
                   A         B         C         D
2020-02-21 -1.211513 -0.824221 -0.712894  1.167004
2020-02-22  1.188228  0.347568  0.238080 -0.028168
2020-02-23 -1.320037  0.685981  0.045797  0.562786
2020-02-24  0.066765 -1.624689 -0.187643  1.231179
2020-02-25  2.183944  0.849512 -0.846504  2.043354
2020-02-26 -0.446159 -0.118173  1.128412 -0.011153
2020-02-27  0.836515 -0.400387 -0.532562 -0.411861
2020-02-28  0.354877  1.690893  1.390138 -0.305671
2020-02-29 -0.452760  1.950024  1.051856 -1.953903
2020-03-01 -0.251373  0.440664  0.941089 -2.285693


Selección de una sola columna que resulta en una Serie, equivalente a df.A

In [12]:
df['A']

2020-02-21   -1.211513
2020-02-22    1.188228
2020-02-23   -1.320037
2020-02-24    0.066765
2020-02-25    2.183944
2020-02-26   -0.446159
2020-02-27    0.836515
2020-02-28    0.354877
2020-02-29   -0.452760
2020-03-01   -0.251373
Freq: D, Name: A, dtype: float64

In [13]:
df.A

2020-02-21   -1.211513
2020-02-22    1.188228
2020-02-23   -1.320037
2020-02-24    0.066765
2020-02-25    2.183944
2020-02-26   -0.446159
2020-02-27    0.836515
2020-02-28    0.354877
2020-02-29   -0.452760
2020-03-01   -0.251373
Freq: D, Name: A, dtype: float64

El uso de paréntesis cuadrados permite seleccionar filas 

In [14]:
df[1:3]

Unnamed: 0,A,B,C,D
2020-02-22,1.188228,0.347568,0.23808,-0.028168
2020-02-23,-1.320037,0.685981,0.045797,0.562786


Seleccion por etiqueta

In [23]:
print(fechas[2])
df.loc[fechas[2]]

2020-02-23 00:00:00


A   -1.320037
B    0.685981
C    0.045797
D    0.562786
Name: 2020-02-23 00:00:00, dtype: float64

Varios ejes por etiqueta

In [22]:
df.loc[:,['A','B']]

Unnamed: 0,A,B
2020-02-21,-1.211513,-0.824221
2020-02-22,1.188228,0.347568
2020-02-23,-1.320037,0.685981
2020-02-24,0.066765,-1.624689
2020-02-25,2.183944,0.849512
2020-02-26,-0.446159,-0.118173
2020-02-27,0.836515,-0.400387
2020-02-28,0.354877,1.690893
2020-02-29,-0.45276,1.950024
2020-03-01,-0.251373,0.440664


Rebanadas. Los extremos de las rebanadas quedan incluidos.

In [25]:
df.loc['20200222':'20200229',['C','B']]

Unnamed: 0,C,B
2020-02-22,0.23808,0.347568
2020-02-23,0.045797,0.685981
2020-02-24,-0.187643,-1.624689
2020-02-25,-0.846504,0.849512
2020-02-26,1.128412,-0.118173
2020-02-27,-0.532562,-0.400387
2020-02-28,1.390138,1.690893
2020-02-29,1.051856,1.950024


Reducción dimensional

In [26]:
df.loc['20200224',['A','D']]

A    0.066765
D    1.231179
Name: 2020-02-24 00:00:00, dtype: float64

Obtención de escalares mediante reducciones dimensionales 

In [27]:
df.loc['20200224',['D']]

D    1.231179
Name: 2020-02-24 00:00:00, dtype: float64

Otra forma más rápida de obtener escalares

In [29]:
df.at[fechas[1],'C']

0.23808031478653507

### Selección por posición 

Por selección de índices enteros como en arreglos en numpy. El primer índice es para filas, el segundo para columnas. 

In [34]:
df.iloc[2:4,1:4]

Unnamed: 0,B,C,D
2020-02-23,0.685981,0.045797,0.562786
2020-02-24,-1.624689,-0.187643,1.231179


In [35]:
df.iloc[[4,3,2],[2,3,1]]

Unnamed: 0,C,D,B
2020-02-25,-0.846504,2.043354,0.849512
2020-02-24,-0.187643,1.231179,-1.624689
2020-02-23,0.045797,0.562786,0.685981


Para hacer una selección en filas hay que usar sólo un índice

In [31]:
df.iloc[0]

A   -1.211513
B   -0.824221
C   -0.712894
D    1.167004
Name: 2020-02-21 00:00:00, dtype: float64

In [44]:
df.iloc[2:4]

Unnamed: 0,A,B,C,D
2020-02-23,-1.320037,0.685981,0.045797,0.562786
2020-02-24,0.066765,-1.624689,-0.187643,1.231179


Rebanadas completas que contienen filas

In [37]:
df.iloc[2:4,:]

Unnamed: 0,A,B,C,D
2020-02-23,-1.320037,0.685981,0.045797,0.562786
2020-02-24,0.066765,-1.624689,-0.187643,1.231179


Rebanadas completas que contienen columnas

In [41]:
df.iloc[:,[1,3]]

Unnamed: 0,B,D
2020-02-21,-0.824221,1.167004
2020-02-22,0.347568,-0.028168
2020-02-23,0.685981,0.562786
2020-02-24,-1.624689,1.231179
2020-02-25,0.849512,2.043354
2020-02-26,-0.118173,-0.011153
2020-02-27,-0.400387,-0.411861
2020-02-28,1.690893,-0.305671
2020-02-29,1.950024,-1.953903
2020-03-01,0.440664,-2.285693


Para obtener un valor sólamente 

In [42]:
df.iloc[2,2]

0.04579683161058737

que es equivalente a 

In [43]:
df.iat[2,2]

0.04579683161058737