# Tutorial de pandas

*pandas* es la librería de Python para Análisis de Datos.

*pandas* proporciona dos estructuras de datos principales: 
* *Series*: array de 1 dimensión con etiquetas
* *DataFrames*: array de 2 dimensiones con etiquetas

Las principales características de pandas son:
* Importación de Datos
* Limpieza de Datos
* Manipulación de Datos
* Cálculos Estadísticos
* Combinación de Datos
* Manipulación de Series Temporales

Otros recursos sobre *Pandas*:
- Manual de Referencia: https://pandas.pydata.org/pandas-docs/stable/
- Python for Data Analysis, Wes McKinney
- Python Data Science Handbook, Jave VanderPlas


In [None]:
import pandas as pd

## Series en Pandas
* *Series*: array de 1 dimensión con etiquetas

In [None]:
ser = pd.Series([100, 'foo', 300, 'bar', 500], ['tom', 'bob', 'nancy', 'dan', 'eric'])

In [None]:
ser

In [None]:
ser.index

In [None]:
# Consulta por Etiqueta
ser[['nancy','bob']]
#ser.loc[['nancy','bob']]

In [None]:
# Consulta por Indice
ser[[4, 3, 1]]
#ser.iloc[[4, 3, 1]]

In [None]:
'bob' in ser

In [None]:
ser

In [None]:
ser * 2

In [None]:
'foo'*2

In [None]:
ser

In [None]:
ser[['nancy', 'eric']] ** 2

## *DataFrame* en Pandas
* *DataFrame*: array de 2 dimensiones con etiquetas

### Crear un DataFrame a partir de un Diccionario de Python

In [None]:
d = {'one' : pd.Series([100., 200., 300.], index=['apple', 'ball', 'clock']),
     'two' : pd.Series([111., 222., 333., 4444.], index=['apple', 'ball', 'cerill', 'dancy'])}

In [None]:
df = pd.DataFrame(d)
print(df)

In [None]:
df.index

In [None]:
df.columns

In [None]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'])

In [None]:
pd.DataFrame(d, index=['dancy', 'ball', 'apple'], columns=['two', 'five'])

### Crear un DataFrame a partir de una Lista de Python

In [None]:
data = [{'alex': 1, 'alice': 3, 'joe': 2}, {'ema': 5, 'dora': 10, 'alice': 20}]

In [None]:
pd.DataFrame(data)

In [None]:
pd.DataFrame(data, index=['orange', 'red'])

In [None]:
pd.DataFrame(data, columns = ['joe', 'dora','alice', 'ema'], index = [0,1])

### Operaciones con DataFrame

In [None]:
df

In [None]:
df['one']

In [None]:
df['three'] = df['one'] * df['two']
df

In [None]:
df['flag'] = df['one'] > 250
df

In [None]:
# Borrar y Obtener una columna
three = df.pop('three')

In [None]:
three

In [None]:
df

In [None]:
# Borrar una columna
del df['two']

In [None]:
df

In [None]:
df['one']

In [None]:
df[['one', 'flag']]

In [None]:
# Añadir una columna
df.insert(2, 'copy_of_one', df['one'])
df

In [None]:
# Añadir una columna
df['one_upper_half'] = df['one'][:2]
df

In [None]:
df['one'][:3]

In [None]:
df

In [None]:
(df['one'] == 100) | (df['copy_of_one'] == 300)

In [None]:
df

In [None]:
# Filtrado de Datos
#df[df['one'] == 100]
df[(df['one'] == 100) | (df['copy_of_one'] == 300)]
#df[df['one'].isnull()]

In [None]:
s1 = df['one']
type(s1)
s1.unique()

In [None]:
df['two'] = [200, 400, 500, 600, 500]

In [None]:
df

In [None]:
df.duplicated()

In [None]:
df.drop_duplicates()

In [None]:
df.dropna()

In [None]:
df

In [None]:
df.fillna(df.sum())

In [None]:
df.replace(100, 10000)

In [None]:
df

# Agregación de Datos

In [None]:
df.groupby('one').count()

In [None]:
df.groupby('one').mean()
#df.groupby('one').sum()

In [None]:
import numpy as np
df.groupby('one').agg({'copy_of_one':np.mean, 'two':np.sum})

In [None]:
df

In [None]:
df2 = pd.concat([df,df])
df2

In [None]:
df2.groupby('one').agg({'copy_of_one':lambda x:sum(x), 'two': lambda x:sum(x)})