# Computación científica en Python | Numpy | Pandas

### Instalar paquetes/librerias

Los paquetes que vamos a usar en este curso ya los tenenemos instalados con Anaconda en el entorno `base`. Para ver que paquetes tenemos instalados en el entorno `base` podemos mirarlo por el entorno gráfico que nos proporciona Anaconda, o a traves de la linea de comandos con el siguiente comando:

```
conda list
```

Para instalar paquetes que no vengan ya de serie, de nuevo dos opciones, usar la interfaz gráfica o la linea de comandos con alguno de los siguientes comandos:
```
conda install scikit-learn
pip install scikit-learn
```

## Numpy

* Numpy (numerical python) es uno de los paquetes más importantes de Python. 

* Muchos otros paquetes usan funcionalidades de este paquete de base. Por ese motivo, es importante conocer los conceptos básicos de Numpy.

* Numpy tiene un tipo de estructura especifico denominado `ndarray` que hace referencia a un vector/matríz N-dimensional.

Para importar el módulo Numpy a la sesión de Python en la que estamos trabajando:

In [1]:
import numpy as np

Importante: Los import de los módulos se suelen hacer al principio en los script de python.

Podemos ver todos los métodods asociados que tiene Numpy con la función `dir`:

In [2]:
dir(np)

['ALLOW_THREADS',
 'AxisError',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'MachAr',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'TooHardError',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__dir__',
 '__doc__',
 '__file__',
 '__getattr__',
 '__git_revision__',
 '__loader__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_distributor_init',
 '_globals',
 '_mat',
 '_pytesttester',
 'abs',
 'absolute',
 'add',
 'add_

Defininimos un `ndarray` con la función `np.array`:

In [3]:
a = np.array([1, 2, 3])
a

array([1, 2, 3])

In [4]:
type(a)

numpy.ndarray

In [5]:
dir(a)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__

Cada `ndarray` está asociado a un tipo de dato (float, float32, float64, int, ...) y todos los objetos que lo forman tienen que ser de ese mismo tipo. Podemos ver que tipo de dato esta asociado a un `ndarray` con `dtype`:

In [6]:
a.dtype

dtype('int32')

Vamos a definir ahora una bi dimensional: 

In [7]:
b = np.array([[1.3, 2.4],[0.3, 4.1]])
b

array([[1.3, 2.4],
       [0.3, 4.1]])

In [8]:
b.dtype

dtype('float64')

In [9]:
b.shape

(2, 2)

In [10]:
b.ndim

2

In [11]:
b.size

4

Podemos definir `ndarray`s con más tipos de elementos:

In [12]:
c = np.array([['a', 'b'],['c', 'd']])
c

array([['a', 'b'],
       ['c', 'd']], dtype='<U1')

In [13]:
d = np.array([[1, 2, 3],[4, 5, 6]], dtype=complex)
d

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j]])

### Diferentes tipos de funciones para crear `ndarrays`:

In [14]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [15]:
np.ones((3, 3))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [16]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
np.arange(4, 10)

array([4, 5, 6, 7, 8, 9])

In [18]:
np.arange(0, 12, 3)

array([0, 3, 6, 9])

In [19]:
np.arange(0, 6, 0.6)

array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4])

In [20]:
np.linspace(0, 10, 5)

array([ 0. ,  2.5,  5. ,  7.5, 10. ])

In [21]:
np.random.random(3)

array([0.43122818, 0.29135981, 0.82425344])

In [22]:
np.random.random((3, 3))

array([[0.80545447, 0.39337949, 0.46895882],
       [0.37764337, 0.71051177, 0.45382529],
       [0.19417507, 0.16479578, 0.74048171]])

La función `reshape`:

In [23]:
np.arange(0,12).reshape(3, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### Operaciones aritméticas

In [24]:
a = np.arange(4)
a

array([0, 1, 2, 3])

In [25]:
a+4

array([4, 5, 6, 7])

In [26]:
a*2

array([0, 2, 4, 6])

In [27]:
b = np.arange(4, 8)
b

array([4, 5, 6, 7])

In [28]:
a + b

array([ 4,  6,  8, 10])

In [29]:
a - b

array([-4, -4, -4, -4])

In [30]:
a * b

array([ 0,  5, 12, 21])

In [31]:
A = np.arange(0, 9).reshape(3, 3)
A

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [32]:
B = np.ones((3, 3))
B

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [33]:
A * B

array([[0., 1., 2.],
       [3., 4., 5.],
       [6., 7., 8.]])

### Multiplicación de matrices

Hasta ahora, solo hemos hecho operaciones por elementos de los `ndarray`s. Ahora vamos a ver otro tipo de operaciones, como la multiplicación de matrices.

In [34]:
np.dot(A, B)

array([[ 3.,  3.,  3.],
       [12., 12., 12.],
       [21., 21., 21.]])

In [35]:
A.dot(B)

array([[ 3.,  3.,  3.],
       [12., 12., 12.],
       [21., 21., 21.]])

In [36]:
np.dot(B, A)

array([[ 9., 12., 15.],
       [ 9., 12., 15.],
       [ 9., 12., 15.]])

### Más funciones para las `ndarrays`

In [37]:
a = np.arange(1, 10)
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [38]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798,
       2.44948974, 2.64575131, 2.82842712, 3.        ])

In [39]:
np.log(a)

array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
       1.79175947, 1.94591015, 2.07944154, 2.19722458])

In [40]:
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [41]:
a.sum()

45

In [42]:
a.min()

1

In [43]:
a.max()

9

In [44]:
a.mean()

5.0

In [45]:
a.std()

2.581988897471611

### Manipular vectores y matrices

In [46]:
a = np.arange(1, 10)
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [47]:
a[4]

5

In [48]:
a[-4]

6

In [49]:
a[:5]

array([1, 2, 3, 4, 5])

In [50]:
a[[1,2,8]]

array([2, 3, 9])

In [51]:
A = np.arange(1, 10).reshape((3, 3))
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [52]:
A[1, 2]

6

In [53]:
A[:,0]

array([1, 4, 7])

In [54]:
A[0:2, 0:2]

array([[1, 2],
       [4, 5]])

In [55]:
A[[0,2], 0:2]

array([[1, 2],
       [7, 8]])

### Iterando un array

In [56]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [57]:
for i in a:
    print(i)

1
2
3
4
5
6
7
8
9


In [58]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [59]:
for row in A:
    print(row)

[1 2 3]
[4 5 6]
[7 8 9]


In [60]:
for i in A.flat:
    print(i)

1
2
3
4
5
6
7
8
9


Si necesitamos aplicar una función en las columnas o filas de una matriz, hay una forma más elegante y eficaz de hacerlo que usando un `for`.

In [61]:
np.apply_along_axis(np.mean, axis=0, arr=A)

array([4., 5., 6.])

In [62]:
np.apply_along_axis(np.mean, axis=1, arr=A)

array([2., 5., 8.])

### Condiciones y arrays de Booleanos

In [63]:
A = np.random.random((4, 4))
A

array([[0.85602406, 0.17439647, 0.82691213, 0.48981487],
       [0.9042295 , 0.99104976, 0.50913057, 0.15598115],
       [0.34064124, 0.97923316, 0.74593774, 0.53230547],
       [0.80176434, 0.45300678, 0.57819088, 0.14618446]])

In [64]:
A < 0.5

array([[False,  True, False,  True],
       [False, False, False,  True],
       [ True, False, False, False],
       [False,  True, False,  True]])

In [65]:
A[A < 0.5]

array([0.17439647, 0.48981487, 0.15598115, 0.34064124, 0.45300678,
       0.14618446])

### Unir arrays

In [66]:
A = np.ones((3, 3))
B = np.zeros((3, 3))

In [67]:
np.vstack((A, B))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [68]:
np.hstack((A, B))

array([[1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

Funciones más especificas para unir arrays de una sola dimensión y crear así arrays bidimensionales:

In [69]:
a = np.array([0, 1, 2])
b = np.array([3, 4, 5])
c = np.array([6, 7, 8])

In [70]:
np.column_stack((a, b, c))

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

In [71]:
np.row_stack((a, b, c))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

### Importante: Copía o vista de un elemento `ndarray`

In [72]:
a = np.array([1, 2, 3, 4])
a

array([1, 2, 3, 4])

In [73]:
b = a
b

array([1, 2, 3, 4])

In [74]:
a[2] = 0
a

array([1, 2, 0, 4])

In [75]:
b

array([1, 2, 0, 4])

In [76]:
c = a[0:2]
c

array([1, 2])

In [77]:
a[0] = 0
c

array([0, 2])

Evitamos esto usando la funcion `copy`.

In [78]:
a = np.array([1, 2, 3, 4])
b = a.copy()
b

array([1, 2, 3, 4])

In [79]:
a[2] = 0
b

array([1, 2, 3, 4])

In [80]:
c = a[0:2].copy()
c

array([1, 2])

In [81]:
a[0] = 0
c

array([1, 2])

## Pandas

* Pandas es el paquete de referencia para el análisis de datos en Python.

* Pandas proporciona estructuras de datos complejas y funciones especificas para trabajar con ellas.

* El concenpto fundamental de Pandas son los `DataFrame`, una estructura de datos con dos dimensiones. También están las `Series`, que son de una dimensión.

* Pandas usa Numpy

In [82]:
import numpy as np
import pandas as pd

### DataFrame

Un DataFrame es basicamente una tabla. Esta formado por filas y columnas, que son arrays con valores individuales (pueden ser números o no).

In [83]:
pd.DataFrame({'Yes': [50, 21], 'No': [131, 2]})

Unnamed: 0,Yes,No
0,50,131
1,21,2


In [84]:
pd.DataFrame({'Juan': ['Sopa', 'Pescado'], 'Ana': ['Pasta', 'Solomillo']})

Unnamed: 0,Juan,Ana
0,Sopa,Pasta
1,Pescado,Solomillo


Estamos usando `pd.DataFrame()` para construir objetos `DataFrame`. Como argumento le pasamos un diccionario con los `keys` `['Juan', 'Ana']` y sus respectivos valores. Aunque este es el método más común para construir un objeto `DataFrame`, no es el único.

El método para construir `DataFrames` que hemos usado le asigna una etiqueta a cada columna que va desde el 0 hasta el número de columnas ascendentemente. Algunas veces esto está bien, pero otras veces puede que queramos asignar una etiqueta específica a cada columna.

In [85]:
pd.DataFrame({'Juan': ['Sopa', 'Pescado', 'Yogurt'], 'Ana': ['Pasta', 'Solomillo', 'Fruta']}, index=['1 Plato', '2 Plato', 'Postre'])

Unnamed: 0,Juan,Ana
1 Plato,Sopa,Pasta
2 Plato,Pescado,Solomillo
Postre,Yogurt,Fruta


### Series

Las `Series` son una sequencia de datos. Si los `DataFrames` son tablas de datos, las `Series` son listas de datos.

In [86]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [87]:
pd.Series([30, 35, 40], index=['2015 matriculas', '2016 matriculas', '2017 matriculas'], name='Matriculas en máster de modelización')

2015 matriculas    30
2016 matriculas    35
2017 matriculas    40
Name: Matriculas en máster de modelización, dtype: int64

Las `Series` y los `DataFrames` están estrechamente relacionados. De hecho, podemos pensas que los `DataFrames` son simplemente un puñado de `Series` juntados.

### Leer ficheros de datos

Aunque exista la opción de crear los `DataFrames` y las `Series` a mano, lo más habitual va a ser que trabajemos con datos que ya existen y están recogidos en algún tipo de fichero (.xls, .csv, .json, ...)

El formato más habitual para guardar datos el el CSV. Los ficheros CSV contienen valores separados por comas.

In [88]:
reviews = pd.read_csv("data/winemag-data-130k-v2.csv")

In [89]:
reviews.shape

(129971, 14)

In [90]:
reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [91]:
reviews = pd.read_csv("data/winemag-data-130k-v2.csv", index_col=0)
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


### Seleccionar subconjuntos del `DataFrame` o `Series`

Podemos seleccionar los valores de una o varias columnas de varias maneras.

In [92]:
reviews.country

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [93]:
reviews['country']

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [94]:
reviews[['country', 'province']]

Unnamed: 0,country,province
0,Italy,Sicily & Sardinia
1,Portugal,Douro
2,US,Oregon
3,US,Michigan
4,US,Oregon
...,...,...
129966,Germany,Mosel
129967,US,Oregon
129968,France,Alsace
129969,France,Alsace


In [95]:
reviews[['country', 'province']][:5]

Unnamed: 0,country,province
0,Italy,Sicily & Sardinia
1,Portugal,Douro
2,US,Oregon
3,US,Michigan
4,US,Oregon


También podemos usar los indices para seleccionar los subconjuntos usando el método `iloc`.

In [96]:
reviews.iloc[0]

country                                                              Italy
description              Aromas include tropical fruit, broom, brimston...
designation                                                   Vulkà Bianco
points                                                                  87
price                                                                  NaN
province                                                 Sicily & Sardinia
region_1                                                              Etna
region_2                                                               NaN
taster_name                                                  Kerin O’Keefe
taster_twitter_handle                                         @kerinokeefe
title                                    Nicosia 2013 Vulkà Bianco  (Etna)
variety                                                        White Blend
winery                                                             Nicosia
Name: 0, dtype: object

In [97]:
reviews.iloc[0,0]

'Italy'

In [98]:
reviews.iloc[:,-1]

0                                          Nicosia
1                              Quinta dos Avidagos
2                                        Rainstorm
3                                       St. Julian
4                                     Sweet Cheeks
                            ...                   
129966    Dr. H. Thanisch (Erben Müller-Burggraef)
129967                                    Citation
129968                             Domaine Gresser
129969                        Domaine Marcel Deiss
129970                            Domaine Schoffit
Name: winery, Length: 129971, dtype: object

In [99]:
reviews.iloc[-3:, :3]

Unnamed: 0,country,description,designation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt
129969,France,"A dry style of Pinot Gris, this is crisp with ...",
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline


In [100]:
reviews.iloc[[0, 10, 100], 0]

0      Italy
10        US
100       US
Name: country, dtype: object

Por último, tambien podemos usar el método `loc` para usar las etiquetas de las filas y columnas.

In [101]:
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

Unnamed: 0,taster_name,taster_twitter_handle,points
0,Kerin O’Keefe,@kerinokeefe,87
1,Roger Voss,@vossroger,87
2,Paul Gregutt,@paulgwine,87
3,Alexander Peartree,,87
4,Paul Gregutt,@paulgwine,87
...,...,...,...
129966,Anna Lee C. Iijima,,90
129967,Paul Gregutt,@paulgwine,90
129968,Roger Voss,@vossroger,90
129969,Roger Voss,@vossroger,90


CUIDADO! En este caso, las etiquetas de las filas son números, pero `iloc` y `loc` no funciónan igual. 

In [102]:
reviews.iloc[:5, 0]

0       Italy
1    Portugal
2          US
3          US
4          US
Name: country, dtype: object

In [103]:
reviews.loc[:5, 'country']

0       Italy
1    Portugal
2          US
3          US
4          US
5       Spain
Name: country, dtype: object

### Manipular el índice

In [104]:
reviews.set_index("title")

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,variety,winery
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Nicosia 2013 Vulkà Bianco (Etna),Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,White Blend,Nicosia
Quinta dos Avidagos 2011 Avidagos Red (Douro),Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Portuguese Red,Quinta dos Avidagos
Rainstorm 2013 Pinot Gris (Willamette Valley),US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Pinot Gris,Rainstorm
St. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Shore),US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,Riesling,St. Julian
Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley),US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...
Dr. H. Thanisch (Erben Müller-Burggraef) 2013 Brauneberger Juffer-Sonnenuhr Spätlese Riesling (Mosel),Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
Citation 2004 Pinot Noir (Oregon),US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Pinot Noir,Citation
Domaine Gresser 2013 Kritt Gewurztraminer (Alsace),France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Gewürztraminer,Domaine Gresser
Domaine Marcel Deiss 2012 Pinot Gris (Alsace),France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Pinot Gris,Domaine Marcel Deiss


### Selección condicional

Podemos buscar los vinos de Italia.

In [105]:
reviews['country'] == 'Italy'

0          True
1         False
2         False
3         False
4         False
          ...  
129966    False
129967    False
129968    False
129969    False
129970    False
Name: country, Length: 129971, dtype: bool

La anterior expresión nos ha devuelto una `Series` con los booleanos que nos dicen cuando el vino es Italiano. Para encontrar esas instancias devueltas por los booleanos hacemos:

In [106]:
reviews.loc[reviews['country'] == 'Italy']

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
22,Italy,Delicate aromas recall white flower and citrus...,Ficiligno,87,19.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio di Pianetto 2007 Ficiligno White (Sicilia),White Blend,Baglio di Pianetto
24,Italy,"Aromas of prune, blackcurrant, toast and oak c...",Aynat,87,35.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Canicattì 2009 Aynat Nero d'Avola (Sicilia),Nero d'Avola,Canicattì
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129929,Italy,"This luminous sparkler has a sweet, fruit-forw...",,91,38.0,Veneto,Prosecco Superiore di Cartizze,,,,Col Vetoraz Spumanti NV Prosecco Superiore di...,Prosecco,Col Vetoraz Spumanti
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


Si además de que sea Italiano, también queremos que nuestro vino tenga una puntuación mayor o igual a 90:

In [107]:
reviews.loc[(reviews['country'] == 'Italy') & (reviews['points'] >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
120,Italy,"Slightly backward, particularly given the vint...",Bricco Rocche Prapó,92,70.0,Piedmont,Barolo,,,,Ceretto 2003 Bricco Rocche Prapó (Barolo),Nebbiolo,Ceretto
130,Italy,"At the first it was quite muted and subdued, b...",Bricco Rocche Brunate,91,70.0,Piedmont,Barolo,,,,Ceretto 2003 Bricco Rocche Brunate (Barolo),Nebbiolo,Ceretto
133,Italy,"Einaudi's wines have been improving lately, an...",,91,68.0,Piedmont,Barolo,,,,Poderi Luigi Einaudi 2003 Barolo,Nebbiolo,Poderi Luigi Einaudi
135,Italy,The color is just beginning to show signs of b...,Sorano,91,60.0,Piedmont,Barolo,,,,Giacomo Ascheri 2001 Sorano (Barolo),Nebbiolo,Giacomo Ascheri
140,Italy,"A big, fat, luscious wine with plenty of toast...",Costa Bruna,90,26.0,Piedmont,Barbera d'Alba,,,,Poderi Colla 2005 Costa Bruna (Barbera d'Alba),Barbera,Poderi Colla
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129929,Italy,"This luminous sparkler has a sweet, fruit-forw...",,91,38.0,Veneto,Prosecco Superiore di Cartizze,,,,Col Vetoraz Spumanti NV Prosecco Superiore di...,Prosecco,Col Vetoraz Spumanti
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


Si queremos un vino Italiano o con puntuación mayor o igual a 90:

In [108]:
reviews.loc[(reviews['country'] == 'Italy') | (reviews['points'] >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
22,Italy,Delicate aromas recall white flower and citrus...,Ficiligno,87,19.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio di Pianetto 2007 Ficiligno White (Sicilia),White Blend,Baglio di Pianetto
24,Italy,"Aromas of prune, blackcurrant, toast and oak c...",Aynat,87,35.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Canicattì 2009 Aynat Nero d'Avola (Sicilia),Nero d'Avola,Canicattì
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


Si queremos un vino Italiano o Español:

In [109]:
reviews.loc[reviews['country'].isin(['Italy', 'Spain'])]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
5,Spain,Blackberry and raspberry aromas show a typical...,Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...,Tempranillo-Merlot,Tandem
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
18,Spain,"Desiccated blackberry, leather, charred wood a...",Vendimia Seleccionada Finca Valdelayegua Singl...,87,28.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Pradorey 2010 Vendimia Seleccionada Finca Vald...,Tempranillo Blend,Pradorey
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129957,Spain,Lightly baked berry aromas vie for attention w...,Crianza,90,17.0,Northern Spain,Rioja,,Michael Schachner,@wineschach,Viñedos Real Rubio 2010 Crianza (Rioja),Tempranillo Blend,Viñedos Real Rubio
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


Si queremos deshacernos de las instancias en las que no tenemos el valor del precio:

In [110]:
reviews.loc[reviews['price'].notnull()]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
5,Spain,Blackberry and raspberry aromas show a typical...,Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...,Tempranillo-Merlot,Tandem
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


### Añadir datos

Añadir datos a nuestros `DataFrames` es fácil. Por ejemplo, podemos asignar el mismo valor a todas las instancias con el siguiente comando:

In [111]:
reviews['critic'] = 'everyone'
reviews['critic']

0         everyone
1         everyone
2         everyone
3         everyone
4         everyone
            ...   
129966    everyone
129967    everyone
129968    everyone
129969    everyone
129970    everyone
Name: critic, Length: 129971, dtype: object

In [112]:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']

0         129971
1         129970
2         129969
3         129968
4         129967
           ...  
129966         5
129967         4
129968         3
129969         2
129970         1
Name: index_backwards, Length: 129971, dtype: int32

### Describir nuestro dataset

Pandas nos proporciona herramientas para facilmente conocer un poco por encima como es el dataset con el que estamos trabajando a traves de valores estadísticos.

In [113]:
reviews.describe()

Unnamed: 0,points,price,index_backwards
count,129971.0,120975.0,129971.0
mean,88.447138,35.363389,64986.0
std,3.03973,41.022218,37519.540256
min,80.0,4.0,1.0
25%,86.0,17.0,32493.5
50%,88.0,25.0,64986.0
75%,91.0,42.0,97478.5
max,100.0,3300.0,129971.0


In [114]:
reviews.describe(include='all')

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards
count,129908,129971,92506,129971.0,120975.0,129908,108724,50511,103727,98758,129971,129970,129971,129971,129971.0
unique,43,119955,37979,,,425,1229,17,19,15,118840,707,16757,1,
top,US,"Ripe plum, game, truffle, leather and menthol ...",Reserve,,,California,Napa Valley,Central Coast,Roger Voss,@vossroger,Gloria Ferrer NV Sonoma Brut Sparkling (Sonoma...,Pinot Noir,Wines & Winemakers,everyone,
freq,54504,3,2009,,,36247,4480,11065,25514,25514,11,13272,222,129971,
mean,,,,88.447138,35.363389,,,,,,,,,,64986.0
std,,,,3.03973,41.022218,,,,,,,,,,37519.540256
min,,,,80.0,4.0,,,,,,,,,,1.0
25%,,,,86.0,17.0,,,,,,,,,,32493.5
50%,,,,88.0,25.0,,,,,,,,,,64986.0
75%,,,,91.0,42.0,,,,,,,,,,97478.5


In [115]:
reviews.dtypes

country                   object
description               object
designation               object
points                     int64
price                    float64
province                  object
region_1                  object
region_2                  object
taster_name               object
taster_twitter_handle     object
title                     object
variety                   object
winery                    object
critic                    object
index_backwards            int32
dtype: object

In [116]:
reviews['points'].mean()

88.44713820775404

In [117]:
reviews['points'].quantile(0.25)

86.0

In [118]:
reviews['country'].unique()

array(['Italy', 'Portugal', 'US', 'Spain', 'France', 'Germany',
       'Argentina', 'Chile', 'Australia', 'Austria', 'South Africa',
       'New Zealand', 'Israel', 'Hungary', 'Greece', 'Romania', 'Mexico',
       'Canada', nan, 'Turkey', 'Czech Republic', 'Slovenia',
       'Luxembourg', 'Croatia', 'Georgia', 'Uruguay', 'England',
       'Lebanon', 'Serbia', 'Brazil', 'Moldova', 'Morocco', 'Peru',
       'India', 'Bulgaria', 'Cyprus', 'Armenia', 'Switzerland',
       'Bosnia and Herzegovina', 'Ukraine', 'Slovakia', 'Macedonia',
       'China', 'Egypt'], dtype=object)

In [119]:
reviews['country'].value_counts()

US                        54504
France                    22093
Italy                     19540
Spain                      6645
Portugal                   5691
Chile                      4472
Argentina                  3800
Austria                    3345
Australia                  2329
Germany                    2165
New Zealand                1419
South Africa               1401
Israel                      505
Greece                      466
Canada                      257
Hungary                     146
Bulgaria                    141
Romania                     120
Uruguay                     109
Turkey                       90
Slovenia                     87
Georgia                      86
England                      74
Croatia                      73
Mexico                       70
Moldova                      59
Brazil                       52
Lebanon                      35
Morocco                      28
Peru                         16
Ukraine                      14
Macedoni

### Modificar los valores de una columna

Por ejemplo, vamos a normalizar los datos de la columna points.

In [120]:
(reviews['points'] - reviews['points'].mean()) / reviews['points'].std()

0        -0.476075
1        -0.476075
2        -0.476075
3        -0.476075
4        -0.476075
            ...   
129966    0.510855
129967    0.510855
129968    0.510855
129969    0.510855
129970    0.510855
Name: points, Length: 129971, dtype: float64

In [121]:
reviews['province - region'] = reviews['province'] + ' - ' + reviews['region_1']
reviews

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery,critic,index_backwards,province - region
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia,everyone,129971,Sicily & Sardinia - Etna
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos,everyone,129970,
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm,everyone,129969,Oregon - Willamette Valley
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian,everyone,129968,Michigan - Lake Michigan Shore
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks,everyone,129967,Oregon - Willamette Valley
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef),everyone,5,
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation,everyone,4,Oregon - Oregon
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser,everyone,3,Alsace - Alsace
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss,everyone,2,Alsace - Alsace


### Eliminar columnas

In [122]:
reviews.columns

Index(['country', 'description', 'designation', 'points', 'price', 'province',
       'region_1', 'region_2', 'taster_name', 'taster_twitter_handle', 'title',
       'variety', 'winery', 'critic', 'index_backwards', 'province - region'],
      dtype='object')

In [123]:
reviews.pop('province - region')

0               Sicily & Sardinia - Etna
1                                    NaN
2             Oregon - Willamette Valley
3         Michigan - Lake Michigan Shore
4             Oregon - Willamette Valley
                       ...              
129966                               NaN
129967                   Oregon - Oregon
129968                   Alsace - Alsace
129969                   Alsace - Alsace
129970                   Alsace - Alsace
Name: province - region, Length: 129971, dtype: object

In [124]:
reviews = reviews.drop(columns=['critic', 'index_backwards'])
reviews.columns

Index(['country', 'description', 'designation', 'points', 'price', 'province',
       'region_1', 'region_2', 'taster_name', 'taster_twitter_handle', 'title',
       'variety', 'winery'],
      dtype='object')

### Agrupar datos

In [125]:
reviews.groupby('points')['points'].count()

points
80       397
81       692
82      1836
83      3025
84      6480
85      9530
86     12600
87     16933
88     17207
89     12226
90     15410
91     11359
92      9613
93      6489
94      3758
95      1535
96       523
97       229
98        77
99        33
100       19
Name: points, dtype: int64

Lo que ha ocurrido es que la función `groupby()` ha creado diferentes grupos dependiendo de la puntuación y luego a contado cuantos vinos hay en cada grupo.

Ahora, vamos a calcular el precio medio de los vinos dependiendo la puntuación:

In [126]:
reviews.groupby('points')['price'].mean()

points
80      16.372152
81      17.182353
82      18.870767
83      18.237353
84      19.310215
85      19.949562
86      22.133759
87      24.901884
88      28.687523
89      32.169640
90      36.906622
91      43.224252
92      51.037763
93      63.112216
94      81.436938
95     109.235420
96     159.292531
97     207.173913
98     245.492754
99     284.214286
100    485.947368
Name: price, dtype: float64

Podemos agrupar usando más de un criterio y devolver más de un valor con `agg()`.

In [127]:
reviews.groupby(['price', 'country']).agg(['count', 'min', 'mean', 'max'])

Unnamed: 0_level_0,Unnamed: 1_level_0,points,points,points,points
Unnamed: 0_level_1,Unnamed: 1_level_1,count,min,mean,max
price,country,Unnamed: 2_level_2,Unnamed: 3_level_2,Unnamed: 4_level_2,Unnamed: 5_level_2
4.0,Argentina,1,84,84.000000,84
4.0,Romania,1,86,86.000000,86
4.0,Spain,4,82,83.750000,85
4.0,US,5,83,84.400000,86
5.0,Argentina,3,80,81.333333,84
...,...,...,...,...,...
1900.0,France,1,98,98.000000,98
2000.0,France,2,96,96.500000,97
2013.0,US,1,91,91.000000,91
2500.0,France,2,96,96.000000,96


### Ordenar instancias

In [128]:
reviews.sort_values(by='points')

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
118056,US,This wine has very little going on aromaticall...,Reserve,80,26.0,California,Livermore Valley,Central Coast,Virginie Boone,@vboone,3 Steves Winery 2008 Reserve Cabernet Sauvigno...,Cabernet Sauvignon,3 Steves Winery
35516,US,"This Merlot has not fully ripened, with aromas...",,80,20.0,Washington,Horse Heaven Hills,Columbia Valley,Sean P. Sullivan,@wawinereport,James Wyatt 2013 Merlot (Horse Heaven Hills),Merlot,James Wyatt
11086,France,Picture grandma standing over a pot of stewed ...,,80,11.0,Languedoc-Roussillon,Fitou,,Joe Czerwinski,@JoeCz,Mont Tauch 1998 Red (Fitou),Red Blend,Mont Tauch
11085,France,A white this age should be fresh and crisp; th...,,80,8.0,Southwest France,Bergerac,,Joe Czerwinski,@JoeCz,Seigneurs de Bergerac 1999 White (Bergerac),White Blend,Seigneurs de Bergerac
102482,US,"This wine is a medium cherry-red color, with s...",Cabernet Franc,80,18.0,Washington,Columbia Valley (WA),Columbia Valley,Sean P. Sullivan,@wawinereport,Tucannon 2014 Cabernet Franc Rosé (Columbia Va...,Rosé,Tucannon
...,...,...,...,...,...,...,...,...,...,...,...,...,...
111756,France,"A hugely powerful wine, full of dark, brooding...",,100,359.0,Bordeaux,Saint-Julien,,Roger Voss,@vossroger,Château Léoville Las Cases 2010 Saint-Julien,Bordeaux-style Red Blend,Château Léoville Las Cases
89728,France,This latest incarnation of the famous brand is...,Cristal Vintage Brut,100,250.0,Champagne,Champagne,,Roger Voss,@vossroger,Louis Roederer 2008 Cristal Vintage Brut (Cha...,Champagne Blend,Louis Roederer
89729,France,This new release from a great vintage for Char...,Le Mesnil Blanc de Blancs Brut,100,617.0,Champagne,Champagne,,Roger Voss,@vossroger,Salon 2006 Le Mesnil Blanc de Blancs Brut Char...,Chardonnay,Salon
118058,US,This wine dazzles with perfection. Sourced fro...,La Muse,100,450.0,California,Sonoma County,Sonoma,,,Verité 2007 La Muse Red (Sonoma County),Bordeaux-style Red Blend,Verité


In [129]:
reviews.sort_values(by='points', ascending=False).reset_index()

Unnamed: 0,index,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,114972,Portugal,"A powerful and ripe wine, strongly influenced ...",Nacional Vintage,100,650.0,Port,,,Roger Voss,@vossroger,Quinta do Noval 2011 Nacional Vintage (Port),Port,Quinta do Noval
1,89729,France,This new release from a great vintage for Char...,Le Mesnil Blanc de Blancs Brut,100,617.0,Champagne,Champagne,,Roger Voss,@vossroger,Salon 2006 Le Mesnil Blanc de Blancs Brut Char...,Chardonnay,Salon
2,113929,US,In 2005 Charles Smith introduced three high-en...,Royal City,100,80.0,Washington,Columbia Valley (WA),Columbia Valley,Paul Gregutt,@paulgwine,Charles Smith 2006 Royal City Syrah (Columbia ...,Syrah,Charles Smith
3,45781,Italy,"This gorgeous, fragrant wine opens with classi...",Riserva,100,550.0,Tuscany,Brunello di Montalcino,,Kerin O’Keefe,@kerinokeefe,Biondi Santi 2010 Riserva (Brunello di Montal...,Sangiovese,Biondi Santi
4,123545,US,Initially a rather subdued Frog; as if it has ...,Bionic Frog,100,80.0,Washington,Walla Walla Valley (WA),Columbia Valley,Paul Gregutt,@paulgwine,Cayuse 2008 Bionic Frog Syrah (Walla Walla Val...,Syrah,Cayuse
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,128255,Argentina,"Severely compromised by green, minty, weedy ar...",,80,13.0,Mendoza Province,Mendoza,,Michael Schachner,@wineschach,Viniterra 2007 Malbec (Mendoza),Malbec,Viniterra
129967,128254,Argentina,Disappointing considering the source. The nose...,Gran Lurton,80,20.0,Mendoza Province,Mendoza,,Michael Schachner,@wineschach,François Lurton 2006 Gran Lurton Cabernet Sauv...,Cabernet Sauvignon,François Lurton
129968,93686,Peru,"Best on the nose, where apple and lemony aroma...",Brut,80,15.0,Ica,,,Michael Schachner,@wineschach,Tacama 2010 Brut Sparkling (Ica),Sparkling Blend,Tacama
129969,73865,Chile,There's not much point in making a reserve-sty...,Prima Reserva,80,13.0,Maipo Valley,,,Joe Czerwinski,@JoeCz,De Martino 1999 Prima Reserva Merlot (Maipo Va...,Merlot,De Martino


### Missing data

Tratar con los datos que faltan es muy importante. Pandas nos ofrece funciones como `isnull, notnull y fillna` para localizar y rellenar los valores perdidos.

In [130]:
reviews['country'].isnull()

0         False
1         False
2         False
3         False
4         False
          ...  
129966    False
129967    False
129968    False
129969    False
129970    False
Name: country, Length: 129971, dtype: bool

In [131]:
reviews[reviews['country'].isnull()]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
913,,"Amber in color, this wine has aromas of peach ...",Asureti Valley,87,30.0,,,,Mike DeSimone,@worldwineguys,Gotsa Family Wines 2014 Asureti Valley Chinuri,Chinuri,Gotsa Family Wines
3131,,"Soft, fruity and juicy, this is a pleasant, si...",Partager,83,,,,,Roger Voss,@vossroger,Barton & Guestier NV Partager Red,Red Blend,Barton & Guestier
4243,,"Violet-red in color, this semisweet wine has a...",Red Naturally Semi-Sweet,88,18.0,,,,Mike DeSimone,@worldwineguys,Kakhetia Traditional Winemaking 2012 Red Natur...,Ojaleshi,Kakhetia Traditional Winemaking
9509,,This mouthwatering blend starts with a nose of...,Theopetra Malagouzia-Assyrtiko,92,28.0,,,,Susan Kostrzewa,@suskostrzewa,Tsililis 2015 Theopetra Malagouzia-Assyrtiko W...,White Blend,Tsililis
9750,,This orange-style wine has a cloudy yellow-gol...,Orange Nikolaevo Vineyard,89,28.0,,,,Jeff Jenssen,@worldwineguys,Ross-idi 2015 Orange Nikolaevo Vineyard Chardo...,Chardonnay,Ross-idi
...,...,...,...,...,...,...,...,...,...,...,...,...,...
124176,,This Swiss red blend is composed of four varie...,Les Romaines,90,30.0,,,,Jeff Jenssen,@worldwineguys,Les Frères Dutruy 2014 Les Romaines Red,Red Blend,Les Frères Dutruy
129407,,Dry spicy aromas of dusty plum and tomato add ...,Reserve,89,22.0,,,,Michael Schachner,@wineschach,El Capricho 2015 Reserve Cabernet Sauvignon,Cabernet Sauvignon,El Capricho
129408,,El Capricho is one of Uruguay's more consisten...,Reserve,89,22.0,,,,Michael Schachner,@wineschach,El Capricho 2015 Reserve Tempranillo,Tempranillo,El Capricho
129590,,"A blend of 60% Syrah, 30% Cabernet Sauvignon a...",Shah,90,30.0,,,,Mike DeSimone,@worldwineguys,Büyülübağ 2012 Shah Red,Red Blend,Büyülübağ


In [132]:
reviews['region_2'].fillna("Unknown")

0                   Unknown
1                   Unknown
2         Willamette Valley
3                   Unknown
4         Willamette Valley
                ...        
129966              Unknown
129967         Oregon Other
129968              Unknown
129969              Unknown
129970              Unknown
Name: region_2, Length: 129971, dtype: object

In [133]:
reviews['price'][reviews['price'].isnull()]

0        NaN
13       NaN
30       NaN
31       NaN
32       NaN
          ..
129844   NaN
129860   NaN
129863   NaN
129893   NaN
129964   NaN
Name: price, Length: 8996, dtype: float64

In [134]:
reviews['price'].fillna(method='bfill')

0         15.0
1         15.0
2         14.0
3         13.0
4         65.0
          ... 
129966    28.0
129967    75.0
129968    30.0
129969    32.0
129970    21.0
Name: price, Length: 129971, dtype: float64

In [135]:
reviews['region_2'].fillna("Unknown")

0                   Unknown
1                   Unknown
2         Willamette Valley
3                   Unknown
4         Willamette Valley
                ...        
129966              Unknown
129967         Oregon Other
129968              Unknown
129969              Unknown
129970              Unknown
Name: region_2, Length: 129971, dtype: object

### Sustituir valores

In [136]:
reviews['taster_twitter_handle'].replace("@kerinokeefe", "@kerino")

0             @kerino
1          @vossroger
2         @paulgwine 
3                 NaN
4         @paulgwine 
             ...     
129966            NaN
129967    @paulgwine 
129968     @vossroger
129969     @vossroger
129970     @vossroger
Name: taster_twitter_handle, Length: 129971, dtype: object

### Renombrar

Podemos renombrar los nombres de los indices o columnas con la función `rename`.

In [137]:
reviews.rename(columns={'points': 'score'})

Unnamed: 0,country,description,designation,score,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


In [138]:
reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
firstEntry,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
secondEntry,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


### Combinar datasets

In [139]:
df1 = reviews.iloc[:50000, :]
df2 = reviews.iloc[50000:, :]
df2.shape

(79971, 13)

In [140]:
pd.concat([df1, df2])

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


In [141]:
dfl = reviews.iloc[:, :8]
dfr = reviews.iloc[:, 8:]
dfr.shape

(129971, 5)

In [142]:
dfl.join(dfr)

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


## Referencias
 - Iker y Mikel (UPV/EHU) lasaiker@fastmail.com, mikelbarrene@gmail.com