# Computación científica en Python | Numpy | Pandas

### Instalar paquetes/librerias

Para ver que paquetes tenemos instalados en el entorno `base` podemos mirarlo por la interfaz gráfica o la línea de comandos el siguiente comando:
```
pip list
```

Para instalar paquetes que no vengan ya de serie, usar la interfaz gráfica o la linea de comandos con el siguiente comando:
```
pip install scikit-learn
```

## Numpy

* Numpy (numerical python) es uno de los paquetes más importantes de Python. 

* Muchos otros paquetes usan funcionalidades de este paquete de base. Por ese motivo, es importante conocer los conceptos básicos de Numpy.

* Numpy tiene un tipo de estructura especifico denominado `ndarray` que hace referencia a un vector/matríz N-dimensional.

Para importar el módulo Numpy a la sesión de Python en la que estamos trabajando:

In [1]:
import numpy as np

Importante: Los import de los módulos se suelen hacer al principio en los script de python.

Podemos ver todos los métodods asociados que tiene Numpy con la función `dir`:

In [2]:
dir(np)

['ALLOW_THREADS',
 'AxisError',
 'BUFSIZE',
 'CLIP',
 'DataSource',
 'ERR_CALL',
 'ERR_DEFAULT',
 'ERR_IGNORE',
 'ERR_LOG',
 'ERR_PRINT',
 'ERR_RAISE',
 'ERR_WARN',
 'FLOATING_POINT_SUPPORT',
 'FPE_DIVIDEBYZERO',
 'FPE_INVALID',
 'FPE_OVERFLOW',
 'FPE_UNDERFLOW',
 'False_',
 'Inf',
 'Infinity',
 'MAXDIMS',
 'MAY_SHARE_BOUNDS',
 'MAY_SHARE_EXACT',
 'MachAr',
 'NAN',
 'NINF',
 'NZERO',
 'NaN',
 'PINF',
 'PZERO',
 'RAISE',
 'SHIFT_DIVIDEBYZERO',
 'SHIFT_INVALID',
 'SHIFT_OVERFLOW',
 'SHIFT_UNDERFLOW',
 'ScalarType',
 'Tester',
 'TooHardError',
 'True_',
 'UFUNC_BUFSIZE_DEFAULT',
 'UFUNC_PYVALS_NAME',
 'WRAP',
 '_NoValue',
 '_UFUNC_API',
 '__NUMPY_SETUP__',
 '__all__',
 '__builtins__',
 '__cached__',
 '__config__',
 '__dir__',
 '__doc__',
 '__file__',
 '__getattr__',
 '__git_revision__',
 '__loader__',
 '__mkl_version__',
 '__name__',
 '__package__',
 '__path__',
 '__spec__',
 '__version__',
 '_add_newdoc_ufunc',
 '_distributor_init',
 '_globals',
 '_mat',
 '_pytesttester',
 'abs',
 'absol

Defininimos un `ndarray` con la función `np.array`:

In [3]:
a = np.array( [1, 2, 3] )
a

array([1, 2, 3])

In [4]:
type(a)

numpy.ndarray

In [5]:
dir(a)

['T',
 '__abs__',
 '__add__',
 '__and__',
 '__array__',
 '__array_finalize__',
 '__array_function__',
 '__array_interface__',
 '__array_prepare__',
 '__array_priority__',
 '__array_struct__',
 '__array_ufunc__',
 '__array_wrap__',
 '__bool__',
 '__class__',
 '__complex__',
 '__contains__',
 '__copy__',
 '__deepcopy__',
 '__delattr__',
 '__delitem__',
 '__dir__',
 '__divmod__',
 '__doc__',
 '__eq__',
 '__float__',
 '__floordiv__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__iadd__',
 '__iand__',
 '__ifloordiv__',
 '__ilshift__',
 '__imatmul__',
 '__imod__',
 '__imul__',
 '__index__',
 '__init__',
 '__init_subclass__',
 '__int__',
 '__invert__',
 '__ior__',
 '__ipow__',
 '__irshift__',
 '__isub__',
 '__iter__',
 '__itruediv__',
 '__ixor__',
 '__le__',
 '__len__',
 '__lshift__',
 '__lt__',
 '__matmul__',
 '__mod__',
 '__mul__',
 '__ne__',
 '__neg__',
 '__new__',
 '__or__',
 '__pos__',
 '__pow__',
 '__radd__',
 '__rand__',
 '__rdivmod__',
 '__

Cada `ndarray` está asociado a un tipo de dato (float, float32, float64, int, ...) y todos los objetos que lo forman tienen que ser de ese mismo tipo. Podemos ver que tipo de dato esta asociado a un `ndarray` con `dtype`:

In [6]:
a.dtype

dtype('int32')

Vamos a definir ahora una bi dimensional: 

In [7]:
b = np.array( 
    [ [1.3, 2.4,4],[0.3, 4.1,3], [1,2,4] ]
    #[  [fila 1], [fila 2], [fila 3]  ]
)
b

array([[1.3, 2.4, 4. ],
       [0.3, 4.1, 3. ],
       [1. , 2. , 4. ]])

In [8]:
b.dtype

dtype('float64')

In [9]:
b.shape

(3, 3)

In [10]:
#b.ndim

In [11]:
b.size

9

Podemos definir `ndarray`s con más tipos de elementos:

In [12]:
c = np.array( [['a', 'b'],['c', 'd']] )
c

array([['a', 'b'],
       ['c', 'd']], dtype='<U1')

In [13]:
d = np.array([[1, 2, 3],[4, 5, 6]], dtype=complex)
d

array([[1.+0.j, 2.+0.j, 3.+0.j],
       [4.+0.j, 5.+0.j, 6.+0.j]])

### Diferentes tipos de funciones para crear `ndarrays`:

In [14]:
np.zeros((3, 3))

array([[0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [15]:
np.ones((3, 4))

array([[1., 1., 1., 1.],
       [1., 1., 1., 1.],
       [1., 1., 1., 1.]])

In [16]:
np.arange(0, 10)

array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])

In [17]:
np.arange(4, 10)

array([4, 5, 6, 7, 8, 9])

In [18]:
np.arange(0, 12, 3)

array([0, 3, 6, 9])

In [19]:
np.arange(0, 6, 0.6)

array([0. , 0.6, 1.2, 1.8, 2.4, 3. , 3.6, 4.2, 4.8, 5.4])

In [20]:
np.linspace(0, 100, 12)

array([  0.        ,   9.09090909,  18.18181818,  27.27272727,
        36.36363636,  45.45454545,  54.54545455,  63.63636364,
        72.72727273,  81.81818182,  90.90909091, 100.        ])

In [21]:
np.random.random(1)

array([0.90221785])

In [22]:
np.random.random((3, 3))

array([[0.8970989 , 0.23082681, 0.72662117],
       [0.85474048, 0.88911   , 0.00406898],
       [0.53447451, 0.87377066, 0.8797384 ]])

La función `reshape`:

In [23]:
np.arange(0,12)

array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11])

In [24]:
np.arange(0,12).reshape(3, 4)

array([[ 0,  1,  2,  3],
       [ 4,  5,  6,  7],
       [ 8,  9, 10, 11]])

### Operaciones aritméticas

In [25]:
a = np.arange(4)
a

array([0, 1, 2, 3])

In [26]:
a+4

array([4, 5, 6, 7])

In [27]:
a*2

array([0, 2, 4, 6])

In [28]:
b = np.arange(4, 8)
b

array([4, 5, 6, 7])

In [29]:
a + b

array([ 4,  6,  8, 10])

In [30]:
a - b

array([-4, -4, -4, -4])

In [31]:
a * b

array([ 0,  5, 12, 21])

In [32]:
A = np.arange(9, 18).reshape(3, 3)
A

array([[ 9, 10, 11],
       [12, 13, 14],
       [15, 16, 17]])

In [33]:
B = np.ones((3, 3))
B

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.]])

In [34]:
A * B

array([[ 9., 10., 11.],
       [12., 13., 14.],
       [15., 16., 17.]])

### Multiplicación de matrices

Hasta ahora, solo hemos hecho operaciones por elementos de los `ndarray`s. Ahora vamos a ver otro tipo de operaciones, como la multiplicación de matrices.

In [35]:
np.dot(A, B)

array([[30., 30., 30.],
       [39., 39., 39.],
       [48., 48., 48.]])

In [36]:
A.dot(B)

array([[30., 30., 30.],
       [39., 39., 39.],
       [48., 48., 48.]])

In [37]:
np.dot(B, A)

array([[36., 39., 42.],
       [36., 39., 42.],
       [36., 39., 42.]])

### Más funciones para las `ndarrays`

In [38]:
a = np.arange(1, 10)
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [39]:
np.sqrt(a)

array([1.        , 1.41421356, 1.73205081, 2.        , 2.23606798,
       2.44948974, 2.64575131, 2.82842712, 3.        ])

In [40]:
np.log(a)

array([0.        , 0.69314718, 1.09861229, 1.38629436, 1.60943791,
       1.79175947, 1.94591015, 2.07944154, 2.19722458])

In [41]:
np.sin(a)

array([ 0.84147098,  0.90929743,  0.14112001, -0.7568025 , -0.95892427,
       -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

In [42]:
a.sum()

45

In [43]:
a.min()

1

In [44]:
a.max()

9

In [45]:
a.mean()

5.0

In [46]:
a.std()

2.581988897471611

### Manipular vectores y matrices

In [47]:
a = np.arange(1, 10)
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [48]:
a[0] # SI 0,1,2,3,4.... NO 1,2,3,...

1

In [49]:
a[-1]

9

In [50]:
a[4:]

array([5, 6, 7, 8, 9])

In [51]:
a[ [2,3,4] ]

array([3, 4, 5])

In [52]:
A = np.arange(1, 10).reshape((3, 3))
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [53]:
A[2, 2]

9

In [54]:
A[:,0]

array([1, 4, 7])

In [55]:
A[0:2, 0:2] 
    # FILA -> 0:2 -> 0 y 1
    # COLUMNA -> 0:2-> 0 y 1 

array([[1, 2],
       [4, 5]])

In [56]:
A[1, 1:3]

array([5, 6])

In [57]:
A[[0,2], 0:3]

array([[1, 2, 3],
       [7, 8, 9]])

### Iterando un array

In [58]:
a

array([1, 2, 3, 4, 5, 6, 7, 8, 9])

In [59]:
for i in a:
    print(i)

1
2
3
4
5
6
7
8
9


In [60]:
A

array([[1, 2, 3],
       [4, 5, 6],
       [7, 8, 9]])

In [61]:
for row in A:
    print(np.mean(row))

2.0
5.0
8.0


In [62]:
for i in A.flat:
    print(i)

1
2
3
4
5
6
7
8
9


Si necesitamos aplicar una función en las columnas o filas de una matriz, hay una forma más elegante y eficaz de hacerlo que usando un `for`.

In [63]:
np.apply_along_axis(np.max, axis=0, arr=A) #0 columnas

array([7, 8, 9])

In [64]:
np.apply_along_axis(np.mean, axis=1, arr=A) #1 filas

array([2., 5., 8.])

### Condiciones y arrays de Booleanos

In [65]:
A = np.random.random((4, 4))
A

array([[0.80751573, 0.88643956, 0.13216374, 0.36770481],
       [0.52882423, 0.18945173, 0.40616755, 0.75420242],
       [0.88037973, 0.75039469, 0.56913913, 0.53795963],
       [0.87652192, 0.13609991, 0.17622018, 0.07650743]])

In [66]:
A < 0.5

array([[False, False,  True,  True],
       [False,  True,  True, False],
       [False, False, False, False],
       [False,  True,  True,  True]])

In [67]:
A[A < 0.5]

array([0.13216374, 0.36770481, 0.18945173, 0.40616755, 0.13609991,
       0.17622018, 0.07650743])

### Unir arrays

In [68]:
A = np.ones((3, 3))
B = np.zeros((3, 3))

In [69]:
np.vstack((A, B))

array([[1., 1., 1.],
       [1., 1., 1.],
       [1., 1., 1.],
       [0., 0., 0.],
       [0., 0., 0.],
       [0., 0., 0.]])

In [70]:
np.hstack((A, B))

array([[1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.],
       [1., 1., 1., 0., 0., 0.]])

Funciones más especificas para unir arrays de una sola dimensión y crear así arrays bidimensionales:

In [71]:
a = np.array([0, 1, 2])
b = np.array([3, 4, 5])
c = np.array([6, 7, 8])

In [72]:
np.column_stack((a, b, c))

array([[0, 3, 6],
       [1, 4, 7],
       [2, 5, 8]])

In [73]:
np.row_stack((a, b, c))

array([[0, 1, 2],
       [3, 4, 5],
       [6, 7, 8]])

In [74]:
a = np.array([0, 1, 2])

In [75]:
a

array([0, 1, 2])

### Importante: Copía o vista de un elemento `ndarray`

In [76]:
a = np.array([1, 2, 3, 4])
a

array([1, 2, 3, 4])

In [77]:
b = a
b

array([1, 2, 3, 4])

In [78]:
a[2] = 0
a

array([1, 2, 0, 4])

In [79]:
b

array([1, 2, 0, 4])

In [80]:
c = a[0:2]
c

array([1, 2])

In [81]:
a[0] = 0
c

array([0, 2])

In [82]:
c[0]=3
c

array([3, 2])

In [83]:
a

array([3, 2, 0, 4])

Evitamos esto usando la funcion `copy`.

In [84]:
a = np.array([1, 2, 3, 4])
b = a.copy()
b

array([1, 2, 3, 4])

In [85]:
a[2] = 0
b

array([1, 2, 3, 4])

In [86]:
a

array([1, 2, 0, 4])

In [87]:
c = a[0:2].copy()
c

array([1, 2])

In [88]:
a[0] = 0
c

array([1, 2])

## Pandas

* Pandas es el paquete de referencia para el análisis de datos en Python.

* Pandas proporciona estructuras de datos complejas y funciones especificas para trabajar con ellas.

* El concenpto fundamental de Pandas son los `DataFrame`, una estructura de datos con dos dimensiones. También están las `Series`, que son de una dimensión.

* Pandas usa Numpy

In [98]:
import numpy as np
import pandas as pd

### DataFrame

Un DataFrame es basicamente una tabla. Esta formado por filas y columnas, que son arrays con valores individuales (pueden ser números o no).

In [99]:
pd.DataFrame({'Yes': [50, 21,31], 'No': [131, 2, 3]}) #a traves de diccionario

Unnamed: 0,Yes,No
0,50,131
1,21,2
2,31,3


In [100]:
pd.DataFrame({'Juan': ['Sopa', 'Pescado'], 'Ana': ['Pasta', 'Solomillo']})

Unnamed: 0,Juan,Ana
0,Sopa,Pasta
1,Pescado,Solomillo


Estamos usando `pd.DataFrame()` para construir objetos `DataFrame`. Como argumento le pasamos un diccionario con los `keys` `['Juan', 'Ana']` y sus respectivos valores. Aunque este es el método más común para construir un objeto `DataFrame`, no es el único.

El método para construir `DataFrames` que hemos usado le asigna una etiqueta a cada columna que va desde el 0 hasta el número de columnas ascendentemente. Algunas veces esto está bien, pero otras veces puede que queramos asignar una etiqueta específica a cada columna.

In [101]:
pd.DataFrame({'Juan': ['Sopa', 'Pescado', 'Yogurt'], 'Ana': ['Pasta', 'Solomillo', 'Fruta']})

Unnamed: 0,Juan,Ana
0,Sopa,Pasta
1,Pescado,Solomillo
2,Yogurt,Fruta


In [102]:
pd.DataFrame({'Juan': ['Sopa', 'Pescado', 'Yogurt'], 'Ana': ['Pasta', 'Solomillo', 'Fruta']}, index=['1 Plato', '2 Plato', 'Postre'])

Unnamed: 0,Juan,Ana
1 Plato,Sopa,Pasta
2 Plato,Pescado,Solomillo
Postre,Yogurt,Fruta


### Series

Las `Series` son una sequencia de datos. Si los `DataFrames` son tablas de datos, las `Series` son listas de datos.

In [103]:
pd.Series([1, 2, 3, 4, 5])

0    1
1    2
2    3
3    4
4    5
dtype: int64

In [104]:
pd.DataFrame([1, 2, 3, 4, 5])

Unnamed: 0,0
0,1
1,2
2,3
3,4
4,5


In [105]:
pd.Series([30, 35, 40], index=['2015 matriculas', '2016 matriculas', '2017 matriculas'], name='Matriculas en máster de modelización')

2015 matriculas    30
2016 matriculas    35
2017 matriculas    40
Name: Matriculas en máster de modelización, dtype: int64

Las `Series` y los `DataFrames` están estrechamente relacionados. De hecho, podemos pensas que los `DataFrames` son simplemente un puñado de `Series` juntados.

### Leer ficheros de datos

Aunque exista la opción de crear los `DataFrames` y las `Series` a mano, lo más habitual va a ser que trabajemos con datos que ya existen y están recogidos en algún tipo de fichero (.xls, .csv, .json, ...)

El formato más habitual para guardar datos el el CSV. Los ficheros CSV contienen valores separados por comas.

In [106]:
reviews = pd.read_csv("data/winemag-data-130k-v2.csv")

In [107]:
reviews.shape

(129971, 14)

In [108]:
reviews.head()

Unnamed: 0.1,Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


In [109]:
reviews = pd.read_csv("data/winemag-data-130k-v2.csv", index_col=0)
reviews.head()

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks


### Seleccionar subconjuntos del `DataFrame` o `Series`

Podemos seleccionar los valores de una o varias columnas de varias maneras.

In [110]:
reviews.country

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [111]:
reviews['country']

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [112]:
reviews[
    #LO QUE QUIERO SELECCIONAR
    'columna1'
    ['columna1','columna2']
]

  'columna1'


TypeError: string indices must be integers

In [113]:
reviews[['country', 'province']]

Unnamed: 0,country,province
0,Italy,Sicily & Sardinia
1,Portugal,Douro
2,US,Oregon
3,US,Michigan
4,US,Oregon
...,...,...
129966,Germany,Mosel
129967,US,Oregon
129968,France,Alsace
129969,France,Alsace


In [114]:
reviews['country'].value_counts()

US                        54504
France                    22093
Italy                     19540
Spain                      6645
Portugal                   5691
Chile                      4472
Argentina                  3800
Austria                    3345
Australia                  2329
Germany                    2165
New Zealand                1419
South Africa               1401
Israel                      505
Greece                      466
Canada                      257
Hungary                     146
Bulgaria                    141
Romania                     120
Uruguay                     109
Turkey                       90
Slovenia                     87
Georgia                      86
England                      74
Croatia                      73
Mexico                       70
Moldova                      59
Brazil                       52
Lebanon                      35
Morocco                      28
Peru                         16
Ukraine                      14
Czech Re

También podemos usar los indices para seleccionar los subconjuntos usando el método `iloc`.

In [148]:
reviews[:1]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia


In [139]:
reviews.iloc[0]

country                                                              Italy
description              Aromas include tropical fruit, broom, brimston...
designation                                                   Vulkà Bianco
points                                                                  87
price                                                                  NaN
province                                                 Sicily & Sardinia
region_1                                                              Etna
region_2                                                               NaN
taster_name                                                  Kerin O’Keefe
taster_twitter_handle                                         @kerinokeefe
title                                    Nicosia 2013 Vulkà Bianco  (Etna)
variety                                                        White Blend
winery                                                             Nicosia
Name: 0, dtype: object

In [140]:
reviews.iloc[0,5]

'Sicily & Sardinia'

In [141]:
reviews.iloc[:,-1]

0                                          Nicosia
1                              Quinta dos Avidagos
2                                        Rainstorm
3                                       St. Julian
4                                     Sweet Cheeks
                            ...                   
129966    Dr. H. Thanisch (Erben Müller-Burggraef)
129967                                    Citation
129968                             Domaine Gresser
129969                        Domaine Marcel Deiss
129970                            Domaine Schoffit
Name: winery, Length: 129971, dtype: object

In [151]:
reviews.iloc[-3:, :3]

Unnamed: 0,country,description,designation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt
129969,France,"A dry style of Pinot Gris, this is crisp with ...",
129970,France,"Big, rich and off-dry, this is powered by inte...",Lieu-dit Harth Cuvée Caroline


In [152]:
reviews.iloc[[0, 10, 100, 120], 0]

0      Italy
10        US
100       US
120    Italy
Name: country, dtype: object

Por último, tambien podemos usar el método `loc` para usar las etiquetas de las filas y columnas.

In [158]:
reviews.loc[:, ['taster_name', 'taster_twitter_handle', 'points']]

Unnamed: 0,taster_name,taster_twitter_handle,points
0,Kerin O’Keefe,@kerinokeefe,87
1,Roger Voss,@vossroger,87
2,Paul Gregutt,@paulgwine,87
3,Alexander Peartree,,87
4,Paul Gregutt,@paulgwine,87
...,...,...,...
129966,Anna Lee C. Iijima,,90
129967,Paul Gregutt,@paulgwine,90
129968,Roger Voss,@vossroger,90
129969,Roger Voss,@vossroger,90


CUIDADO! En este caso, las etiquetas de las filas son números, pero `iloc` y `loc` no funciónan igual. 

In [154]:
reviews.iloc[:5, 0]

0       Italy
1    Portugal
2          US
3          US
4          US
Name: country, dtype: object

In [155]:
reviews.loc[:5, 'country']

0       Italy
1    Portugal
2          US
3          US
4          US
5       Spain
Name: country, dtype: object

### Manipular el índice

In [162]:
reviews.set_index("title")

Unnamed: 0_level_0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,variety,winery
title,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1
Nicosia 2013 Vulkà Bianco (Etna),Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,White Blend,Nicosia
Quinta dos Avidagos 2011 Avidagos Red (Douro),Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Portuguese Red,Quinta dos Avidagos
Rainstorm 2013 Pinot Gris (Willamette Valley),US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Pinot Gris,Rainstorm
St. Julian 2013 Reserve Late Harvest Riesling (Lake Michigan Shore),US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,Riesling,St. Julian
Sweet Cheeks 2012 Vintner's Reserve Wild Child Block Pinot Noir (Willamette Valley),US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Pinot Noir,Sweet Cheeks
...,...,...,...,...,...,...,...,...,...,...,...,...
Dr. H. Thanisch (Erben Müller-Burggraef) 2013 Brauneberger Juffer-Sonnenuhr Spätlese Riesling (Mosel),Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
Citation 2004 Pinot Noir (Oregon),US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Pinot Noir,Citation
Domaine Gresser 2013 Kritt Gewurztraminer (Alsace),France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Gewürztraminer,Domaine Gresser
Domaine Marcel Deiss 2012 Pinot Gris (Alsace),France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Pinot Gris,Domaine Marcel Deiss


### Selección condicional

Podemos buscar los vinos de Italia.

In [163]:
df=pd.DataFrame({'Yes': [50, 21,31], 'No': [131, 2, 3]}) #a traves de diccionario
df

Unnamed: 0,Yes,No
0,50,131
1,21,2
2,31,3


In [181]:
df[ [False,True,False] ]

Unnamed: 0,Yes,No
1,21,2


In [183]:
reviews['country']

0            Italy
1         Portugal
2               US
3               US
4               US
            ...   
129966     Germany
129967          US
129968      France
129969      France
129970      France
Name: country, Length: 129971, dtype: object

In [166]:
reviews['country'] == 'Italy'

0          True
1         False
2         False
3         False
4         False
          ...  
129966    False
129967    False
129968    False
129969    False
129970    False
Name: country, Length: 129971, dtype: bool

La anterior expresión nos ha devuelto una `Series` con los booleanos que nos dicen cuando el vino es Italiano. Para encontrar esas instancias devueltas por los booleanos hacemos:

In [187]:
reviews[reviews['country'] == 'Italy']

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
22,Italy,Delicate aromas recall white flower and citrus...,Ficiligno,87,19.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio di Pianetto 2007 Ficiligno White (Sicilia),White Blend,Baglio di Pianetto
24,Italy,"Aromas of prune, blackcurrant, toast and oak c...",Aynat,87,35.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Canicattì 2009 Aynat Nero d'Avola (Sicilia),Nero d'Avola,Canicattì
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129929,Italy,"This luminous sparkler has a sweet, fruit-forw...",,91,38.0,Veneto,Prosecco Superiore di Cartizze,,,,Col Vetoraz Spumanti NV Prosecco Superiore di...,Prosecco,Col Vetoraz Spumanti
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


In [188]:
reviews.loc[reviews['country'] == 'Italy']

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
22,Italy,Delicate aromas recall white flower and citrus...,Ficiligno,87,19.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio di Pianetto 2007 Ficiligno White (Sicilia),White Blend,Baglio di Pianetto
24,Italy,"Aromas of prune, blackcurrant, toast and oak c...",Aynat,87,35.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Canicattì 2009 Aynat Nero d'Avola (Sicilia),Nero d'Avola,Canicattì
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129929,Italy,"This luminous sparkler has a sweet, fruit-forw...",,91,38.0,Veneto,Prosecco Superiore di Cartizze,,,,Col Vetoraz Spumanti NV Prosecco Superiore di...,Prosecco,Col Vetoraz Spumanti
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


Si además de que sea Italiano, también queremos que nuestro vino tenga una puntuación mayor o igual a 90:

In [211]:
#reviews[reviews['country'] == 'Italy' & reviews['points'] >= 90]

reviews[(reviews['country'] == 'Italy') & (reviews['points'] < 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
22,Italy,Delicate aromas recall white flower and citrus...,Ficiligno,87,19.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio di Pianetto 2007 Ficiligno White (Sicilia),White Blend,Baglio di Pianetto
24,Italy,"Aromas of prune, blackcurrant, toast and oak c...",Aynat,87,35.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Canicattì 2009 Aynat Nero d'Avola (Sicilia),Nero d'Avola,Canicattì
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129844,Italy,"Doga delle Clavule is a neutral, mineral-drive...",Doga delle Clavule,86,,Tuscany,Morellino di Scansano,,,,Caparzo 2006 Doga delle Clavule (Morellino di...,Sangiovese,Caparzo
129849,Italy,This 100% expression of Corvina offers pointed...,Vignenuove,86,20.0,Veneto,Veneto,,,,Luciana Cordioli 2006 Vignenuove Corvina (Veneto),Corvina,Luciana Cordioli
129850,Italy,Here's an Aglianico from Campania with bright ...,,86,24.0,Southern Italy,Campania,,,,Macchialupa 2006 Aglianico (Campania),Aglianico,Macchialupa
129851,Italy,Almond paste and crushed pistachio nut charact...,,86,10.0,Sicily & Sardinia,Sicilia,,,,MandraRossa 2006 Nero d'Avola (Sicilia),Nero d'Avola,MandraRossa


Es como una suma o multiplicación

Si queremos un vino Italiano o con puntuación mayor o igual a 90:

In [214]:
reviews[(reviews['country'] == 'Italy') | (reviews['points'] >= 90)]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
22,Italy,Delicate aromas recall white flower and citrus...,Ficiligno,87,19.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio di Pianetto 2007 Ficiligno White (Sicilia),White Blend,Baglio di Pianetto
24,Italy,"Aromas of prune, blackcurrant, toast and oak c...",Aynat,87,35.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Canicattì 2009 Aynat Nero d'Avola (Sicilia),Nero d'Avola,Canicattì
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


Si queremos un vino Italiano o Español:

In [215]:
reviews[(reviews['country']=='Italy') | (reviews['country']=='Spain')]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
5,Spain,Blackberry and raspberry aromas show a typical...,Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...,Tempranillo-Merlot,Tandem
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
18,Spain,"Desiccated blackberry, leather, charred wood a...",Vendimia Seleccionada Finca Valdelayegua Singl...,87,28.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Pradorey 2010 Vendimia Seleccionada Finca Vald...,Tempranillo Blend,Pradorey
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129957,Spain,Lightly baked berry aromas vie for attention w...,Crianza,90,17.0,Northern Spain,Rioja,,Michael Schachner,@wineschach,Viñedos Real Rubio 2010 Crianza (Rioja),Tempranillo Blend,Viñedos Real Rubio
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


In [222]:
reviews[reviews['country'].isin(['Italy','Spain'])]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
0,Italy,"Aromas include tropical fruit, broom, brimston...",Vulkà Bianco,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Nicosia 2013 Vulkà Bianco (Etna),White Blend,Nicosia
5,Spain,Blackberry and raspberry aromas show a typical...,Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...,Tempranillo-Merlot,Tandem
6,Italy,"Here's a bright, informal red that opens with ...",Belsito,87,16.0,Sicily & Sardinia,Vittoria,,Kerin O’Keefe,@kerinokeefe,Terre di Giurfo 2013 Belsito Frappato (Vittoria),Frappato,Terre di Giurfo
13,Italy,This is dominated by oak and oak-driven aromas...,Rosso,87,,Sicily & Sardinia,Etna,,Kerin O’Keefe,@kerinokeefe,Masseria Setteporte 2012 Rosso (Etna),Nerello Mascalese,Masseria Setteporte
18,Spain,"Desiccated blackberry, leather, charred wood a...",Vendimia Seleccionada Finca Valdelayegua Singl...,87,28.0,Northern Spain,Ribera del Duero,,Michael Schachner,@wineschach,Pradorey 2010 Vendimia Seleccionada Finca Vald...,Tempranillo Blend,Pradorey
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129943,Italy,"A blend of Nero d'Avola and Syrah, this convey...",Adènzia,90,29.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,Baglio del Cristo di Campobello 2012 Adènzia R...,Red Blend,Baglio del Cristo di Campobello
129947,Italy,"A blend of 65% Cabernet Sauvignon, 30% Merlot ...",Symposio,90,20.0,Sicily & Sardinia,Terre Siciliane,,Kerin O’Keefe,@kerinokeefe,Feudo Principi di Butera 2012 Symposio Red (Te...,Red Blend,Feudo Principi di Butera
129957,Spain,Lightly baked berry aromas vie for attention w...,Crianza,90,17.0,Northern Spain,Rioja,,Michael Schachner,@wineschach,Viñedos Real Rubio 2010 Crianza (Rioja),Tempranillo Blend,Viñedos Real Rubio
129961,Italy,"Intense aromas of wild cherry, baking spice, t...",,90,30.0,Sicily & Sardinia,Sicilia,,Kerin O’Keefe,@kerinokeefe,COS 2013 Frappato (Sicilia),Frappato,COS


Si queremos deshacernos de las instancias en las que no tenemos el valor del precio:

In [254]:
reviews['price'].isna()

0          True
1         False
2         False
3         False
4         False
          ...  
129966    False
129967    False
129968    False
129969    False
129970    False
Name: price, Length: 129971, dtype: bool

In [174]:
reviews['price'].notnull()

0         False
1          True
2          True
3          True
4          True
          ...  
129966     True
129967     True
129968     True
129969     True
129970     True
Name: price, Length: 129971, dtype: bool

In [255]:
reviews[reviews['price'].notnull()]

Unnamed: 0,country,description,designation,points,price,province,region_1,region_2,taster_name,taster_twitter_handle,title,variety,winery
1,Portugal,"This is ripe and fruity, a wine that is smooth...",Avidagos,87,15.0,Douro,,,Roger Voss,@vossroger,Quinta dos Avidagos 2011 Avidagos Red (Douro),Portuguese Red,Quinta dos Avidagos
2,US,"Tart and snappy, the flavors of lime flesh and...",,87,14.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Rainstorm 2013 Pinot Gris (Willamette Valley),Pinot Gris,Rainstorm
3,US,"Pineapple rind, lemon pith and orange blossom ...",Reserve Late Harvest,87,13.0,Michigan,Lake Michigan Shore,,Alexander Peartree,,St. Julian 2013 Reserve Late Harvest Riesling ...,Riesling,St. Julian
4,US,"Much like the regular bottling from 2012, this...",Vintner's Reserve Wild Child Block,87,65.0,Oregon,Willamette Valley,Willamette Valley,Paul Gregutt,@paulgwine,Sweet Cheeks 2012 Vintner's Reserve Wild Child...,Pinot Noir,Sweet Cheeks
5,Spain,Blackberry and raspberry aromas show a typical...,Ars In Vitro,87,15.0,Northern Spain,Navarra,,Michael Schachner,@wineschach,Tandem 2011 Ars In Vitro Tempranillo-Merlot (N...,Tempranillo-Merlot,Tandem
...,...,...,...,...,...,...,...,...,...,...,...,...,...
129966,Germany,Notes of honeysuckle and cantaloupe sweeten th...,Brauneberger Juffer-Sonnenuhr Spätlese,90,28.0,Mosel,,,Anna Lee C. Iijima,,Dr. H. Thanisch (Erben Müller-Burggraef) 2013 ...,Riesling,Dr. H. Thanisch (Erben Müller-Burggraef)
129967,US,Citation is given as much as a decade of bottl...,,90,75.0,Oregon,Oregon,Oregon Other,Paul Gregutt,@paulgwine,Citation 2004 Pinot Noir (Oregon),Pinot Noir,Citation
129968,France,Well-drained gravel soil gives this wine its c...,Kritt,90,30.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Gresser 2013 Kritt Gewurztraminer (Als...,Gewürztraminer,Domaine Gresser
129969,France,"A dry style of Pinot Gris, this is crisp with ...",,90,32.0,Alsace,Alsace,,Roger Voss,@vossroger,Domaine Marcel Deiss 2012 Pinot Gris (Alsace),Pinot Gris,Domaine Marcel Deiss


In [260]:
reviews.isnull().sum(axis = 0)

country                     63
description                  0
designation              37465
points                       0
price                     8996
province                    63
region_1                 21247
region_2                 79460
taster_name              26244
taster_twitter_handle    31213
title                        0
variety                      1
winery                       0
dtype: int64

### Añadir datos

Añadir datos a nuestros `DataFrames` es fácil. Por ejemplo, podemos asignar el mismo valor a todas las instancias con el siguiente comando:

In [None]:
reviews['critic'] = 'everyone'
reviews['critic']

In [None]:
reviews.head()

In [None]:
reviews['index_backwards'] = range(len(reviews), 0, -1)
reviews['index_backwards']

In [None]:
reviews.head()

### Describir nuestro dataset

Pandas nos proporciona herramientas para facilmente conocer un poco por encima como es el dataset con el que estamos trabajando a traves de valores estadísticos.

In [None]:
reviews.describe()

In [None]:
reviews.describe(include='all')

In [None]:
reviews.columns

In [None]:
reviews.dtypes

In [None]:
reviews['points'].mean()

In [None]:
reviews['points'].quantile(0.25)

In [None]:
reviews['country'].unique()

In [None]:
reviews['country'].value_counts()

### Modificar los valores de una columna

Por ejemplo, vamos a normalizar los datos de la columna points.

In [None]:
(reviews['points'] - reviews['points'].mean()) / reviews['points'].std()

Combatir outliers

In [None]:
reviews['province - region'] = reviews['province'] + ' - ' + reviews['region_1']
reviews.head()

### Eliminar columnas

In [None]:
reviews.columns

In [None]:
reviews.pop('province - region');

In [None]:
reviews.drop(columns=['critic', 'index_backwards'], inplace=True)

#reviews = reviews.drop(columns=['critic', 'index_backwards'])
reviews.columns

### Agrupar datos

In [None]:
reviews.groupby('points')[['points']].count()

Lo que ha ocurrido es que la función `groupby()` ha creado diferentes grupos dependiendo de la puntuación y luego a contado cuantos vinos hay en cada grupo.

Ahora, vamos a calcular el precio medio de los vinos dependiendo la puntuación:

In [None]:
reviews.groupby('points')['price'].mean()

Podemos agrupar usando más de un criterio y devolver más de un valor con `agg()`.

In [None]:
reviews.groupby(['price', 'country'])['points'].agg(['count', 'min', 'mean', 'max']).head(40)

### Ordenar instancias

In [None]:
reviews.sort_values(by=['points','province'])

In [None]:
reviews.sort_values(by='points', ascending=False).reset_index()

### Missing data

Tratar con los datos que faltan es muy importante. Pandas nos ofrece funciones como `isnull, notnull y fillna` para localizar y rellenar los valores perdidos.

In [None]:
reviews['country'].isnull()

isnull / notnull

In [None]:
reviews[reviews['country'].isnull()]

In [None]:
reviews['region_2'].fillna("Unknown")

In [None]:
reviews['price'][reviews['price'].isnull()]

In [None]:
reviews['price'].fillna(method='bfill')

In [None]:
reviews['region_2'].fillna("Unknown")

### Sustituir valores

In [None]:
reviews['taster_twitter_handle'].replace("@kerinokeefe", "@kerino")

In [None]:
reviews['taster_twitter_handle'] = reviews['taster_twitter_handle'].replace("@kerinokeefe", "@kerino")

### Renombrar

Podemos renombrar los nombres de los indices o columnas con la función `rename`.

In [None]:
reviews.rename(columns={'points': 'score', 'price':'precio'})

In [None]:
reviews.rename(index={0: 'firstEntry', 1: 'secondEntry'})

### Combinar datasets

In [None]:
df1 = reviews.iloc[:50000, :]
df2 = reviews.iloc[50000:, :]
df2.shape

In [None]:
pd.concat([df1, df2])

In [None]:
dfl = reviews.iloc[:, :8]
dfr = reviews.iloc[:, 8:]
dfr.shape

In [None]:
dfl.join(dfr)

# Ejercicios propuestos

- Tres países más frecuentes
- El title del vino más caro
- Los vinos que probo el taster anterior
- Agrupa por provincias y cuales tienen mayor puntuacion
- La variedad española que mas puntuacion tiene
- La variedad española más cara y su region_1

In [None]:
df = reviews

In [None]:
df.head(5)

In [None]:
df['country'].value_counts().head(3)

In [None]:
df[df['price'] == df['price'].max()]['taster_name']

In [None]:
df[df['taster_name']=='Roger Voss']

In [None]:
df.head(5)

In [None]:
df.groupby(by=['province'])['points'].mean().sort_values(ascending=False).keys()[0]

La variedad española que mas puntuacion tiene

In [None]:
df[df['country']=='Spain'][['variety','points']].groupby(by=['variety'])['points'].mean().sort_values(ascending=False)

La variedad española más cara y su region_1

In [None]:
df[df['country']=='Spain'].groupby(by=['variety'])['price'].mean().sort_values(ascending=False)

In [None]:
df[(df['variety']=='Carignan') & (df['country']=='Spain')]['region_1'].unique()

## Referencias
 - Iker y Mikel (UPV/EHU) lasaiker@fastmail.com, mikelbarrene@gmail.com