# ¿Podemos predecir si un vino blanco será bueno? #

Existe una base de datos de propiedades químicas de vinos y su valoración en catas que proviene de este paper.

>P. Cortez, A. Cerdeira, F. Almeida, T. Matos and J. Reis. Modeling wine preferences by data mining from physicochemical properties. In Decision Support Systems, Elsevier, 47(4):547-553, 2009. 

Se trata de más de 4000 vinos verdes portugueses. El objetivo es entender los datos y proporcionar una guía visual sencilla de qué propiedades debe tener un buen vino blanco.


In [8]:
import pandas as pd
import numpy as np
from sklearn import linear_model
from matplotlib import pylab as plt
plt.style.use('bmh')
%matplotlib notebook

In [9]:
wine = pd.read_csv('data/winequality-white.csv',delimiter=';')
wine.describe()

Unnamed: 0,fixed acidity,volatile acidity,citric acid,residual sugar,chlorides,free sulfur dioxide,total sulfur dioxide,density,pH,sulphates,alcohol,quality
count,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0,4898.0
mean,6.854788,0.278241,0.334192,6.391415,0.045772,35.308085,138.360657,0.994027,3.188267,0.489847,10.514267,5.877909
std,0.843868,0.100795,0.12102,5.072058,0.021848,17.007137,42.498065,0.002991,0.151001,0.114126,1.230621,0.885639
min,3.8,0.08,0.0,0.6,0.009,2.0,9.0,0.98711,2.72,0.22,8.0,3.0
25%,6.3,0.21,0.27,1.7,0.036,23.0,108.0,0.991723,3.09,0.41,9.5,5.0
50%,6.8,0.26,0.32,5.2,0.043,34.0,134.0,0.99374,3.18,0.47,10.4,6.0
75%,7.3,0.32,0.39,9.9,0.05,46.0,167.0,0.9961,3.28,0.55,11.4,6.0
max,14.2,1.1,1.66,65.8,0.346,289.0,440.0,1.03898,3.82,1.08,14.2,9.0


In [None]:
fig = plt.figure(2)
ax = [fig.add_subplot(3,4,i) for i in range(1,12)]

models = [linear_model.LinearRegression() for i in range(11)]
for column, model in zip(wine.columns, models):
    model.fit(wine['quality'].reshape(-1,1),
              wine[column].as_matrix().reshape(-1,1))

for qual, group in wine.groupby('quality'):
    for column, axis in zip(group.columns, ax):
        axis.plot(qual, group[column].mean(), 'ob')
        axis.set_title(column + ' (avg)', fontsize=10)
        
qual = np.arange(3,10)
for model, axi in zip(models, ax):
    axi.plot(qual, model.coef_[0][0]*qual + model.intercept_,
             'r--', linewidth=4, label='Regression')
    axi.legend(fontsize=6)
    
fig.tight_layout()

![](fig/wines.png)

# Python #

Pero la estrella indiscutible de Jupyter es Python, que se está convirtiendo poco a poco en el lenguaje de facto para el análisis de datos, decantando lentamente R, SAS, Matlab...

Lo importante no son los lenguajes, sino el enorme ecosistema de herramientas que han aparecido gracias a la apertura y facilidad de Python.

# Poner NFQ en un mapa #

¿Cómo de difícil puede ser pintar un mapa interactivo de Madrid y poner la geolocalización de las oficinas de NFQ en él?

In [12]:
import folium
madrid = folium.Map(location=[40.429857, -3.685812], tiles="Stamen toner",
                    zoom_start=15)
nfqsolutions = folium.Marker([40.429857, -3.685812], popup='NFQ Solutions')
madrid.add_children(nfqsolutions)
madrid.save('madrid.html')
madrid