<h1 align="center">Red Wine Quality Analysis</h1>

<img src="https://impossible.works/thumbs/impossible-news/international-news/ti-tha-ginei-me-ta-makedonika-emporika-simata-meta-ti-simfwnia/simata-1024x576.jpg" width="500">

### Thank you for opening my notebook.

This notebook predicted the quality of red wine by using multiple classifiers of the most popular and least useful library scikit-learn and library for gradient boosting, which worked well lately, lightgbm.  Before that, the entire dataset is analyzed using a graphical representation of the data, parameters and their dependencies among themselves.

If you find this notebook entertaining, interesting or useful (well, suddenly), be sure to write a comment that could be improved in this and in the following notebooks. <b>While there, write a comment with your thoughts about this notebook definitely!</b>

Сontent:
- [Part One: Working with data](#p1)
  - [Import Python libraries](#1)
  - [Loading the dataset](#2)
  - [Quick view of raw data](#3)
  - [Create a binary quality column](#4)
  - [Visualization of dataset](#5)
    - [The distribution of unique quality values](#6)
    - [Correlation heatmap](#7)
  - [Dependence of quality on different parameters](#8)

- [Part Two: The use of machine learning for classification](#p2)
  - [Import libraries: sklearn and lightgbm](#10)
    - [A set of classifiers](#11)
  - [Short description each algorithm](#12)
    - [1. Logistic Regression](#13)
    - [2. k-Nearest Neighbors (kNN)](#14)
    - [3. Support Vector Machine (SVM)](#15)
    - [4. Multilayer Perceptron classifier (MLPClassifier)](#16)
    - [5. ExtraTreesClassifier and RandomForestClassifier](#17)
    - [6. Fisher's linear discriminant or LinearDiscriminantAnalysis (LDA)](#18)
    - [7. Gradient Boosting (in our case LGBMClassifier)](#19)
  - [Life-simplifying functions](#20)
  - [Training and testing standard models](#21)
  - [Visualization of the first results](#22)
  - [Configure the models](#23)
    - [SVC: parameters and tuning](#24)
    - [ExtraTreesClassifier and RandomForestClassifier: parameters and tuning](#25)
    - [LGBMClassifier: parameters and tuning](#26)
  - [Visualize the results of tuned models](#27)
  
The author's English is not his native language, so the author uses a translator.
The native language of the author of the Russian.

# Working with data <a id="p1"></a>
## Import Python libraries <a id="1"></a>

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import matplotlib.gridspec as gridspec
%matplotlib inline

import warnings
warnings.simplefilter("ignore")

## Loading the dataset <a id="2"></a>

In [None]:
# read from a regular csv file
data = pd.read_csv("../input/winequality-red.csv")

## Quick view of raw data <a id="3"></a>

In [None]:
'''
This is translation for me :)
Input variables (based on physicochemical tests):
1 - fixed acidity        - фиксированная кислотность
2 - volatile acidity     - летучая кислотность
3 - citric acid          - лимонная кислота
4 - residual sugar       - остаточный сахар
5 - chlorides            - хлориды
6 - free sulfur dioxide  - свободный диоксид серы
7 - total sulfur dioxide - общий диоксид серы
8 - density              - плотность
9 - pH                   - водородный показатель(кислотность среды pH)
10 - sulphates           - сульфаты
11 - alcohol             - алкоголь
Output variable:
12 - quality (score between 0 and 10) - качество (0 - 10)
'''
data.head(10)

In [None]:
data.tail()

In [None]:
data.info()

In [None]:
data.describe()

In [None]:
print("Number of unique values in each column:\n")
for i in data.columns:
    print(i, len(data[i].unique()))

## Create a binary quality column <a id="4"></a>
bad: quality < 6.5<br>
good: quality > 6.5

In [None]:
data['bin_quality'] = pd.cut(data['quality'], bins=[0, 6.5, 10], labels=["bad", "good"])

In [None]:
data.head(10)

## Visualization of dataset <a id="5"></a>
### The distribution of unique quality values <a id="6"></a>

In [None]:
fig, ax = plt.subplots(nrows=2, ncols=2, figsize=(14, 10))

data_length = len(data)
quality_percentage = [100 * i / data_length for i in data["quality"].value_counts()]
bin_quality_percentage = [100 * i / data_length for i in data["bin_quality"].value_counts()]

sns.countplot("quality", data=data, ax=ax[0, 0])
sns.countplot("bin_quality", data=data, ax=ax[0, 1]);

sns.barplot(x=data["quality"].unique(), y=quality_percentage, ax=ax[1, 0])
ax[1, 0].set_xlabel("quality")

sns.barplot(x=data["bin_quality"].unique(), y=bin_quality_percentage, ax=ax[1, 1])
ax[1, 1].set_xlabel("bin_quality")

for i in range(2):
    ax[1, i].set_ylabel("The percentage of the total number")
    ax[1, i].set_yticks(range(0, 101, 10))
    ax[1, i].set_yticklabels([str(i) + "%" for i in range(0, 101, 10)])
    for j in range(2):
        ax[i, j].yaxis.grid()
        ax[i, j].set_axisbelow(True)

### Correlation heatmap <a id="7"></a>

A positive correlation means that as the value of the first parameter increases, so will the second one. For example, with a higher content of acid in the red wine, the wine has a higher density, since the acids themselves are denser than water.

A negative correlation means that as the value of the first parameter increases, the value of the second parameter decreases. For example, with a higher acid content in red wine, the pH value will be lower, indicating an acidic environment.

The closer the correlation value is to zero, the less the parameters affect each other.

In [None]:
plt.figure(figsize=[9, 9])
sns.heatmap(data.corr(), xticklabels=data.columns[:-1], yticklabels=data.columns[:-1], 

            square=True, cmap="Spectral_r", center=0);

## Dependence of quality on different parameters <a id="8"></a>

In [None]:
#  The function takes on the input column name and restrictions on the y axis. 
#  Next, the function builds a histogram of the distribution of the values 
#  of this column, a histogram of the dependence of the two types 
# of quality to the column passed as a parameter.

def drawing_two_barplots(column, ylims):
    fig = plt.figure(figsize=(14, 12))
    gs = gridspec.GridSpec(2, 2)
    ax0 = fig.add_subplot(gs[0, :])
    ax1 = fig.add_subplot(gs[1, 0])
    ax2 = fig.add_subplot(gs[1, 1])
    
    sns.distplot(data[data.columns[column]], kde=False, ax=ax0)
    sns.barplot("quality", data.columns[column], data=data, ax=ax1)
    sns.barplot("bin_quality", data.columns[column], data=data, ax=ax2)
    ax1.set_ylim(ylims[0], ylims[1])
    ax2.set_ylim(ylims[0], ylims[1])
    ax1.set_yticks(np.linspace(ylims[0], ylims[1], 11))
    ax2.set_yticks(np.linspace(ylims[0], ylims[1], 11))
    ax1.yaxis.grid()
    ax2.yaxis.grid()
    ax1.set_axisbelow(True)
    ax2.set_axisbelow(True)

In [None]:
drawing_two_barplots(0, [0, 10])

In [None]:
drawing_two_barplots(1, [0, 1.2])

In [None]:
drawing_two_barplots(2, [0, 0.5])

In [None]:
drawing_two_barplots(3, [0, 3.6])

In [None]:
drawing_two_barplots(4, [0, 0.18])

In [None]:
drawing_two_barplots(5, [0, 20])

In [None]:
drawing_two_barplots(6, [0, 60])

In [None]:
drawing_two_barplots(7, [0.994, 0.999])

In [None]:
drawing_two_barplots(8, [3.1, 3.5])

In [None]:
drawing_two_barplots(9, [0, 0.9])

In [None]:
drawing_two_barplots(10, [0, 13])

# The use of machine learning for classification <a id="p2"></a>
## Import libraries: sklearn and lightgbm <a id="10"></a>

In [None]:
from sklearn.model_selection import train_test_split, cross_val_score, RandomizedSearchCV
from sklearn.preprocessing import StandardScaler, normalize

from sklearn.naive_bayes           import GaussianNB
from sklearn.linear_model          import LogisticRegression
from sklearn.neighbors             import KNeighborsClassifier
from sklearn.svm                   import SVC
from sklearn.tree                  import DecisionTreeClassifier
from sklearn.neural_network        import MLPClassifier
from sklearn.ensemble              import ExtraTreesClassifier
from sklearn.ensemble              import RandomForestClassifier
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.discriminant_analysis import QuadraticDiscriminantAnalysis

from lightgbm import LGBMClassifier

FEATURES = slice(0,-2, 1)

### A set of classifiers <a id="11"></a>

In [None]:
model_names = ['LogisticRegression',
               'KNeighborsClassifier',
               'SVC',
               'MLPClassifier',
               'ExtraTreesClassifier',
               'RandomForestClassifier',
               'LinearDiscriminantAnalysis',
               'LGBMClassifier']

classifiers = [LogisticRegression, # Логистическая регрессия
               KNeighborsClassifier, # K-ближайших соседей
               SVC, # Метод опорных векторов
               MLPClassifier, # Трёхслойный перцептрон
               ExtraTreesClassifier, # Экстра (randomized) деревья 
               RandomForestClassifier, # Случайный лес
               LinearDiscriminantAnalysis, # Линейный дискриминантный анализ
               LGBMClassifier] # Градиентный бустинг

## Short description each algorithm <a id="12"></a>
Description of algorithms without explanation from Wikipedia

### 1. Logistic Regression <a id="13"></a>

<img src="https://plot.ly/~florianh/140.png">

In statistics, the logistic model (or logit model) is a widely used statistical model that, in its basic form, uses a logistic function to model a binary dependent variable; many more complex extensions exist. In regression analysis, logistic regression (or logit regression) is estimating the parameters of a logistic model; it is a form of binomial regression. Mathematically, a binary logistic model has a dependent variable with two possible values, such as pass/fail, win/lose, alive/dead or healthy/sick; these are represented by an indicator variable, where the two values are labeled "0" and "1". In the logistic model, the log-odds (the logarithm of the odds) for the value labeled "1" is a linear combination of one or more independent variables ("predictors"); the independent variables can each be a binary variable (two classes, coded by an indicator variable) or a continuous variable (any real value). The corresponding probability of the value labeled "1" can vary between 0 (certainly the value "0") and 1 (certainly the value "1"), hence the labeling; the function that converts log-odds to probability is the logistic function, hence the name. The unit of measurement for the log-odds scale is called a logit, from logistic unit, hence the alternative names. Analogous models with a different sigmoid function instead of the logistic function can also be used, such as the probit model; the defining characteristic of the logistic model is that increasing one of the independent variables multiplicatively scales the odds of the given outcome at a constant rate, with each dependent variable having its own parameter; for a binary independent variable this generalizes the odds ratio. <br><br>Logistic regression was developed by statistician David Cox in 1958. The binary logistic regression model has extensions to more than two levels of the dependent variable: categorical outputs with more than two values are modelled by multinomial logistic regression, and if the multiple categories are ordered, by ordinal logistic regression, for example the proportional odds ordinal logistic model. The model itself simply models probability of output in terms of input, and does not perform statistical classification (it is not a classifier), though it can be used to make a classifier, for instance by choosing a cutoff value and classifying inputs with probability greater than the cutoff as one class, below the cutoff as the other; this is a common way to make a binary classifier. The coefficients are generally not computed by a closed-form expression, unlike linear least squares. 

### 2. k-Nearest Neighbors (kNN) <a id="14"></a>

<img src="https://blog.knowledgent.com/wp-content/uploads/2016/01/PartII-1024x765.jpg" width=600>

In pattern recognition, the k-nearest neighbors algorithm (k-NN) is a non-parametric method used for classification and regression. In both cases, the input consists of the k closest training examples in the feature space. The output depends on whether k-NN is used for classification or regression:

        In k-NN classification, the output is a class membership. An object is classified by a majority vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors (k is a positive integer, typically small). If k = 1, then the object is simply assigned to the class of that single nearest neighbor.

        In k-NN regression, the output is the property value for the object. This value is the average of the values of its k nearest neighbors.

k-NN is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The k-NN algorithm is among the simplest of all machine learning algorithms.

Both for classification and regression, a useful technique can be used to assign weight to the contributions of the neighbors, so that the nearer neighbors contribute more to the average than the more distant ones. For example, a common weighting scheme consists in giving each neighbor a weight of 1/d, where d is the distance to the neighbor.

The neighbors are taken from a set of objects for which the class (for k-NN classification) or the object property value (for k-NN regression) is known. This can be thought of as the training set for the algorithm, though no explicit training step is required.

A peculiarity of the k-NN algorithm is that it is sensitive to the local structure of the data.

### 3. Support Vector Machine (SVM) <a id="15"></a>
In our case, the classifier, so Support Vector Classifier (SVC)

<img src="https://upload.wikimedia.org/wikipedia/commons/thumb/7/72/SVM_margin.png/440px-SVM_margin.png" width=500>

In machine learning, support vector machines (SVMs, also support vector networks) are supervised learning models with associated learning algorithms that analyze data used for classification and regression analysis. Given a set of training examples, each marked as belonging to one or the other of two categories, an SVM training algorithm builds a model that assigns new examples to one category or the other, making it a non-probabilistic binary linear classifier (although methods such as Platt scaling exist to use SVM in a probabilistic classification setting). An SVM model is a representation of the examples as points in space, mapped so that the examples of the separate categories are divided by a clear gap that is as wide as possible. New examples are then mapped into that same space and predicted to belong to a category based on which side of the gap they fall.

In addition to performing linear classification, SVMs can efficiently perform a non-linear classification using what is called the kernel trick, implicitly mapping their inputs into high-dimensional feature spaces.

When data is unlabelled, supervised learning is not possible, and an unsupervised learning approach is required, which attempts to find natural clustering of the data to groups, and then map new data to these formed groups. The support vector clustering algorithm, created by Hava Siegelmann and Vladimir Vapnik, applies the statistics of support vectors, developed in the support vector machines algorithm, to categorize unlabeled data, and is one of the most widely used clustering algorithms in industrial applications.

### 4. Multilayer Perceptron classifier (MLPClassifier) <a id="16"></a>

<img src="https://i.imgur.com/3KqBd3C.png" width=600>

A multilayer perceptron (MLP) is a class of feedforward artificial neural network. An MLP consists of, at least, three layers of nodes: an input layer, a hidden layer and an output layer. Except for the input nodes, each node is a neuron that uses a nonlinear activation function. MLP utilizes a supervised learning technique called backpropagation for training. Its multiple layers and non-linear activation distinguish MLP from a linear perceptron. It can distinguish data that is not linearly separable.

Multilayer perceptrons are sometimes colloquially referred to as "vanilla" neural networks, especially when they have a single hidden layer.

### 5. ExtraTreesClassifier & RandomForestClassifier <a id="17"></a>
The principle of operation is similar, if not too go into the essence of the algorithms, so they are in one point.

<img src="https://adpearance.com/images/blog/Spam_Trees.png" width=600>

Random forests or random decision forests are an ensemble learning method for classification, regression and other tasks, that operate by constructing a multitude of decision trees at training time and outputting the class that is the mode of the classes (classification) or mean prediction (regression) of the individual trees. Random decision forests correct for decision trees' habit of overfitting to their training set.

The first algorithm for random decision forests was created by Tin Kam Ho using the random subspace method, which, in Ho's formulation, is a way to implement the "stochastic discrimination" approach to classification proposed by Eugene Kleinberg.

An extension of the algorithm was developed by Leo Breiman and Adele Cutler, and "Random Forests" is their trademark. The extension combines Breiman's "bagging" idea and random selection of features, introduced first by Ho and later independently by Amit and Geman in order to construct a collection of decision trees with controlled variance. 

### 6. Fisher's linear discriminant or LinearDiscriminantAnalysis (LDA) <a id="18"></a>

<img src="http://coxdocs.org/lib/exe/fetch.php?media=perseus:user:activities:matrixprocessing:learning:lda2.png">

Linear discriminant analysis (LDA), normal discriminant analysis (NDA), or discriminant function analysis is a generalization of Fisher's linear discriminant, a method used in statistics, pattern recognition and machine learning to find a linear combination of features that characterizes or separates two or more classes of objects or events. The resulting combination may be used as a linear classifier, or, more commonly, for dimensionality reduction before later classification.

LDA is closely related to analysis of variance (ANOVA) and regression analysis, which also attempt to express one dependent variable as a linear combination of other features or measurements. However, ANOVA uses categorical independent variables and a continuous dependent variable, whereas discriminant analysis has continuous independent variables and a categorical dependent variable (i.e. the class label). Logistic regression and probit regression are more similar to LDA than ANOVA is, as they also explain a categorical variable by the values of continuous independent variables. These other methods are preferable in applications where it is not reasonable to assume that the independent variables are normally distributed, which is a fundamental assumption of the LDA method.

LDA is also closely related to principal component analysis (PCA) and factor analysis in that they both look for linear combinations of variables which best explain the data. LDA explicitly attempts to model the difference between the classes of data. PCA on the other hand does not take into account any difference in class, and factor analysis builds the feature combinations based on differences rather than similarities. Discriminant analysis is also different from factor analysis in that it is not an interdependence technique: a distinction between independent variables and dependent variables (also called criterion variables) must be made.

LDA works when the measurements made on independent variables for each observation are continuous quantities. When dealing with categorical independent variables, the equivalent technique is discriminant correspondence analysis.

Discriminant analysis is used when groups are known a priori (unlike in cluster analysis). Each case must have a score on one or more quantitative predictor measures, and a score on a group measure. In simple terms, discriminant function analysis is classification - the act of distributing things into groups, classes or categories of the same type. 

### 7. Gradient Boosting (in our case LGBMClassifier) <a id="19"></a>

1. <img src="https://pbs.twimg.com/media/DSTrDtVUIAA3CZb.jpg">

Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.

The idea of gradient boosting originated in the observation by Leo Breiman that boosting can be interpreted as an optimization algorithm on a suitable cost function. Explicit regression gradient boosting algorithms were subsequently developed by Jerome H. Friedman, simultaneously with the more general functional gradient boosting perspective of Llew Mason, Jonathan Baxter, Peter Bartlett and Marcus Frean. The latter two papers introduced the view of boosting algorithms as iterative functional gradient descent algorithms. That is, algorithms that optimize a cost function over function space by iteratively choosing a function (weak hypothesis) that points in the negative gradient direction. This functional gradient view of boosting has led to the development of boosting algorithms in many areas of machine learning and statistics beyond regression and classification. 

## Life-simplifying functions <a id="20"></a>

In [None]:
#  This function takes an instance of the model, data and labels as input, 
#  and there is an optional parameter that indicates the number of splits (rounds) to validate. 
#  The function returns the average value of cross-validation, as well as the standard deviation.

def cross_val_mean_std(clsf, data, labels, cv=5):
    cross_val = cross_val_score(clsf, data, labels, cv=cv)
    cross_val_mean = cross_val.mean() * 100
    cross_val_std = cross_val.std() * 100
    return round(cross_val_mean, 3), round(cross_val_std, 3)

In [None]:
#  This function takes the type of training model, training and test data, 
#  and parameters for that model, if any, as input. 
#  The function returns the already trained model, which we can use 
#  if necessary, as well as a dictionary with the 
#  results of cross-validation of training and test data.

def train_and_validate_model(model, train, train_labels, test, test_labels, parameters=None):
    
    if parameters is not None:
        model = model(**parameters)
    else:
        model = model()
        
    model.fit(train, train_labels)
    train_valid = cross_val_mean_std(model, train, train_labels)
    test_valid = cross_val_mean_std(model, test, test_labels)
        
    res_of_valid = {"train_mean": train_valid[0], "train_std": train_valid[1],
                    "test_mean":  test_valid[0],  "test_std":  test_valid[1]}
    
    return res_of_valid, model

In [None]:
#  This function takes a dictionary derived from the work of a past function 
#  that contains a cross-validation result for one or more models and 
#  creates a Pandas table (which returns), optionally adding postfix to the column names.

def create_table_with_scores(res_of_valid, postfix=""):
    if not hasattr(res_of_valid["test_std"], "len"):
        index = [0]
    else:
        index = list(res_of_valid["test_std"])

    table = pd.DataFrame({"Test mean score" + postfix:  res_of_valid["test_mean"],
                          "Test std score" + postfix:   res_of_valid["test_std"],
                          "Train mean score" + postfix: res_of_valid["train_mean"],
                          "Train std score" + postfix:  res_of_valid["train_std"]}, 
                          index=index)
    return table

In [None]:
#  This function takes a list of Pandas tables that are created by the function above, 
#  then it takes a list of names in text format (the length of the lists must match), 
#  there is an optional argument - the number of the column to sort, if necessary.
#  Returns one large table that consists of a list of tables that the function has accepted, 
#  as well as a new column with model names from the second argument. 
#  If the third parameter was specified, the function returns the table with sorting.

def table_of_results(model_results, model_names=None, col_sort_by=None):
    res = model_results[0]
    for i in model_results[1:]:
        res = res.append(i)
    if model_names is not None:
        names = []
        for i, j in enumerate(model_names):
            names += [j] * len(model_results[i])
        res["Model name"] = names
    if col_sort_by is not None:
        sort_by = res.columns[col_sort_by]
        res = res.sort_values(by=sort_by, ascending=False)
    res = res.reset_index(drop=True)
    return res

In [None]:
#  This function takes in a large table from the previous function as well 
#  as column numbers to draw a scatter chart. 
#  This function is used to draw cross-validation results from training and test data
#  and compare different models or the same models with different parameters.

def graph_for_the_results_table(table, col_x, col_y, col_style):
    x = table.columns[col_x]
    y = table.columns[col_y]
    style = table.columns[col_style]
    plt.figure(figsize=[8, 8])
    min_lim = min(min(table[x]), min(table[y]))
    max_lim = max(max(table[x]), max(table[y]))
    ax = sns.scatterplot(x, y, style, style=style, data=table, s=100)
    ax.set_xlim(min_lim - 0.01 * max_lim, max_lim + 0.01 * max_lim)
    ax.set_ylim(min_lim - 0.01 * max_lim, max_lim + 0.01 * max_lim)
    ax.grid()
    ax.set_axisbelow(True)

## Training and testing standard models <a id="21"></a>
Variables prefixed with "b_" are the variables associated with a binary classification

Divide the data into training and test samples. <br>Labels are also divisible for classifications and for binary classifications.<br>Normalize data for better learning outcomes.

In [None]:
train, test, train_labels, test_labels = train_test_split(data[data.columns[FEATURES]], 
                                                          data[data.columns[-2:]], 
                                                          test_size=0.25, random_state=3)

b_train_labels = np.array(train_labels)[:, 1]
b_test_labels = np.array(test_labels)[:, 1]

train_labels = np.array(train_labels)[:, 0].astype(int)
test_labels = np.array(test_labels)[:, 0].astype(int)

sc = StandardScaler()
train = sc.fit_transform(train)
test = sc.fit_transform(test)

Training and testing of models with standard parameters.

In [None]:
classifiers_scores = []
b_classifiers_scores = []

classifiers_importance = []

for i, clsf in enumerate(classifiers):
    t = [0, 0]
    
    res_of_valid, t[0] = train_and_validate_model(clsf, train, train_labels, test, test_labels)
    b_res_of_valid, t[1] = train_and_validate_model(clsf, train, b_train_labels, test, b_test_labels)
    
    classifiers_importance.append(t)
    
    classifiers_scores.append(create_table_with_scores(res_of_valid, " ('quality')"))
    b_classifiers_scores.append(create_table_with_scores(b_res_of_valid, " ('bin_quality')"))
    
classifiers_scores = table_of_results(classifiers_scores, model_names, 0)
b_classifiers_scores = table_of_results(b_classifiers_scores, model_names, 0)

## Visualization of the first results <a id="22"></a>
#### For classification

In [None]:
classifiers_scores

In [None]:
graph_for_the_results_table(classifiers_scores, 0, 2, 4)

In [None]:
graph_for_the_results_table(classifiers_scores, 1, 3, 4)

#### For binary classification

In [None]:
b_classifiers_scores

In [None]:
graph_for_the_results_table(b_classifiers_scores, 0, 2, 4)

In [None]:
graph_for_the_results_table(b_classifiers_scores, 1, 3, 4)

#### Importance of parameters for each type of classification

In [None]:
importances = []
b_importances = []

for clsf, b_clsf in classifiers_importance:
    if hasattr(clsf, "feature_importances_"):
        importances.append(clsf.feature_importances_)
        b_importances.append(b_clsf.feature_importances_)

fig, (ax1, ax2) = plt.subplots(ncols=2, figsize=(16, 6))
fig.suptitle("The average importance of features")
sns.barplot(list(range(1, 12)), np.mean(importances, axis=0), ax=ax1)
sns.barplot(list(range(1, 12)), np.mean(b_importances, axis=0), ax=ax2)
ax1.set_title("quality")
ax2.set_title("bin_quality")
ax1.get_yaxis().set_visible(False)
ax2.get_yaxis().set_visible(False)
ax1.set_xticklabels(data.columns, rotation=90)
ax2.set_xticklabels(data.columns, rotation=90);

## Configure the models <a id="23"></a>
We will configure 4 models: SVC, ExtraTreesClassifier, RandomForestClassifier, LGBMClassifier, as they give results above average according to my estimates.

In [None]:
#  This function takes model and parameters to find optimal, training and test data, 
#  postfix for column names, number of iterations and partitions to cross-validate 
#  for RandomizedSearchCV.
#  Returns only the table with the results.

def tuning_models(model, params, train, train_labels, 
                                 test, test_labels, postfix="", iterations=50, cv=5):
    
    model_1 = model()
    random_search = RandomizedSearchCV(model_1, params, iterations, scoring='accuracy', cv=cv)
    random_search.fit(train, train_labels)
    
    parameter_set = []
    mean_test_scores = list(random_search.cv_results_['mean_test_score'])
    for i in sorted(mean_test_scores, reverse=True):
        if i > np.mean(mean_test_scores):
            parameter_set.append(random_search.cv_results_["params"][mean_test_scores.index(i)])
        
    params_set_updated = []
    for i in parameter_set:
        if i not in params_set_updated:
            params_set_updated.append(i)
    
    results = []
    for i in params_set_updated:
        res_of_valid, _ = train_and_validate_model(model, train, train_labels, test, test_labels, parameters=i)
        results.append(create_table_with_scores(res_of_valid, postfix))
    
    results_table = table_of_results(results)
    return results_table

### SVC: parameters and tuning <a id="24"></a>

In [None]:
params = {"kernel": ["rbf", "poly", "linear", "sigmoid"],
          "C": np.arange(0.1, 1.5, 0.1), 
          "gamma": list(np.arange(0.1, 1.5, 0.1)) + ["auto"],
          "probability": [True, False],
          "shrinking": [True, False]}

In [None]:
svc_res = tuning_models(SVC, params, train, train_labels, 
                        test, test_labels, " ('quality')", 100)

b_svc_res = tuning_models(SVC, params, train, b_train_labels, 
                          test, b_test_labels, " ('bin_quality')", 100)

### ExtraTreesClassifier and RandomForestClassifier: parameters and tuning <a id="25"></a>

In [None]:
params = {"n_estimators": np.arange(1, 500, 2),
          "max_depth": list(np.arange(2, 100, 2)) + [None],
          "min_samples_leaf": np.arange(1, 20, 1),
          "min_samples_split": np.arange(2, 20, 2),
          "max_features": ["auto", "log2", None]}

In [None]:
extra_res = tuning_models(ExtraTreesClassifier, params, train, train_labels, 
                          test, test_labels, " ('quality')", 100)

b_extra_res = tuning_models(ExtraTreesClassifier, params, train, b_train_labels, 
                            test, b_test_labels, " ('bin_quality')", 100)

forest_res = tuning_models(RandomForestClassifier, params, train, train_labels, 
                           test, test_labels, " ('quality')", 100)

b_forest_res = tuning_models(RandomForestClassifier, params, train, b_train_labels, 
                             test, b_test_labels, " ('bin_quality')", 100)

### LGBMClassifier: parameters and tuning <a id="26"></a>

In [None]:
params = {"boosting_type": ["gbdt"],
          "num_leaves": np.arange(2, 100, 2),
          "max_depth": list(np.arange(2, 100, 2)) + [-1],
          "learning_rate": [0.001, 0.003, 0.006, 0.01, 0.02, 0.04, 0.06, 0.08, 0.1, 0.12, 0.14, 0.17, 0.2, 0.3, 0.4],
          "n_estimators": np.arange(2, 300, 5),
          "reg_alpha": np.arange(0, 1, 0.1),
          "reg_lambda": np.arange(0, 1, 0.1)}

In [None]:
lgb_res = tuning_models(LGBMClassifier, params, train, train_labels, 
                        test, test_labels, " ('quality')", 100)

b_lgb_res = tuning_models(LGBMClassifier, params, train, b_train_labels, 
                          test, b_test_labels, " ('bin_quality')", 100);

## Visualize the results of tuned models <a id="27"></a>
#### For classification

In [None]:
all_results = table_of_results([svc_res, extra_res, forest_res, lgb_res], 
                               ["SVC", "ExtraTrees", "RandomForest", "LightGBM"], 0)
all_results.head(10)

In [None]:
graph_for_the_results_table(all_results, 0, 2, 4)

In [None]:
graph_for_the_results_table(all_results, 1, 3, 4)

#### For binary classification

In [None]:
b_all_results = table_of_results([b_svc_res, b_extra_res, b_forest_res, b_lgb_res], 
                                 ["SVC", "ExtraTrees", "RandomForest", "LightGBM"], 0)
b_all_results.head(10)

In [None]:
graph_for_the_results_table(b_all_results, 0, 2, 4)

In [None]:
graph_for_the_results_table(b_all_results, 1, 3, 4)