### Naive Bayes with Multiple Labels

• Till now you have learned Naive Bayes classification with binary labels.

• Now you will learn about multiple class classification in Naive Bayes.

• Which is known as multinomial Naive Bayes classification.

• For example, if you want to classify a news article about technology, entertainment, politics, or sports.

• In model building part, you can use wine dataset which is a very famous multi-class classification problem. "This dataset is the result of a chemical analysis of wines grown in the same region in Italy but derived from three different cultivars."

• Dataset comprises of 13 features (alcohol, malic_acid, ash, alcalinity_of_ash, magnesium, total_phenols, flavanoids, nonflavanoid_phenols, proanthocyanins, color_intensity, hue, od280/od315_of_diluted_wines, proline) and type of wine cultivar.

• This data has three type of wine Class_0, Class_1, and Class_3. Here you can build a model to classify the type of wine.

• The dataset is available in the scikit-learn library.

#### Loading Data

• Let's first load the required wine dataset from scikit-learn datasets.

In [2]:
# Import scikit-learn dataset library
from sklearn.datasets import load_wine

# Load datasets
wine = load_wine()

#### Exploring Data
• You can print the target and feature names, to make sure you have the right dataset, as such:

In [5]:
# Print the names of the features
print("Features of wine dataset:", wine.feature_names)

# Print the label type of wine
print("\nLabel type of wine:", wine.target_names)

Features of wine dataset: ['alcohol', 'malic_acid', 'ash', 'alcalinity_of_ash', 'magnesium', 'total_phenols', 'flavanoids', 'nonflavanoid_phenols', 'proanthocyanins', 'color_intensity', 'hue', 'od280/od315_of_diluted_wines', 'proline']

Label type of wine: ['class_0' 'class_1' 'class_2']


• It's a good idea to always explore your data a bit, so you know what you're working with.

• Here, you can see the first five rows of the dataset are printed, as well as the target variable for the whole dataset.

In [6]:
# Print data(Feature) shape
wine.data.shape

(178, 13)

In [9]:
# Print the wine data features(Top 5 records)
wine.data[0:5]

array([[1.423e+01, 1.710e+00, 2.430e+00, 1.560e+01, 1.270e+02, 2.800e+00,
        3.060e+00, 2.800e-01, 2.290e+00, 5.640e+00, 1.040e+00, 3.920e+00,
        1.065e+03],
       [1.320e+01, 1.780e+00, 2.140e+00, 1.120e+01, 1.000e+02, 2.650e+00,
        2.760e+00, 2.600e-01, 1.280e+00, 4.380e+00, 1.050e+00, 3.400e+00,
        1.050e+03],
       [1.316e+01, 2.360e+00, 2.670e+00, 1.860e+01, 1.010e+02, 2.800e+00,
        3.240e+00, 3.000e-01, 2.810e+00, 5.680e+00, 1.030e+00, 3.170e+00,
        1.185e+03],
       [1.437e+01, 1.950e+00, 2.500e+00, 1.680e+01, 1.130e+02, 3.850e+00,
        3.490e+00, 2.400e-01, 2.180e+00, 7.800e+00, 8.600e-01, 3.450e+00,
        1.480e+03],
       [1.324e+01, 2.590e+00, 2.870e+00, 2.100e+01, 1.180e+02, 2.800e+00,
        2.690e+00, 3.900e-01, 1.820e+00, 4.320e+00, 1.040e+00, 2.930e+00,
        7.350e+02]])

In [11]:
# Print the win labels(0:Class_0, 1:class_2, 2:class_2)
wine.target

array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2])

#### Splitting Data

• First, you separate the columns into dependent and independent variables(or features and label).

• Then you split those variables into train and test set.

In [13]:
# Import train_test_split function
from sklearn.model_selection import train_test_split

# Split dataset into training set & test set
# 70% training set & 30% test set
X_train, X_test, y_train, y_test = train_test_split(wine.data, wine.target,test_size=0.3, random_state=109)

#### Model Generation
• After splitting, you will generate a Gaussian Naive Bayes Model on the training set and perform prediction on test set features.

In [17]:
# Import Gaussian Naive Bayes Model
from sklearn.naive_bayes import GaussianNB

# Create a Gaussian Classifier
gnb = GaussianNB()

# Train the model using traing sets
gnb.fit(X_train, y_train)

# Predict the response from test sets
y_predict = gnb.predict(X_test)

### Evaluating Model

• After model generation, check the accuracy using actual and predicted values.

In [21]:
# Import sklearn metrics module for accuracy calculation
from sklearn.metrics import accuracy_score

# Model accuracy, how often is the classifier correct?
print("Model accuracy:", round(accuracy_score(y_test, y_predict),2))

Model accuracy: 0.91


### Zero Probability Problem

• Suppose there is no tuple for a risky loan in the dataset, in this scenario, the posterior probability will be zero, and the model is unable to make a prediction. This problem is known as Zero Probability because the occurrence of the particular class is zero.

• The solution for such an issue is the Laplacian correction or Laplace Transformation. Laplacian correction is one of the smoothing techniques. Here, you can assume that the dataset is large enough that adding one row of each class will not make a difference in the estimated probability. This will overcome the issue of probability values to zero.

• For Example: Suppose that for the class loan risky, there are 1000 training tuples in the database. In this database, income column has 0 tuples for low income, 990 tuples for medium income, and 10 tuples for high income. The probabilities of these events, without the Laplacian correction, are 0, 0.990 (from 990/1000), and 0.010 (from 10/1000)

• Now, apply Laplacian correction on the given dataset. Let's add 1 more tuple for each income-value pair. The probabilities of these events:

                                        1/1003 = 0.001
                                        991/1003 = 0.988
                                        11/1003 = 0.011

### Advantages

• It is not only a simple approach but also a fast and accurate method for prediction.

• Naive Bayes has very low computation cost.

• It can efficiently work on a large dataset.

• It performs well in case of discrete response variable compared to the continuous variable.

• It can be used with multiple class prediction problems.

• It also performs well in the case of text analytics problems.

• When the assumption of independence holds, a Naive Bayes classifier performs better compared to other models like logistic regression.

### Disadvantages
• The assumption of independent features. In practice, it is almost impossible that model will get a set of predictors which are entirely independent.

• If there is no training tuple of a particular class, this causes zero posterior probability. In this case, the model is unable to make predictions. This problem is known as Zero Probability/Frequency Problem.