# Introduction
This is a Glass Identification Data Set from UCI.

## Content:
  1. [Variable Description](#1)
  2. [Import Libraries](#2)
  3. [Load the Dataset](#3)
  4. [Train Test Split](#4)
  5. [Feature Scaling](#5)
  6. [Classification Models](#6)
    * [Decision Tree Classification](#7)
    * [Random Tree Classification](#8)
    * [K-NN](#9)
    * [Support Vector Machine](#10)
    * [Naive Bayes](#11)
    * [Logistic Regression](#12)
  7. [Results](#13)

<a id= 1> </a>
# 1. Variable Description

### Independent Variables
  1. Id number: 1 to 214 (removed from CSV file)
  2. RI: refractive index
  3. Na: Sodium (unit measurement: weight percent in corresponding oxide, as are attributes 4-10)
  4. Mg: Magnesium
  5. Al: Aluminum
  6. Si: Silicon
  7. K: Potassium
  8. Ca: Calcium
  9. Ba: Barium
  10. Fe: Iron

### Dependent Variable
  Type of glass: (class attribute)
  * buildingwindowsfloatprocessed: 1  
  * buildingwindowsnonfloatprocessed: 2  
  * vehiclewindowsfloatprocessed: 3
  * vehiclewindowsnonfloatprocessed (none in this database): 4
  * containers: 5
  * tableware: 6
  * headlamps: 7


<a id= 2> </a>
# 2. Import Libraries

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import seaborn as sns

from sklearn.metrics import accuracy_score
from sklearn.metrics import plot_confusion_matrix

import warnings
warnings.filterwarnings("ignore")

<a id= 3> </a>
# 3. Load the Dataset

In [None]:
df = pd.read_csv('../input/glass/glass.csv')

In [None]:
df

In [None]:
df.info()

In [None]:
df.corr()['Type'].sort_values()

In [None]:
plt.figure(figsize= (10,7))
sns.heatmap(df.corr(), annot = True, fmt= ' .1g')

## Check if there is null value or not

In [None]:
df.notnull().all()

In [None]:
X = df.drop(columns= ['Type', 'Ca']).values
y = df.iloc[:, -1].values

## Distributions

In [None]:
for column in df.columns[:-1]:
  sns.distplot(df[column], color= 'y')
  plt.grid(True)
  plt.show()

<a id= 4> </a>
# 4. Train Test Split

In [None]:
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2, random_state = 0)

<a id= 5> </a>
# 5. Feature Scaling

In [None]:
from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X_train = sc.fit_transform(X_train)
X_test = sc.transform(X_test)

<a id= 6> </a>
# 6. Classification Models

In [None]:
accuracies = {}

<a id= 7> </a>
## Decision Tree Classification

In [None]:
from sklearn.tree import DecisionTreeClassifier

classifier = DecisionTreeClassifier(criterion= 'entropy', random_state= 0)
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

### Confusion Matrix

In [None]:
plot_confusion_matrix(classifier, X_test, y_test, cmap = plt.cm.BuGn, normalize = 'true')
plt.show()

### Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracies['Decision Tree Classification'] = accuracy

print(accuracy)

<a id= 8> </a>
## Random Forest Classification

In [None]:
from sklearn.ensemble import RandomForestClassifier

classifier = RandomForestClassifier(n_estimators = 10, criterion = 'entropy', random_state = 0)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)


### Confusion Matrix

In [None]:
plot_confusion_matrix(classifier, X_test, y_test, cmap = plt.cm.BuGn, normalize = 'true')
plt.show()

### Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracies['Random Forest Classification'] = accuracy

print(accuracy)

<a id= 9> </a>
## K-NN

In [None]:
from sklearn.neighbors import KNeighborsClassifier

# metric = 'minkowski', p = 2 means Euclidean distance.
classifier = KNeighborsClassifier(n_neighbors = 5, metric = 'minkowski', p = 2)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

### Confusion Matrix

In [None]:
plot_confusion_matrix(classifier, X_test, y_test, cmap = plt.cm.BuGn, normalize = 'true')
plt.show()

### Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracies['K-NN'] = accuracy

print(accuracy)

<a id= 10> </a>
## Support Vector Machine

In [None]:
from sklearn.svm import SVC

classifier = SVC(kernel = 'rbf', random_state = 0)
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

### Confusion Matrix

In [None]:
plot_confusion_matrix(classifier, X_test, y_test, cmap = plt.cm.BuGn, normalize = 'true')
plt.show()

### Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracies['Support Vector Machine'] = accuracy

print(accuracy)

<a id= 11> </a>
## Naive Bayes

In [None]:
from sklearn.naive_bayes import GaussianNB

classifier = GaussianNB()
classifier.fit(X_train, y_train)

y_pred = classifier.predict(X_test)

### Confusion Matrix

In [None]:
plot_confusion_matrix(classifier, X_test, y_test, cmap = plt.cm.BuGn, normalize = 'true')
plt.show()

### Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracies['Naive Bayes'] = accuracy

print(accuracy)

<a id= 12> </a>
## Logistic Regression

In [None]:
from sklearn.linear_model import LogisticRegression

classifier = LogisticRegression()
classifier.fit(X_train, y_train)
y_pred = classifier.predict(X_test)

### Confusion Matrix

In [None]:
plot_confusion_matrix(classifier, X_test, y_test, cmap = plt.cm.BuGn, normalize= 'true')
plt.show()

### Accuracy

In [None]:
accuracy = accuracy_score(y_test, y_pred)
accuracies['Logistic Regression'] = accuracy

print(accuracy)

<a id= 13> </a>
# 7. Results


In [None]:
accuracy_df  = pd.DataFrame(list(accuracies.items()),columns = ['Model Name', 'Accuracy Score']) 
accuracy_df

In [None]:
f, ax = plt.subplots(figsize = (8,6))
sns.set_color_codes('pastel')
sns.barplot(y = 'Model Name', x = 'Accuracy Score', data = accuracy_df, color = 'pink')
plt.show()