## Introduction





* **In this data, I classified wine qualities into 3 categories as good, mid and bad.  Then, I explored the new data with data visualization libraries.** 

* **For prediction I used K-Nearest Neighbors, Support Vector Machine and Random Forest models.** 

* **For conclusion, I matched accuracy scores according to model prediction ratios**


> **Please leave me a comment and upvote the kernel if you liked at the end.**

**Basic Imports**

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

**Get The Data**

In [None]:
df = pd.read_csv("../input/winequality-red.csv")
df.head(3)

**Classify The Quality**

In [None]:
quality = df["quality"].values
category = []
for num in quality:
    if num<5:
        category.append("Bad")
    elif num>6:
        category.append("Good")
    else:
        category.append("Mid")

In [None]:
#Create new data
category = pd.DataFrame(data=category, columns=["category"])
data = pd.concat([df,category],axis=1)
data.drop(columns="quality",axis=1,inplace=True)

In [None]:
data.head(3)

## Exploratory Data Analysis

**Let's explore the data!**

___
**Here I counted the number of each class and checked correlation of the columns**

In [None]:
plt.figure(figsize=(10,6))
sns.countplot(data["category"],palette="muted")
data["category"].value_counts()

In [None]:
plt.figure(figsize=(12,6))
sns.heatmap(df.corr(),annot=True)

**According to heatmap, we can focus on alcohol-quality and density-alcohol relations to get meaningful exploration**

In [None]:
plt.figure(figsize=(12,6))
sns.barplot(x=df["quality"],y=df["alcohol"],palette="Reds")

In [None]:
plt.figure(figsize=(12,6))
sns.jointplot(y=df["density"],x=df["alcohol"],kind="hex")

** Setting features, labels and
Encoding the categorical data**

**[](http://)(good=1, med=2, bad=3)**

In [None]:
X= data.iloc[:,:-1].values
y=data.iloc[:,-1].values

In [None]:
from sklearn.preprocessing import LabelEncoder
labelencoder_y =LabelEncoder()
y= labelencoder_y.fit_transform(y)

## Training and Testing Data
**Now that we've explored the data a bit, let's go ahead and split the data into training and testing sets.**

In [None]:
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2,random_state=0)

**Scaling the data for optimise predictions**

In [None]:
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)

## Training the Model and Predicting the Test Data 

Now its time to train our models on our training data and predict each of them!

## Support Vector Machine

In [None]:
from sklearn.svm import SVC
svc = SVC()
svc.fit(X_train,y_train)
pred_svc =svc.predict(X_test)

In [None]:
from sklearn.metrics import classification_report,accuracy_score
print(classification_report(y_test,pred_svc))

## K-Nearest Neighbors

In [None]:
from sklearn.neighbors import KNeighborsClassifier
knn = KNeighborsClassifier()
knn.fit(X_train,y_train)
pred_knn=knn.predict(X_test)
print(classification_report(y_test, pred_knn))

## Conclusion

**Time to match the results!**

In [None]:
conclusion = pd.DataFrame({'models': ["SVC","KNN"],
                           'accuracies': [accuracy_score(y_test,pred_svc),accuracy_score(y_test,pred_knn)]})
conclusion

Here SVC is more accurate.