In this notebook, you will use SVM (Support Vector Machines) to build and train a model using diamond features.

SVM works by mapping data to a high-dimensional feature space so that data points can be categorized, even when the data are not otherwise linearly separable. A separator between the categories is found, then the data is transformed in such a way that the separator could be drawn as a hyperplane. Following this, characteristics of new data can be used to predict the group to which a new record should belong.

# Importing Needed Packages and Data

In [None]:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
import sklearn
from sklearn.utils import shuffle
from sklearn import svm
from sklearn.model_selection import train_test_split

In [None]:
df = pd.read_csv("../input/diamonds/diamonds.csv")

# Understanding the Data

In [None]:
df.head()

In [None]:
df.drop(columns="Unnamed: 0", inplace= True)

In [None]:
df.head()

In [None]:
df.describe()

# Data Exploration

In [None]:
plt.figure(figsize=(10,8))
plt.scatter(x="depth", y="price", data=df)

plt.ylabel("Depth")
plt.xlabel("Price")

In [None]:
sns.countplot(x="color", data=df)

In [None]:
sns.countplot(x="cut", data=df)

# Data Pre-processing and Selection

In [None]:
df["cut"].unique()

In [None]:
df["color"].unique()

In [None]:
df["clarity"].unique()

In [None]:
cut_dict = {"Fair" : 1, "Good" : 2, "Very Good" : 3, "Premium" : 4, "Ideal" : 5}
color_dict = {'E' : 1, 'I' : 2, 'J' : 3, 'H' : 4, 'F' : 5, 'G' : 6, 'D' : 7}
clarity_dict = {'SI2' : 1, 'SI1' : 2, 'VS1' : 3, 'VS2' : 4, 'VVS2' : 5, 'VVS1' : 6, 'I1' :7, 'IF' : 8}


In [None]:
df["cut"] = df["cut"].map(cut_dict)
df["color"] = df["color"].map(color_dict)
df["clarity"] = df["clarity"].map(clarity_dict)

In [None]:
df.head()

In [None]:
df.info()

In [None]:
df.isnull().sum()

In [None]:
df = sklearn.utils.shuffle(df)
df.head()

In [None]:
X = df.drop("price", axis = 1).values
y = df["price"].values

In [None]:
X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=4)
print ('Train set:', X_train.shape,  y_train.shape)
print ('Test set:', X_test.shape,  y_test.shape)

In [None]:
clf = svm.SVR(kernel='linear')
clf.fit(X_train, y_train)

In [None]:
clf.score(X_test, y_test)

Please leave a feedback. 
Thank you very much :)