# Social Media ADs analysis & prediction 

#### Notice: In my previous notebook, I used KNN algorithms to predict who would make a purchase based on salary compared to age, now I'm going to use SVM algorithm to try to get higher accuracy and separate the points in the scatter

![](https://www.threegirlsmedia.com/wp-content/uploads/2020/11/3G.SocialMediaAds.1.21.2019.png)

## The purpose of the case study:

### The main purpose of the study is to predict whether customers will buy again based on age and income

### Importing Libraries

In [1]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
sns.set()

### Importing Dataset & Extracting Features

In [4]:
df = pd.read_csv('../input/social-ads/social-network-ads.csv')
df.head(10)

In [12]:
X = df.iloc[:, :-1].values
X

In [13]:
y = df.iloc[:, -1].values
y

#### There is a problem with the X values needing to be scaled so I need to do some data preprocessing to fix the problem

### Data Splitting & Preprocessing

In [14]:
# first we need to split the data
from sklearn.model_selection import train_test_split

In [33]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

In [34]:
# now we do scaling to the X values using StandardScaler model
from sklearn.preprocessing import StandardScaler

In [35]:
sc = StandardScaler()

In [36]:
X_train = sc.fit_transform(X_train)
X_train

In [37]:
X_test = sc.transform(X_test)
X_test

#### Now that the problem is solved we can build the model

## Using Linear Kernel

### Model Training & Predicting

In [56]:
from sklearn.svm import SVC

In [57]:
classifier = SVC(kernel='linear', random_state= 0)

In [58]:
classifier.fit(X_train, y_train)

In [59]:
# check if the everything going well :)
print(classifier.predict(sc.transform([[40, 75000]])))

In [42]:
y_pred = classifier.predict(X_test)
y_pred

In [43]:
y_test

In [44]:
# if needed to check every prediction versus the original input 
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

#### Now let's evaluate the model

### Model Evaluation

In [45]:
from sklearn.metrics import accuracy_score, confusion_matrix
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

The score are 90% are very good but i need more than that in this time so i will try another kernel 

### Visualising Results

##### Visualising the Training set results

In [49]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

##### Visualising the Test set results

In [50]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('SVM (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

## Using RBF Kernel

### Model Training & Predicting

In [60]:
from sklearn.svm import SVC

In [61]:
classifier = SVC(kernel = 'rbf', random_state = 0)

In [62]:
classifier.fit(X_train, y_train)

In [64]:
print(classifier.predict(sc.transform([[40, 75000]])))

In [65]:
y_pred = classifier.predict(X_test)
y_pred

In [67]:
y_test

In [68]:
print(np.concatenate((y_pred.reshape(len(y_pred),1), y_test.reshape(len(y_test),1)),1))

### Model Evaluation

In [69]:
from sklearn.metrics import confusion_matrix, accuracy_score
cm = confusion_matrix(y_test, y_pred)
print(cm)
accuracy_score(y_test, y_pred)

#### Is better than the linear kernel so i will accept this accourcy score

### Visualizing Results

##### Visualising the Training set results

In [70]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_train), y_train
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Kernel SVM (Training set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

##### Visualising the Test set results

In [71]:
from matplotlib.colors import ListedColormap
X_set, y_set = sc.inverse_transform(X_test), y_test
X1, X2 = np.meshgrid(np.arange(start = X_set[:, 0].min() - 10, stop = X_set[:, 0].max() + 10, step = 0.25),
                     np.arange(start = X_set[:, 1].min() - 1000, stop = X_set[:, 1].max() + 1000, step = 0.25))
plt.contourf(X1, X2, classifier.predict(sc.transform(np.array([X1.ravel(), X2.ravel()]).T)).reshape(X1.shape),
             alpha = 0.75, cmap = ListedColormap(('red', 'green')))
plt.xlim(X1.min(), X1.max())
plt.ylim(X2.min(), X2.max())
for i, j in enumerate(np.unique(y_set)):
    plt.scatter(X_set[y_set == j, 0], X_set[y_set == j, 1], c = ListedColormap(('red', 'green'))(i), label = j)
plt.title('Kernel SVM (Test set)')
plt.xlabel('Age')
plt.ylabel('Estimated Salary')
plt.legend()
plt.show()

## Please leave an upvote and comment to helps me continue my data science journy and improves my work. Thanks