# Ad click purchase prediction - Logistic Regression

In this project we will be working with a advertising data set, indicating whether or not a particular user clicked on an digital Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user.

This data set contains the following features:

1. 'User ID': unique identification for consumer 
2. 'Age': cutomer age in years
3. 'Estimated Salary': Avg. Income of consumer
4. 'Gender': Whether consumer was male or female
5. 'Purchased': 0 or 1 indicated clicking on Ad

In [None]:
import pandas as pd
import numpy as np 
import matplotlib.pyplot as plt 
import seaborn as sns


In [None]:
data = pd.read_csv("Social_Network_Ads.csv")

In [None]:
data.head()

In [None]:
data.describe()

# Exploratory Data Analysis

Let's use seaborn to explore the data!


** Create a histogram of the Age**

In [None]:
sns.set_style('whitegrid')
data['Age'].hist(bins=30)
plt.xlabel('Age')



#majority of company leads look they belong to 35 to 48 age category 

In [None]:
Create a jointplot showing Estimated Salary versus Age

In [None]:
sns.jointplot(x='Age',y='EstimatedSalary',data= data)

In [None]:
sns.pairplot(data)

# Feature Engineering

### 1. Convert the categorical variables into numerical form 

In [None]:
Sex  = pd.get_dummies(data['Gender'] , drop_first = True)
Sex

In [None]:
data['Sex'] = Sex
data = data.drop('Gender' , axis =1)

### 2. Standardize the data 

In [None]:
from sklearn.preprocessing import StandardScaler

In [None]:
ss = StandardScaler()

In [None]:
ss.fit(data.drop('Purchased', axis =1 ))

In [None]:
scaled_featured = ss.transform( data.drop('Purchased', axis =1 ))

In [None]:
scaled_featured

In [None]:
scale = pd.DataFrame( scaled_featured , columns = data.columns[:-1])

In [None]:
scale['Sex'] = scale['Purchased']

In [None]:
scale = scale.drop('Purchased'  ,axis =1)

In [None]:
scale

# Logistic Regression
Now it's time to do a train test split, and train our model!



** Split the data into training set and testing set using train_test_split**

In [None]:
data.head(1)

In [None]:
x = scale
y = data['Purchased']

In [None]:
X_train  ,X_test , y_train , y_test = train_test_split( x , y , test_size = 0.3 , random_state = 50)

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
logistic_model  = LogisticRegression()

In [None]:
logistic_model.fit(X_train,y_train)

# Predictions and Evaluations
** Now predict values for the testing data.**

In [None]:
pred = logistic_model.predict(X_test)

In [None]:
from sklearn.metrics import classification_report , confusion_matrix

In [None]:
print( confusion_matrix (y_test , pred))

In [None]:
print( classification_report(y_test , pred))

In [None]:
# 88% accuracy works !

# Great Job!