# Logistic Regression Project #

In this project we will be working with a fake advertising data set, indicating whether or not a particular internet user clicked on an Advertisement on a company website. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user.

This data set contains the following features:

* 'Daily Time Spent on Site': consumer time on site in minutes
* 'Age': cutomer age in years
* 'Area Income': Avg. Income of geographical area of consumer
* 'Daily Internet Usage': Avg. minutes a day consumer is on the internet
* 'Ad Topic Line': Headline of the advertisement
* 'City': City of consumer
* 'Male': Whether or not consumer was male
* 'Country': Country of consumer
* 'Timestamp': Time at which consumer clicked on Ad or closed window
* 'Clicked on Ad': 0 or 1 indicated clicking on Ad

## Import Libraries ##

In [1]:
%matplotlib notebook

In [2]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

## Get the data ##

In [3]:
ad_data = pd.read_csv ("advertising.csv")

In [4]:
ad_data.head()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Ad Topic Line,City,Male,Country,Timestamp,Clicked on Ad
0,68.95,35,61833.9,256.09,Cloned 5thgeneration orchestration,Wrightburgh,0,Tunisia,2016-03-27 00:53:11,0
1,80.23,31,68441.85,193.77,Monitored national standardization,West Jodi,1,Nauru,2016-04-04 01:39:02,0
2,69.47,26,59785.94,236.5,Organic bottom-line service-desk,Davidton,0,San Marino,2016-03-13 20:35:42,0
3,74.15,29,54806.18,245.89,Triple-buffered reciprocal time-frame,West Terrifurt,1,Italy,2016-01-10 02:31:19,0
4,68.37,35,73889.99,225.58,Robust logistical utilization,South Manuel,0,Iceland,2016-06-03 03:36:18,0


In [4]:
ad_data.describe()

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Male,Clicked on Ad
count,1000.0,1000.0,1000.0,1000.0,1000.0,1000.0
mean,65.0002,36.009,55000.00008,180.0001,0.481,0.5
std,15.853615,8.785562,13414.634022,43.902339,0.499889,0.50025
min,32.6,19.0,13996.5,104.78,0.0,0.0
25%,51.36,29.0,47031.8025,138.83,0.0,0.0
50%,68.215,35.0,57012.3,183.13,0.0,0.5
75%,78.5475,42.0,65470.635,218.7925,1.0,1.0
max,91.43,61.0,79484.8,269.96,1.0,1.0


# Exploratory Data Analysis #

Using seaborn to explore the data!

**Creating a histogram of the Age**

In [5]:
sns.set_style ('darkgrid')
sns.histplot (x='Age', data=ad_data, color='blue', bins=40)

<IPython.core.display.Javascript object>

<AxesSubplot:xlabel='Age', ylabel='Count'>

**Creating a jointplot showing Area Income versus Age.**

In [6]:
sns.jointplot(x='Age', y='Area Income', color='blue', data=ad_data)

<IPython.core.display.Javascript object>

<seaborn.axisgrid.JointGrid at 0x138cdc77340>

**Creating a jointplot showing the kde distributions of Daily Time spent on site vs. Age.**

In [8]:
sns.jointplot (x='Age',y='Daily Time Spent on Site', kind= 'kde', color='red', data = ad_data, fill=True )

<IPython.core.display.Javascript object>

<seaborn.axisgrid.JointGrid at 0x138cf80fc10>

**Creating a jointplot of 'Daily Time Spent on Site' vs. 'Daily Internet Usage'**

In [9]:
sns.jointplot (x='Daily Time Spent on Site', y='Daily Internet Usage', color='green', data = ad_data)

<IPython.core.display.Javascript object>

<seaborn.axisgrid.JointGrid at 0x138cfd3d3d0>

# Logistic Regression #

Now it's time to do a train test split, and train our model!

**Splitting the data into training set and testing set using train_test_split**

In [15]:
from sklearn.model_selection import train_test_split

In [16]:
X = ad_data [['Daily Time Spent on Site', 'Age', 'Area Income', 'Daily Internet Usage', 'Male']]
y = ad_data ['Clicked on Ad']

In [17]:
X_train, X_test, y_train, y_test = train_test_split (X,y, test_size = 0.3, random_state=101)

**Training and fitting a logistic regression model on the training set.**

In [18]:
from sklearn.linear_model import LogisticRegression

In [19]:
logmodel = LogisticRegression()

In [20]:
logmodel.fit(X_train, y_train)

LogisticRegression()

## Predictions and Evaluations ##

**Now predicting values for the testing data.**

In [21]:
prediction = logmodel.predict (X_test)

**Creating a classification report for the model.**

In [22]:
from sklearn.metrics import classification_report

In [23]:
print (classification_report (y_test, prediction))

              precision    recall  f1-score   support

           0       0.91      0.95      0.93       157
           1       0.94      0.90      0.92       143

    accuracy                           0.93       300
   macro avg       0.93      0.93      0.93       300
weighted avg       0.93      0.93      0.93       300

