# Predictive Model: Logistic Regression for Binary Classification

# Description:
### More often than not, resources are scarce for businesses. It is essential to have the meaningful information to make the right decision and craft an effective strategy. Having a deeper understanding of the data can benefit businesses for informed decision-making. In this dataset "advertising", the business is trying to understand the customers' behaviors - if the customer would click on the online advertisement during their internet usage. The machine learning logistic regression model will be used to answer this binary classification question. 

# Project Objective:
### To develop a machine learning using the logistic regression model to predict if the customer "Clicked on Ad".

# Process:
### This interesting project will start off with basic descriptive analysis, followed by detailed exploratory data analysis, data visualization of the features, data cleansing, and scaling the data to prepare for machine learning model from sklearn library. The model will be evaluated using classification report and confusion matrix, which are the standard evaluation metrics for binary classification model. 

# Positive Impact:
### The business will gain actionable insights to their question by predicting the customer's behavior. Upon success, the business can craft a more effective strategy to boost their advertisement exposure on the website. Having a deeper understanding of the data using advanced data analysis can help businesses make the best available decision and deploy an effective business strategies.

### Importing libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [None]:
ad_data = pd.read_csv("../input/advertisingcsv/advertising.csv")

In [None]:
ad_data.head()

### Noted that the dataset comes with a mixture of data type, such as object and float.

In [None]:
ad_data.info()

### Running a basic descriptive analysis on the dataset as the first glimpse before diving in.


In [None]:
ad_data.describe()

# Exploratory Data Analysis
### A quick way to visualize the correlation between features

In [None]:
sns.distplot(ad_data["Age"], bins=30, kde=False, color="red")

In [None]:
sns.jointplot(x="Age", y="Area Income", data=ad_data)

In [None]:
sns.jointplot(x="Age", y="Daily Time Spent on Site", data=ad_data, color="red", kind="kde", fill=True, palette="seismic")

In [None]:
sns.jointplot(x="Daily Time Spent on Site", y="Daily Internet Usage", data=ad_data, color="green")

In [None]:
sns.pairplot(ad_data, hue="Clicked on Ad")

### Preparing the dataset for train and test.

In [None]:
from sklearn.model_selection import train_test_split

### Creating variables for X and y. The target feature is "Clicked on Ad".

In [None]:
y = ad_data["Clicked on Ad"]
X = ad_data[["Age", "Daily Time Spent on Site", "Male", "Daily Internet Usage", "Area Income"]]

### The test size is set at 33%, which is arbitrary. Random state is set at 42, which is also arbitrary.

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

### Importing the logistc regression model from sklearn library

In [None]:
from sklearn.linear_model import LogisticRegression

In [None]:
logress = LogisticRegression()
logress.fit(X_train, y_train)

In [None]:
prediction = logress.predict(X_test)

### It is time to evaluate the model using the clssification report and confusion matrix

In [None]:
from sklearn.metrics import classification_report, confusion_matrix

In [None]:
print(classification_report(y_test, prediction))
print(confusion_matrix(y_test, prediction))