___

<a href='http://www.pieriandata.com'> <img src='../Pierian_Data_Logo.png' /></a>
___
# Logistic Regression Project 

In this project we will be working with a fake advertising data set, indicating whether or not a particular internet user clicked on an Advertisement. We will try to create a model that will predict whether or not they will click on an ad based off the features of that user.

This data set contains the following features:

* 'Daily Time Spent on Site': consumer time on site in minutes
* 'Age': cutomer age in years
* 'Area Income': Avg. Income of geographical area of consumer
* 'Daily Internet Usage': Avg. minutes a day consumer is on the internet
* 'Ad Topic Line': Headline of the advertisement
* 'City': City of consumer
* 'Male': Whether or not consumer was male
* 'Country': Country of consumer
* 'Timestamp': Time at which consumer clicked on Ad or closed window
* 'Clicked on Ad': 0 or 1 indicated clicking on Ad

## Import Libraries

**Import a few libraries you think you'll need (Or just import them as you go along!)**

In [None]:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

## Get the Data
**Read in the advertising.csv file and set it to a data frame called ad_data.**

In [None]:
df = pd.read_csv('../input/advertise/advertising.csv')

**Check the head of ad_data**

In [None]:
df.head()

** Use info and describe() on ad_data**

In [None]:
df.info()

In [None]:
df.describe()

## Exploratory Data Analysis

Let's use seaborn to explore the data!

Try recreating the plots shown below!

** Create a histogram of the Age**

In [None]:
sns.set_style('whitegrid')

In [None]:
sns.histplot(data=df,x='Age')

**Create a jointplot showing Area Income versus Age.**

In [None]:
df.columns

In [None]:
sns.jointplot(y='Age', x='Area Income',data=df)

**Create a jointplot showing the kde distributions of Daily Time spent on site vs. Age.**

In [None]:
plt.figure(figsize=(10,4))
sns.jointplot(y='Age', x='Daily Time Spent on Site',kind='kde', data=df)

** Create a jointplot of 'Daily Time Spent on Site' vs. 'Daily Internet Usage'**

In [None]:
sns.jointplot(y='Daily Internet Usage', x='Daily Time Spent on Site',data=df)

** Finally, create a pairplot with the hue defined by the 'Clicked on Ad' column feature.**

In [None]:
sns.pairplot(data=df, hue='Clicked on Ad')

In [None]:
df.head()

# Logistic Regression

Now it's time to do a train test split, and train our model!

You'll have the freedom here to choose columns that you want to train on!

In [None]:
from sklearn.model_selection import train_test_split

In [None]:
df.head()

In [None]:
df.columns

In [None]:
df.drop(['Ad Topic Line', 'City', 'Country','Timestamp'], axis=1, inplace=True)

In [None]:
df

In [None]:
X = df.iloc[:,:-1]
y = df.iloc[:,-1]

In [None]:
X.head()

In [None]:
y.head()

** Split the data into training set and testing set using train_test_split**

In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

In [None]:
X_train.info()

In [None]:
X_test.info()

In [None]:
from sklearn.linear_model import LogisticRegression
logreg = LogisticRegression()

** Train and fit a logistic regression model on the training set.**

In [None]:
logreg.fit(X_train,y_train)

In [None]:
logreg

## Predictions and Evaluations
** Now predict values for the testing data.**

In [None]:
predictions = logreg.predict(X_test)

In [None]:
predictions

In [None]:
type(y_test)

In [None]:
sns.histplot(predictions-y_test)

In [None]:
sns.scatterplot(x=y_test, y=predictions)

*Above scatter plot was not of any use since we get only 4 point with overlapping possibly which basically won't allow us to conclude something. The histogram could be a better choice here, which we did.*


** Create a classification report for the model.**

In [None]:
from sklearn.metrics import classification_report

In [None]:
print(classification_report(y_test, predictions))

In [None]:
from sklearn.metrics import confusion_matrix

In [None]:
confusion_matrix(y_test,predictions)

## Great Job!