In any advertising agency, it is very important to predict the most profitable users who are very likely to respond to targeted advertisements.

## Click-Through Rate Prediction Model with Machine Learning

By predicting the click-through rate, an advertising company select the most potential visitors who are most likely to respond to the ads, analyzing their browsing history and showing the most relevant ads based on the interest of the user.

This task is important for every advertising agency because the commercial value of promotions on the Internet depends only on how the user responds to them. A user’s response to ads is very valuable to every ad company because it allows the company to select the ads that are most relevant to users.

Now let’s get started with the task of click-through rate prediction model with Machine Learning by importing the dataset:

In [1]:
import pandas as pd
data = pd.read_csv(r'C:\Users\SHREE\Downloads\Python CODES\Click-Through Rate Prediction with Machine Learning\advertising.csv')
print(data.head())

   Daily Time Spent on Site  Age  Area Income  Daily Internet Usage  \
0                     68.95   35     61833.90                256.09   
1                     80.23   31     68441.85                193.77   
2                     69.47   26     59785.94                236.50   
3                     74.15   29     54806.18                245.89   
4                     68.37   35     73889.99                225.58   

                           Ad Topic Line            City  Male     Country  \
0     Cloned 5thgeneration orchestration     Wrightburgh     0     Tunisia   
1     Monitored national standardization       West Jodi     1       Nauru   
2       Organic bottom-line service-desk        Davidton     0  San Marino   
3  Triple-buffered reciprocal time-frame  West Terrifurt     1       Italy   
4          Robust logistical utilization    South Manuel     0     Iceland   

             Timestamp  Clicked on Ad  
0  2016-03-27 00:53:11              0  
1  2016-04-04 01:39:02  

Now let’s have a look at the data to see if we have any null values in the dataset:

In [2]:
print(data.isnull().sum())

Daily Time Spent on Site    0
Age                         0
Area Income                 0
Daily Internet Usage        0
Ad Topic Line               0
City                        0
Male                        0
Country                     0
Timestamp                   0
Clicked on Ad               0
dtype: int64


Before moving forward, let’s have a look at the names of all the columns in the dataset:

In [3]:
print(data.columns)

Index(['Daily Time Spent on Site', 'Age', 'Area Income',
       'Daily Internet Usage', 'Ad Topic Line', 'City', 'Male', 'Country',
       'Timestamp', 'Clicked on Ad'],
      dtype='object')


Now let’s prepare the data so that we can easily fit into the machine learning model. Here will drop some unnecessary columns:

In [4]:
x=data.iloc[:,0:7]
x=x.drop(['Ad Topic Line','City'],axis=1)
x

Unnamed: 0,Daily Time Spent on Site,Age,Area Income,Daily Internet Usage,Male
0,68.95,35,61833.90,256.09,0
1,80.23,31,68441.85,193.77,1
2,69.47,26,59785.94,236.50,0
3,74.15,29,54806.18,245.89,1
4,68.37,35,73889.99,225.58,0
...,...,...,...,...,...
995,72.97,30,71384.57,208.58,1
996,51.30,45,67782.17,134.42,1
997,51.63,51,42415.72,120.37,1
998,55.55,19,41920.79,187.95,0


In [5]:
y=data.iloc[:,9]
y

0      0
1      0
2      0
3      0
4      0
      ..
995    1
996    1
997    1
998    0
999    1
Name: Clicked on Ad, Length: 1000, dtype: int64

Now I will split the data into training and test sets. Here I will use 70 per cent of data as training and 30 per cent as testing:

In [6]:
from sklearn.model_selection import train_test_split
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.3,random_state=4)
print(x_train.shape)
print(y_train.shape)
print(x_test.shape)
print(y_test.shape)

(700, 5)
(700,)
(300, 5)
(300,)


## Logistic Regression Model:

Now I will use the logistic regression model to predict the click-through rate of the users:

In [7]:
from sklearn.linear_model import LogisticRegression
Lr=LogisticRegression(C=0.01,random_state=0)
Lr.fit(x_train,y_train)
y_pred=Lr.predict(x_test)
print(y_pred)

[0 0 0 1 1 0 0 1 0 1 1 0 0 0 1 1 0 1 0 1 1 0 1 1 1 0 0 1 0 0 0 1 1 1 0 1 1
 0 1 0 0 1 0 1 0 1 0 0 0 0 1 1 0 1 0 1 0 0 0 1 1 1 1 0 1 0 0 0 0 1 1 0 0 0
 0 1 1 1 0 1 0 1 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 1 1 0
 1 0 0 1 1 1 0 0 1 0 1 1 0 1 1 0 0 1 1 0 0 0 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0
 1 0 0 1 0 0 1 0 0 1 0 0 0 0 1 1 1 1 1 0 1 1 0 0 1 0 1 0 0 0 1 1 0 0 0 1 0
 0 1 0 1 0 1 1 1 1 0 1 1 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 1 1 1 1 1 1 0
 0 1 0 0 0 1 1 0 0 0 0 1 0 1 0 0 1 1 0 0 0 1 1 0 1 1 0 1 1 0 1 1 1 1 1 1 1
 1 0 1 0 1 0 0 1 0 0 0 1 1 1 0 1 0 1 0 1 0 0 0 0 0 0 1 0 0 1 0 1 1 1 0 0 1
 1 0 0 0]


In [8]:
y_pred_proba=Lr.predict_proba(x_test)
print(y_pred_proba)

[[9.59792501e-01 4.02074986e-02]
 [9.09776229e-01 9.02237714e-02]
 [7.01565507e-01 2.98434493e-01]
 [3.22512399e-01 6.77487601e-01]
 [2.61683053e-02 9.73831695e-01]
 [7.93135584e-01 2.06864416e-01]
 [9.81079720e-01 1.89202796e-02]
 [1.51174357e-02 9.84882564e-01]
 [8.13527820e-01 1.86472180e-01]
 [4.66011549e-03 9.95339885e-01]
 [4.49152136e-01 5.50847864e-01]
 [5.79962255e-01 4.20037745e-01]
 [9.71411345e-01 2.85886551e-02]
 [6.80813498e-01 3.19186502e-01]
 [1.57934217e-03 9.98420658e-01]
 [4.27634403e-03 9.95723656e-01]
 [9.78083451e-01 2.19165489e-02]
 [3.14819973e-03 9.96851800e-01]
 [9.84776758e-01 1.52232419e-02]
 [7.54927542e-03 9.92450725e-01]
 [4.39604876e-02 9.56039512e-01]
 [8.83260862e-01 1.16739138e-01]
 [3.43304798e-01 6.56695202e-01]
 [4.64330398e-01 5.35669602e-01]
 [1.04935760e-01 8.95064240e-01]
 [9.40619354e-01 5.93806460e-02]
 [9.64504455e-01 3.54955446e-02]
 [3.88141195e-02 9.61185880e-01]
 [7.55872151e-01 2.44127849e-01]
 [9.85954424e-01 1.40455764e-02]
 [9.635106

Now let’s have a look at the accuracy of the model:

In [9]:
from sklearn.metrics import accuracy_score
print(accuracy_score(y_test,y_pred))

0.8733333333333333


So we have an accuracy of around 87 per cent which is not bad for this kind of problem. At last, let’s have a look at the f1 score:

In [10]:
from sklearn.metrics import f1_score
print(f1_score(y_test,y_pred))

0.8652482269503545
