# Building a Logistic Regression

Create a logistic regression based on the bank data provided. 

The data is based on the marketing campaign efforts of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Note that the first column of the dataset is the index.

Source: [Moro et al., 2014] S. Moro, P. Cortez and P. Rita. A Data-Driven Approach to Predict the Success of Bank Telemarketing. Decision Support Systems, Elsevier, 62:22-31, June 2014


In [None]:
import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Import the relevant libraries

In [None]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import statsmodels.api as sm

## Load the data

Load the ‘Example_bank_data.csv’ dataset.

In [None]:
raw_data = pd.read_csv(r'/kaggle/input/Example-bank-data.csv')
raw_data.head()

We want to know whether the bank marketing strategy was successful, so we need to transform the outcome variable into 0s and 1s in order to perform a logistic regression.

In [None]:
data=raw_data.copy()

In [None]:
data= data.drop(['Unnamed: 0'],axis=1)
data.head()

In [None]:
data['y'] = data['y'].map({'yes':1, 'no':0})
data.head()

In [None]:
data.describe()

### Declare the dependent and independent variables

In [None]:
y=data['y']
x1=data['duration']

### Simple Logistic Regression

Run the regression and visualize it on a scatter plot (no need to plot the line).

In [None]:
x= sm.add_constant(x1)
reg_log= sm.Logit(y,x)
results_log = reg_log.fit()
print(results_log.summary())

In [None]:
plt.scatter(x1,y,color = 'C1')
plt.xlabel('Duration', fontsize = 20)
plt.ylabel('Subscription', fontsize = 20)
plt.show()