# Building a Logistic Regression
Create a logistic regression based on the bank data provided.

The data is based on the marketing campaign efforts of a Portuguese banking institution. The classification goal is to predict if the client will subscribe a term deposit (variable y).

Note that the first column of the dataset is the index.

# Import the relevant libraries

In [None]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
sns.set()

from scipy import stats
stats.chisqprob = lambda chisq, df: stats.chi2.sf(chisq, df)

# Loading The DataSets

In [None]:
df = pd.read_csv("Example_bank_data.csv")
df.head()

We want to know whether the bank marketing strategy was successful, so we need to transform the outcome variable into 0s and 1s in order to perform a logistic regression.

We make sure to create a copy of the data before we start altering itNote that we don't change the original data we loaded.

In [None]:
data = df.copy()

In the Given dataset Unnamed:0 is not used so we drop

In [None]:
data = data.drop(['Unnamed: 0'], axis = 1)
data.head()

We use the map function to change any 'yes' values to 1 and 'no' values to 0. 

In [None]:
data['y'] = data['y'].map({'yes':1, 'no':0})
data.head()

# Check the descriptive statistics

In [None]:
data.describe()

# Declare The Dependent Variable And Independent

In [None]:
y = data['y']
x1 = data['duration']

# Simple Logistic Regression
Run the regression and visualize it on a scatter plot (no need to plot the line).

In [None]:
x = sm.add_constant(x1)
reg_log = sm.Logit(y,x)
results_log = reg_log.fit()
results_log.summary()

# Plot The Scatter Plot

In [None]:
plt.scatter(x1,y,color = 'C1')
plt.title("Duration Vs Subsription",fontsize = 20)
plt.xlabel('Duration', fontsize = 20)
plt.ylabel('Subscription', fontsize = 20)
plt.show()

In [None]:
np.set_printoptions(formatter={'float': lambda x: "{0:0.2f}".format(x)})

In [None]:
results_log.predict()

In the given column the answers are 'Yes' and 'No' and after we mapping to '0' and '1' so the predicted values will in '0'th column and '1'st column

In [None]:
np.array(data['y'])
results_log.pred_table()

In last line we predict the values so now we have to calculate the actual value

In [None]:
cm_df = pd.DataFrame(results_log.pred_table())
cm_df.columns = ['Predicted 0','Predicted 1']
cm_df = cm_df.rename(index={0: 'Actual 0',1:'Actual 1'})
cm_df

# Test The Accuracy

In [None]:
cm = np.array(cm_df)
accuracy_train = (cm[0,0]+cm[1,1])/cm.sum()
accuracy_train

# Conclusion :-
                        In Logistic Regression after the prediction the output will be yes and no so the given the dataset we can analyse how many peoples are in yes category and no category and that we will find by duration i.e the duration is dependendent on y column. 