# Hands-On Exercise 6.1:
# Working with Logistic Regression in Python
***

## Objectives

#### In this exercise, you will work with logistic regression using Python. This exercise allows you to predict a target variable from a number of predictor variables. The goal is to show you how logistic regression models can be used to predict unknown values from a model trained on an existing data set.

### Overview

You will work on a data set called Adult that you will import from a csv file. You will:<br>
● Review the data within the data set and transform it into a format suitable for use in logistic regression<br>
● Examine the predictor variables<br>
● Train a logistic regression model that can be used to make future predictions<br><br>

**Pre-step: Execute the following cell in order to suppress warning messages**

In [None]:
import warnings
warnings.filterwarnings("ignore")

1. ❏ Import **pandas** and **numpy**

In [None]:
import pandas as pd
import numpy as np

2. ❏ Read in the **Adult.csv** dataset into a dataframe and preview it

In [None]:
AdultData = pd.read_csv('Adult.csv')
AdultData.head()

3. ❏ Check if there are any missing values in each column of the columns

In [None]:
AdultData.isnull().any()

4. ❏ Check how many different values there are for each feature

In [None]:
AdultData.nunique()

5. ❏ Examine the structure of the dataset using the **.shape, .columns and .dtypes** attributes

In [None]:
AdultData.shape

In [None]:
AdultData.columns

In [None]:
AdultData.dtypes

6a. ❏ Preview and dummy code the **RELATIONSHIP** variable

In [None]:
AdultData['RELATIONSHIP'].head()

In [None]:
REL = pd.get_dummies(AdultData['RELATIONSHIP'], drop_first=True)

REL.head()

6b. ❏ Preview and dummy code the **OCCUPATION** variable

In [None]:
AdultData['OCCUPATION'].head()

In [None]:
OCC = pd.get_dummies(AdultData['OCCUPATION'], drop_first=True)

OCC.head()

7. ❏ Join these two dummy coded variables to **'AGE', 'CAPITALGAIN',
'EDUCATIONNUM'** to make up the predictor variable dataset

In [None]:
X = REL.join(OCC.join(AdultData[['AGE','CAPITALGAIN','EDUCATIONNUM']]))

8. ❏ Create a target variable dataframe containing **'ABOVE50K'**

In [None]:
y = AdultData['ABOVE50K']

9. ❏ Split target and predictor variables into training and test datasets using the **train_test_split** library from **sklearn.model_selection**

In [None]:
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=0)

10. ❏ Import the **LogisticRegression** library from **sklearn.linear_model**

In [None]:
from sklearn.linear_model import LogisticRegression

11. ❏ Instantiate the logistic regression model

In [None]:
logisticRegr = LogisticRegression()

12. ❏ Train the model on the data

In [None]:
logisticRegr.fit(x_train, y_train)

13. ❏ Examine the coefficients produced by the model

In [None]:
logisticRegr.coef_

14. ❏ Use the model to make predictions for the test dataset

In [None]:
predictions = logisticRegr.predict(x_test)
print(predictions)

15. ❏ Use the **score()** method to get the accuracy of the model

In [None]:
score = logisticRegr.score(x_test, y_test)
print(score)

16. ❏ Calculate the **Mean Square Error (MSE)**

In [None]:
mse = np.mean((predictions-y_test)**2)
print(mse)

17. ❏ Import **confusion_matrix** from **sklearn.metrics**

In [None]:
from sklearn.metrics import confusion_matrix

18. ❏ Create and display a **confusion matrix**

In [None]:
cm = confusion_matrix(y_test, predictions)
print(cm)

19. ❏ Import **roc_auc_score** and **roc_curve** from **sklearn.metrics**

In [None]:
from sklearn.metrics import roc_auc_score
from sklearn.metrics import roc_curve

20. ❏ Calculate the **auc score**

In [None]:
roc_auc_score(y_test, logisticRegr.predict(x_test))

21. ❏ Import **matplotlib.pyplot**

In [None]:
import matplotlib.pyplot as plt

22. ❏ Calculate **false positive rates, true positive rates**, and **thresholds** using the **roc_curve()** function and plot them 

In [None]:
fpr, tpr, thresholds = roc_curve(y_test, logisticRegr.predict_proba(x_test)[:,1])
plt.figure()
plt.plot(fpr, tpr)

## <center>**Congratulations! You have completed the exercise.**</center>

![image.png](attachment:image.png)

# <center>**This is the end of the exercise.**</center>