# CAS Machine Learning Demo

## Overview of the Predictive Modeling Case
A financial services company offers a home equity line of credit to its clients. The
company has extended several thousand lines of credit in the past, and many of these
accepted applicants (approximately 20%) have defaulted on their loans. By using
geographic, demographic, and financial variables, the company wants to build a model
to predict whether an applicant will default.

## Data
After analyzing the data, the company selected a subset of 12 predictor (or input)
variables to model whether each applicant defaulted. The response (or target) variable
(BAD) indicates whether an applicant defaulted on the home equity line of credit.
These variables, along with their model role, measurement level, and description, are
shown in the following table from the <a href = "https://support.sas.com/documentation/onlinedoc/viya/examples.htm">**HMEQ**</a> data set.

| Name      | Model Role | Measurement Level | Description                                                            |
|:----------|:-----------|:------------------|:-----------------------------------------------------------------------|
| BAD       | Target     | Binary            | 1 = applicant defaulted on loan or delinquent, 0 = applicant paid loan |
| CLAGE     | Input      | Interval          | Age of oldest credit line in months                                    |
| CLNO      | Input      | Interval          | Number of credit lines                                                 |
| DEBTINC   | Input      | Interval          | Debt-to-income ratio                                                   |
| DELINQ    | Input      | Interval          | Number of delinquent credit lines                                      |
| DEROG     | Input      | Interval          | Number of derogatory reports                                           |
| JOB       | Input      | Nominal           | Occupational categories                                                |
| LOAN      | Input      | Interval          | Amount of loan request                                                 |
| MORTDUE   | Input      | Interval          | Amount due on existing mortgage                                        |
| NINQ      | Input      | Interval          | Number of recent credit inquiries                                      |
| REASON    | Input      | Binary            | DebtCon = debt consolidation, HomeImp = home improvement               |
| VALUE     | Input      | Interval          | Value of current property                                              |
| YOJ       | Input      | Interval          | Years at present job                                                   |

# Load Packages

In [None]:
# Imports the necessary packages

import swat
import pandas as pd
from matplotlib import pyplot as plt
%matplotlib inline
swat.options.cas.print_messages = True

# Connect to CAS

In [None]:
# Connects to the CAS server

conn = swat.CAS("server.demo.sas.com", 30570, "christine", "Student1")

In [None]:
# Change timeout

mytime = 60*60*12
conn.session.timeout(time=mytime)
conn.session.sessionStatus()

# Load Data onto the Server

In [None]:
# Read in the hmeq CSV to an in-memory data table and create a CAS table object reference

hmeq = conn.read_csv("data/hmeq.csv", casout = dict(name="hmeq", replace=True))

In [None]:
# Checks the tables that are in-memory

conn.table.tableInfo()

In [None]:
# Checks the type of the dataset

type(hmeq)

# Exploratory Analysis

In [None]:
# Displays a sample of the dataset

hmeq.head()

In [None]:
# Displays the type of the dataset sample

type(hmeq.head())

In [None]:
# Displays the shape of the dataframe

hmeq.shape

In [None]:
# Displays the columns in the dataset

hmeq.columns

In [None]:
# Displays information about the entire table

hmeq.info()

In [None]:
# Checks for missing values

hmeq.distinct()

In [None]:
# Explores the levels in the target variable

hmeq["BAD"].value_counts(normalize = True)

In [None]:
# Explores the target distribution

hmeq["BAD"].describe()

In [None]:
# Remove duplicate rows

hmeq.drop_duplicates(casout = {"name": "hmeq", "replace": True})
hmeq.shape

In [None]:
# Explores the distibution of the numeric variables

hmeq.describe(include=['numeric'])

In [None]:
# Explores the distibution of the categorical variables

hmeq.describe(include=['character'])

In [None]:
# Creates a histogram of all the numeric variables 

hmeq.hist(figsize = (10, 10))

In [None]:
# Creates a correlation matrix of the numeric variables

hmeq.corr()

# Explore the Data using CAS Actions

In [None]:
# Loads the simple action set

conn.loadActionSet('simple')

In [None]:
# Checks the distribution of the categorical columns

conn.simple.freq(
    table = indata,
    inputs = ["JOB","REASON"]
)

# A more concise way of writing the above

conn.freq(
    table = indata,
    inputs = ["JOB","REASON"]
)

In [None]:
# Alternative way of performing the correlation analysis

conn.correlation(
    table = hmeq,
    inputs = ["LOAN","VALUE","MORTDUE"]
)

In [None]:
conn.loadActionSet('sampling')
actions = conn.builtins.help(actionSet='sampling')

# End the Session

In [None]:
conn.session.endSession()