# Credit Risk Assessment

A credit risk is the risk of default on a debt that may arise from a borrower failing to make required payments. Someone who defaults on their loans can mean a lot of money lost for a financial institution and at the same time, false negatives (i.e. declining a loan when they are capable of repaying the money) can mean money lost from interest. In the following, we perform some exploratory analysis using the German Credit Data stored in the UCI Machine Learning Repository [here](https://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data)

## Retrieve Data Source from ODL

The dsdbc module is delivered with the z/OS IzODA Anaconda distribution. It enables Python applications to access the z/OS IzODA Mainframe Data Service. The Data Service component, Optimized Data Layer (ODL, previously known as MDS) provides optimized, virtualized, and parallelized access to both IBM Z data sources and other off-platform data sources. Refer to the [IBM Knowledge Center](https://www.ibm.com/support/knowledgecenter/) for product documentation (search: "Open Data Analytics"). After the connection is established, we will use this connection to retrieve the data and store it in a dataframe using the python library, [pandas](http://pandas.pydata.org/).

In [None]:
import dsdbc
import pandas as pd

#Remove font warnings.
import warnings
warnings.filterwarnings("ignore")
warnings.simplefilter("ignore", category=PendingDeprecationWarning)

#Get ODL database connection. For ssid, please enter in the subsystem ID of
#the local data service server. If this is not specified, the name will be
#selected based on the server group, 'sgrp', or if not provided, the first
#subsystem with a Data Service will be used. For more information please
#run help(dsdbc)
ssid = "<SUBSYSTEM_ID_OF_LOCAL_DATA_SERVICE_SERVER>"
conn = dsdbc.connect(SSID=ssid)

#Query Execution
sql = ('(select checkingAccount as "checkingAccount",'
        'duration as "duration",'
        'creditHistory as "creditHistory",'
        'purpose as "purpose",'
        'amount as "amount",'
        'savingsAccount as "savingsAccount",'
        'gender as "gender",'
        'age as "age",'
        'dependents as "dependents",'
        'risk as "risk" from credit_data)')
credit_risk_df = pd.read_sql(sql, conn)

# Data Analysis with Pandas

With pandas dataframe, we can do a variety of analysis on the mainframe data.

In [None]:
#Visualize the first couple of rows in our dataframe
credit_risk_df.head()

In [None]:
#Look at the datatypes within our dataframe
credit_risk_df.dtypes

In [None]:
#Calculate the averages of the different feature columns and group by risk
#where 0 is good risk and 1 is bad risk.
credit_risk_df.groupby('risk').mean()

In [None]:
#Statistics describing the feature column, age
credit_risk_df['age'].describe()

# Visualization with Matplotlib

We will use matplotlib to create a plot analysing data trends. A visualization that might be useful is seeing what each gender is requesting money for. In this particular dataset, the different requests or purposes include buying a car, furniture, television, repairs, domestic appliances, education/business related incentive, and "other".

In [None]:
import matplotlib.pyplot as plt
#Anything with a % is called a magic command. Below, we are specifying that
#we want matplotlib plots to be outputted within the notebook
%matplotlib inline

In [None]:
gender_vs_purpose = credit_risk_df.groupby(['purpose', 'gender']).size().unstack()
gender_vs_purpose.plot(kind='line', marker='o', figsize=(17,10))
plt.title("What are people borrowing for?")
plt.ylabel("# of People")
plt.xlabel("Purpose for borrowing credit")

In this dataset, for both males and females, the #1 reason for borrowing is to buy a car. And it seems in all cases besides for domestic appliances, more men are requesting for a loan.

These different visuals help to understand our data better. It is hard to pick out with the human eye all these correlations. Pandas and matplotlib make it incredibly easy to find these underlying patterns.