# German Loan Data

## Instructions

### Banking: Loan Approval Case
In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.

You can find the data `german_credit_data.csv` saved under the [data](../data) folder:<br>
NOTE: At this point, **DO NOT check the reference website**
- Looking into the data using appropriate functions and extract the fields in the data.
- For each data, describe what the data is about and what fields are saved.
    - Which column contain continuous variables and which columns contain categorical variables?    

You need to answer the questions and perform the task below:
- What are mean age, mean credit amount, and duration?
- What are the major three purpose of loan?
- What is the majoriry loan taker? Male of female?

Note:
- You are NOT ALLOWED to import other library or package
- You can write you own functions
- Your answers should be readable with approprate comments
- You can refer to [markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you are not familar with Markdown

### Reference
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit

The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

## Import libraries 

In [57]:
# Usual libraries are imported here
import os
import yaml
import dask.dataframe as dd
import pandas as pd
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Please perform your tasks below and answer the questions

In [58]:
# Read the csv file

data = pd.read_csv("german_credit_data.csv") 

# Loop through the column name of the dataset and read through its dtypes
# Use dtypes to determine if they are continuous variables or categorical variables

for i in range(data.shape[1]):
    print("Column Name:".ljust(15), data.columns[i].ljust(20))
    print("dtypes:".ljust(15), str(data[data.columns[i]].dtypes).ljust(20))
    if i == 0:
        print("Variable types:".ljust(13), "index".ljust(20), "\n")
    else:
        if data[data.columns[i]].dtypes != "object":
            vtype = "Continuous Variables"
        else:
            vtype = "Categorical Variables"
        print("Variable types:".ljust(13), vtype.ljust(20), "\n")

Column Name:    Unnamed: 0          
dtypes:         int64               
Variable types: index                

Column Name:    Age                 
dtypes:         int64               
Variable types: Continuous Variables 

Column Name:    Sex                 
dtypes:         object              
Variable types: Categorical Variables 

Column Name:    Job                 
dtypes:         int64               
Variable types: Continuous Variables 

Column Name:    Housing             
dtypes:         object              
Variable types: Categorical Variables 

Column Name:    Saving accounts     
dtypes:         object              
Variable types: Categorical Variables 

Column Name:    Checking account    
dtypes:         object              
Variable types: Categorical Variables 

Column Name:    Credit amount       
dtypes:         int64               
Variable types: Continuous Variables 

Column Name:    Duration            
dtypes:         int64               
Variable types: Co

In [59]:
# Find the mean age, mean credit amount, and duration
mean_age = data['Age'].mean()
mean_creditamt = data['Credit amount'].mean()
mean_duration = data['Duration'].mean()

print("The mean age, credit amount and duration were ", "{:.0f}".format(mean_age), " ,", 
      "{:.0f}".format(mean_creditamt), " and ", "{:.0f}".format(mean_duration), " respectively.", sep="")

The mean age, credit amount and duration were 36 ,3271 and 21 respectively.


In [73]:
# Find the major three purpose of loan

print("The major three purposes of loans were used for: ", data["Purpose"].value_counts().index[0],
     ", ",  data["Purpose"].value_counts().index[1], 
     " and ", data["Purpose"].value_counts().index[2], sep="")

The major three purposes of loans were used for: car, radio/TV and furniture/equipment


In [82]:
# Find the majoriry loan taker
print("The majority loan taker was ", data["Sex"].value_counts().index[0], " with ", 
     data["Sex"].value_counts()[0], " applications", sep="")

The majority loan taker wasmale with 690 applications
