# German Loan Data

## Instructions

### Banking: Loan Approval Case
In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.

You can find the data `german_credit_data.csv` saved under the [data](../data) folder:<br>
NOTE: At this point, **DO NOT check the reference website**
- Looking into the data using appropriate functions and extract the fields in the data.
- For each data, describe what the data is about and what fields are saved.
    - Which column contain continuous variables and which columns contain categorical variables?    

You need to answer the questions and perform the task below:
- What are mean age, mean credit amount, and duration?
- What are the major three purpose of loan?
- What is the majoriry loan taker? Male of female?

Note:
- You are NOT ALLOWED to import other library or package
- You can write you own functions
- Your answers should be readable with approprate comments
- You can refer to [markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you are not familar with Markdown

### Reference
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit

The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

## Import libraries 

In [16]:
# Usual libraries are imported here
import os
import yaml
import dask.dataframe as dd
import pandas as pd
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Please perform your tasks below and answer the questions

In [17]:
# read data
os.chdir("../data")
credit_data = pd.read_csv('german_credit_data.csv')
credit_data.head(n=5)

Unnamed: 0.1,Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose
0,0,67,male,2,own,,little,1169,6,radio/TV
1,1,22,female,2,own,little,moderate,5951,48,radio/TV
2,2,49,male,1,own,little,,2096,12,education
3,3,45,male,2,free,little,little,7882,42,furniture/equipment
4,4,53,male,2,free,little,little,4870,24,car


1.Unnamed(0): an index, no repeated, the only key to find a loan item. <br>
2.Age: the age of agent <br>
3.Sex: the sex of agent <br>
4.Job: How many jobs does the agent have. <br>
5.Housing: Does the agent have his own house. <br>
6.Saving accounts: How many saving accounts(储蓄账户) does the agent have. <br>
7.Checking account: How many checking accounts(支票账户) does the agent have. <br>
8.Credit amount: How many credit does the agent want to borrow from bank. <br>
9.Duration: Loan period(year). <br>
10.Purpose: For what reason does the agent loan. <br>

<br>
continuous variables columns: Unnamed(0)、Job、Age、Credit amount、Duration<br>
categorical variables columns: Sex、Housing、Saving accounts、Checking account	、Purpose<br>


In [41]:
# data describe
n = credit_data.size
print("Total number of data:",n)

credit_data.isnull().sum()
loan_df = credit_data.fillna('little')  # fill NAN data

Total number of data: 10000


In [43]:
# mean age
mean_age = loan_df['Age'].mean()
mean_duration = loan_df['Duration'].mean()
mean_credit_amount = loan_df['Credit amount'].mean()
print("Mean agen:",mean_age)
print("Mean duration:",mean_duration)
print("Mean credit amount:",mean_credit_amount)

Mean agen: 35.546
Mean duration: 20.903
Mean credit amount: 3271.258


In [44]:
loan_df['Purpose'].value_counts()

car                    337
radio/TV               280
furniture/equipment    181
business                97
education               59
repairs                 22
vacation/others         12
domestic appliances     12
Name: Purpose, dtype: int64

Main purpose to loan: 1. buy car  2. radio/TV  3.furnitrue/equipment

In [None]:
load_df['Sex'].v