# German Loan Data

## Instructions

### Banking: Loan Approval Case
In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.

You can find the data `german_credit_data.csv` saved under the [data](../data) folder:<br>
NOTE: At this point, **DO NOT check the reference website**
- Looking into the data using appropriate functions and extract the fields in the data.
- For each data, describe what the data is about and what fields are saved.
    - Which column contain continuous variables and which columns contain categorical variables?    

You need to answer the questions and perform the task below:
- What are mean age, mean credit amount, and duration?
- What are the major three purpose of loan?
- What is the majoriry loan taker? Male of female?

Note:
- You are NOT ALLOWED to import other library or package
- You can write you own functions
- Your answers should be readable with approprate comments
- You can refer to [markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you are not familar with Markdown

### Reference
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit

The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

## Import libraries 

In [2]:
# Usual libraries are imported here
import os
import yaml
import dask.dataframe as dd
import pandas as pd
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Please perform your tasks below and answer the questions

### 0. Read data

In [4]:
df = pd.read_csv('/Users/yangyong/Documents/文稿/git/brickmovers_explor_and_vis-1/data/german_credit_data.csv')

In [5]:
df.head()

Unnamed: 0.1,Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose
0,0,67,male,2,own,,little,1169,6,radio/TV
1,1,22,female,2,own,little,moderate,5951,48,radio/TV
2,2,49,male,1,own,little,,2096,12,education
3,3,45,male,2,free,little,little,7882,42,furniture/equipment
4,4,53,male,2,free,little,little,4870,24,car


In [8]:
df.columns

Index(['Unnamed: 0', 'Age', 'Sex', 'Job', 'Housing', 'Saving accounts',
       'Checking account', 'Credit amount', 'Duration', 'Purpose'],
      dtype='object')

### 1. What are mean age, mean credit amount, and duration?

In [13]:
avg_age = df['Age'].mean()
avg_credit_amount = df['Credit amount'].mean()
avg_duration = df['Duration'].mean()

print(f'mean age: {avg_age}\nmean credit amount: {avg_credit_amount}\nmean duration: {avg_duration}')

mean age: 35.546
mean credit amount: 3271.258
mean duration: 20.903


### 2. What are the major three purpose of loan?

In [26]:
pur = list(df['Purpose'].value_counts()[:3].index)

print(f"The major three purpose is '{pur[0]}', '{pur[1]}', and '{pur[2]}'.")

The major three purpose is 'car', 'radio/TV', and 'furniture/equipment'.


### 3. What is the majoriry loan taker? Male of female?

In [27]:
df['Sex'].value_counts()

male      690
female    310
Name: Sex, dtype: int64

In [28]:
print('Male is the majority loan taker.')

Male is the majority loan taker.
