# German Loan Data

## Instructions

### Banking: Loan Approval Case
In this use case, each entry in the dataset represents a person who takes a credit loan from a bank. The learning task is to classify each person as either a good or bad credit risk according to the set of attributes.

You can find the data `german_credit_data.csv` saved under the [data](../data) folder:<br>
NOTE: At this point, **DO NOT check the reference website**
- Looking into the data using appropriate functions and extract the fields in the data.
- For each data, describe what the data is about and what fields are saved.
    - Which column contain continuous variables and which columns contain categorical variables?    

You need to answer the questions and perform the task below:
- What are mean age, mean credit amount, and duration?
- What are the major three purpose of loan?
- What is the majoriry loan taker? Male of female?

Note:
- You are NOT ALLOWED to import other library or package
- You can write you own functions
- Your answers should be readable with approprate comments
- You can refer to [markdown cheatsheet](https://github.com/adam-p/markdown-here/wiki/Markdown-Cheatsheet) if you are not familar with Markdown

### Reference
This dataset was sourced from Kaggle: https://www.kaggle.com/uciml/german-credit

The original source is: https://archive.ics.uci.edu/ml/datasets/Statlog+%28German+Credit+Data%29

## Import libraries 

In [2]:
# Usual libraries are imported here
import os
import yaml
import dask.dataframe as dd
import pandas as pd
import matplotlib
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

## Please perform your tasks below and answer the questions

 - This part is to describe the Data.

In [3]:
german_credit_data = pd.read_csv('../data/german_credit_data.csv')
german_credit_data.info()
german_credit_data.head(10)

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 10 columns):
 #   Column            Non-Null Count  Dtype 
---  ------            --------------  ----- 
 0   Unnamed: 0        1000 non-null   int64 
 1   Age               1000 non-null   int64 
 2   Sex               1000 non-null   object
 3   Job               1000 non-null   int64 
 4   Housing           1000 non-null   object
 5   Saving accounts   817 non-null    object
 6   Checking account  606 non-null    object
 7   Credit amount     1000 non-null   int64 
 8   Duration          1000 non-null   int64 
 9   Purpose           1000 non-null   object
dtypes: int64(5), object(5)
memory usage: 78.2+ KB


Unnamed: 0.1,Unnamed: 0,Age,Sex,Job,Housing,Saving accounts,Checking account,Credit amount,Duration,Purpose
0,0,67,male,2,own,,little,1169,6,radio/TV
1,1,22,female,2,own,little,moderate,5951,48,radio/TV
2,2,49,male,1,own,little,,2096,12,education
3,3,45,male,2,free,little,little,7882,42,furniture/equipment
4,4,53,male,2,free,little,little,4870,24,car
5,5,35,male,1,free,,,9055,36,education
6,6,53,male,2,own,quite rich,,2835,24,furniture/equipment
7,7,35,male,3,rent,little,moderate,6948,36,car
8,8,61,male,1,own,rich,,3059,12,radio/TV
9,9,28,male,3,own,little,moderate,5234,30,car


1000 entries and 10 columns in dataset 'german_credit_data.csv'. 10 columns describe these 1000 people's index， age， sex， job classification， house state(such as: rent, own, etc.)， saving state， type of checking account， credit amount， loan duration and their purpose of loan.

continuous variables columns: Unnamed: 0 , Age , Credit amount

categorical variables columns: Sex , Job , Housing , Saving accounts , Checking account , Duration , Purpose

 - This part is to answer the Questions.

What are mean age, mean credit amount, and duration?

In [5]:
print('mean age:', np.mean(german_credit_data['Age']))
print('mean credit amount:', np.mean(german_credit_data['Credit amount']))
print('mean duration:', np.mean(german_credit_data['Duration']))

mean age: 35.546
mean credit amount: 3271.258
mean duration: 20.903


What are the major three purpose of loan?

In [12]:
purpose = german_credit_data[['Unnamed: 0', 'Purpose']].groupby('Purpose').count()
purpose.sort_values('Unnamed: 0', ascending=False)[0:3]

Unnamed: 0_level_0,Unnamed: 0
Purpose,Unnamed: 1_level_1
car,337
radio/TV,280
furniture/equipment,181


What is the majoriry loan taker? Male of female?

In [19]:
sex = german_credit_data[['Unnamed: 0', 'Sex']].groupby('Sex').count()
sex

Unnamed: 0_level_0,Unnamed: 0
Sex,Unnamed: 1_level_1
female,310
male,690


The majoriry loan taker is male