# Statlog (German Credit Data) Data Set
## Description
Classifies people described by a set of attributes as good or bad credit risks.

The column attributes are as follows:

| Id | Attribute |Domain |
| -   | ----------- |----------- |
| 1   | Status of existing checking account | A11 : ... < 0 DM, A12 : 0 <= ... < 200 DM, A13 : ... >= 200 DM / salary assignments for at least 1 year, A14 : no checking account |
| 2   | Duration in months | numerical |
| 3   | Credit history | A30 : no credits taken/ all credits paid back duly, A31 : all credits at this bank paid back duly, A32 : existing credits paid back duly till now, A33 : delay in paying off in the past, A34 : critical account/ other credits existing (not at this bank) |
| 4   | Purpose | A40 : car (new), A41 : car (used), A42 : furniture/equipment, A43 : radio/television, A44 : domestic appliances, A45 : repairs, A46 : education, A47 : (vacation - does not exist?), A48 : retraining, A49 : business, A410 : others |
| 5   | Credit amount | numerical  |
| 6   | Savings account/bonds | A61 : ... < 100 DM, A62 : 100 <= ... < 500 DM, A63 : 500 <= ... < 1000 DM, A64 : .. >= 1000 DM, A65 : unknown/ no savings account |
| 7   | Present employment since | A71 : unemployed, A72 : ... < 1 year, A73 : 1 <= ... < 4 years, A74 : 4 <= ... < 7 years, A75 : .. >= 7 years |
| 8   | Installment rate in percentage of disposable income | numerical |
| 9   | Personal status and sex | A91 : male : divorced/separated, A92 : female : divorced/separated/married, A93 : male : single, A94 : male : married/widowed, A95 : female : single |
| 10   | Other debtors/guarantors | A101 : none, A102 : co-applicant, A103 : guarantor |
| 11   | Present residence since | numerical |
| 12   | Property | A121 : real estate, A122 : if not A121 : building society savings agreement/ life insurance, A123 : if not A121/A122 : car or other, not in attribute 6, A124 : unknown / no property |
| 13   | Age in years | numerical |
| 14   | Other installment plans | A141 : bank, A142 : stores, A143 : none |
| 15   | Housing | A151 : rent, A152 : own, A153 : for free |
| 16   | Number of existing credits at this bank | numerical |
| 17   | Job | A171 : unemployed/ unskilled - non-resident, A172 : unskilled - resident, A173 : skilled employee / official, A174 : management/ self-employed/, highly qualified employee/ officer |
| 18   | Number of people being liable to provide maintenance for | numerical |
| 19   | Telephone | A191 : none, A192 : yes, registered under the customers name |
| 20   | Foreign worker | A201 : yes, A202 : no |

## Importin and processing dataset

In [8]:
import os                        # for os.path.exists
import json                      # for loading metadata
import urllib                    # for downloading remote files 
import numpy as np
import pandas as pd

In [6]:
def download(remoteurl: str, localfile: str):
    """
    Download remoteurl to localfile, unless localfile already exists.
    Returns the localfile string.
    """
    localfile = "../../datasets/classification/"+localfile
    if not os.path.exists(localfile):
        print("Downloading %s..." % localfile)
        filename, headers = urllib.request.urlretrieve(remoteurl, localfile)
    return localfile

In [10]:
data = np.loadtxt(download("https://archive.ics.uci.edu/ml/machine-learning-databases/statlog/german/german.data-numeric","german-data"))
y = data[:, -1].astype(np.int32) # for last column
X = data[:, :-1].astype(np.int32) # for all but last column

## Importing libraries

In [12]:
import matplotlib as mpl
import matplotlib.pyplot as plt
import sklearn
import sklearn.tree
import sklearn.neighbors
import sklearn.ensemble
from sklearn.naive_bayes import GaussianNB
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split

In [14]:
X_train, X_test, y_train, y_test = train_test_split(X,y, test_size=0.3, random_state=0)