# Notebook para el avance 1 del proyecto de credit scoring

## Objetivo

El objetivo de este notebook es realizar un analisis exploratorio de los datos de credit scoring.

## Datos

Los datos corresponden a un dataset de credit scoring de la competicion de kaggle: https://archive.ics.uci.edu/dataset/144/statlog+german+credit+data



In [1]:
!pip install ucimlrepo

Collecting ucimlrepo
  Downloading ucimlrepo-0.0.7-py3-none-any.whl.metadata (5.5 kB)
Downloading ucimlrepo-0.0.7-py3-none-any.whl (8.0 kB)
Installing collected packages: ucimlrepo
Successfully installed ucimlrepo-0.0.7


In [16]:
from ucimlrepo import fetch_ucirepo 
  
# fetch dataset 
statlog_german_credit_data = fetch_ucirepo(id=144) 
  
# data (as pandas dataframes) 
X = statlog_german_credit_data.data.features 
y = statlog_german_credit_data.data.targets 
  
  
# variable information 
print(statlog_german_credit_data.variables)
print(statlog_german_credit_data.metadata)



           name     role         type     demographic  \
0    Attribute1  Feature  Categorical            None   
1    Attribute2  Feature      Integer            None   
2    Attribute3  Feature  Categorical            None   
3    Attribute4  Feature  Categorical            None   
4    Attribute5  Feature      Integer            None   
5    Attribute6  Feature  Categorical            None   
6    Attribute7  Feature  Categorical           Other   
7    Attribute8  Feature      Integer            None   
8    Attribute9  Feature  Categorical  Marital Status   
9   Attribute10  Feature  Categorical            None   
10  Attribute11  Feature      Integer            None   
11  Attribute12  Feature  Categorical            None   
12  Attribute13  Feature      Integer             Age   
13  Attribute14  Feature  Categorical            None   
14  Attribute15  Feature  Categorical           Other   
15  Attribute16  Feature      Integer            None   
16  Attribute17  Feature  Categ

In [17]:
statlog_german_credit_data.variables.description

0                   Status of existing checking account
1                                              Duration
2                                        Credit history
3                                               Purpose
4                                         Credit amount
5                                 Savings account/bonds
6                              Present employment since
7     Installment rate in percentage of disposable i...
8                               Personal status and sex
9                            Other debtors / guarantors
10                              Present residence since
11                                             Property
12                                                  Age
13                              Other installment plans
14                                              Housing
15              Number of existing credits at this bank
16                                                  Job
17    Number of people being liable to provide m

In [18]:
X.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1000 entries, 0 to 999
Data columns (total 20 columns):
 #   Column       Non-Null Count  Dtype 
---  ------       --------------  ----- 
 0   Attribute1   1000 non-null   object
 1   Attribute2   1000 non-null   int64 
 2   Attribute3   1000 non-null   object
 3   Attribute4   1000 non-null   object
 4   Attribute5   1000 non-null   int64 
 5   Attribute6   1000 non-null   object
 6   Attribute7   1000 non-null   object
 7   Attribute8   1000 non-null   int64 
 8   Attribute9   1000 non-null   object
 9   Attribute10  1000 non-null   object
 10  Attribute11  1000 non-null   int64 
 11  Attribute12  1000 non-null   object
 12  Attribute13  1000 non-null   int64 
 13  Attribute14  1000 non-null   object
 14  Attribute15  1000 non-null   object
 15  Attribute16  1000 non-null   int64 
 16  Attribute17  1000 non-null   object
 17  Attribute18  1000 non-null   int64 
 18  Attribute19  1000 non-null   object
 19  Attribute20  1000 non-null  

In [29]:
print(statlog_german_credit_data.metadata['additional_info']['variable_info'])

Attribute 1:  (qualitative)      
 Status of existing checking account
             A11 :      ... <    0 DM
	       A12 : 0 <= ... <  200 DM
	       A13 :      ... >= 200 DM / salary assignments for at least 1 year
               A14 : no checking account

Attribute 2:  (numerical)
	      Duration in month

Attribute 3:  (qualitative)
	      Credit history
	      A30 : no credits taken/ all credits paid back duly
              A31 : all credits at this bank paid back duly
	      A32 : existing credits paid back duly till now
              A33 : delay in paying off in the past
	      A34 : critical account/  other credits existing (not at this bank)

Attribute 4:  (qualitative)
	      Purpose
	      A40 : car (new)
	      A41 : car (used)
	      A42 : furniture/equipment
	      A43 : radio/television
	      A44 : domestic appliances
	      A45 : repairs
	      A46 : education
	      A47 : (vacation - does not exist?)
	      A48 : retraining
	      A49 : business
	      A410 : others

A

## Uso de Metadata para la transformacion de variables y nombres de columnas.

In [46]:
from utils import attribute_mappings, attribute_column_names  

attribute_column_names 
attribute_mappings

{'A11': '< 0 DM',
 'A12': '0 <= ... < 200 DM',
 'A13': '... >= 200 DM / salary assignments for at least 1 year',
 'A14': 'no checking account',
 'A30': 'no credits taken/ all credits paid back duly',
 'A31': 'all credits at this bank paid back duly',
 'A32': 'existing credits paid back duly till now',
 'A33': 'delay in paying off in the past',
 'A34': 'critical account/ other credits existing (not at this bank)',
 'A40': 'car (new)',
 'A41': 'car (used)',
 'A42': 'furniture/equipment',
 'A43': 'radio/television',
 'A44': 'domestic appliances',
 'A45': 'repairs',
 'A46': 'education',
 'A47': '(vacation - does not exist?)',
 'A48': 'retraining',
 'A49': 'business',
 'A410': 'others',
 'A61': '... < 100 DM',
 'A62': '100 <= ... < 500 DM',
 'A63': '500 <= ... < 1000 DM',
 'A64': '... >= 1000 DM',
 'A65': 'unknown/ no savings account',
 'A71': 'unemployed',
 'A72': '... < 1 year',
 'A73': '1 <= ... < 4 years',
 'A74': '4 <= ... < 7 years',
 'A75': '... >= 7 years',
 'A91': 'male: divorced/s

In [47]:
for attribute in attribute_mappings:
    X.replace(attribute, attribute_mappings[attribute], inplace=True)

X.rename(columns=attribute_column_names, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X.replace(attribute, attribute_mappings[attribute], inplace=True)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  X.rename(columns=attribute_column_names, inplace=True)


In [48]:
X.head()

Unnamed: 0,duration,credit_history,purpose,credit_amount,savings,employment,installment_rate,personal_status_sex,other_debtors,residence_since,property,age,installment_plans,housing,existing_credits,job,num_dependents,telephone,foreign_worker,risk
0,< 0 DM,6,critical account/ other credits existing (not ...,radio/television,1169,unknown/ no savings account,... >= 7 years,4,male: single,none,4,real estate,67,none,own,2,skilled employee / official,1,"yes, registered under the customer's name",yes
1,0 <= ... < 200 DM,48,existing credits paid back duly till now,radio/television,5951,... < 100 DM,1 <= ... < 4 years,2,female: divorced/separated/married,none,2,real estate,22,none,own,1,skilled employee / official,1,none,yes
2,no checking account,12,critical account/ other credits existing (not ...,education,2096,... < 100 DM,4 <= ... < 7 years,2,male: single,none,3,real estate,49,none,own,1,unskilled - resident,2,none,yes
3,< 0 DM,42,existing credits paid back duly till now,furniture/equipment,7882,... < 100 DM,4 <= ... < 7 years,2,male: single,guarantor,4,if not A121: building society savings agreemen...,45,none,for free,1,skilled employee / official,2,none,yes
4,< 0 DM,24,delay in paying off in the past,car (new),4870,... < 100 DM,1 <= ... < 4 years,3,male: single,none,4,unknown / no property,53,none,for free,2,skilled employee / official,2,none,yes
