
# **Data Préstamos Bancarios**

El análisis de datos de préstamos es una tarea fundamental para los bancos y otras instituciones financieras. Utilizando datos históricos de solicitudes de préstamos, se pueden identificar patrones que ayuden a predecir la probabilidad de incumplimiento de un préstamo. Este tipo de análisis permite a las instituciones financieras tomar decisiones informadas sobre a quién otorgar un préstamo, basándose en la capacidad de pago y en el historial de comportamiento financiero de los solicitantes.

En esta práctica, trabajaremos con un conjunto de datos que contiene información de clientes que han solicitado préstamos en un banco. La tarea principal será aplicar técnicas de Análisis Exploratorio de Datos (EDA) para comprender los patrones y características de los clientes que pueden estar relacionados con el incumplimiento del pago de un préstamo. A través de este proceso, se buscará identificar señales tempranas que puedan indicar si un solicitante es probable que no pueda devolver el préstamo.

## Definición del problema:

El objetivo es identificar los factores que contribuyen a la probabilidad de que un cliente no devuelva el préstamo que ha solicitado. A través de un análisis exploratorio, se explorarán variables clave del conjunto de datos para encontrar posibles correlaciones entre las características del cliente y el riesgo de incumplimiento. Las decisiones derivadas de este análisis pueden ser utilizadas para mejorar el proceso de aprobación de préstamos, minimizando el riesgo de pérdida para el banco.

El análisis permitirá responder a la pregunta: ¿Qué tipo de clientes son más propensos a no devolver un préstamo? Esta información será clave para la formulación de estrategias más precisas en la concesión de préstamos y la gestión de riesgos financieros.

Los pasos a realizar son:

1. Análisis inicial de los datos y preprocesamiento inicial
2. Correlaciones, tratamiento de missing y outliers
4. Tratamiento de variables categoricas: encoding
5. Aplicación de algoritmos
6. Evaluación con la muestra de test

### **Importamos librerías**

In [1]:
import os
import pandas as pd
import plotly.express as px

In [2]:
pd.set_option('display.max_rows', 500)
pd.set_option('display.max_columns', 500)

In [3]:
pd.credit = pd.read_csv("/content/application_data.csv")

In [None]:
# Ruta relativa al archivo
ruta_csv = os.path.join('data', 'Raw', 'application_data.csv')

# Leer el archivo CSV
pd.credit = pd.read_csv(ruta_csv)

### **Variables futuras**

Son variables que contienen información que no estaría disponible en el momento de la toma de decisiones, como cuando se evalúa la solicitud de un préstamo. Por ejemplo, datos que se recopilan o actualizan después de que la decisión inicial ha sido tomada. Si las incluyésemos en el modelo se crearía un sesgo de información, ya que el modelo tendría acceso a datos que no serían conocidos en una situación real.

Para seleccionar las variables futuras nos hemos basado en una evaluación general del contexto y los nombres de las columnas que parecen indicar datos recopilados o calculados después de un evento inicial, en este caso una solicitud de crédito o préstamo. Esto no es 100% seguro porque depende del contexto específico del problema y del significado exacto de cada variable en nuestro conjunto de datos.

Al analizar todas las variables, podemos decir que ninguna de ellas proporciona información sobre eventos que suceden después del momento de predicción, lo cual es lo que calificaría a una variable como futura.

`SK_ID_CURR` es un idenificador único de un préstamo ya existente. No tiene información futura, solo un registro presente o pasado.

`TARGET` es la variable objetivo.

`NAME_CONTRACT_TYPE`:Identifica el tipo de contrato del préstamo, que es un hecho definido al inicio del contrato, no algo futuro.

Los datos de perfil como `DAYS_BIRTH`, `CODE_GENDER`, `CNT_CHILDREN`, `AMT_INCOME_TOTAL`, `FLAG_OWN_CAR`, `FLAG_OWN_REALTY`, `NAME_INCOME_TYPE`, `NAME_EDUCATION_TYPE`, `NAME_FAMILY_STATUS`, `NAME_HOUSING_TYPE`, etc. son atributos fijos o cambian con muy poca frecuencia y no dependen de eventos futuros. Reflejan la situación del cliente en el momento de la evaluación y, por tanto, son seguros para incluir en el modelo.

Los valores AMT son valores calculados al momento de la solicitud:
Estas cantidades se determinan en función del monto solicitado `AMT_CREDIT`, las condiciones del contrato `AMT_ANNUITY`, o el precio del bien financiado `AMT_GOODS_PRICE`. Estos valores están definidos y fijos desde el momento en que se evalúa al cliente.

Otro datos como `FLAG_EMP_PHONE`, `FLAG_WORK_PHONE`, `FLAG_CONT_MOBILE`, `FLAG_PHONE`, `FLAG_EMAIL`, `FLAG_MOBIL`, entre otros, son datos actuales, basados en la información proporcionada por el cliente.

`REGION_RATING_CLIENT` y `REGION_RATING_CLIENT_W_CITY` son datos basados en información geográfica actual.

Las consultas al Buró de Crédito `AMT_REQ_CREDIT_BUREAU_` muestran la cantidad de veces que la institución ha consultado el historial de crédito del cliente en distintos períodos (días, semanas, meses, trimestres, años).

En realidad, cada consulta puede considerarse información obtenida antes o en el momento de solicitar el préstamo, como parte del historial. Si están disponibles en el momento de evaluar el riesgo, no son futuras. Solo reflejan el comportamiento previo del cliente en términos de acceso a crédito, no el resultado de pago.

Finalmente hemos concluido que todas estas variables reflejan características y comportamientos previos o presentes del cliente, no eventos posteriores a la predicción. Esto significa que no están sesgando el mode`o con información de un período que suceda después del que queremos analizar.

Si hubiera habido alguna variable futura, el procedimiento a seguir habría sido el siguiente:


```python
# Variables futuras identificadas
list_future_variables = [
    'NOMBRE_VARIABLE_FUTURA_1',
    'NOMBRE_VARIABLE_FUTURA_2',
    'NOMBRE_VARIABLE_FUTURA_3',
    'NOMBRE_VARIABLE_FUTURA_4',
    'NOMBRE_VARIABLE_FUTURA_X'
]

# Eliminar variables futuras y otras variables irrelevantes
data_cleaned = data.drop(columns = list_future_variables)

# Confirmar que las variables han sido eliminadas
print("Columnas restantes después de eliminar variables futuras:")
print(data_cleaned.columns)
```
Y de aquí en adelante trabajaríamos con el nuevo dataset sin variables futuras: data_cleaned.

Como no hay variables futuras, continuamos trabajando con el dataset original "data".

### **Análisis generales de la tabla**
Dimensiones

In [None]:
print(pd.credit.shape, pd.credit.drop_duplicates().shape)

(5789, 122) (5789, 122)


In [None]:
pd.credit

Unnamed: 0,SK_ID_CURR,TARGET,NAME_CONTRACT_TYPE,CODE_GENDER,FLAG_OWN_CAR,FLAG_OWN_REALTY,CNT_CHILDREN,AMT_INCOME_TOTAL,AMT_CREDIT,AMT_ANNUITY,AMT_GOODS_PRICE,NAME_TYPE_SUITE,NAME_INCOME_TYPE,NAME_EDUCATION_TYPE,NAME_FAMILY_STATUS,NAME_HOUSING_TYPE,REGION_POPULATION_RELATIVE,DAYS_BIRTH,DAYS_EMPLOYED,DAYS_REGISTRATION,DAYS_ID_PUBLISH,OWN_CAR_AGE,FLAG_MOBIL,FLAG_EMP_PHONE,FLAG_WORK_PHONE,FLAG_CONT_MOBILE,FLAG_PHONE,FLAG_EMAIL,OCCUPATION_TYPE,CNT_FAM_MEMBERS,REGION_RATING_CLIENT,REGION_RATING_CLIENT_W_CITY,WEEKDAY_APPR_PROCESS_START,HOUR_APPR_PROCESS_START,REG_REGION_NOT_LIVE_REGION,REG_REGION_NOT_WORK_REGION,LIVE_REGION_NOT_WORK_REGION,REG_CITY_NOT_LIVE_CITY,REG_CITY_NOT_WORK_CITY,LIVE_CITY_NOT_WORK_CITY,ORGANIZATION_TYPE,EXT_SOURCE_1,EXT_SOURCE_2,EXT_SOURCE_3,APARTMENTS_AVG,BASEMENTAREA_AVG,YEARS_BEGINEXPLUATATION_AVG,YEARS_BUILD_AVG,COMMONAREA_AVG,ELEVATORS_AVG,ENTRANCES_AVG,FLOORSMAX_AVG,FLOORSMIN_AVG,LANDAREA_AVG,LIVINGAPARTMENTS_AVG,LIVINGAREA_AVG,NONLIVINGAPARTMENTS_AVG,NONLIVINGAREA_AVG,APARTMENTS_MODE,BASEMENTAREA_MODE,YEARS_BEGINEXPLUATATION_MODE,YEARS_BUILD_MODE,COMMONAREA_MODE,ELEVATORS_MODE,ENTRANCES_MODE,FLOORSMAX_MODE,FLOORSMIN_MODE,LANDAREA_MODE,LIVINGAPARTMENTS_MODE,LIVINGAREA_MODE,NONLIVINGAPARTMENTS_MODE,NONLIVINGAREA_MODE,APARTMENTS_MEDI,BASEMENTAREA_MEDI,YEARS_BEGINEXPLUATATION_MEDI,YEARS_BUILD_MEDI,COMMONAREA_MEDI,ELEVATORS_MEDI,ENTRANCES_MEDI,FLOORSMAX_MEDI,FLOORSMIN_MEDI,LANDAREA_MEDI,LIVINGAPARTMENTS_MEDI,LIVINGAREA_MEDI,NONLIVINGAPARTMENTS_MEDI,NONLIVINGAREA_MEDI,FONDKAPREMONT_MODE,HOUSETYPE_MODE,TOTALAREA_MODE,WALLSMATERIAL_MODE,EMERGENCYSTATE_MODE,OBS_30_CNT_SOCIAL_CIRCLE,DEF_30_CNT_SOCIAL_CIRCLE,OBS_60_CNT_SOCIAL_CIRCLE,DEF_60_CNT_SOCIAL_CIRCLE,DAYS_LAST_PHONE_CHANGE,FLAG_DOCUMENT_2,FLAG_DOCUMENT_3,FLAG_DOCUMENT_4,FLAG_DOCUMENT_5,FLAG_DOCUMENT_6,FLAG_DOCUMENT_7,FLAG_DOCUMENT_8,FLAG_DOCUMENT_9,FLAG_DOCUMENT_10,FLAG_DOCUMENT_11,FLAG_DOCUMENT_12,FLAG_DOCUMENT_13,FLAG_DOCUMENT_14,FLAG_DOCUMENT_15,FLAG_DOCUMENT_16,FLAG_DOCUMENT_17,FLAG_DOCUMENT_18,FLAG_DOCUMENT_19,FLAG_DOCUMENT_20,FLAG_DOCUMENT_21,AMT_REQ_CREDIT_BUREAU_HOUR,AMT_REQ_CREDIT_BUREAU_DAY,AMT_REQ_CREDIT_BUREAU_WEEK,AMT_REQ_CREDIT_BUREAU_MON,AMT_REQ_CREDIT_BUREAU_QRT,AMT_REQ_CREDIT_BUREAU_YEAR
0,100002,1,Cash loans,M,N,Y,0,202500.0,406597.5,24700.5,351000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.018801,-9461,-637,-3648.0,-2120,,1,1,0,1,1,0,Laborers,1.0,2,2,WEDNESDAY,10,0,0,0,0,0,0,Business Entity Type 3,0.083037,0.262949,0.139376,0.0247,0.0369,0.9722,0.6192,0.0143,0.00,0.0690,0.0833,0.1250,0.0369,0.0202,0.0190,0.0000,0.0000,0.0252,0.0383,0.9722,0.6341,0.0144,0.0000,0.0690,0.0833,0.1250,0.0377,0.0220,0.0198,0.0000,0.0000,0.0250,0.0369,0.9722,0.6243,0.0144,0.00,0.0690,0.0833,0.1250,0.0375,0.0205,0.0193,0.0000,0.0000,reg oper account,block of flats,0.0149,"Stone, brick",No,2.0,2.0,2.0,2.0,-1134.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003,0,Cash loans,F,N,N,0,270000.0,1293502.5,35698.5,1129500.0,Family,State servant,Higher education,Married,House / apartment,0.003541,-16765,-1188,-1186.0,-291,,1,1,0,1,1,0,Core staff,2.0,1,1,MONDAY,11,0,0,0,0,0,0,School,0.311267,0.622246,,0.0959,0.0529,0.9851,0.7960,0.0605,0.08,0.0345,0.2917,0.3333,0.0130,0.0773,0.0549,0.0039,0.0098,0.0924,0.0538,0.9851,0.8040,0.0497,0.0806,0.0345,0.2917,0.3333,0.0128,0.0790,0.0554,0.0000,0.0000,0.0968,0.0529,0.9851,0.7987,0.0608,0.08,0.0345,0.2917,0.3333,0.0132,0.0787,0.0558,0.0039,0.0100,reg oper account,block of flats,0.0714,Block,No,1.0,0.0,1.0,0.0,-828.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004,0,Revolving loans,M,Y,Y,0,67500.0,135000.0,6750.0,135000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.010032,-19046,-225,-4260.0,-2531,26.0,1,1,1,1,1,0,Laborers,1.0,2,2,MONDAY,9,0,0,0,0,0,0,Government,,0.555912,0.729567,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-815.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,100006,0,Cash loans,F,N,Y,0,135000.0,312682.5,29686.5,297000.0,Unaccompanied,Working,Secondary / secondary special,Civil marriage,House / apartment,0.008019,-19005,-3039,-9833.0,-2437,,1,1,0,1,0,0,Laborers,2.0,2,2,WEDNESDAY,17,0,0,0,0,0,0,Business Entity Type 3,,0.650442,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,0.0,2.0,0.0,-617.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
4,100007,0,Cash loans,M,N,Y,0,121500.0,513000.0,21865.5,513000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.028663,-19932,-3038,-4311.0,-3458,,1,1,0,1,0,0,Core staff,1.0,2,2,THURSDAY,11,0,0,0,0,1,1,Religion,,0.322738,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-1106.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5784,106764,0,Cash loans,F,N,Y,0,112050.0,450000.0,22977.0,450000.0,Unaccompanied,Working,Secondary / secondary special,Married,House / apartment,0.024610,-10249,-2474,-4482.0,-2346,,1,1,0,1,1,0,Sales staff,2.0,2,2,SUNDAY,9,0,0,0,0,0,0,Trade: type 3,0.371027,0.575786,0.609276,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,0.0,1.0,0.0,-1641.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
5785,106765,0,Cash loans,F,N,Y,2,63000.0,112068.0,12199.5,99000.0,Family,Commercial associate,Secondary / secondary special,Married,House / apartment,0.018850,-14853,-1867,-3521.0,-4378,,1,1,0,1,0,0,,4.0,2,2,FRIDAY,15,0,0,0,0,0,0,Kindergarten,,0.592031,,0.1464,0.0000,0.9767,0.6804,0.0260,0.00,0.0345,0.1667,0.2083,0.0488,0.1194,0.0615,0.0000,0.0198,0.1492,0.0000,0.9767,0.6929,0.0262,0.0000,0.0345,0.1667,0.2083,0.0500,0.1304,0.0640,0.0000,0.0209,0.1478,0.0000,0.9767,0.6847,0.0262,0.00,0.0345,0.1667,0.2083,0.0497,0.1214,0.0626,0.0000,0.0202,reg oper account,block of flats,0.0668,"Stone, brick",No,2.0,0.0,2.0,0.0,-1201.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
5786,106766,0,Cash loans,M,Y,N,0,270000.0,1350000.0,35743.5,1350000.0,,Working,Higher education,Civil marriage,With parents,0.030755,-11457,-1764,-5239.0,-3707,6.0,1,1,0,1,0,0,IT staff,2.0,2,2,WEDNESDAY,17,0,0,0,0,0,0,Business Entity Type 3,,0.703121,0.203252,0.1567,0.1142,0.9866,0.8164,0.0440,0.16,0.1379,0.3333,0.3750,0.0536,0.1252,0.1589,0.0116,0.0487,0.1597,0.1185,0.9866,0.8236,0.0444,0.1611,0.1379,0.3333,0.3750,0.0548,0.1368,0.1656,0.0117,0.0515,0.1582,0.1142,0.9866,0.8189,0.0443,0.16,0.1379,0.3333,0.3750,0.0545,0.1274,0.1618,0.0116,0.0497,reg oper account,block of flats,0.1587,"Stone, brick",No,0.0,0.0,0.0,0.0,-527.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5787,106767,0,Cash loans,F,N,N,0,67500.0,153504.0,15084.0,144000.0,Unaccompanied,Pensioner,Lower secondary,Single / not married,House / apartment,0.018634,-25054,365243,-1404.0,-4797,,1,0,0,1,0,0,,1.0,2,2,SUNDAY,10,0,0,0,0,0,0,XNA,,0.255691,,,,0.9781,,,,,,,,,0.0175,,,,,0.9782,,,,,,,,,0.0182,,,,,0.9781,,,,,,,,,0.0178,,,,block of flats,0.0152,,No,0.0,0.0,0.0,0.0,-1403.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,


Tipos de datos

In [None]:
pd.credit.dtypes.to_dict()

{'SK_ID_CURR': dtype('int64'),
 'TARGET': dtype('int64'),
 'NAME_CONTRACT_TYPE': dtype('O'),
 'CODE_GENDER': dtype('O'),
 'FLAG_OWN_CAR': dtype('O'),
 'FLAG_OWN_REALTY': dtype('O'),
 'CNT_CHILDREN': dtype('int64'),
 'AMT_INCOME_TOTAL': dtype('float64'),
 'AMT_CREDIT': dtype('float64'),
 'AMT_ANNUITY': dtype('float64'),
 'AMT_GOODS_PRICE': dtype('float64'),
 'NAME_TYPE_SUITE': dtype('O'),
 'NAME_INCOME_TYPE': dtype('O'),
 'NAME_EDUCATION_TYPE': dtype('O'),
 'NAME_FAMILY_STATUS': dtype('O'),
 'NAME_HOUSING_TYPE': dtype('O'),
 'REGION_POPULATION_RELATIVE': dtype('float64'),
 'DAYS_BIRTH': dtype('int64'),
 'DAYS_EMPLOYED': dtype('int64'),
 'DAYS_REGISTRATION': dtype('float64'),
 'DAYS_ID_PUBLISH': dtype('int64'),
 'OWN_CAR_AGE': dtype('float64'),
 'FLAG_MOBIL': dtype('int64'),
 'FLAG_EMP_PHONE': dtype('int64'),
 'FLAG_WORK_PHONE': dtype('int64'),
 'FLAG_CONT_MOBILE': dtype('int64'),
 'FLAG_PHONE': dtype('int64'),
 'FLAG_EMAIL': dtype('int64'),
 'OCCUPATION_TYPE': dtype('O'),
 'CNT_FAM_MEMB

### **Exploración de la variable objetivo y tratamiento**

In [None]:
pd_plot_target = pd.credit['TARGET'].value_counts(normalize=True).mul(100).rename('percent').reset_index()
pd_plot_target.rename(columns={'index': 'TARGET'}, inplace=True)

pd_plot_target_conteo = pd.credit['TARGET'].value_counts().rename('count').reset_index()
pd_plot_target_conteo.rename(columns={'index': 'TARGET'}, inplace=True)

pd_plot_target_pc = pd.merge(pd_plot_target, pd_plot_target_conteo, on='TARGET', how='inner')

print(pd_plot_target_pc)

   TARGET    percent  count
0       0  92.261185   5341
1       1   7.738815    448


In [None]:
fig = px.bar(pd_plot_target_pc, x='TARGET', y='percent', text='count')

fig.update_layout(
    title='Distribución de la variable objetivo',
    xaxis_title='TARGET',
    yaxis_title='Porcentaje (%)',
    template='plotly_white'
)

fig.show()

El eje X indica los valores posibles de la variable TARGET, que en este caso son 0 (clientes SIN dificultades de pago) y 1 (clientes CON dificultades de pago).

Esto quiere decir que:
- si `TARGET` = 0: Los clientes cumplieron con sus pagos.
- si `TARGET` = 1: Los clientes tuvieron impagos o problemas financieros.

El eje Y muestra el porcentaje de observaciones para cada clase.

La clase 0 (sin dificultades de pago) tiene una mayor proporción, representando aproximadamente 90% del total.
La clase 1 (con dificultades de pago) es significativamente menor, aproximadamente 10% del total.

La gráfica muestra un problema de clases desbalanceadas, común en datasets financieros donde la mayoría de los clientes no tienen problemas de pago (0), mientras que los casos de incumplimiento (1) son una minoría.


### **Selección de threshold por filas y columnas para eliminar valores missing**

In [None]:
# Calculamos valores faltantes por columna y por fila
pd_series_null_columns = pd.credit.isnull().sum().sort_values(ascending=False)
pd_series_null_rows = pd.credit.isnull().sum(axis=1).sort_values(ascending=False)
print(pd_series_null_columns.shape, pd_series_null_rows.shape)

(122,) (5789,)


In [None]:
# Creamos DataFrames para almacenar información sobre valores nulos
pd_null_columnas = pd.DataFrame(pd_series_null_columns, columns=['nulos_columnas'])
pd_null_filas = pd.DataFrame(pd_series_null_rows, columns=['nulos_filas'])

In [None]:
# Añadimos columnas con porcentaje de nulos
pd_null_columnas['porcentaje_columnas'] = pd_null_columnas['nulos_columnas'] / pd.credit.shape[0]
pd_null_columnas = pd_null_columnas.sort_values(by='porcentaje_columnas', ascending=False)
pd_null_filas['porcentaje_filas'] = pd_null_filas['nulos_filas'] / pd.credit.shape[1]
pd_null_filas = pd_null_filas.sort_values(by='porcentaje_filas', ascending=False)

In [None]:
# Mostramos estadísticas iniciales
print(f"Dimensiones iniciales del dataset: {pd.credit.shape}")

print(pd_null_columnas.head())
print(pd_null_filas.head())

Dimensiones iniciales del dataset: (5789, 122)
                          nulos_columnas  porcentaje_columnas
COMMONAREA_MODE                     3990             0.689238
COMMONAREA_MEDI                     3990             0.689238
COMMONAREA_AVG                      3990             0.689238
NONLIVINGAPARTMENTS_MODE            3970             0.685783
NONLIVINGAPARTMENTS_MEDI            3970             0.685783
      nulos_filas  porcentaje_filas
3498           59          0.483607
3525           58          0.475410
3718           58          0.475410
5088           57          0.467213
3878           57          0.467213


Este código muestra de mayor a menor el % de nulos en cada fila y en cada columna. Como el mayor %, tanto en filas como en columnas, no alcanza el 90%, no tenemos que eliminar nada.

En caso de que hubiese habido alguno mayor que 90%, el procedimiento a seguir habría sido el siguiente:

```python
# Definimos el threshold para eliminar columnas y filas
threshold_columnas = 0.9
threshold_filas = 0.9

# Filtramos columnas que tienen menos del 90% de valores faltantes
list_vars_not_null = list(pd_null_columnas[pd_null_columnas['porcentaje_columnas'] < threshold_columnas].index)
data_filtered_columns = data.loc[:, list_vars_not_null]
print(f"Dimensiones después de filtrar columnas con más del {threshold_columnas*100}% de nulos: {data_filtered_columns.shape}")

# Filtramos filas que tienen menos del 90% de valores faltantes
data_filtered = data_filtered_columns[data_filtered_columns.isnull().sum(axis=1) / data_filtered_columns.shape[1] < threshold_filas]
print(f"Dimensiones después de filtrar filas con más del {threshold_filas*100}% de nulos: {data_filtered.shape}")
```

### **Tipos: Variables categoricas y numericas**

Vamos a calcular la cantidad de valores únicos para cada variable en el conjunto de datos. Este procedimiento nos permitirá obtener una idea general de qué variables podrían ser categóricas y cuáles numéricas. Una vez realizados los cálculos, hemos establecido un umbral de 50 valores únicos, ya que el conjunto de datos contiene un número considerable de filas, y consideramos que este valor es adecuado para la distinción inicial. Este umbral puede ajustarse posteriormente a medida que avanzamos en el análisis.

En términos prácticos, aquellas variables cuyo número de valores únicos sea inferior a 50 serán consideradas como variables categóricas, mientras que aquellas que superen los 50 valores únicos se clasificarán como numéricas. Esto se basa en la premisa de que, generalmente, las variables categóricas tienden a tener un número limitado de valores distintos, mientras que las variables numéricas suelen tener una mayor diversidad de valores.

Al revisar las variables que hemos clasificado inicialmente como categóricas, nos hemos percatado de que algunas de ellas, a pesar de tener menos de 50 valores únicos, en realidad son variables numéricas. Un ejemplo de esto es la variable `OBS_30_CNT_SOCIAL_CIRCLE`, que contiene solo 33 valores únicos, pero representa el número de observaciones del entorno social del cliente con mora observable de 30 días (DPD, por sus siglas en inglés), lo cual claramente indica que se trata de una variable numérica.

Con base en este análisis, hemos decidido reconsiderar la clasificación de esta y otras variables similares, cambiándolas a numéricas, ya que su naturaleza y contexto sugieren que deben ser tratadas como tales, independientemente de la cantidad de valores únicos que tengan.

In [None]:
dict_nunique = {col: pd.credit[col].nunique() for col in pd.credit.columns}
filtrado_dict = {key: value for key, value in dict_nunique.items() if value < 50}

list_var_cat = list(filtrado_dict.keys())
list_var_continuous = [col for col in pd.credit.select_dtypes(include='number').columns if col not in list_var_cat]

manual_numeric_vars = [
    'AMT_REQ_CREDIT_BUREAU_HOUR',
    'AMT_REQ_CREDIT_BUREAU_DAY',
    'AMT_REQ_CREDIT_BUREAU_WEEK',
    'AMT_REQ_CREDIT_BUREAU_MON',
    'AMT_REQ_CREDIT_BUREAU_QRT',
    'AMT_REQ_CREDIT_BUREAU_YEAR',
    'OBS_30_CNT_SOCIAL_CIRCLE',
    'DEF_30_CNT_SOCIAL_CIRCLE',
    'OBS_60_CNT_SOCIAL_CIRCLE',
    'DEF_60_CNT_SOCIAL_CIRCLE',
    'ELEVATORS_MODE',
    'ENTRANCES_MODE',
    'FLOORSMAX_MODE',
    'FLOORSMIN_MODE',
    'ELEVATORS_MEDI',
    'ENTRANCES_MEDI',
    'FLOORSMAX_MEDI',
    'FLOORSMIN_MEDI',
    'HOUR_APPR_PROCESS_START',
    'CNT_FAM_MEMBERS',
    'CNT_CHILDREN'
]

list_var_cat = [col for col in list_var_cat if col not in manual_numeric_vars]
list_var_continuous += manual_numeric_vars


print("Variables categóricas:", list_var_cat)
print("Variables numéricas:", list_var_continuous)



Variables categóricas: ['TARGET', 'NAME_CONTRACT_TYPE', 'CODE_GENDER', 'FLAG_OWN_CAR', 'FLAG_OWN_REALTY', 'NAME_TYPE_SUITE', 'NAME_INCOME_TYPE', 'NAME_EDUCATION_TYPE', 'NAME_FAMILY_STATUS', 'NAME_HOUSING_TYPE', 'OWN_CAR_AGE', 'FLAG_MOBIL', 'FLAG_EMP_PHONE', 'FLAG_WORK_PHONE', 'FLAG_CONT_MOBILE', 'FLAG_PHONE', 'FLAG_EMAIL', 'OCCUPATION_TYPE', 'REGION_RATING_CLIENT', 'REGION_RATING_CLIENT_W_CITY', 'WEEKDAY_APPR_PROCESS_START', 'REG_REGION_NOT_LIVE_REGION', 'REG_REGION_NOT_WORK_REGION', 'LIVE_REGION_NOT_WORK_REGION', 'REG_CITY_NOT_LIVE_CITY', 'REG_CITY_NOT_WORK_CITY', 'LIVE_CITY_NOT_WORK_CITY', 'NONLIVINGAPARTMENTS_MODE', 'FONDKAPREMONT_MODE', 'HOUSETYPE_MODE', 'WALLSMATERIAL_MODE', 'EMERGENCYSTATE_MODE', 'FLAG_DOCUMENT_2', 'FLAG_DOCUMENT_3', 'FLAG_DOCUMENT_4', 'FLAG_DOCUMENT_5', 'FLAG_DOCUMENT_6', 'FLAG_DOCUMENT_7', 'FLAG_DOCUMENT_8', 'FLAG_DOCUMENT_9', 'FLAG_DOCUMENT_10', 'FLAG_DOCUMENT_11', 'FLAG_DOCUMENT_12', 'FLAG_DOCUMENT_13', 'FLAG_DOCUMENT_14', 'FLAG_DOCUMENT_15', 'FLAG_DOCUMENT_1

In [None]:
pd.credit[list_var_cat] = pd.credit[list_var_cat].astype("category")
pd.credit[list_var_continuous] = pd.credit[list_var_continuous].astype(float)

print(pd.credit.dtypes)

SK_ID_CURR                       float64
TARGET                          category
NAME_CONTRACT_TYPE              category
CODE_GENDER                     category
FLAG_OWN_CAR                    category
FLAG_OWN_REALTY                 category
CNT_CHILDREN                     float64
AMT_INCOME_TOTAL                 float64
AMT_CREDIT                       float64
AMT_ANNUITY                      float64
AMT_GOODS_PRICE                  float64
NAME_TYPE_SUITE                 category
NAME_INCOME_TYPE                category
NAME_EDUCATION_TYPE             category
NAME_FAMILY_STATUS              category
NAME_HOUSING_TYPE               category
REGION_POPULATION_RELATIVE       float64
DAYS_BIRTH                       float64
DAYS_EMPLOYED                    float64
DAYS_REGISTRATION                float64
DAYS_ID_PUBLISH                  float64
OWN_CAR_AGE                     category
FLAG_MOBIL                      category
FLAG_EMP_PHONE                  category
FLAG_WORK_PHONE 

### **Preprocesamiento inicial de algunas variables**

En este punto, vamos a modificar algunos aspectos del DataFrame para hacerlo más limpio y legible. En primer lugar, vamos a convertir todos los nombres de las columnas a minúsculas para mantener una convención uniforme. Por otro lado, eliminaremos los espacios en blanco, si es que los hay, al principio y al final de las cadenas de texto en todas las columnas de tipo "object".

Además, vamos a transformar la variable `WEEKDAY_APPR_PROCESS_START`. En lugar de tener los días de la semana escritos como texto, los reemplazaremos por números que representen su orden (lunes = 1, martes = 2, etc.). Esto facilitará la codificación posterior, permitiendo representar los días como "Weekday_1", "Weekday_2", etc., lo que puede ser útil para análisis posteriores.

Por lo general, no hemos identificado otros aspectos que requieran limpieza en este momento, por lo que consideramos que este DataFrame ya está preprocesado y listo para el tratamiento de valores faltantes, detección de valores atípicos, cálculo de correlaciones, entre otros análisis.

In [None]:
pd.credit.columns = pd.credit.columns.str.lower()

pd.credit = pd.credit.apply(lambda x: x.str.strip() if x.dtype == "object" else x)

weekday_mapping = {
    'MONDAY': 1, 'TUESDAY': 2, 'WEDNESDAY': 3,
    'THURSDAY': 4, 'FRIDAY': 5, 'SATURDAY': 6, 'SUNDAY': 7
}

pd.credit['weekday_appr_process_start'] = pd.credit['weekday_appr_process_start'].map(weekday_mapping)


In [None]:
pd.credit.head()

Unnamed: 0,sk_id_curr,target,name_contract_type,code_gender,flag_own_car,flag_own_realty,cnt_children,amt_income_total,amt_credit,amt_annuity,amt_goods_price,name_type_suite,name_income_type,name_education_type,name_family_status,name_housing_type,region_population_relative,days_birth,days_employed,days_registration,days_id_publish,own_car_age,flag_mobil,flag_emp_phone,flag_work_phone,flag_cont_mobile,flag_phone,flag_email,occupation_type,cnt_fam_members,region_rating_client,region_rating_client_w_city,weekday_appr_process_start,hour_appr_process_start,reg_region_not_live_region,reg_region_not_work_region,live_region_not_work_region,reg_city_not_live_city,reg_city_not_work_city,live_city_not_work_city,organization_type,ext_source_1,ext_source_2,ext_source_3,apartments_avg,basementarea_avg,years_beginexpluatation_avg,years_build_avg,commonarea_avg,elevators_avg,entrances_avg,floorsmax_avg,floorsmin_avg,landarea_avg,livingapartments_avg,livingarea_avg,nonlivingapartments_avg,nonlivingarea_avg,apartments_mode,basementarea_mode,years_beginexpluatation_mode,years_build_mode,commonarea_mode,elevators_mode,entrances_mode,floorsmax_mode,floorsmin_mode,landarea_mode,livingapartments_mode,livingarea_mode,nonlivingapartments_mode,nonlivingarea_mode,apartments_medi,basementarea_medi,years_beginexpluatation_medi,years_build_medi,commonarea_medi,elevators_medi,entrances_medi,floorsmax_medi,floorsmin_medi,landarea_medi,livingapartments_medi,livingarea_medi,nonlivingapartments_medi,nonlivingarea_medi,fondkapremont_mode,housetype_mode,totalarea_mode,wallsmaterial_mode,emergencystate_mode,obs_30_cnt_social_circle,def_30_cnt_social_circle,obs_60_cnt_social_circle,def_60_cnt_social_circle,days_last_phone_change,flag_document_2,flag_document_3,flag_document_4,flag_document_5,flag_document_6,flag_document_7,flag_document_8,flag_document_9,flag_document_10,flag_document_11,flag_document_12,flag_document_13,flag_document_14,flag_document_15,flag_document_16,flag_document_17,flag_document_18,flag_document_19,flag_document_20,flag_document_21,amt_req_credit_bureau_hour,amt_req_credit_bureau_day,amt_req_credit_bureau_week,amt_req_credit_bureau_mon,amt_req_credit_bureau_qrt,amt_req_credit_bureau_year
0,100002.0,1,Cash loans,M,N,Y,0.0,202500.0,406597.5,24700.5,351000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.018801,-9461.0,-637.0,-3648.0,-2120.0,,1,1,0,1,1,0,Laborers,1.0,2,2,3,10.0,0,0,0,0,0,0,Business Entity Type 3,0.083037,0.262949,0.139376,0.0247,0.0369,0.9722,0.6192,0.0143,0.0,0.069,0.0833,0.125,0.0369,0.0202,0.019,0.0,0.0,0.0252,0.0383,0.9722,0.6341,0.0144,0.0,0.069,0.0833,0.125,0.0377,0.022,0.0198,0.0,0.0,0.025,0.0369,0.9722,0.6243,0.0144,0.0,0.069,0.0833,0.125,0.0375,0.0205,0.0193,0.0,0.0,reg oper account,block of flats,0.0149,"Stone, brick",No,2.0,2.0,2.0,2.0,-1134.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003.0,0,Cash loans,F,N,N,0.0,270000.0,1293502.5,35698.5,1129500.0,Family,State servant,Higher education,Married,House / apartment,0.003541,-16765.0,-1188.0,-1186.0,-291.0,,1,1,0,1,1,0,Core staff,2.0,1,1,1,11.0,0,0,0,0,0,0,School,0.311267,0.622246,,0.0959,0.0529,0.9851,0.796,0.0605,0.08,0.0345,0.2917,0.3333,0.013,0.0773,0.0549,0.0039,0.0098,0.0924,0.0538,0.9851,0.804,0.0497,0.0806,0.0345,0.2917,0.3333,0.0128,0.079,0.0554,0.0,0.0,0.0968,0.0529,0.9851,0.7987,0.0608,0.08,0.0345,0.2917,0.3333,0.0132,0.0787,0.0558,0.0039,0.01,reg oper account,block of flats,0.0714,Block,No,1.0,0.0,1.0,0.0,-828.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004.0,0,Revolving loans,M,Y,Y,0.0,67500.0,135000.0,6750.0,135000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.010032,-19046.0,-225.0,-4260.0,-2531.0,26.0,1,1,1,1,1,0,Laborers,1.0,2,2,1,9.0,0,0,0,0,0,0,Government,,0.555912,0.729567,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-815.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,100006.0,0,Cash loans,F,N,Y,0.0,135000.0,312682.5,29686.5,297000.0,Unaccompanied,Working,Secondary / secondary special,Civil marriage,House / apartment,0.008019,-19005.0,-3039.0,-9833.0,-2437.0,,1,1,0,1,0,0,Laborers,2.0,2,2,3,17.0,0,0,0,0,0,0,Business Entity Type 3,,0.650442,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,0.0,2.0,0.0,-617.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
4,100007.0,0,Cash loans,M,N,Y,0.0,121500.0,513000.0,21865.5,513000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.028663,-19932.0,-3038.0,-4311.0,-3458.0,,1,1,0,1,0,0,Core staff,1.0,2,2,4,11.0,0,0,0,0,1,1,Religion,,0.322738,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-1106.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0


In [None]:
pd.credit.shape

(5789, 122)

In [None]:
pd.credit.to_csv('pd.credit_procesado.csv', index=False)

In [None]:
pd.credit

Unnamed: 0,sk_id_curr,target,name_contract_type,code_gender,flag_own_car,flag_own_realty,cnt_children,amt_income_total,amt_credit,amt_annuity,amt_goods_price,name_type_suite,name_income_type,name_education_type,name_family_status,name_housing_type,region_population_relative,days_birth,days_employed,days_registration,days_id_publish,own_car_age,flag_mobil,flag_emp_phone,flag_work_phone,flag_cont_mobile,flag_phone,flag_email,occupation_type,cnt_fam_members,region_rating_client,region_rating_client_w_city,weekday_appr_process_start,hour_appr_process_start,reg_region_not_live_region,reg_region_not_work_region,live_region_not_work_region,reg_city_not_live_city,reg_city_not_work_city,live_city_not_work_city,organization_type,ext_source_1,ext_source_2,ext_source_3,apartments_avg,basementarea_avg,years_beginexpluatation_avg,years_build_avg,commonarea_avg,elevators_avg,entrances_avg,floorsmax_avg,floorsmin_avg,landarea_avg,livingapartments_avg,livingarea_avg,nonlivingapartments_avg,nonlivingarea_avg,apartments_mode,basementarea_mode,years_beginexpluatation_mode,years_build_mode,commonarea_mode,elevators_mode,entrances_mode,floorsmax_mode,floorsmin_mode,landarea_mode,livingapartments_mode,livingarea_mode,nonlivingapartments_mode,nonlivingarea_mode,apartments_medi,basementarea_medi,years_beginexpluatation_medi,years_build_medi,commonarea_medi,elevators_medi,entrances_medi,floorsmax_medi,floorsmin_medi,landarea_medi,livingapartments_medi,livingarea_medi,nonlivingapartments_medi,nonlivingarea_medi,fondkapremont_mode,housetype_mode,totalarea_mode,wallsmaterial_mode,emergencystate_mode,obs_30_cnt_social_circle,def_30_cnt_social_circle,obs_60_cnt_social_circle,def_60_cnt_social_circle,days_last_phone_change,flag_document_2,flag_document_3,flag_document_4,flag_document_5,flag_document_6,flag_document_7,flag_document_8,flag_document_9,flag_document_10,flag_document_11,flag_document_12,flag_document_13,flag_document_14,flag_document_15,flag_document_16,flag_document_17,flag_document_18,flag_document_19,flag_document_20,flag_document_21,amt_req_credit_bureau_hour,amt_req_credit_bureau_day,amt_req_credit_bureau_week,amt_req_credit_bureau_mon,amt_req_credit_bureau_qrt,amt_req_credit_bureau_year
0,100002.0,1,Cash loans,M,N,Y,0.0,202500.0,406597.5,24700.5,351000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.018801,-9461.0,-637.0,-3648.0,-2120.0,,1,1,0,1,1,0,Laborers,1.0,2,2,3,10.0,0,0,0,0,0,0,Business Entity Type 3,0.083037,0.262949,0.139376,0.0247,0.0369,0.9722,0.6192,0.0143,0.00,0.0690,0.0833,0.1250,0.0369,0.0202,0.0190,0.0000,0.0000,0.0252,0.0383,0.9722,0.6341,0.0144,0.0000,0.0690,0.0833,0.1250,0.0377,0.0220,0.0198,0.0000,0.0000,0.0250,0.0369,0.9722,0.6243,0.0144,0.00,0.0690,0.0833,0.1250,0.0375,0.0205,0.0193,0.0000,0.0000,reg oper account,block of flats,0.0149,"Stone, brick",No,2.0,2.0,2.0,2.0,-1134.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0
1,100003.0,0,Cash loans,F,N,N,0.0,270000.0,1293502.5,35698.5,1129500.0,Family,State servant,Higher education,Married,House / apartment,0.003541,-16765.0,-1188.0,-1186.0,-291.0,,1,1,0,1,1,0,Core staff,2.0,1,1,1,11.0,0,0,0,0,0,0,School,0.311267,0.622246,,0.0959,0.0529,0.9851,0.7960,0.0605,0.08,0.0345,0.2917,0.3333,0.0130,0.0773,0.0549,0.0039,0.0098,0.0924,0.0538,0.9851,0.8040,0.0497,0.0806,0.0345,0.2917,0.3333,0.0128,0.0790,0.0554,0.0000,0.0000,0.0968,0.0529,0.9851,0.7987,0.0608,0.08,0.0345,0.2917,0.3333,0.0132,0.0787,0.0558,0.0039,0.0100,reg oper account,block of flats,0.0714,Block,No,1.0,0.0,1.0,0.0,-828.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
2,100004.0,0,Revolving loans,M,Y,Y,0.0,67500.0,135000.0,6750.0,135000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.010032,-19046.0,-225.0,-4260.0,-2531.0,26.0,1,1,1,1,1,0,Laborers,1.0,2,2,1,9.0,0,0,0,0,0,0,Government,,0.555912,0.729567,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-815.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
3,100006.0,0,Cash loans,F,N,Y,0.0,135000.0,312682.5,29686.5,297000.0,Unaccompanied,Working,Secondary / secondary special,Civil marriage,House / apartment,0.008019,-19005.0,-3039.0,-9833.0,-2437.0,,1,1,0,1,0,0,Laborers,2.0,2,2,3,17.0,0,0,0,0,0,0,Business Entity Type 3,,0.650442,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,2.0,0.0,2.0,0.0,-617.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
4,100007.0,0,Cash loans,M,N,Y,0.0,121500.0,513000.0,21865.5,513000.0,Unaccompanied,Working,Secondary / secondary special,Single / not married,House / apartment,0.028663,-19932.0,-3038.0,-4311.0,-3458.0,,1,1,0,1,0,0,Core staff,1.0,2,2,4,11.0,0,0,0,0,1,1,Religion,,0.322738,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,0.0,0.0,0.0,0.0,-1106.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
5784,106764.0,0,Cash loans,F,N,Y,0.0,112050.0,450000.0,22977.0,450000.0,Unaccompanied,Working,Secondary / secondary special,Married,House / apartment,0.024610,-10249.0,-2474.0,-4482.0,-2346.0,,1,1,0,1,1,0,Sales staff,2.0,2,2,7,9.0,0,0,0,0,0,0,Trade: type 3,0.371027,0.575786,0.609276,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,1.0,0.0,1.0,0.0,-1641.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,4.0
5785,106765.0,0,Cash loans,F,N,Y,2.0,63000.0,112068.0,12199.5,99000.0,Family,Commercial associate,Secondary / secondary special,Married,House / apartment,0.018850,-14853.0,-1867.0,-3521.0,-4378.0,,1,1,0,1,0,0,,4.0,2,2,5,15.0,0,0,0,0,0,0,Kindergarten,,0.592031,,0.1464,0.0000,0.9767,0.6804,0.0260,0.00,0.0345,0.1667,0.2083,0.0488,0.1194,0.0615,0.0000,0.0198,0.1492,0.0000,0.9767,0.6929,0.0262,0.0000,0.0345,0.1667,0.2083,0.0500,0.1304,0.0640,0.0000,0.0209,0.1478,0.0000,0.9767,0.6847,0.0262,0.00,0.0345,0.1667,0.2083,0.0497,0.1214,0.0626,0.0000,0.0202,reg oper account,block of flats,0.0668,"Stone, brick",No,2.0,0.0,2.0,0.0,-1201.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,
5786,106766.0,0,Cash loans,M,Y,N,0.0,270000.0,1350000.0,35743.5,1350000.0,,Working,Higher education,Civil marriage,With parents,0.030755,-11457.0,-1764.0,-5239.0,-3707.0,6.0,1,1,0,1,0,0,IT staff,2.0,2,2,3,17.0,0,0,0,0,0,0,Business Entity Type 3,,0.703121,0.203252,0.1567,0.1142,0.9866,0.8164,0.0440,0.16,0.1379,0.3333,0.3750,0.0536,0.1252,0.1589,0.0116,0.0487,0.1597,0.1185,0.9866,0.8236,0.0444,0.1611,0.1379,0.3333,0.3750,0.0548,0.1368,0.1656,0.0117,0.0515,0.1582,0.1142,0.9866,0.8189,0.0443,0.16,0.1379,0.3333,0.3750,0.0545,0.1274,0.1618,0.0116,0.0497,reg oper account,block of flats,0.1587,"Stone, brick",No,0.0,0.0,0.0,0.0,-527.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0
5787,106767.0,0,Cash loans,F,N,N,0.0,67500.0,153504.0,15084.0,144000.0,Unaccompanied,Pensioner,Lower secondary,Single / not married,House / apartment,0.018634,-25054.0,365243.0,-1404.0,-4797.0,,1,0,0,1,0,0,,1.0,2,2,7,10.0,0,0,0,0,0,0,XNA,,0.255691,,,,0.9781,,,,,,,,,0.0175,,,,,0.9782,,,,,,,,,0.0182,,,,,0.9781,,,,,,,,,0.0178,,,,block of flats,0.0152,,No,0.0,0.0,0.0,0.0,-1403.0,0.0,0.0,0.0,0.0,1.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,,,,,,


In [None]:
# RUTA RELATIVA
# pd.credit_procesado.to_csv(r'..\data\Processed\archivo_procesado.csv', index=False)