# Prediccion de Default en Prestamos


Para este proyecto utilizaremos un sample de los datos de Lending Club. La idea es predecir si cierto usuario cometera Default basado en informacion que la plataforma recolecta. Esto nos ayudara a mejorar la metodologia/pipeline de prestamo.


# Descripcion



Contiene los prestamos de esta plataforma:

    periodo 2007-2017Q3.
    887mil observaciones, sample de 100mil
    150 variables
    Target: loan status



# Objetivo

Realizar un ETL y un EDA

## ETL

0. Limpia los datos de tal manera que al final del ETL queden en formato `tidy`.
1. Asegurate de cargar y leer los datos
2. Crea una tabla donde se guarde el nombre de la columna y el tipo de dato: (`column_name`,   `type`).
3. Asegurate de pensar cual es el tipo de dato correcto. Porque elejiste strig/object o float o int?. No hay respuestas incorrectas como tal, pero tienes que justificar tu decision.
4. Maneja missings o nans de la manera adecuada. Justifica cada decision







## EDA

0. Preparar lo datos para un pipeline de datos
1. Quitar columnas inservibles
2. Imputar valores
3. Mantener replicabildiad y reproducibilidad

**No olvides anotar tus justificaciones en celdas para recordar cuando te toque explicarlo.** Puedes agregar el numero de celdas que necesites para poner tu explicacion y el codigo, solo manten la estructura.

# ETL

In [None]:
import pandas as pd
import numpy as np

Vas a obtener 2 errores, solucionalo con los visto en clase.  
Tip: Se arreglan con argumentos adicionales de la funcion `read_csv`  
Documentacion: https://pandas.pydata.org/docs/reference/api/pandas.read_csv.html

In [None]:
loans = pd.read_csv('https://github.com/sonder-art/fdd_prim_2023/blob/main/codigo/pandas/LoansData_sample.csv.gz?raw=true', compression='gzip')

loans


  loans = pd.read_csv('https://github.com/sonder-art/fdd_prim_2023/blob/main/codigo/pandas/LoansData_sample.csv.gz?raw=true', compression='gzip')


Unnamed: 0.1,Unnamed: 0,id,member_id,loan_amnt,funded_amnt,funded_amnt_inv,term,int_rate,installment,grade,...,hardship_payoff_balance_amount,hardship_last_payment_amount,disbursement_method,debt_settlement_flag,debt_settlement_flag_date,settlement_status,settlement_date,settlement_amount,settlement_percentage,settlement_term
0,0,38098114,,15000.0,15000.0,15000.0,60 months,12.39,336.64,C,...,,,Cash,N,,,,,,
1,1,36805548,,10400.0,10400.0,10400.0,36 months,6.99,321.08,A,...,,,Cash,N,,,,,,
2,2,37842129,,21425.0,21425.0,21425.0,60 months,15.59,516.36,D,...,,,Cash,N,,,,,,
3,3,37612354,,12800.0,12800.0,12800.0,60 months,17.14,319.08,D,...,,,Cash,N,,,,,,
4,4,37662224,,7650.0,7650.0,7650.0,36 months,13.66,260.20,C,...,,,Cash,N,,,,,,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
99995,99995,22454240,,8400.0,8400.0,8400.0,36 months,9.17,267.79,B,...,,,Cash,N,,,,,,
99996,99996,11396920,,10000.0,10000.0,10000.0,36 months,12.99,336.90,C,...,,,Cash,N,,,,,,
99997,99997,8556176,,30000.0,30000.0,30000.0,60 months,20.99,811.44,E,...,,,Cash,N,,,,,,
99998,99998,24023408,,8475.0,8475.0,8475.0,36 months,24.99,336.92,F,...,,,Cash,N,,,,,,


## Tabla (column_name, type)

Revisa el metodo pd.DataFrame.dtypes. https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.dtypes.html

In [None]:
column_types = loans.dtypes
column_types

Unnamed: 0                 int64
id                         int64
member_id                float64
loan_amnt                float64
funded_amnt              float64
                          ...   
settlement_status         object
settlement_date           object
settlement_amount        float64
settlement_percentage    float64
settlement_term          float64
Length: 151, dtype: object

## Cargar descripcion de columnas

La siguiente tabla tiene una descripcion del significado de cada columna

In [None]:


datos_dict = pd.read_excel(
    'https://resources.lendingclub.com/LCDataDictionary.xlsx')
datos_dict.columns = ['feature', 'description']


In [None]:
datos_dict

Unnamed: 0,feature,description
0,acc_now_delinq,The number of accounts on which the borrower i...
1,acc_open_past_24mths,Number of trades opened in past 24 months.
2,addr_state,The state provided by the borrower in the loan...
3,all_util,Balance to credit limit on all trades
4,annual_inc,The self-reported annual income provided by th...
...,...,...
148,settlement_amount,The loan amount that the borrower has agreed t...
149,settlement_percentage,The settlement amount as a percentage of the p...
150,settlement_term,The number of months that the borrower will be...
151,,


### Pickle

Crea codigo para **guardar** y **cargar** el DataFrame de `datos_dict` creada en las celdas anteriores en formato **pickle**

In [None]:
datos_dict.to_pickle('datos_dict.pkl')

In [None]:
# Codigo para cargar
datos_dict_loaded = pd.read_pickle('datos_dict.pkl')


## Tipos de Datos

Realiza las transformaciones o casteos (casting) que creas necesarios a tus datos de tal manera que el typo de dato sea adecuado. Al terminar recrea la tabla `column_types` con los nuevos tipos.

No olvides anotar tus justificaciones para recordar cuando te toque explicarlo.

In [None]:
del loans['Unnamed: 0']

# Convert ID columns to strings if they are not used for calculations
loans['id'] = loans['id'].astype(str)
loans['member_id'] = loans['member_id'].astype(str)

# Convert date columns to datetime
loans['settlement_date'] = pd.to_datetime(loans['settlement_date'])

# Convert categorical columns to category type
loans['settlement_status'] = loans['settlement_status'].astype('category')

# Recreate the column_types DataFrame
column_types = loans.dtypes.reset_index()
column_types.columns = ['feature', 'dtype']
column_types

Unnamed: 0,feature,dtype
0,id,object
1,member_id,object
2,loan_amnt,float64
3,funded_amnt,float64
4,funded_amnt_inv,float64
...,...,...
145,settlement_status,category
146,settlement_date,datetime64[ns]
147,settlement_amount,float64
148,settlement_percentage,float64


## **EDA**

4 criterios de eliminacion de columnas:

*   Baja correlacion con la variable objetivo
*   Alta correlacion entre ellas
*   Muchos Missings/NAs
*   Conocimiento del fenomeno/negocio

Primero eliminaremos las columnas con baja correlación con la variable objetivo

Antes que nada, eliminaremos todos los valores tales que el 'Loan status' no sea o DEFAULT, o pagado.
Además de dejar como valores 0 o 1 si no pagó o si pagó el crédito, respectivamente.


In [None]:
dataset = loans.loc[loans['loan_status'].isin(['Fully Paid', 'Charged Off'])]
dataset['charged_off'] = (dataset['loan_status'] == 'Charged Off').apply(np.uint8)
dataset.drop('loan_status', axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset['charged_off'] = (dataset['loan_status'] == 'Charged Off').apply(np.uint8)
A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset.drop('loan_status', axis=1, inplace=True)


Ahora sí, se selecciona a las columnas que tienen poca relación con una correlación 'Charged off'

In [None]:
correlacion = dataset.corr()
correlacion_y = abs(correlacion['charged_off'])
drop_corr = sorted(list(correlacion_y[correlacion_y < 0.03].index))
print(f'Numero de columnas a quitar {len(drop_corr)} \n',drop_corr)

  correlacion = dataset.corr()


Numero de columnas a quitar 28 
 ['acc_now_delinq', 'chargeoff_within_12_mths', 'collections_12_mths_ex_med', 'delinq_2yrs', 'delinq_amnt', 'mo_sin_old_il_acct', 'mths_since_last_delinq', 'mths_since_last_major_derog', 'mths_since_last_record', 'mths_since_recent_bc_dlq', 'mths_since_recent_revol_delinq', 'num_accts_ever_120_pd', 'num_bc_sats', 'num_bc_tl', 'num_il_tl', 'num_rev_accts', 'num_tl_120dpd_2m', 'num_tl_30dpd', 'num_tl_90g_dpd_24m', 'pct_tl_nvr_dlq', 'pub_rec', 'pub_rec_bankruptcies', 'revol_bal', 'tax_liens', 'tot_coll_amt', 'total_acc', 'total_bal_ex_mort', 'total_il_high_credit_limit']


In [None]:
dataset.drop(labels=drop_corr, axis=1, inplace=True)
dataset.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset.drop(labels=drop_corr, axis=1, inplace=True)


(86138, 122)

## Manejo de NaNs o missings

Maneja los datos de tipos missing. Elije una estrategia adecuada dependiendo del tipo de dato que le asignaste a la columna.


Como estrategia, eliminaremos aquellas columnas cuya frecuencia de NANs sea mayor a 0.25

In [None]:
missing_per = dataset.isnull().mean().sort_values(ascending=False)
drop_miss = sorted(list(missing_per[missing_per > 0.25].index))
print(f'Numero de columnas a quitar {len(drop_miss)} \n',drop_miss)

Numero de columnas a quitar 52 
 ['all_util', 'annual_inc_joint', 'debt_settlement_flag_date', 'deferral_term', 'desc', 'dti_joint', 'hardship_amount', 'hardship_dpd', 'hardship_end_date', 'hardship_last_payment_amount', 'hardship_length', 'hardship_loan_status', 'hardship_payoff_balance_amount', 'hardship_reason', 'hardship_start_date', 'hardship_status', 'hardship_type', 'il_util', 'inq_fi', 'inq_last_12m', 'max_bal_bc', 'mths_since_rcnt_il', 'next_pymnt_d', 'open_acc_6m', 'open_act_il', 'open_il_12m', 'open_il_24m', 'open_rv_12m', 'open_rv_24m', 'orig_projected_additional_accrued_interest', 'payment_plan_start_date', 'revol_bal_joint', 'sec_app_chargeoff_within_12_mths', 'sec_app_collections_12_mths_ex_med', 'sec_app_earliest_cr_line', 'sec_app_fico_range_high', 'sec_app_fico_range_low', 'sec_app_inq_last_6mths', 'sec_app_mort_acc', 'sec_app_mths_since_last_major_derog', 'sec_app_num_rev_accts', 'sec_app_open_acc', 'sec_app_open_act_il', 'sec_app_revol_util', 'settlement_amount', 's

In [None]:
dataset.drop(labels=drop_miss, axis=1, inplace=True)
dataset.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset.drop(labels=drop_miss, axis=1, inplace=True)


(86138, 70)

Podemos ver que ya sólo tenemos 70 columnas, menos de la mitad.

# Intuición

Ahora, podemos hacer uso del documento [ Credit Risk Analysis in Peer to Peer Lending Data set: Lending Club Club](https://colab.research.google.com/corgiredirector?site=https%3A%2F%2Fdigitalcommons.bard.edu%2Fcgi%2Fviewcontent.cgi%3Farticle%3D1299%26context%3Dsenproj_s2019) para seguir eliminando más columnas


In [None]:
elegidas = ['charged_off','funded_amnt','addr_state', 'annual_inc',
            'application_type', 'dti', 'earliest_cr_line', 'emp_length',
            'emp_title', 'fico_range_high', 'fico_range_low', 'grade',
            'home_ownership', 'initial_list_status', 'installment',
            'int_rate', 'loan_amnt', 'loan_status', 'mort_acc',
            'open_acc', 'pub_rec', 'pub_rec_bankruptcies', 'purpose',
            'revol_bal', 'revol_util', 'sub_grade', 'term', 'title',
            'total_acc', 'verification_status', 'zip_code',
            'last_pymnt_amnt','num_actv_rev_tl', 'mo_sin_rcnt_rev_tl_op',
            'mo_sin_old_rev_tl_op',"bc_util","bc_open_to_buy","avg_cur_bal",
            "acc_open_past_24mths" ]
len(elegidas)

Además de las anteriormente elegidas, también eliminaremos las siguientes 10 columnas ya que dan mucha granularidad, redundancia o poca importancia:

*   zip_code - Redundante con 'addr_state' y demasiado granular.
*   title - Generalmente es una descripción del propósito del préstamo y puede ser redundante con 'purpose'.
*   emp_title - Demasiado específico y variará mucho entre individuos.
*   application_type - A menos que la proporción de aplicaciones conjuntas sea significativa, esta variable puede no ser tan relevante.
*   initial_list_status - Es más un atributo administrativo del préstamo que un indicador del comportamiento del prestatario.
*   sub_grade - Redundante si ya se considera la 'grade'.
*   term - Mientras que el término del préstamo es importante, si todos los préstamos son a plazos similares, esta columna podría no añadir mucho valor.
*   acc_open_past_24mths - Aunque relevante, puede estar correlacionada con otras métricas de crédito más directas.
*   avg_cur_bal - La información podría estar implícita en otras variables como 'total_acc' y 'revol_bal'.
*   bc_open_to_buy - Puede ser redundante con 'revol_bal' y 'revol_util'.

Así, elegidas termina siendo:

In [None]:
elegidas = ['charged_off','funded_amnt','addr_state', 'annual_inc', 'dti',
            'earliest_cr_line', 'emp_length', 'fico_range_high', 'fico_range_low',
            'grade', 'home_ownership', 'installment', 'int_rate', 'loan_amnt',
            'loan_status', 'mort_acc', 'open_acc', 'pub_rec', 'pub_rec_bankruptcies',
            'purpose', 'revol_bal', 'revol_util', 'total_acc', 'verification_status',
            'last_pymnt_amnt','num_actv_rev_tl', 'mo_sin_rcnt_rev_tl_op',
            'mo_sin_old_rev_tl_op',"bc_util"
            ]
len(elegidas)

29

In [None]:
drop_no_intuitivas = [col for col in dataset.columns if col not in elegidas]
dataset.drop(labels=drop_no_intuitivas , axis=1, inplace=True)

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset.drop(labels=drop_no_intuitivas , axis=1, inplace=True)


In [None]:
dataset.describe()

Unnamed: 0,loan_amnt,funded_amnt,int_rate,installment,annual_inc,dti,fico_range_low,fico_range_high,open_acc,revol_util,last_pymnt_amnt,bc_util,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mort_acc,num_actv_rev_tl,charged_off
count,86138.0,86138.0,86138.0,86138.0,86138.0,86138.0,86138.0,86138.0,86138.0,86094.0,86138.0,85089.0,86138.0,86138.0,86138.0,86138.0,86138.0
mean,14106.526446,14106.526446,13.00236,430.737187,73843.11,18.532747,692.462966,696.463024,11.746453,54.582777,4757.453184,63.808959,183.524333,12.796896,1.74888,5.762358,0.187559
std,8391.139221,8391.139221,4.397419,251.653139,59293.52,8.538247,29.731549,29.731848,5.433122,23.515901,6466.767327,27.051347,93.26643,16.224586,2.091488,3.224598,0.390362
min,1000.0,1000.0,6.0,30.42,4000.0,0.0,660.0,664.0,1.0,0.0,0.0,0.0,3.0,0.0,0.0,0.0,0.0
25%,7800.0,7800.0,9.49,248.48,45000.0,12.07,670.0,674.0,8.0,37.2,358.5225,44.1,118.0,3.0,0.0,3.0,0.0
50%,12000.0,12000.0,12.99,370.48,62473.72,17.95,685.0,689.0,11.0,54.9,1241.23,67.7,167.0,8.0,1.0,5.0,0.0
75%,20000.0,20000.0,15.61,568.005,90000.0,24.5,705.0,709.0,14.0,72.5,7303.205,87.5,232.0,15.0,3.0,7.0,0.0
max,35000.0,35000.0,26.06,1408.13,7500000.0,39.99,845.0,850.0,84.0,180.3,36234.44,255.2,718.0,372.0,34.0,38.0,1.0


In [None]:

dataset.shape

(86138, 23)

Ya sólo nos quedan 22 columnas explicativas. Eliminaremos también 'home_ownership' por tener baja correlación con el objetivo, al igual que 'verification_status'

In [None]:
dataset.drop(columns=['home_ownership'], inplace=True)
dataset.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset.drop(columns=['home_ownership'], inplace=True)


(86138, 22)

In [None]:
pd.get_dummies(dataset[['verification_status', 'charged_off']], columns=['verification_status'],
               drop_first=True).corr()

In [None]:
dataset.drop(columns=['verification_status'], inplace=True)
dataset.shape

A value is trying to be set on a copy of a slice from a DataFrame

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  dataset.drop(columns=['verification_status'], inplace=True)


(86138, 21)

# Selección de features numéricas

Eliminaremos las variables explicativas que esten muy relacionadas entre si.

In [None]:
feature_correlation = dataset.loc[:, dataset.columns != 'charged_off'].corr()
feature_correlation

  feature_correlation = dataset.loc[:, dataset.columns != 'charged_off'].corr()


Unnamed: 0,loan_amnt,funded_amnt,int_rate,installment,annual_inc,dti,fico_range_low,fico_range_high,open_acc,revol_util,last_pymnt_amnt,bc_util,mo_sin_old_rev_tl_op,mo_sin_rcnt_rev_tl_op,mort_acc,num_actv_rev_tl
loan_amnt,1.0,1.0,0.081992,0.95672,0.375339,0.004018,0.145428,0.145427,0.189387,0.112913,0.483553,0.035254,0.179374,0.055022,0.241957,0.153578
funded_amnt,1.0,1.0,0.081992,0.95672,0.375339,0.004018,0.145428,0.145427,0.189387,0.112913,0.483553,0.035254,0.179374,0.055022,0.241957,0.153578
int_rate,0.081992,0.081992,1.0,0.078197,-0.116058,0.203275,-0.387206,-0.387204,-0.008519,0.207596,0.099688,0.246673,-0.153259,-0.124424,-0.078238,0.08248
installment,0.95672,0.95672,0.078197,1.0,0.36855,0.005816,0.105179,0.105178,0.178559,0.124198,0.397907,0.051599,0.163325,0.044402,0.21437,0.159338
annual_inc,0.375339,0.375339,-0.116058,0.36855,1.0,-0.211716,0.103118,0.103118,0.143201,0.045471,0.199016,-0.009169,0.159436,0.043779,0.252862,0.072766
dti,0.004018,0.004018,0.203275,0.005816,-0.211716,1.0,-0.065496,-0.065497,0.280576,0.17561,-0.0484,0.183582,0.029643,-0.023149,-0.068522,0.255055
fico_range_low,0.145428,0.145428,-0.387206,0.105179,0.103118,-0.065496,1.0,1.0,0.039114,-0.418144,0.103164,-0.459832,0.116304,0.103095,0.089883,-0.15662
fico_range_high,0.145427,0.145427,-0.387204,0.105178,0.103118,-0.065497,1.0,1.0,0.039113,-0.418144,0.103165,-0.459832,0.116305,0.103097,0.089883,-0.156621
open_acc,0.189387,0.189387,-0.008519,0.178559,0.143201,0.280576,0.039114,0.039113,1.0,-0.147787,0.078452,-0.106107,0.130899,-0.227453,0.112286,0.664857
revol_util,0.112913,0.112913,0.207596,0.124198,0.045471,0.17561,-0.418144,-0.418144,-0.147787,1.0,0.012169,0.825019,-0.011862,0.187007,0.010235,0.077046


In [None]:
# Solo usaremos la matriz triangular
# Recuerda que la matriz de correlacion es simetrica
upper = feature_correlation.where(np.triu(np.ones(
    feature_correlation.shape), k=1).astype(np.bool))

# Encuentra variables con correlacion mayor a threshold
to_drop = [column for column in upper.columns if any(upper[column] > 0.9)]

to_drop


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  feature_correlation.shape), k=1).astype(np.bool))


['funded_amnt', 'installment', 'fico_range_high']

In [None]:
dataset.drop(to_drop, axis=1, inplace=True)
dataset.shape

(86138, 18)

Así, al final de nuestro análisis, nos quedan 17 variables realmente explicativas. Casi el 10% del número de variables original.

A partir de aquí, se pueden utilizar herramientas de machine learning para dar con la inferencia

# JSON's

Crea codigo para **guardar** y **cargar** un archivo JSON en el que se guarde la `estrategia` y `valor` que utilizaste para **imputar**. Por ejemplo: Si hay una columna que se llama `columna 3` y utilizaste la estrategia de imputacion de media, y existe otra llamada `columna 4` y  elegiste la palabra 'missing' el JSON debera contener:  
  
 `{'columna 3':{'estrategia':'mean', 'valor':3.4}, 'columna 4':{'estrategia':'identificador', 'valor':'missing'}}`  

 De tal manera que para cada columna que tenga un metodo de imputacion apunte a otro diccionario donde el **key** `estrategia` describa de manera sencilla el metodo, y el **key** `valor` el valor usado. En general:   
 `{'nombre de la columna':{'estrategia':'descripcion de estrategia', 'valor':'valor utilizado'}}`.


De utilizar mas de un metodo puedes anidarlos en una lista  
  `[{...},{...}]`.  

Incluso si la columna utilizada no sufrio imputacion, es necesario que la agregues al JSON.

La idea es que cualquier otra persona pueda cargar el el archivo JSON con tu funcion, entender que hiciste y replicarlo facilmente. No existe solo una respuesta correcta, pero tendras que justificar y explicar tus deciciones.

### Imputacion

In [None]:
import pandas as pd
import json

# Supongamos que 'df' es tu DataFrame con valores faltantes
# Para propósitos de ejemplo, digamos que df['columna_3'] tiene valores faltantes que deseas imputar con la media

# Paso 1: Decidir estrategia de imputación
estrategias_de_imputacion = {
    'columna_3': {'estrategia': 'media', 'valor': df['columna_3'].mean()},
    'columna_4': {'estrategia': 'identificador', 'valor': 'faltante'}
}

# Paso 2: Aplicar la imputación
df['columna_3'].fillna(estrategias_de_imputacion['columna_3']['valor'], inplace=True)
# Asumiendo que 'columna_4' es categórica con valores faltantes representados por la palabra 'faltante'
df['columna_4'].fillna(estrategias_de_imputacion['columna_4']['valor'], inplace=True)

# Paso 3 & 4: Guardar las estrategias en un archivo JSON
with open('estrategias_de_imputacion.json', 'w') as archivo:
    json.dump(estrategias_de_imputacion, archivo, indent=4)

# Paso 5: Función para cargar y aplicar estrategias de imputación
def aplicar_imputacion(archivo_json, dataframe):
    with open(archivo_json, 'r') as archivo:
        estrategias = json.load(archivo)

    for columna, info_estrategia in estrategias.items():
        if info_estrategia['estrategia'] == 'media':
            dataframe[columna].fillna(info_estrategia['valor'], inplace=True)
        elif info_estrategia['estrategia'] == 'identificador':
            dataframe[columna].fillna(info_estrategia['valor'], inplace=True)
        # Añadir más estrategias según sea necesario

# Aplicando la función de imputación
aplicar_imputacion('estrategias_de_imputacion.json', df)


### Codigo para salvar y cargar JSONs