# Bootcamp Data Science y MLOps

<img src="https://i.ibb.co/5RM26Cw/LOGO-COLOR2.png" width="500px">

Creado en [escueladedatosvivos.ai](https://escueladedatosvivos.ai) 🚀.

¿Consultas? En la página tenés soporte por IA guiada, comunidad y el acceso a certificación.

<br>

---  

# 0) Dataset 🤑

Nos basamos en el dataset de la ciudad de Buenos Aires, Argentina: [Encuesta anual de hogares 2019](https://data.buenosaires.gob.ar/dataset/encuesta-anual-hogares).
<br>Se hicieron modificaciones al dataset para ajustarlo a esta lección.

# 1) Cargamos los datos 📕

In [1]:
import pandas as pd
from funpymodeling.exploratory import freq_tbl, status

In [2]:
# Para este caso nos interesa visualizar todas las columnas
pd.set_option('display.max_columns', None)

In [3]:
data = pd.read_csv("data/encuesta-anual-hogares-2019.csv", sep=',') 

*Nota:* si bien el valor defecto de sep en `read_csv` es la coma `,`. 
<br>Siempre lo hago explícito porque a veces los archivos vienen separados por punto y coma, u otro separador como tab. Es una buena práctica, y también aplica cuando graban archivos.

# 2) Inspección inicial 👀

In [4]:
# Primeros 5 registros
data.head(5)

Unnamed: 0,id,nhogar,miembro,comuna,dominio,edad,sexo,parentesco_jefe,situacion_conyugal,num_miembro_padre,num_miembro_madre,estado_ocupacional,cat_ocupacional,calidad_ingresos_lab,ingreso_total_lab,calidad_ingresos_no_lab,ingreso_total_no_lab,calidad_ingresos_totales,ingresos_totales,calidad_ingresos_familiares,ingresos_familiares,ingreso_per_capita_familiar,estado_educativo,sector_educativo,nivel_actual,nivel_max_educativo,años_escolaridad,lugar_nacimiento,afiliacion_salud,hijos_nacidos_vivos,cantidad_hijos_nac_vivos
0,1,1,1,5,Resto de la Ciudad,18,Mujer,Jefe,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,0,Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,18000,9000,Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,PBA excepto GBA,Solo obra social,No,No corresponde
1,1,1,2,5,Resto de la Ciudad,18,Mujer,Otro no familiar,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,0,Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,18000,9000,Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,Otra provincia,Solo plan de medicina prepaga por contratación...,No,No corresponde
2,2,1,1,2,Resto de la Ciudad,18,Varon,Jefe,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,0,No tuvo ingresos,0,No tuvo ingresos,0,Tuvo ingresos pero no declara monto,100000,33333,Asiste,Privado religioso,Universitario,Otras escuelas especiales,12,CABA,Solo plan de medicina prepaga por contratación...,,No corresponde
3,2,1,2,2,Resto de la Ciudad,50,Mujer,Padre/Madre/Suegro/a,Viudo/a,No corresponde,No corresponde,Ocupado,Asalariado,Tuvo ingresos y declara monto,70000,Tuvo ingresos pero no declara monto,30000,Tuvo ingresos pero no declara monto,100000,Tuvo ingresos pero no declara monto,100000,33333,No asiste pero asistió,No corresponde,No corresponde,Secundario/medio comun,17,CABA,Solo prepaga o mutual via OS,Si,2
4,2,1,3,2,Resto de la Ciudad,17,Varon,Otro familiar,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,0,No tuvo ingresos,0,No tuvo ingresos,0,Tuvo ingresos pero no declara monto,100000,33333,Asiste,Privado religioso,Secundario/medio comun,EGB (1° a 9° año),10,CABA,Solo plan de medicina prepaga por contratación...,,No corresponde


In [5]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,id,0,0.0,0,0.0,5795,int64
1,nhogar,0,0.0,0,0.0,7,int64
2,miembro,0,0.0,0,0.0,19,int64
3,comuna,0,0.0,0,0.0,15,int64
4,dominio,0,0.0,0,0.0,2,object
5,edad,0,0.0,128,0.008939,101,int64
6,sexo,0,0.0,0,0.0,2,object
7,parentesco_jefe,0,0.0,0,0.0,9,object
8,situacion_conyugal,1,7e-05,0,0.0,7,object
9,num_miembro_padre,0,0.0,0,0.0,9,object


## Eliminar ciertas columnas 📍

In [6]:
data = data.drop(['id','hijos_nacidos_vivos'], axis=1)

In [7]:
data.head(5)

Unnamed: 0,nhogar,miembro,comuna,dominio,edad,sexo,parentesco_jefe,situacion_conyugal,num_miembro_padre,num_miembro_madre,estado_ocupacional,cat_ocupacional,calidad_ingresos_lab,ingreso_total_lab,calidad_ingresos_no_lab,ingreso_total_no_lab,calidad_ingresos_totales,ingresos_totales,calidad_ingresos_familiares,ingresos_familiares,ingreso_per_capita_familiar,estado_educativo,sector_educativo,nivel_actual,nivel_max_educativo,años_escolaridad,lugar_nacimiento,afiliacion_salud,cantidad_hijos_nac_vivos
0,1,1,5,Resto de la Ciudad,18,Mujer,Jefe,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,0,Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,18000,9000,Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,PBA excepto GBA,Solo obra social,No corresponde
1,1,2,5,Resto de la Ciudad,18,Mujer,Otro no familiar,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,0,Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,18000,9000,Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,Otra provincia,Solo plan de medicina prepaga por contratación...,No corresponde
2,1,1,2,Resto de la Ciudad,18,Varon,Jefe,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,0,No tuvo ingresos,0,No tuvo ingresos,0,Tuvo ingresos pero no declara monto,100000,33333,Asiste,Privado religioso,Universitario,Otras escuelas especiales,12,CABA,Solo plan de medicina prepaga por contratación...,No corresponde
3,1,2,2,Resto de la Ciudad,50,Mujer,Padre/Madre/Suegro/a,Viudo/a,No corresponde,No corresponde,Ocupado,Asalariado,Tuvo ingresos y declara monto,70000,Tuvo ingresos pero no declara monto,30000,Tuvo ingresos pero no declara monto,100000,Tuvo ingresos pero no declara monto,100000,33333,No asiste pero asistió,No corresponde,No corresponde,Secundario/medio comun,17,CABA,Solo prepaga o mutual via OS,2
4,1,3,2,Resto de la Ciudad,17,Varon,Otro familiar,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,0,No tuvo ingresos,0,No tuvo ingresos,0,Tuvo ingresos pero no declara monto,100000,33333,Asiste,Privado religioso,Secundario/medio comun,EGB (1° a 9° año),10,CABA,Solo plan de medicina prepaga por contratación...,No corresponde


In [8]:
from funpymodeling.exploratory import profiling_num

profiling_num(data)

Unnamed: 0,variable,mean,std_dev,variation_coef,p_0.01,p_0.05,p_0.25,p_0.5,p_0.75,p_0.95,p_0.99
0,nhogar,1.009638,0.126376,0.125169,1.0,1.0,1.0,1.0,1.0,1.0,1.0
1,miembro,2.144982,1.354969,0.631693,1.0,1.0,1.0,2.0,3.0,5.0,6.0
2,comuna,7.620644,4.236359,0.555906,1.0,1.0,4.0,8.0,11.0,15.0,15.0
3,edad,38.81549,23.11017,0.595385,1.0,4.0,20.0,37.0,57.0,78.0,89.0
4,ingreso_total_lab,20078.62644,34698.173111,1.728115,0.0,0.0,0.0,2500.0,30000.0,80000.0,140000.0
5,ingreso_total_no_lab,6016.234583,16065.350052,2.670333,0.0,0.0,0.0,0.0,4000.0,32000.0,73916.66
6,ingresos_totales,26094.861024,37152.503186,1.423748,0.0,0.0,0.0,16000.0,37000.0,85505.0,150820.0
7,ingresos_familiares,70212.818423,62685.684278,0.892795,4600.0,12000.0,30000.0,54000.0,90000.0,175100.0,300000.0
8,ingreso_per_capita_familiar,26192.009638,27463.908496,1.048561,1285.44,3729.7,10500.0,19900.0,33500.0,70000.0,120000.0


In [9]:
# La .T es de Transpuesta
data.describe().T

Unnamed: 0,count,mean,std,min,25%,50%,75%,max
nhogar,14319.0,1.009638,0.126376,1.0,1.0,1.0,1.0,7.0
miembro,14319.0,2.144982,1.354969,1.0,1.0,2.0,3.0,19.0
comuna,14319.0,7.620644,4.236359,1.0,4.0,8.0,11.0,15.0
edad,14319.0,38.81549,23.11017,0.0,20.0,37.0,57.0,100.0
ingreso_total_lab,14319.0,20078.62644,34698.173111,0.0,0.0,2500.0,30000.0,1000000.0
ingreso_total_no_lab,14319.0,6016.234583,16065.350052,0.0,0.0,0.0,4000.0,500000.0
ingresos_totales,14319.0,26094.861024,37152.503186,0.0,0.0,16000.0,37000.0,1000000.0
ingresos_familiares,14319.0,70212.818423,62685.684278,0.0,30000.0,54000.0,90000.0,1000000.0
ingreso_per_capita_familiar,14319.0,26192.009638,27463.908496,0.0,10500.0,19900.0,33500.0,1000000.0


# 3) Discretización 📈➜📊 

## 3.1) Por igual frecuencia y por igual rango.

In [10]:
data['ingreso_total_lab'] = pd.qcut(data['ingreso_total_lab'], q=10, duplicates='drop')
data['ingreso_total_no_lab'] = pd.qcut(data['ingreso_total_no_lab'], q=4, duplicates='drop')
data['ingresos_familiares'] = pd.qcut(data['ingresos_familiares'], q=8)
data['ingreso_per_capita_familiar'] = pd.qcut(data['ingreso_per_capita_familiar'], q=10)

In [11]:
data.head(5)

Unnamed: 0,nhogar,miembro,comuna,dominio,edad,sexo,parentesco_jefe,situacion_conyugal,num_miembro_padre,num_miembro_madre,estado_ocupacional,cat_ocupacional,calidad_ingresos_lab,ingreso_total_lab,calidad_ingresos_no_lab,ingreso_total_no_lab,calidad_ingresos_totales,ingresos_totales,calidad_ingresos_familiares,ingresos_familiares,ingreso_per_capita_familiar,estado_educativo,sector_educativo,nivel_actual,nivel_max_educativo,años_escolaridad,lugar_nacimiento,afiliacion_salud,cantidad_hijos_nac_vivos
0,1,1,5,Resto de la Ciudad,18,Mujer,Jefe,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",Tuvo ingresos y declara monto,"(4000.0, 500000.0]",Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,"(-0.001, 20800.0]","(8700.0, 12000.0]",Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,PBA excepto GBA,Solo obra social,No corresponde
1,1,2,5,Resto de la Ciudad,18,Mujer,Otro no familiar,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",Tuvo ingresos y declara monto,"(4000.0, 500000.0]",Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,"(-0.001, 20800.0]","(8700.0, 12000.0]",Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,Otra provincia,Solo plan de medicina prepaga por contratación...,No corresponde
2,1,1,2,Resto de la Ciudad,18,Varon,Jefe,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",No tuvo ingresos,"(-0.001, 4000.0]",No tuvo ingresos,0,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",Asiste,Privado religioso,Universitario,Otras escuelas especiales,12,CABA,Solo plan de medicina prepaga por contratación...,No corresponde
3,1,2,2,Resto de la Ciudad,50,Mujer,Padre/Madre/Suegro/a,Viudo/a,No corresponde,No corresponde,Ocupado,Asalariado,Tuvo ingresos y declara monto,"(56000.0, 1000000.0]",Tuvo ingresos pero no declara monto,"(4000.0, 500000.0]",Tuvo ingresos pero no declara monto,100000,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",No asiste pero asistió,No corresponde,No corresponde,Secundario/medio comun,17,CABA,Solo prepaga o mutual via OS,2
4,1,3,2,Resto de la Ciudad,17,Varon,Otro familiar,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",No tuvo ingresos,"(-0.001, 4000.0]",No tuvo ingresos,0,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",Asiste,Privado religioso,Secundario/medio comun,EGB (1° a 9° año),10,CABA,Solo plan de medicina prepaga por contratación...,No corresponde


## 3.2) Ahora usando igual distancia

In [12]:
data['edad'] = pd.cut(data['edad'],bins=5)

In [13]:
data.head()

Unnamed: 0,nhogar,miembro,comuna,dominio,edad,sexo,parentesco_jefe,situacion_conyugal,num_miembro_padre,num_miembro_madre,estado_ocupacional,cat_ocupacional,calidad_ingresos_lab,ingreso_total_lab,calidad_ingresos_no_lab,ingreso_total_no_lab,calidad_ingresos_totales,ingresos_totales,calidad_ingresos_familiares,ingresos_familiares,ingreso_per_capita_familiar,estado_educativo,sector_educativo,nivel_actual,nivel_max_educativo,años_escolaridad,lugar_nacimiento,afiliacion_salud,cantidad_hijos_nac_vivos
0,1,1,5,Resto de la Ciudad,"(-0.1, 20.0]",Mujer,Jefe,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",Tuvo ingresos y declara monto,"(4000.0, 500000.0]",Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,"(-0.001, 20800.0]","(8700.0, 12000.0]",Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,PBA excepto GBA,Solo obra social,No corresponde
1,1,2,5,Resto de la Ciudad,"(-0.1, 20.0]",Mujer,Otro no familiar,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",Tuvo ingresos y declara monto,"(4000.0, 500000.0]",Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,"(-0.001, 20800.0]","(8700.0, 12000.0]",Asiste,Estatal/publico,Universitario,Otras escuelas especiales,12,Otra provincia,Solo plan de medicina prepaga por contratación...,No corresponde
2,1,1,2,Resto de la Ciudad,"(-0.1, 20.0]",Varon,Jefe,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",No tuvo ingresos,"(-0.001, 4000.0]",No tuvo ingresos,0,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",Asiste,Privado religioso,Universitario,Otras escuelas especiales,12,CABA,Solo plan de medicina prepaga por contratación...,No corresponde
3,1,2,2,Resto de la Ciudad,"(40.0, 60.0]",Mujer,Padre/Madre/Suegro/a,Viudo/a,No corresponde,No corresponde,Ocupado,Asalariado,Tuvo ingresos y declara monto,"(56000.0, 1000000.0]",Tuvo ingresos pero no declara monto,"(4000.0, 500000.0]",Tuvo ingresos pero no declara monto,100000,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",No asiste pero asistió,No corresponde,No corresponde,Secundario/medio comun,17,CABA,Solo prepaga o mutual via OS,2
4,1,3,2,Resto de la Ciudad,"(-0.1, 20.0]",Varon,Otro familiar,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",No tuvo ingresos,"(-0.001, 4000.0]",No tuvo ingresos,0,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",Asiste,Privado religioso,Secundario/medio comun,EGB (1° a 9° año),10,CABA,Solo plan de medicina prepaga por contratación...,No corresponde


# 4) Preparación de datos 🔧

## 4.1) Cambiarlos de tipo de dato

In [14]:
data['comuna'] = data['comuna'].astype(str)
data['nhogar'] = data['nhogar'].astype(str)

In [15]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,nhogar,0,0.0,0,0.0,7,object
1,miembro,0,0.0,0,0.0,19,int64
2,comuna,0,0.0,0,0.0,15,object
3,dominio,0,0.0,0,0.0,2,object
4,edad,0,0.0,0,0.0,5,category
5,sexo,0,0.0,0,0.0,2,object
6,parentesco_jefe,0,0.0,0,0.0,9,object
7,situacion_conyugal,1,7e-05,0,0.0,7,object
8,num_miembro_padre,0,0.0,0,0.0,9,object
9,num_miembro_madre,0,0.0,0,0.0,11,object


## 4.2) Rellenar los valores nulos

In [16]:
data['años_escolaridad'].unique()

array(['12', '17', '10', '8', 'Ningun año de escolaridad aprobado', '11',
       '9', '13', '7', '16', '14', '15', '5', '6', '2', '19', '4', '1',
       '3', '18', nan], dtype=object)

In [17]:
# Reemplazar un solo valor
data['años_escolaridad'] = data['años_escolaridad'].replace('Ningun año de escolaridad aprobado', '0')

In [18]:
data['años_escolaridad'] = data['años_escolaridad'].astype(float).astype("Int32")

In [19]:
data['años_escolaridad'], saved_bins = pd.qcut(data['años_escolaridad'], q=5, retbins=True)

In [20]:
data['años_escolaridad']

0         (11.0, 12.0]
1         (11.0, 12.0]
2         (11.0, 12.0]
3         (16.0, 19.0]
4          (7.0, 11.0]
             ...      
14314    (-0.001, 7.0]
14315      (7.0, 11.0]
14316     (11.0, 12.0]
14317    (-0.001, 7.0]
14318     (12.0, 16.0]
Name: años_escolaridad, Length: 14319, dtype: category
Categories (5, interval[float64, right]): [(-0.001, 7.0] < (7.0, 11.0] < (11.0, 12.0] < (12.0, 16.0] < (16.0, 19.0]]

In [21]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,nhogar,0,0.0,0,0.0,7,object
1,miembro,0,0.0,0,0.0,19,int64
2,comuna,0,0.0,0,0.0,15,object
3,dominio,0,0.0,0,0.0,2,object
4,edad,0,0.0,0,0.0,5,category
5,sexo,0,0.0,0,0.0,2,object
6,parentesco_jefe,0,0.0,0,0.0,9,object
7,situacion_conyugal,1,7e-05,0,0.0,7,object
8,num_miembro_padre,0,0.0,0,0.0,9,object
9,num_miembro_madre,0,0.0,0,0.0,11,object


## Eliminar valores NaN

In [22]:
data = data.dropna(subset=['situacion_conyugal', 'sector_educativo', 'lugar_nacimiento', 'afiliacion_salud'])
 
# Resetting the indices using df.reset_index()
data = data.reset_index(drop=True)

In [23]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,nhogar,0,0.0,0,0.0,7,object
1,miembro,0,0.0,0,0.0,19,int64
2,comuna,0,0.0,0,0.0,15,object
3,dominio,0,0.0,0,0.0,2,object
4,edad,0,0.0,0,0.0,5,category
5,sexo,0,0.0,0,0.0,2,object
6,parentesco_jefe,0,0.0,0,0.0,9,object
7,situacion_conyugal,0,0.0,0,0.0,7,object
8,num_miembro_padre,0,0.0,0,0.0,9,object
9,num_miembro_madre,0,0.0,0,0.0,11,object


## Agregar categoría faltante a años_escolaridad

In [24]:
data['años_escolaridad']=data['años_escolaridad'].cat.add_categories("desconocido")

In [25]:
data['años_escolaridad']=data['años_escolaridad'].fillna(value="desconocido")

In [26]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,nhogar,0,0.0,0,0.0,7,object
1,miembro,0,0.0,0,0.0,19,int64
2,comuna,0,0.0,0,0.0,15,object
3,dominio,0,0.0,0,0.0,2,object
4,edad,0,0.0,0,0.0,5,category
5,sexo,0,0.0,0,0.0,2,object
6,parentesco_jefe,0,0.0,0,0.0,9,object
7,situacion_conyugal,0,0.0,0,0.0,7,object
8,num_miembro_padre,0,0.0,0,0.0,9,object
9,num_miembro_madre,0,0.0,0,0.0,11,object


## 4.3) Custom tratamiento de datos faltantes

In [27]:
data.head(5)

Unnamed: 0,nhogar,miembro,comuna,dominio,edad,sexo,parentesco_jefe,situacion_conyugal,num_miembro_padre,num_miembro_madre,estado_ocupacional,cat_ocupacional,calidad_ingresos_lab,ingreso_total_lab,calidad_ingresos_no_lab,ingreso_total_no_lab,calidad_ingresos_totales,ingresos_totales,calidad_ingresos_familiares,ingresos_familiares,ingreso_per_capita_familiar,estado_educativo,sector_educativo,nivel_actual,nivel_max_educativo,años_escolaridad,lugar_nacimiento,afiliacion_salud,cantidad_hijos_nac_vivos
0,1,1,5,Resto de la Ciudad,"(-0.1, 20.0]",Mujer,Jefe,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",Tuvo ingresos y declara monto,"(4000.0, 500000.0]",Tuvo ingresos y declara monto,6000,Tuvo ingresos y declara monto,"(-0.001, 20800.0]","(8700.0, 12000.0]",Asiste,Estatal/publico,Universitario,Otras escuelas especiales,"(11.0, 12.0]",PBA excepto GBA,Solo obra social,No corresponde
1,1,2,5,Resto de la Ciudad,"(-0.1, 20.0]",Mujer,Otro no familiar,Soltero/a,Padre no vive en el hogar,Madre no vive en el hogar,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",Tuvo ingresos y declara monto,"(4000.0, 500000.0]",Tuvo ingresos y declara monto,12000,Tuvo ingresos y declara monto,"(-0.001, 20800.0]","(8700.0, 12000.0]",Asiste,Estatal/publico,Universitario,Otras escuelas especiales,"(11.0, 12.0]",Otra provincia,Solo plan de medicina prepaga por contratación...,No corresponde
2,1,1,2,Resto de la Ciudad,"(-0.1, 20.0]",Varon,Jefe,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",No tuvo ingresos,"(-0.001, 4000.0]",No tuvo ingresos,0,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",Asiste,Privado religioso,Universitario,Otras escuelas especiales,"(11.0, 12.0]",CABA,Solo plan de medicina prepaga por contratación...,No corresponde
3,1,2,2,Resto de la Ciudad,"(40.0, 60.0]",Mujer,Padre/Madre/Suegro/a,Viudo/a,No corresponde,No corresponde,Ocupado,Asalariado,Tuvo ingresos y declara monto,"(56000.0, 1000000.0]",Tuvo ingresos pero no declara monto,"(4000.0, 500000.0]",Tuvo ingresos pero no declara monto,100000,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",No asiste pero asistió,No corresponde,No corresponde,Secundario/medio comun,"(16.0, 19.0]",CABA,Solo prepaga o mutual via OS,2
4,1,3,2,Resto de la Ciudad,"(-0.1, 20.0]",Varon,Otro familiar,Soltero/a,Padre no vive en el hogar,2,Inactivo,No corresponde,No tuvo ingresos,"(-0.001, 2500.0]",No tuvo ingresos,"(-0.001, 4000.0]",No tuvo ingresos,0,Tuvo ingresos pero no declara monto,"(90000.0, 124000.0]","(30000.0, 38300.0]",Asiste,Privado religioso,Secundario/medio comun,EGB (1° a 9° año),"(7.0, 11.0]",CABA,Solo plan de medicina prepaga por contratación...,No corresponde


In [28]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,nhogar,0,0.0,0,0.0,7,object
1,miembro,0,0.0,0,0.0,19,int64
2,comuna,0,0.0,0,0.0,15,object
3,dominio,0,0.0,0,0.0,2,object
4,edad,0,0.0,0,0.0,5,category
5,sexo,0,0.0,0,0.0,2,object
6,parentesco_jefe,0,0.0,0,0.0,9,object
7,situacion_conyugal,0,0.0,0,0.0,7,object
8,num_miembro_padre,0,0.0,0,0.0,9,object
9,num_miembro_madre,0,0.0,0,0.0,11,object


In [29]:
data['nivel_max_educativo']=data['nivel_max_educativo'].fillna(value="desconocido")

In [44]:
status(data)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,nhogar,0,0.0,0,0.0,7,object
1,miembro,0,0.0,0,0.0,19,int64
2,comuna,0,0.0,0,0.0,15,object
3,dominio,0,0.0,0,0.0,2,object
4,edad,0,0.0,0,0.0,5,category
5,sexo,0,0.0,0,0.0,2,object
6,parentesco_jefe,0,0.0,0,0.0,9,object
7,situacion_conyugal,0,0.0,0,0.0,7,object
8,num_miembro_padre,0,0.0,0,0.0,9,object
9,num_miembro_madre,0,0.0,0,0.0,11,object


### Guardar respuesta

In [50]:
respuesta1 = status(data)
respuesta1.to_csv('data/tarea_respuesta1.csv', index=False)

# 5) One hot encoding ✂️

In [31]:
data_ohe = pd.get_dummies(data) 

In [32]:
data_ohe

Unnamed: 0,miembro,ingresos_totales,nhogar_1,nhogar_2,nhogar_3,nhogar_4,nhogar_5,nhogar_6,nhogar_7,comuna_1,comuna_10,comuna_11,comuna_12,comuna_13,comuna_14,comuna_15,comuna_2,comuna_3,comuna_4,comuna_5,comuna_6,comuna_7,comuna_8,comuna_9,dominio_Resto de la Ciudad,dominio_Villas de emergencia,"edad_(-0.1, 20.0]","edad_(20.0, 40.0]","edad_(40.0, 60.0]","edad_(60.0, 80.0]","edad_(80.0, 100.0]",sexo_Mujer,sexo_Varon,parentesco_jefe_Conyugue o pareja,parentesco_jefe_Hijo/a - Hijastro/a,parentesco_jefe_Jefe,parentesco_jefe_Nieto/a,parentesco_jefe_Otro familiar,parentesco_jefe_Otro no familiar,parentesco_jefe_Padre/Madre/Suegro/a,parentesco_jefe_Servicio domestico y sus familiares,parentesco_jefe_Yerno/nuera,situacion_conyugal_Casado/a,situacion_conyugal_Divorciado/a,situacion_conyugal_No corresponde,situacion_conyugal_Separado/a de unión o matrimonio,situacion_conyugal_Soltero/a,situacion_conyugal_Unido/a,situacion_conyugal_Viudo/a,num_miembro_padre_1,num_miembro_padre_2,num_miembro_padre_3,num_miembro_padre_4,num_miembro_padre_5,num_miembro_padre_6,num_miembro_padre_7,num_miembro_padre_No corresponde,num_miembro_padre_Padre no vive en el hogar,num_miembro_madre_1,num_miembro_madre_15,num_miembro_madre_2,num_miembro_madre_3,num_miembro_madre_4,num_miembro_madre_5,num_miembro_madre_6,num_miembro_madre_7,num_miembro_madre_9,num_miembro_madre_Madre no vive en el hogar,num_miembro_madre_No corresponde,estado_ocupacional_Desocupado,estado_ocupacional_Inactivo,estado_ocupacional_Ocupado,cat_ocupacional_Asalariado,cat_ocupacional_No corresponde,cat_ocupacional_Patron/empleador,cat_ocupacional_Trabajador familiar,cat_ocupacional_Trabajador por cuenta propia,calidad_ingresos_lab_No corresponde,calidad_ingresos_lab_No tuvo ingresos,calidad_ingresos_lab_Tuvo ingresos pero no declara monto,calidad_ingresos_lab_Tuvo ingresos y declara monto,"ingreso_total_lab_(-0.001, 2500.0]","ingreso_total_lab_(2500.0, 15000.0]","ingreso_total_lab_(15000.0, 25000.0]","ingreso_total_lab_(25000.0, 37000.0]","ingreso_total_lab_(37000.0, 56000.0]","ingreso_total_lab_(56000.0, 1000000.0]",calidad_ingresos_no_lab_No corresponde,calidad_ingresos_no_lab_No tuvo ingresos,calidad_ingresos_no_lab_Tuvo ingresos pero no declara monto,calidad_ingresos_no_lab_Tuvo ingresos y declara monto,"ingreso_total_no_lab_(-0.001, 4000.0]","ingreso_total_no_lab_(4000.0, 500000.0]",calidad_ingresos_totales_No corresponde,calidad_ingresos_totales_No tuvo ingresos,calidad_ingresos_totales_Tuvo ingresos pero no declara monto,calidad_ingresos_totales_Tuvo ingresos y declara monto,calidad_ingresos_familiares_No tuvo ingresos,calidad_ingresos_familiares_Tuvo ingresos pero no declara monto,calidad_ingresos_familiares_Tuvo ingresos y declara monto,"ingresos_familiares_(-0.001, 20800.0]","ingresos_familiares_(20800.0, 30000.0]","ingresos_familiares_(30000.0, 42000.0]","ingresos_familiares_(42000.0, 54000.0]","ingresos_familiares_(54000.0, 70000.0]","ingresos_familiares_(70000.0, 90000.0]","ingresos_familiares_(90000.0, 124000.0]","ingresos_familiares_(124000.0, 1000000.0]","ingreso_per_capita_familiar_(-0.001, 5400.0]","ingreso_per_capita_familiar_(5400.0, 8700.0]","ingreso_per_capita_familiar_(8700.0, 12000.0]","ingreso_per_capita_familiar_(12000.0, 15016.0]","ingreso_per_capita_familiar_(15016.0, 19900.0]","ingreso_per_capita_familiar_(19900.0, 24000.0]","ingreso_per_capita_familiar_(24000.0, 30000.0]","ingreso_per_capita_familiar_(30000.0, 38300.0]","ingreso_per_capita_familiar_(38300.0, 52340.0]","ingreso_per_capita_familiar_(52340.0, 1000000.0]",estado_educativo_Asiste,estado_educativo_No asiste pero asistió,estado_educativo_Nunca asistio,sector_educativo_Estatal/publico,sector_educativo_No corresponde,sector_educativo_Privado no religioso,sector_educativo_Privado religioso,nivel_actual_Jardin maternal,nivel_actual_No corresponde,nivel_actual_Otras escuelas especiales,nivel_actual_Postgrado,nivel_actual_Primario adultos,nivel_actual_Primario comun,nivel_actual_Primario especial,nivel_actual_Sala de 3,nivel_actual_Sala de 4,nivel_actual_Sala de 5,nivel_actual_Secundario/medio adultos,nivel_actual_Secundario/medio comun,nivel_actual_Terciario/superior no universitario,nivel_actual_Universitario,nivel_max_educativo_EGB (1° a 9° año),nivel_max_educativo_No corresponde,nivel_max_educativo_Otras escuelas especiales,nivel_max_educativo_Primario comun,nivel_max_educativo_Primario especial,nivel_max_educativo_Sala de 5,nivel_max_educativo_Secundario/medio comun,nivel_max_educativo_desconocido,"años_escolaridad_(-0.001, 7.0]","años_escolaridad_(7.0, 11.0]","años_escolaridad_(11.0, 12.0]","años_escolaridad_(12.0, 16.0]","años_escolaridad_(16.0, 19.0]",años_escolaridad_desconocido,lugar_nacimiento_CABA,lugar_nacimiento_Otra provincia,lugar_nacimiento_PBA excepto GBA,lugar_nacimiento_PBA sin especificar,lugar_nacimiento_Pais limitrofe,lugar_nacimiento_Pais no limitrofe,lugar_nacimiento_Partido GBA,afiliacion_salud_Otros,afiliacion_salud_Solo obra social,afiliacion_salud_Solo plan de medicina prepaga por contratación voluntaria,afiliacion_salud_Solo prepaga o mutual via OS,afiliacion_salud_Solo sistema publico,cantidad_hijos_nac_vivos_1,cantidad_hijos_nac_vivos_10,cantidad_hijos_nac_vivos_11,cantidad_hijos_nac_vivos_12,cantidad_hijos_nac_vivos_15,cantidad_hijos_nac_vivos_2,cantidad_hijos_nac_vivos_3,cantidad_hijos_nac_vivos_4,cantidad_hijos_nac_vivos_5,cantidad_hijos_nac_vivos_6,cantidad_hijos_nac_vivos_7,cantidad_hijos_nac_vivos_8,cantidad_hijos_nac_vivos_9,cantidad_hijos_nac_vivos_No corresponde
0,1,6000,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
1,2,12000,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
2,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
3,2,100000,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,1,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
4,3,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
14305,1,24000,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,1,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
14306,2,11000,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1
14307,3,11000,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,1,0,0,1,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0
14308,4,11000,1,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,1,1,0,1,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,1,0,1,0,0,1,0,0,0,0,1,0,0,1,0,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,1,0,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,0,0,0,0,1,0,0,0,0,0,1,0,0,0,0,0,0,0,1,0,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0


In [33]:
data_ohe.columns

Index(['miembro', 'ingresos_totales', 'nhogar_1', 'nhogar_2', 'nhogar_3',
       'nhogar_4', 'nhogar_5', 'nhogar_6', 'nhogar_7', 'comuna_1',
       ...
       'cantidad_hijos_nac_vivos_15', 'cantidad_hijos_nac_vivos_2',
       'cantidad_hijos_nac_vivos_3', 'cantidad_hijos_nac_vivos_4',
       'cantidad_hijos_nac_vivos_5', 'cantidad_hijos_nac_vivos_6',
       'cantidad_hijos_nac_vivos_7', 'cantidad_hijos_nac_vivos_8',
       'cantidad_hijos_nac_vivos_9',
       'cantidad_hijos_nac_vivos_No corresponde'],
      dtype='object', length=179)

In [34]:
import pickle

with open('categories_ohe.pickle', 'wb') as handle:
    pickle.dump(data_ohe.columns, handle, protocol=pickle.HIGHEST_PROTOCOL)

## Agreguen esta nueva data y adapten la data

In [35]:
new_data = pd.read_csv("new_data.csv", sep=',') 

In [36]:
with open('categories_ohe.pickle', 'rb') as handle:
    ohe_tr = pickle.load(handle)

In [37]:
ohe_tr

Index(['miembro', 'ingresos_totales', 'nhogar_1', 'nhogar_2', 'nhogar_3',
       'nhogar_4', 'nhogar_5', 'nhogar_6', 'nhogar_7', 'comuna_1',
       ...
       'cantidad_hijos_nac_vivos_15', 'cantidad_hijos_nac_vivos_2',
       'cantidad_hijos_nac_vivos_3', 'cantidad_hijos_nac_vivos_4',
       'cantidad_hijos_nac_vivos_5', 'cantidad_hijos_nac_vivos_6',
       'cantidad_hijos_nac_vivos_7', 'cantidad_hijos_nac_vivos_8',
       'cantidad_hijos_nac_vivos_9',
       'cantidad_hijos_nac_vivos_No corresponde'],
      dtype='object', length=179)

In [38]:
pd.get_dummies(new_data).reindex(columns = ohe_tr)

Unnamed: 0,miembro,ingresos_totales,nhogar_1,nhogar_2,nhogar_3,nhogar_4,nhogar_5,nhogar_6,nhogar_7,comuna_1,comuna_10,comuna_11,comuna_12,comuna_13,comuna_14,comuna_15,comuna_2,comuna_3,comuna_4,comuna_5,comuna_6,comuna_7,comuna_8,comuna_9,dominio_Resto de la Ciudad,dominio_Villas de emergencia,"edad_(-0.1, 20.0]","edad_(20.0, 40.0]","edad_(40.0, 60.0]","edad_(60.0, 80.0]","edad_(80.0, 100.0]",sexo_Mujer,sexo_Varon,parentesco_jefe_Conyugue o pareja,parentesco_jefe_Hijo/a - Hijastro/a,parentesco_jefe_Jefe,parentesco_jefe_Nieto/a,parentesco_jefe_Otro familiar,parentesco_jefe_Otro no familiar,parentesco_jefe_Padre/Madre/Suegro/a,parentesco_jefe_Servicio domestico y sus familiares,parentesco_jefe_Yerno/nuera,situacion_conyugal_Casado/a,situacion_conyugal_Divorciado/a,situacion_conyugal_No corresponde,situacion_conyugal_Separado/a de unión o matrimonio,situacion_conyugal_Soltero/a,situacion_conyugal_Unido/a,situacion_conyugal_Viudo/a,num_miembro_padre_1,num_miembro_padre_2,num_miembro_padre_3,num_miembro_padre_4,num_miembro_padre_5,num_miembro_padre_6,num_miembro_padre_7,num_miembro_padre_No corresponde,num_miembro_padre_Padre no vive en el hogar,num_miembro_madre_1,num_miembro_madre_15,num_miembro_madre_2,num_miembro_madre_3,num_miembro_madre_4,num_miembro_madre_5,num_miembro_madre_6,num_miembro_madre_7,num_miembro_madre_9,num_miembro_madre_Madre no vive en el hogar,num_miembro_madre_No corresponde,estado_ocupacional_Desocupado,estado_ocupacional_Inactivo,estado_ocupacional_Ocupado,cat_ocupacional_Asalariado,cat_ocupacional_No corresponde,cat_ocupacional_Patron/empleador,cat_ocupacional_Trabajador familiar,cat_ocupacional_Trabajador por cuenta propia,calidad_ingresos_lab_No corresponde,calidad_ingresos_lab_No tuvo ingresos,calidad_ingresos_lab_Tuvo ingresos pero no declara monto,calidad_ingresos_lab_Tuvo ingresos y declara monto,"ingreso_total_lab_(-0.001, 2500.0]","ingreso_total_lab_(2500.0, 15000.0]","ingreso_total_lab_(15000.0, 25000.0]","ingreso_total_lab_(25000.0, 37000.0]","ingreso_total_lab_(37000.0, 56000.0]","ingreso_total_lab_(56000.0, 1000000.0]",calidad_ingresos_no_lab_No corresponde,calidad_ingresos_no_lab_No tuvo ingresos,calidad_ingresos_no_lab_Tuvo ingresos pero no declara monto,calidad_ingresos_no_lab_Tuvo ingresos y declara monto,"ingreso_total_no_lab_(-0.001, 4000.0]","ingreso_total_no_lab_(4000.0, 500000.0]",calidad_ingresos_totales_No corresponde,calidad_ingresos_totales_No tuvo ingresos,calidad_ingresos_totales_Tuvo ingresos pero no declara monto,calidad_ingresos_totales_Tuvo ingresos y declara monto,calidad_ingresos_familiares_No tuvo ingresos,calidad_ingresos_familiares_Tuvo ingresos pero no declara monto,calidad_ingresos_familiares_Tuvo ingresos y declara monto,"ingresos_familiares_(-0.001, 20800.0]","ingresos_familiares_(20800.0, 30000.0]","ingresos_familiares_(30000.0, 42000.0]","ingresos_familiares_(42000.0, 54000.0]","ingresos_familiares_(54000.0, 70000.0]","ingresos_familiares_(70000.0, 90000.0]","ingresos_familiares_(90000.0, 124000.0]","ingresos_familiares_(124000.0, 1000000.0]","ingreso_per_capita_familiar_(-0.001, 5400.0]","ingreso_per_capita_familiar_(5400.0, 8700.0]","ingreso_per_capita_familiar_(8700.0, 12000.0]","ingreso_per_capita_familiar_(12000.0, 15016.0]","ingreso_per_capita_familiar_(15016.0, 19900.0]","ingreso_per_capita_familiar_(19900.0, 24000.0]","ingreso_per_capita_familiar_(24000.0, 30000.0]","ingreso_per_capita_familiar_(30000.0, 38300.0]","ingreso_per_capita_familiar_(38300.0, 52340.0]","ingreso_per_capita_familiar_(52340.0, 1000000.0]",estado_educativo_Asiste,estado_educativo_No asiste pero asistió,estado_educativo_Nunca asistio,sector_educativo_Estatal/publico,sector_educativo_No corresponde,sector_educativo_Privado no religioso,sector_educativo_Privado religioso,nivel_actual_Jardin maternal,nivel_actual_No corresponde,nivel_actual_Otras escuelas especiales,nivel_actual_Postgrado,nivel_actual_Primario adultos,nivel_actual_Primario comun,nivel_actual_Primario especial,nivel_actual_Sala de 3,nivel_actual_Sala de 4,nivel_actual_Sala de 5,nivel_actual_Secundario/medio adultos,nivel_actual_Secundario/medio comun,nivel_actual_Terciario/superior no universitario,nivel_actual_Universitario,nivel_max_educativo_EGB (1° a 9° año),nivel_max_educativo_No corresponde,nivel_max_educativo_Otras escuelas especiales,nivel_max_educativo_Primario comun,nivel_max_educativo_Primario especial,nivel_max_educativo_Sala de 5,nivel_max_educativo_Secundario/medio comun,nivel_max_educativo_desconocido,"años_escolaridad_(-0.001, 7.0]","años_escolaridad_(7.0, 11.0]","años_escolaridad_(11.0, 12.0]","años_escolaridad_(12.0, 16.0]","años_escolaridad_(16.0, 19.0]",años_escolaridad_desconocido,lugar_nacimiento_CABA,lugar_nacimiento_Otra provincia,lugar_nacimiento_PBA excepto GBA,lugar_nacimiento_PBA sin especificar,lugar_nacimiento_Pais limitrofe,lugar_nacimiento_Pais no limitrofe,lugar_nacimiento_Partido GBA,afiliacion_salud_Otros,afiliacion_salud_Solo obra social,afiliacion_salud_Solo plan de medicina prepaga por contratación voluntaria,afiliacion_salud_Solo prepaga o mutual via OS,afiliacion_salud_Solo sistema publico,cantidad_hijos_nac_vivos_1,cantidad_hijos_nac_vivos_10,cantidad_hijos_nac_vivos_11,cantidad_hijos_nac_vivos_12,cantidad_hijos_nac_vivos_15,cantidad_hijos_nac_vivos_2,cantidad_hijos_nac_vivos_3,cantidad_hijos_nac_vivos_4,cantidad_hijos_nac_vivos_5,cantidad_hijos_nac_vivos_6,cantidad_hijos_nac_vivos_7,cantidad_hijos_nac_vivos_8,cantidad_hijos_nac_vivos_9,cantidad_hijos_nac_vivos_No corresponde
0,1,4000,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,0,1,,,1,,0,,,,,,,,,1,,,,,,,,,,,1,,,,,,,,,,1,,1,0,0,0,1,,,0,,1,0,0,,,,,,,,0,,1,,,,,0,1,,0,1,,,,,,,,,,,,,,,,,,,1,,,0,,1,0,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,0,1,0,,,,,0,0,1,,,,,,,,,,,,,,,,1
1,1,22000,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,0,1,,,1,,0,,,,,,,,,1,,,,,,,,,,,1,,,,,,,,,,1,,0,1,0,0,1,,,0,,1,0,0,,,,,,,,0,,1,,,,,0,1,,0,1,,,,,,,,,,,,,,,,,,,1,,,1,,0,0,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,0,0,1,,,,,0,1,0,,,,,,,,,,,,,,,,1
2,1,25000,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,0,1,,,1,,0,,,,,,,,,1,,,,,,,,,,,1,,,,,,,,,,1,,0,0,1,1,0,,,0,,0,0,1,,,,,,,,0,,1,,,,,0,1,,1,0,,,,,,,,,,,,,,,,,,,1,,,0,,1,0,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,1,0,0,,,,,1,0,0,,,,,,,,,,,,,,,,1
3,2,30000,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,1,0,,,0,,1,,,,,,,,,1,,,,,,,,,,,1,,,,,,,,,,1,,0,0,1,0,0,,,1,,0,1,0,,,,,,,,1,,0,,,,,1,0,,1,0,,,,,,,,,,,,,,,,,,,1,,,0,,0,1,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,1,0,0,,,,,0,0,1,,,,,,,,,,,,,,,,1
4,1,20000,,,,,,,,,,,,,,,,,,,,,,,1,,,,,,,0,1,,,1,,0,,,,,,,,,1,,,,,,,,,,,1,,,,,,,,,,1,,0,1,0,0,1,,,0,,1,0,0,,,,,,,,0,,1,,,,,0,1,,0,1,,,,,,,,,,,,,,,,,,,1,,,0,,1,0,,,,,,,,,,,,,,1,,,1,,,,,,,,,,,,0,1,0,,,,,0,0,1,,,,,,,,,,,,,,,,1


In [39]:
d_tr_ohe_2 = pd.get_dummies(new_data).reindex(columns = ohe_tr).fillna(0)

In [40]:
d_tr_ohe_2

Unnamed: 0,miembro,ingresos_totales,nhogar_1,nhogar_2,nhogar_3,nhogar_4,nhogar_5,nhogar_6,nhogar_7,comuna_1,comuna_10,comuna_11,comuna_12,comuna_13,comuna_14,comuna_15,comuna_2,comuna_3,comuna_4,comuna_5,comuna_6,comuna_7,comuna_8,comuna_9,dominio_Resto de la Ciudad,dominio_Villas de emergencia,"edad_(-0.1, 20.0]","edad_(20.0, 40.0]","edad_(40.0, 60.0]","edad_(60.0, 80.0]","edad_(80.0, 100.0]",sexo_Mujer,sexo_Varon,parentesco_jefe_Conyugue o pareja,parentesco_jefe_Hijo/a - Hijastro/a,parentesco_jefe_Jefe,parentesco_jefe_Nieto/a,parentesco_jefe_Otro familiar,parentesco_jefe_Otro no familiar,parentesco_jefe_Padre/Madre/Suegro/a,parentesco_jefe_Servicio domestico y sus familiares,parentesco_jefe_Yerno/nuera,situacion_conyugal_Casado/a,situacion_conyugal_Divorciado/a,situacion_conyugal_No corresponde,situacion_conyugal_Separado/a de unión o matrimonio,situacion_conyugal_Soltero/a,situacion_conyugal_Unido/a,situacion_conyugal_Viudo/a,num_miembro_padre_1,num_miembro_padre_2,num_miembro_padre_3,num_miembro_padre_4,num_miembro_padre_5,num_miembro_padre_6,num_miembro_padre_7,num_miembro_padre_No corresponde,num_miembro_padre_Padre no vive en el hogar,num_miembro_madre_1,num_miembro_madre_15,num_miembro_madre_2,num_miembro_madre_3,num_miembro_madre_4,num_miembro_madre_5,num_miembro_madre_6,num_miembro_madre_7,num_miembro_madre_9,num_miembro_madre_Madre no vive en el hogar,num_miembro_madre_No corresponde,estado_ocupacional_Desocupado,estado_ocupacional_Inactivo,estado_ocupacional_Ocupado,cat_ocupacional_Asalariado,cat_ocupacional_No corresponde,cat_ocupacional_Patron/empleador,cat_ocupacional_Trabajador familiar,cat_ocupacional_Trabajador por cuenta propia,calidad_ingresos_lab_No corresponde,calidad_ingresos_lab_No tuvo ingresos,calidad_ingresos_lab_Tuvo ingresos pero no declara monto,calidad_ingresos_lab_Tuvo ingresos y declara monto,"ingreso_total_lab_(-0.001, 2500.0]","ingreso_total_lab_(2500.0, 15000.0]","ingreso_total_lab_(15000.0, 25000.0]","ingreso_total_lab_(25000.0, 37000.0]","ingreso_total_lab_(37000.0, 56000.0]","ingreso_total_lab_(56000.0, 1000000.0]",calidad_ingresos_no_lab_No corresponde,calidad_ingresos_no_lab_No tuvo ingresos,calidad_ingresos_no_lab_Tuvo ingresos pero no declara monto,calidad_ingresos_no_lab_Tuvo ingresos y declara monto,"ingreso_total_no_lab_(-0.001, 4000.0]","ingreso_total_no_lab_(4000.0, 500000.0]",calidad_ingresos_totales_No corresponde,calidad_ingresos_totales_No tuvo ingresos,calidad_ingresos_totales_Tuvo ingresos pero no declara monto,calidad_ingresos_totales_Tuvo ingresos y declara monto,calidad_ingresos_familiares_No tuvo ingresos,calidad_ingresos_familiares_Tuvo ingresos pero no declara monto,calidad_ingresos_familiares_Tuvo ingresos y declara monto,"ingresos_familiares_(-0.001, 20800.0]","ingresos_familiares_(20800.0, 30000.0]","ingresos_familiares_(30000.0, 42000.0]","ingresos_familiares_(42000.0, 54000.0]","ingresos_familiares_(54000.0, 70000.0]","ingresos_familiares_(70000.0, 90000.0]","ingresos_familiares_(90000.0, 124000.0]","ingresos_familiares_(124000.0, 1000000.0]","ingreso_per_capita_familiar_(-0.001, 5400.0]","ingreso_per_capita_familiar_(5400.0, 8700.0]","ingreso_per_capita_familiar_(8700.0, 12000.0]","ingreso_per_capita_familiar_(12000.0, 15016.0]","ingreso_per_capita_familiar_(15016.0, 19900.0]","ingreso_per_capita_familiar_(19900.0, 24000.0]","ingreso_per_capita_familiar_(24000.0, 30000.0]","ingreso_per_capita_familiar_(30000.0, 38300.0]","ingreso_per_capita_familiar_(38300.0, 52340.0]","ingreso_per_capita_familiar_(52340.0, 1000000.0]",estado_educativo_Asiste,estado_educativo_No asiste pero asistió,estado_educativo_Nunca asistio,sector_educativo_Estatal/publico,sector_educativo_No corresponde,sector_educativo_Privado no religioso,sector_educativo_Privado religioso,nivel_actual_Jardin maternal,nivel_actual_No corresponde,nivel_actual_Otras escuelas especiales,nivel_actual_Postgrado,nivel_actual_Primario adultos,nivel_actual_Primario comun,nivel_actual_Primario especial,nivel_actual_Sala de 3,nivel_actual_Sala de 4,nivel_actual_Sala de 5,nivel_actual_Secundario/medio adultos,nivel_actual_Secundario/medio comun,nivel_actual_Terciario/superior no universitario,nivel_actual_Universitario,nivel_max_educativo_EGB (1° a 9° año),nivel_max_educativo_No corresponde,nivel_max_educativo_Otras escuelas especiales,nivel_max_educativo_Primario comun,nivel_max_educativo_Primario especial,nivel_max_educativo_Sala de 5,nivel_max_educativo_Secundario/medio comun,nivel_max_educativo_desconocido,"años_escolaridad_(-0.001, 7.0]","años_escolaridad_(7.0, 11.0]","años_escolaridad_(11.0, 12.0]","años_escolaridad_(12.0, 16.0]","años_escolaridad_(16.0, 19.0]",años_escolaridad_desconocido,lugar_nacimiento_CABA,lugar_nacimiento_Otra provincia,lugar_nacimiento_PBA excepto GBA,lugar_nacimiento_PBA sin especificar,lugar_nacimiento_Pais limitrofe,lugar_nacimiento_Pais no limitrofe,lugar_nacimiento_Partido GBA,afiliacion_salud_Otros,afiliacion_salud_Solo obra social,afiliacion_salud_Solo plan de medicina prepaga por contratación voluntaria,afiliacion_salud_Solo prepaga o mutual via OS,afiliacion_salud_Solo sistema publico,cantidad_hijos_nac_vivos_1,cantidad_hijos_nac_vivos_10,cantidad_hijos_nac_vivos_11,cantidad_hijos_nac_vivos_12,cantidad_hijos_nac_vivos_15,cantidad_hijos_nac_vivos_2,cantidad_hijos_nac_vivos_3,cantidad_hijos_nac_vivos_4,cantidad_hijos_nac_vivos_5,cantidad_hijos_nac_vivos_6,cantidad_hijos_nac_vivos_7,cantidad_hijos_nac_vivos_8,cantidad_hijos_nac_vivos_9,cantidad_hijos_nac_vivos_No corresponde
0,1,4000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0.0,0.0,1,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,1,0,0,0,1,0.0,0.0,0,0.0,1,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1,0.0,0.0,0.0,0.0,0,1,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0,0.0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0.0,0.0,0.0,0.0,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
1,1,22000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0.0,0.0,1,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0,1,0,0,1,0.0,0.0,0,0.0,1,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1,0.0,0.0,0.0,0.0,0,1,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,1,0.0,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0,1,0.0,0.0,0.0,0.0,0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
2,1,25000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0.0,0.0,1,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0,0,1,1,0,0.0,0.0,0,0.0,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1,0.0,0.0,0.0,0.0,0,1,0.0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0,0.0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0,0.0,0.0,0.0,0.0,1,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
3,2,30000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0.0,0.0,0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0,0,1,0,0,0.0,0.0,1,0.0,0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0,0.0,0.0,0.0,0.0,1,0,0.0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0,0,0.0,0.0,0.0,0.0,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1
4,1,20000,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0.0,0.0,1,0.0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0,1,0,0,1,0.0,0.0,0,0.0,1,0,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,0.0,1,0.0,0.0,0.0,0.0,0,1,0.0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,0,0.0,1,0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1,0.0,0.0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0,1,0,0.0,0.0,0.0,0.0,0,0,1,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,0.0,1


In [41]:
# Para este caso nos interesa visualizar todas las columnas
pd.set_option('display.max_rows', None)

In [42]:
status(d_tr_ohe_2)

Unnamed: 0,variable,q_nan,p_nan,q_zeros,p_zeros,unique,type
0,miembro,0,0.0,0,0.0,2,int64
1,ingresos_totales,0,0.0,0,0.0,5,int64
2,nhogar_1,0,0.0,5,1.0,1,float64
3,nhogar_2,0,0.0,5,1.0,1,float64
4,nhogar_3,0,0.0,5,1.0,1,float64
5,nhogar_4,0,0.0,5,1.0,1,float64
6,nhogar_5,0,0.0,5,1.0,1,float64
7,nhogar_6,0,0.0,5,1.0,1,float64
8,nhogar_7,0,0.0,5,1.0,1,float64
9,comuna_1,0,0.0,5,1.0,1,float64


### Guardar respuesta 2

In [49]:
d_tr_ohe_2.to_csv('data/tarea_respuesta2.csv', index=False)