# Desafío - Manipulación y transformación de datos (Parte I)

1. Para desarrollar este desafío necesitarás los siguientes archivos:
● incidents.pkl
● officers.pkl
● subjects.pkl
a. Carga los datos y crea un DataFrame con cada uno de ellos.

In [1]:
# Primero importamos la librería pandas.
import pandas as pd

# Ahora cargamos las bases de datos y las volvemos dataframes.
incidents= pd.read_pickle("incidents.pkl")
officers= pd.read_pickle("officers.pkl")
subjects= pd.read_pickle("subjects.pkl")

b. Genera una tabla que contenga la unión de las 3 tablas. 

In [8]:
# Para poder saber cómo ralizar la union vamos a  ver los nombres de las columnas.
incidents.columns

Index(['case_number', 'date', 'location', 'subject_statuses', 'subject_weapon',
       'subjects', 'subject_count', 'officers', 'officer_count',
       'grand_jury_disposition', 'attorney_general_forms_url', 'summary_url',
       'summary_text', 'latitude', 'longitude'],
      dtype='object')

In [9]:
officers.columns

Index(['case_number', 'race', 'gender', 'last_name', 'first_name',
       'full_name'],
      dtype='object')

In [10]:
subjects.columns

Index(['case_number', 'race', 'gender', 'last_name', 'first_name',
       'full_name'],
      dtype='object')

In [2]:
# Realizamos la union las 3 tabla y agregamos sufijos para distinguir las columnas que se llaman igual.
# Lo voy a hacer en dos partes para no perder la información del sufijo.

union_parcial = incidents.merge(officers, on='case_number', how='inner', suffixes=("_inc", "_off"))
union_final = union_parcial.merge(subjects, on='case_number', how='inner', suffixes=(None, "_sub"))

#Imprimimos el resultado.
union_final

Unnamed: 0,case_number,date,location,subject_statuses,subject_weapon,subjects,subject_count,officers,officer_count,grand_jury_disposition,...,race,gender,last_name,first_name,full_name,race_sub,gender_sub,last_name_sub,first_name_sub,full_name_sub
0,44523A,2013-02-23,3000 Chihuahua Street,Injured,Handgun,"Curry, James L/M",1,"Patino, Michael L/M; Fillingim, Brian W/M",2,No Bill,...,L,M,Patino,Michael,"Patino, Michael",L,M,Curry,James,"Curry, James"
1,44523A,2013-02-23,3000 Chihuahua Street,Injured,Handgun,"Curry, James L/M",1,"Patino, Michael L/M; Fillingim, Brian W/M",2,No Bill,...,W,M,Fillingim,Brian,"Fillingim, Brian",L,M,Curry,James,"Curry, James"
2,121982X,2010-05-03,1300 N. Munger Boulevard,Injured,Handgun,"Chavez, Gabriel L/M",1,"Padilla, Gilbert L/M",1,No Bill,...,L,M,Padilla,Gilbert,"Padilla, Gilbert",L,M,Chavez,Gabriel,"Chavez, Gabriel"
3,605484T,2007-08-12,200 S. Stemmons Freeway,Other,Shotgun,"Salinas, Nick L/M",1,"Poston, Jerry W/M",1,See Summary,...,W,M,Poston,Jerry,"Poston, Jerry",L,M,Salinas,Nick,"Salinas, Nick"
4,384832T,2007-05-26,7900 S. Loop 12,Shoot and Miss,Unarmed,"Smith, James B/M; Dews, Antonio B/M; Spearman,...",3,"Mondy, Michael B/M",1,,...,B,M,Mondy,Michael,"Mondy, Michael",B,M,Smith,James,"Smith, James"
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
372,165193-2016,2016-07-07,801 Main Street,Deceased,Assault Rifle,"Johnson, Micah B/M",1,"Edwards, Henry W/M; Wells, Giovanni B/M; Junge...",12,,...,W,M,Michaels,Mark,"Michaels, Mark",B,M,Johnson,Micah,"Johnson, Micah"
373,165193-2016,2016-07-07,801 Main Street,Deceased,Assault Rifle,"Johnson, Micah B/M",1,"Edwards, Henry W/M; Wells, Giovanni B/M; Junge...",12,,...,W,M,Borchardt,Jeremy,"Borchardt, Jeremy",B,M,Johnson,Micah,"Johnson, Micah"
374,165193-2016,2016-07-07,801 Main Street,Deceased,Assault Rifle,"Johnson, Micah B/M",1,"Edwards, Henry W/M; Wells, Giovanni B/M; Junge...",12,,...,W,M,Craig,Robert,"Craig, Robert",B,M,Johnson,Micah,"Johnson, Micah"
375,165193-2016,2016-07-07,801 Main Street,Deceased,Assault Rifle,"Johnson, Micah B/M",1,"Edwards, Henry W/M; Wells, Giovanni B/M; Junge...",12,,...,W,M,Cannon,Elmar,"Cannon, Elmar",B,M,Johnson,Micah,"Johnson, Micah"


c. Verifica si hay filas duplicadas; si es así, elimínalas.

In [3]:
# Primero vamos a ver si hay filas duplicadas.
print(union_final.duplicated().sum())

# No hay filas duplicadas.

0


d. ¿Cuántos sujetos de género F hay en el DataFrame resultante? 

In [13]:
# Vamos a contar en la columna gender que tiene el sufijo "_sub".
conteo_f = union_final['gender_sub'].value_counts().get('F', 0)

#Imprimimos el resultado.
print(f"Número de sujetos de género 'F': {conteo_f}")

Número de sujetos de género 'F': 9


e. ¿En cuántos números de caso hay por lo menos una sospechosa que sea mujer?

In [28]:
# Filtramos el DataFrame para obtener solo las filas donde el género es 'F'.
casos_con_mujeres = union_final[union_final['gender_sub'] == 'F']

# Usamos unique() para obtener todos los 'case_number' únicos del resultado filtrado  y luego contar cuántos hay (usando len()).
numero_de_casos_unicos = len(casos_con_mujeres['case_number'].unique())

#Imprimimos el resultado.
print(f"Número de casos con por lo menos una sospechosa mujer: {numero_de_casos_unicos}")

Número de casos con por lo menos una sospechosa mujer: 7


f. Genera una tabla pivote que muestre en las filas el género del oficial y en las columnas el
género del subject. ¿Cómo interpretas los valores que muestra esta vista?

In [8]:
# Generamos la tabla pivote.

pivot1 = pd.pivot_table(
    data=union_final,
    index='gender',
    columns='gender_sub',
    values='case_number',
    aggfunc='count'
)

#interpretación:
#Cada celda muestra cuántos casos hubo entre un tipo de oficial y un tipo de sujeto.

#Imprimimos el resultado.
pivot1

gender_sub,F,M
gender,Unnamed: 1_level_1,Unnamed: 2_level_1
F,2,18
M,7,350


2. Para continuar con el desarrollo de este desafío, necesitarás el archivo "Cleaned_DS_Jobs.csv"
a. Carga los datos y crea un DataFrame con ellos.

In [38]:
# Cargamos los datos y creamos el dataframe.
df= pd.read_csv("Cleaned_DS_Jobs.csv")

# Visualizamos el dataframe creado.
df

Unnamed: 0,Job Title,Salary Estimate,Job Description,Rating,Company Name,Location,Headquarters,Size,Type of ownership,Industry,...,company_age,python,excel,hadoop,spark,aws,tableau,big_data,job_simp,seniority
0,Sr Data Scientist,137-171,Description\n\nThe Senior Data Scientist is re...,3.1,Healthfirst,"New York, NY","New York, NY",1001 to 5000 employees,Nonprofit Organization,Insurance Carriers,...,27,0,0,0,0,1,0,0,data scientist,senior
1,Data Scientist,137-171,"Secure our Nation, Ignite your Future\n\nJoin ...",4.2,ManTech,"Chantilly, VA","Herndon, VA",5001 to 10000 employees,Company - Public,Research & Development,...,52,0,0,1,0,0,0,1,data scientist,na
2,Data Scientist,137-171,Overview\n\n\nAnalysis Group is one of the lar...,3.8,Analysis Group,"Boston, MA","Boston, MA",1001 to 5000 employees,Private Practice / Firm,Consulting,...,39,1,1,0,0,1,0,0,data scientist,na
3,Data Scientist,137-171,JOB DESCRIPTION:\n\nDo you have a passion for ...,3.5,INFICON,"Newton, MA","Bad Ragaz, Switzerland",501 to 1000 employees,Company - Public,Electrical & Electronic Manufacturing,...,20,1,1,0,0,1,0,0,data scientist,na
4,Data Scientist,137-171,Data Scientist\nAffinity Solutions / Marketing...,2.9,Affinity Solutions,"New York, NY","New York, NY",51 to 200 employees,Company - Private,Advertising & Marketing,...,22,1,1,0,0,0,0,0,data scientist,na
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
655,Data Scientist,105-167,Summary\n\nWe’re looking for a data scientist ...,3.6,TRANZACT,"Fort Lee, NJ","Fort Lee, NJ",1001 to 5000 employees,Company - Private,Advertising & Marketing,...,31,1,1,1,0,0,1,1,data scientist,na
656,Data Scientist,105-167,Job Description\nBecome a thought leader withi...,0.0,JKGT,"San Francisco, CA",-1,-1,-1,-1,...,-1,0,0,0,0,0,0,0,data scientist,na
657,Data Scientist,105-167,Join a thriving company that is changing the w...,0.0,AccessHope,"Irwindale, CA",-1,-1,-1,-1,...,-1,1,1,1,0,0,1,0,data scientist,na
658,Data Scientist,105-167,100 Remote Opportunity As an AINLP Data Scient...,5.0,ChaTeck Incorporated,"San Francisco, CA","Santa Clara, CA",1 to 50 employees,Company - Private,Advertising & Marketing,...,-1,1,0,1,1,0,0,1,data scientist,na


b. Utiliza la siguiente lista de valores que serán considerados como nulos:
["na", "NA", -1, "0", "-1", "null", "n/a", "N/A", "NULL"]
(hint: utiliza el método replace para reemplazar los valores indicados por np.nan)

In [17]:
# Primero importamos la librería que vamos a utilizar.
import numpy as np

# Creamos la lista con los valores que queremos reemplazar.
valores_a_reemplazar=["na", "NA", -1, "0", "-1", "null", "n/a", "N/A", "NULL"]

# Usamos el método replace para reemplazar los valores que consideramos como nulos en la lista anterior.
df_limpio = df.replace(to_replace=valores_a_reemplazar, value=np.nan, inplace=False)

# Visualizamos el dataframe nuevo, para comprobar que los valores se hayan reemplazado.
df_limpio

Unnamed: 0,Job Title,Salary Estimate,Job Description,Rating,Company Name,Location,Headquarters,Size,Type of ownership,Industry,...,company_age,python,excel,hadoop,spark,aws,tableau,big_data,job_simp,seniority
0,Sr Data Scientist,137-171,Description\n\nThe Senior Data Scientist is re...,3.1,Healthfirst,"New York, NY","New York, NY",1001 to 5000 employees,Nonprofit Organization,Insurance Carriers,...,27.0,0,0,0,0,1,0,0,data scientist,senior
1,Data Scientist,137-171,"Secure our Nation, Ignite your Future\n\nJoin ...",4.2,ManTech,"Chantilly, VA","Herndon, VA",5001 to 10000 employees,Company - Public,Research & Development,...,52.0,0,0,1,0,0,0,1,data scientist,
2,Data Scientist,137-171,Overview\n\n\nAnalysis Group is one of the lar...,3.8,Analysis Group,"Boston, MA","Boston, MA",1001 to 5000 employees,Private Practice / Firm,Consulting,...,39.0,1,1,0,0,1,0,0,data scientist,
3,Data Scientist,137-171,JOB DESCRIPTION:\n\nDo you have a passion for ...,3.5,INFICON,"Newton, MA","Bad Ragaz, Switzerland",501 to 1000 employees,Company - Public,Electrical & Electronic Manufacturing,...,20.0,1,1,0,0,1,0,0,data scientist,
4,Data Scientist,137-171,Data Scientist\nAffinity Solutions / Marketing...,2.9,Affinity Solutions,"New York, NY","New York, NY",51 to 200 employees,Company - Private,Advertising & Marketing,...,22.0,1,1,0,0,0,0,0,data scientist,
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
655,Data Scientist,105-167,Summary\n\nWe’re looking for a data scientist ...,3.6,TRANZACT,"Fort Lee, NJ","Fort Lee, NJ",1001 to 5000 employees,Company - Private,Advertising & Marketing,...,31.0,1,1,1,0,0,1,1,data scientist,
656,Data Scientist,105-167,Job Description\nBecome a thought leader withi...,0.0,JKGT,"San Francisco, CA",,,,,...,,0,0,0,0,0,0,0,data scientist,
657,Data Scientist,105-167,Join a thriving company that is changing the w...,0.0,AccessHope,"Irwindale, CA",,,,,...,,1,1,1,0,0,1,0,data scientist,
658,Data Scientist,105-167,100 Remote Opportunity As an AINLP Data Scient...,5.0,ChaTeck Incorporated,"San Francisco, CA","Santa Clara, CA",1 to 50 employees,Company - Private,Advertising & Marketing,...,,1,0,1,1,0,0,1,data scientist,


c. Elimina todas las filas con datos faltantes. (hint: utiliza el método .dropna())

In [21]:
# Borramos las filas con los datos faltantes y también reseteamos los indices para que no haya saltos.
df_limpio=df_limpio.dropna().reset_index(drop=True)

# Visualizamos el df_limpio.
df_limpio

Unnamed: 0,Job Title,Salary Estimate,Job Description,Rating,Company Name,Location,Headquarters,Size,Type of ownership,Industry,...,company_age,python,excel,hadoop,spark,aws,tableau,big_data,job_simp,seniority
0,Sr Data Scientist,137-171,Description\n\nThe Senior Data Scientist is re...,3.1,Healthfirst,"New York, NY","New York, NY",1001 to 5000 employees,Nonprofit Organization,Insurance Carriers,...,27.0,0,0,0,0,1,0,0,data scientist,senior
1,Senior Research Statistician- Data Scientist,75-131,Acuity is seeking a Senior Research Statistici...,4.8,Acuity Insurance,"Sheboygan, WI","Sheboygan, WI",1001 to 5000 employees,Company - Private,Insurance Carriers,...,95.0,0,0,0,0,0,0,0,data scientist,senior
2,Senior Analyst/Data Scientist,75-131,At Edmunds were driven to make car buying easi...,3.4,Edmunds.com,"Santa Monica, CA","Santa Monica, CA",501 to 1000 employees,Company - Private,Internet,...,54.0,1,1,0,0,1,1,0,data scientist,senior
3,Senior Data Scientist,75-131,Klaviyo is looking for Senior Data Scientists ...,4.8,Klaviyo,"Boston, MA","Boston, MA",201 to 500 employees,Company - Private,Computer Hardware & Software,...,8.0,0,0,0,0,0,0,0,data scientist,senior
4,Senior Data Scientist,75-131,Benson Hill empowers innovators to develop mor...,3.5,Benson Hill,"Saint Louis, MO","Saint Louis, MO",201 to 500 employees,Company - Private,Biotech & Pharmaceuticals,...,8.0,1,1,0,0,0,0,1,data scientist,senior
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,Senior Data Engineer,138-158,Senior Data Engineer\n\nMaster’s degree in Inf...,2.9,Affinity Solutions,"New York, NY","New York, NY",51 to 200 employees,Company - Private,Advertising & Marketing,...,22.0,1,0,1,1,0,1,0,data engineer,senior
76,Senior Data Scientist,80-132,Job Requisition ID #\n20WD40666\nJob Title\nSe...,4.0,Autodesk,"San Francisco, CA","San Rafael, CA",5001 to 10000 employees,Company - Public,Computer Hardware & Software,...,38.0,1,0,1,1,0,0,1,data scientist,senior
77,Senior Data Scientist,87-141,"Secure our Nation, Ignite your Future\n\nJob S...",4.2,ManTech,"Alexandria, VA","Herndon, VA",5001 to 10000 employees,Company - Public,Research & Development,...,52.0,1,0,1,1,0,1,1,data scientist,senior
78,Senior Data Scientist,105-167,"About Us\n\nAt GutCheck, we pioneered agile ma...",3.8,GutCheck,"Denver, CO","Denver, CO",51 to 200 employees,Company - Private,Advertising & Marketing,...,11.0,0,0,0,0,0,0,0,data scientist,senior


d. A partir de la columna “Salary Estimate”, genera dos columnas: Salario Estimado Mínimo
y Máximo. (hint: Utiliza el método apply sobre la columna.)

In [32]:
# Primero definimos la función que vamos a usar. 
def parse_salary(salary_estimate):
    try:
        # Dividimos el rango por el guion '-'.
        min_salario, max_salario =salary_estimate.split('-')
        
        # Convertimos a números enteros.
        return [int(min_salario.strip()), int(max_salario.strip())]
    
    except ValueError:
        # Si la división falla (no hay guion) o la conversión a int falla.
        return [np.nan, np.nan]

# Aplicación y Expansión de Columnas.
df_limpio[['Salario Estimado Mínimo', 'Salario Estimado Máximo']] = \
    df_limpio['Salary Estimate'].apply(parse_salary).apply(pd.Series)

#Imprimimos el resultado.
df_limpio

Unnamed: 0,Job Title,Salary Estimate,Job Description,Rating,Company Name,Location,Headquarters,Size,Type of ownership,Industry,...,excel,hadoop,spark,aws,tableau,big_data,job_simp,seniority,Salario Estimado Mínimo,Salario Estimado Máximo
0,Sr Data Scientist,137-171,Description\n\nThe Senior Data Scientist is re...,3.1,Healthfirst,"New York, NY","New York, NY",1001 to 5000 employees,Nonprofit Organization,Insurance Carriers,...,0,0,0,1,0,0,data scientist,senior,137,171
1,Senior Research Statistician- Data Scientist,75-131,Acuity is seeking a Senior Research Statistici...,4.8,Acuity Insurance,"Sheboygan, WI","Sheboygan, WI",1001 to 5000 employees,Company - Private,Insurance Carriers,...,0,0,0,0,0,0,data scientist,senior,75,131
2,Senior Analyst/Data Scientist,75-131,At Edmunds were driven to make car buying easi...,3.4,Edmunds.com,"Santa Monica, CA","Santa Monica, CA",501 to 1000 employees,Company - Private,Internet,...,1,0,0,1,1,0,data scientist,senior,75,131
3,Senior Data Scientist,75-131,Klaviyo is looking for Senior Data Scientists ...,4.8,Klaviyo,"Boston, MA","Boston, MA",201 to 500 employees,Company - Private,Computer Hardware & Software,...,0,0,0,0,0,0,data scientist,senior,75,131
4,Senior Data Scientist,75-131,Benson Hill empowers innovators to develop mor...,3.5,Benson Hill,"Saint Louis, MO","Saint Louis, MO",201 to 500 employees,Company - Private,Biotech & Pharmaceuticals,...,1,0,0,0,0,1,data scientist,senior,75,131
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,Senior Data Engineer,138-158,Senior Data Engineer\n\nMaster’s degree in Inf...,2.9,Affinity Solutions,"New York, NY","New York, NY",51 to 200 employees,Company - Private,Advertising & Marketing,...,0,1,1,0,1,0,data engineer,senior,138,158
76,Senior Data Scientist,80-132,Job Requisition ID #\n20WD40666\nJob Title\nSe...,4.0,Autodesk,"San Francisco, CA","San Rafael, CA",5001 to 10000 employees,Company - Public,Computer Hardware & Software,...,0,1,1,0,0,1,data scientist,senior,80,132
77,Senior Data Scientist,87-141,"Secure our Nation, Ignite your Future\n\nJob S...",4.2,ManTech,"Alexandria, VA","Herndon, VA",5001 to 10000 employees,Company - Public,Research & Development,...,0,1,1,0,1,1,data scientist,senior,87,141
78,Senior Data Scientist,105-167,"About Us\n\nAt GutCheck, we pioneered agile ma...",3.8,GutCheck,"Denver, CO","Denver, CO",51 to 200 employees,Company - Private,Advertising & Marketing,...,0,0,0,0,0,0,data scientist,senior,105,167


e. Realiza la recodificación de la columna Size con los valores de la siguiente tabla: (hint: utilice reemplazo con diccionario usando el método replace sobre la columna.)

In [34]:
# Definimos el diccionario.
recodificacion = {
    '10000+ employees': 'Mega Empresas',
    '5001 to 10000 employees': 'Grandes Empresas',
    '1001 to 5000 employees': 'Medianas Empresas',
    '501 to 1000 employees': 'Microempresas', 
    '201 to 500 employees': 'Pequeñas Empresas',
    '51 to 200 employees': 'Pequeñas Grandes Empresas', 
    'Unknown': 'Empresas sin Información'
}

# Aplicamos el reemplazo.
df_limpio['Size'] = df_limpio['Size'].replace(recodificacion)

#Imprimimos el resultado.
df_limpio

Unnamed: 0,Job Title,Salary Estimate,Job Description,Rating,Company Name,Location,Headquarters,Size,Type of ownership,Industry,...,hadoop,spark,aws,tableau,big_data,job_simp,seniority,Salario Estimado Mínimo,Salario Estimado Máximo,Size_Recodificado
0,Sr Data Scientist,137-171,Description\n\nThe Senior Data Scientist is re...,3.1,Healthfirst,"New York, NY","New York, NY",Medianas Empresas,Nonprofit Organization,Insurance Carriers,...,0,0,1,0,0,data scientist,senior,137,171,Medianas Empresas
1,Senior Research Statistician- Data Scientist,75-131,Acuity is seeking a Senior Research Statistici...,4.8,Acuity Insurance,"Sheboygan, WI","Sheboygan, WI",Medianas Empresas,Company - Private,Insurance Carriers,...,0,0,0,0,0,data scientist,senior,75,131,Medianas Empresas
2,Senior Analyst/Data Scientist,75-131,At Edmunds were driven to make car buying easi...,3.4,Edmunds.com,"Santa Monica, CA","Santa Monica, CA",Microempresas,Company - Private,Internet,...,0,0,1,1,0,data scientist,senior,75,131,Microempresas
3,Senior Data Scientist,75-131,Klaviyo is looking for Senior Data Scientists ...,4.8,Klaviyo,"Boston, MA","Boston, MA",Pequeñas Empresas,Company - Private,Computer Hardware & Software,...,0,0,0,0,0,data scientist,senior,75,131,Pequeñas Empresas
4,Senior Data Scientist,75-131,Benson Hill empowers innovators to develop mor...,3.5,Benson Hill,"Saint Louis, MO","Saint Louis, MO",Pequeñas Empresas,Company - Private,Biotech & Pharmaceuticals,...,0,0,0,0,1,data scientist,senior,75,131,Pequeñas Empresas
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
75,Senior Data Engineer,138-158,Senior Data Engineer\n\nMaster’s degree in Inf...,2.9,Affinity Solutions,"New York, NY","New York, NY",Pequeñas Grandes Empresas,Company - Private,Advertising & Marketing,...,1,1,0,1,0,data engineer,senior,138,158,Pequeñas Grandes Empresas
76,Senior Data Scientist,80-132,Job Requisition ID #\n20WD40666\nJob Title\nSe...,4.0,Autodesk,"San Francisco, CA","San Rafael, CA",Grandes Empresas,Company - Public,Computer Hardware & Software,...,1,1,0,0,1,data scientist,senior,80,132,Grandes Empresas
77,Senior Data Scientist,87-141,"Secure our Nation, Ignite your Future\n\nJob S...",4.2,ManTech,"Alexandria, VA","Herndon, VA",Grandes Empresas,Company - Public,Research & Development,...,1,1,0,1,1,data scientist,senior,87,141,Grandes Empresas
78,Senior Data Scientist,105-167,"About Us\n\nAt GutCheck, we pioneered agile ma...",3.8,GutCheck,"Denver, CO","Denver, CO",Pequeñas Grandes Empresas,Company - Private,Advertising & Marketing,...,0,0,0,0,0,data scientist,senior,105,167,Pequeñas Grandes Empresas


f. Finalmente, genera una tabla pivote que muestre la media del salario estimado mínimo y la media del salario estimado máximo por tamaño de empresa. (hint: utiliza pd.pivot_table para generar la vista adecuada con las columnas generadas.)

In [37]:
# Generamos la tabla pivote.
df_salario= pd.pivot_table(
    data=df_limpio,
    index='Size',
    values=['Salario Estimado Mínimo', 'Salario Estimado Máximo'],
    aggfunc='mean',  # Indicamos que queremos el promedio.
)

# Para mejorar la lectura quitamos los decimales.
df_salario = df_salario.round(0).astype(int)

#Imprimimos el resultado.
print("\nTabla Pivote: Salario Promedio por Tamaño de Empresa:")
df_salario


Tabla Pivote: Salario Promedio por Tamaño de Empresa:


Unnamed: 0_level_0,Salario Estimado Máximo,Salario Estimado Mínimo
Size,Unnamed: 1_level_1,Unnamed: 2_level_1
Empresas sin Información,110,73
Grandes Empresas,139,92
Medianas Empresas,137,94
Mega Empresas,151,98
Microempresas,146,100
Pequeñas Empresas,141,94
Pequeñas Grandes Empresas,138,101
