# Exploratory data analysis - Answering Questions

El departamento de recursos humanos de una empresa multinacional ha almacenado los datos de las promociones internas del último año. Con estos datos la empresa quiere conocer si existen patrones determinados a la hora de promocionar a un empleado o no. Además esta empresa quiere saber si puede tomar alguna medida en el futuro para orientar la mejora de las carreras profesionales de sus empleados.

Para ello la empresa os pide:

* Realizar un análisis exploratorio de los datos detallando aquellos aspectos más relevantes que hayáis encontrado.
* Construir un modelo de clasificación que prediga la probabilidad de que un empleado sea promocionado o no basandonos en el histórico que tenemos.
* Desarrollar un cuadro de mando con Dash que resuma los aspectos más relevantes que hayáis extraido en el análisis exploratorio y pueda aconsejar a un empleado en las acciones que puede tomar para incrementar su probabilidad de ascenso.

¿Qué recomendaciones le daríais al departamento de recursos humanos basándoos en los datos?

## Información de los datos:
Las variables que tenemos en los datos son las siguientes:

* employee_id: Identificador del empleado
* department: Departamento del empleado
* region: Región del empleado
* educacion: Nivel de estudios
* gender: Género del empleado
* recruitment_channel: Manera en la que el empleado ha sido contratado
* no_of_trainings: Número de formaciones que ha realizado el empleado en el último año
* age: Edad del empleado
* previous_year_rating: Puntuación obtenida en la evaluación durante los años anteriores
* length_of_service: Años de servicio
* awards_won: Si ha ganado algún premio durante el último año
* avg_training_score: Puntuación media de las formaciones realizadas
* is_promoted: 1 si ha sido ascendido y 0 en caso contrario.

In [1]:
import pandas as pd
import numpy as np
import plotly.graph_objects as go

In [2]:
df = pd.read_csv('../data/trabajo1.csv')
df.head(3)

Unnamed: 0,employee_id,department,region,education,gender,recruitment_channel,no_of_trainings,age,previous_year_rating,length_of_service,awards_won,avg_training_score,is_promoted
0,65438,Sales & Marketing,region_7,Master's & above,f,sourcing,1,35,5.0,8,0,49.0,0
1,65141,Operations,region_22,Bachelor's,m,other,1,30,5.0,4,0,60.0,0
2,7513,Sales & Marketing,region_19,Bachelor's,m,sourcing,1,34,3.0,7,0,50.0,0


# How is each department constituted?

In [3]:
df_department = pd.read_pickle('../data/datos_departamentos.pkl')
df_department.head(3)

Unnamed: 0,department,total_people,male_staff,female_staff,total_promotions,promotions_male_staff,promotions_female_staff,percentage_promotions,mean_age,median_age,mean_prev_year_rating,mean_awards_won
0,Sales & Marketing,16840,13686,3154,1213,1037,176,0.072031,34.860629,33.0,3.067937,0.021437
1,Operations,11348,6671,4677,1023,581,442,0.090148,36.073669,35.0,3.632156,0.023088
2,Technology,7138,4350,2788,768,491,277,0.107593,34.86719,33.0,3.158677,0.025918


In [63]:
traces = [go.Scatter(x= df_department[df_department['department']==department]['percentage_promotions'],
                     y = df_department[df_department['department']==department]['mean_prev_year_rating'],
                     mode = 'markers',
                     marker_size = df_department[df_department['department']==department]['total_people']/100, 
                     hovertemplate='<b>{0}</b><br><br>'.format(department)+ 
                                '<b>T. People:</b> {0}<br>'.format(df_department[df_department['department']=="HR"]['total_people'].values[0]) + 
                                '<b>Promotion:</b> {0}%<br>'.format(round(df_department[df_department['department']==department]['percentage_promotions'].values[0]*100,2))+
                                '<b>Y. Ranking:</b> %{y:.1f}<br>',
                     showlegend = False,
                     name = ''
                    ) for department in df_department['department']]


data = traces
layout = go.Layout(title = "Department constitution", xaxis_title = "% of promotions", yaxis_title = "Prev. year rating")

fig = go.Figure(data = data, layout = layout)
fig.show()


# How does each department hire?

In [64]:
df.head(3)

Unnamed: 0,employee_id,department,region,education,gender,recruitment_channel,no_of_trainings,age,previous_year_rating,length_of_service,awards_won,avg_training_score,is_promoted
0,65438,Sales & Marketing,region_7,Master's & above,f,sourcing,1,35,5.0,8,0,49.0,0
1,65141,Operations,region_22,Bachelor's,m,other,1,30,5.0,4,0,60.0,0
2,7513,Sales & Marketing,region_19,Bachelor's,m,sourcing,1,34,3.0,7,0,50.0,0


In [70]:
departments = list(df['department'].unique())

In [71]:
recruitment_channels = list(df['recruitment_channel'].unique())

In [94]:
departments

['Sales & Marketing',
 'Operations',
 'Technology',
 'Analytics',
 'R&D',
 'Procurement',
 'Finance',
 'HR',
 'Legal']

Los separamos en grandes y pequeños para que la visualización quede más clara

In [106]:
departments_large = ["Sales & Marketing", "Operations", "Technology", "Procurement"]

In [107]:
departments_small = [dep for dep in departments if dep not in departments_large]

In [112]:
def generateIcicleDepartmentAndRecruitment(departments_to_generate, title):
    labels = ["Departments"]
    parents = [""]
    ids = ["Departments"]
    values = [0] # se inicializa a cero, luego se cambiara
    tot_value = 0
    for department in departments_to_generate:
        parents.append("Departments")
        labels.append(department)
        ids.append(department)


        values.append(df[df['department']==department]['department'].count())
        tot_value = tot_value + df[df['department']==department]['department'].count()

        for recruitment_channel in recruitment_channels:
            labels.append(recruitment_channel)
            parents.append(department)
            ids.append(department + " - " + recruitment_channel)

            values.append(df[(df['department']==department) & (df['recruitment_channel']==recruitment_channel)]['recruitment_channel'].count())

    values[0] = tot_value
    
    fig =go.Figure(go.Icicle(
    ids = ids,
    labels=labels,
    parents=parents,
    values=values,
    branchvalues="total",
    root_color="lightgrey"
    ))

    fig.update_layout(margin = dict(t=50, l=25, r=25, b=25), title = title)

    fig.show()

In [114]:
generateIcicleDepartmentAndRecruitment(departments_large, "Larger company departments")

In [115]:
generateIcicleDepartmentAndRecruitment(departments_small, "Smaller company departments")