# 2023: Week 32 - HR Month - Reshaping Generations

### Inputs
1. The list of generations, with start and end date (generations.csv)
2. The employee dimension table from last week (ee_dim_v2.csv)
3. The DC/month assignments from last week (ee_monthly_v2.csv)

 ### Requirements
 - Input the data
 - Add a new column, generation_name, which includes the generation name and the start/end years, example: “Generation X (1965-1980)”
 - -  If the generation doesn’t have a start year, the text should be “(born in or before XXXX)”
 - - If the generation doesn’t have an end year, the text should be “(born in or after XXXX)”
 - Calculate the employee’s birth year from the date_of_birth
 - Join the employee data and generation data to get the generation name for each employee
  - - If the employee’s birth date is missing, the generation_name should be “Not provided”
 - Join the monthly data to the employee data on employee_id.
 - Calculate the employee’s age (in full years) as of the month_end_date.
 - Calculate the employee’s age range, in 5-year increments, and name that column age_range:
 - -  Employees under 20 should be grouped into “Under 20 years”
 - -  Employees between 20 and 69 should be grouped into 5-year increments (“20-24 years”, “25-29 years”, etc.)
 - -  Employees 70 and over should be grouped into “70+ years”
 - -  If the employee’s birth date is missing, the age_range should be “Not provided"
 - Output the data (two datasets):
 - -  For the the employee data, keep only the original columns + the new generation_name column, and remove the date_of_birth and birth year columns
 - -  For the monthly data, keep only the original columns + the new age_range column. - 

# Input data and clean

In [1]:
import os
import pandas as pd
import numpy as np
import datetime as dt
dim_df = pd.read_csv('ee_dim_v2.csv')
monthly_df = pd.read_csv('ee_monthly_v2.csv')
gen_df = pd.read_csv('generations.csv')

In [2]:
dim_df.columns = dim_df.columns.str.lower().str.strip().str.replace(' ','_')
monthly_df.columns = monthly_df.columns.str.lower().str.strip().str.replace(' ','_')
gen_df.columns = gen_df.columns.str.lower().str.strip().str.replace(' ','_')

In [3]:
dim_df.date_of_birth=pd.to_datetime(dim_df.date_of_birth,format='%d/%m/%Y')
dim_df.hire_date=pd.to_datetime(dim_df.hire_date,format='%d/%m/%Y')
dim_df.leave_date=pd.to_datetime(dim_df.leave_date,format='%d/%m/%Y')

In [4]:
monthly_df.month_end_date=pd.to_datetime(monthly_df.month_end_date,format='%d/%m/%Y')
monthly_df.hire_date=pd.to_datetime(monthly_df.hire_date,format='%d/%m/%Y')
monthly_df.leave_date=pd.to_datetime(monthly_df.leave_date,format='%d/%m/%Y')

### Add Generation name

In [5]:
# Get birth years

dim_df['birth_year']=dim_df.date_of_birth.dt.year
gen_df.start_year=gen_df.start_year.fillna(0)
gen_df.end_year=gen_df.end_year.fillna(0)

In [6]:
# Create detailed column eg: Silent Generation (1928 - 1945)

gen_df['generation_name'] = gen_df.generation + ' ('+ gen_df.start_year.astype('int').astype('str') + ' - '+gen_df.end_year.astype('int').astype('str') + ')'
gen_df['generation_name'] = np.where(gen_df.start_year==0,gen_df.generation + ' (Born in or before ' + gen_df.end_year.astype('int').astype('str') +')',gen_df.generation_name)
gen_df['generation_name'] = np.where(gen_df.end_year==0,gen_df.generation + ' (Born in or after ' + gen_df.start_year.astype('int').astype('str') +')',gen_df.generation_name)
    

In [7]:
# recall generation name

def get_generation(y):
  get_year = y
  min_date=gen_df[gen_df.start_year!=0].start_year.min()
  max_date=gen_df[gen_df.start_year!=0].start_year.max()
  
  try:
    if get_year<= min_date:
        a = gen_df[gen_df.start_year==0].generation_name.iloc[0]

    elif get_year>= max_date:
        a = gen_df[gen_df.end_year==0].generation_name.iloc[0]

    else: 
        a = gen_df[(get_year>= gen_df.start_year)&
          (get_year<= gen_df.end_year)].generation_name.iloc[0]
        
  except:
    a='Not Provided'
      
  return a

# Apply function and generate column in dim_df
dim_df['generation_name']=dim_df.birth_year.apply(lambda x:get_generation(x))

In [8]:
dim_df

Unnamed: 0,employee_id,guid,first_name,last_name,date_of_birth,nationality,gender,email,hire_date,leave_date,birth_year,generation_name
0,E00001,44eca9c9-0081-4d69-906e-73285b7f6dd2,Terry,Cooper,1999-05-17,US,male,terry.cooper@example.com,2022-09-07,2022-08-30,1999.0,Generation Z (1997 - 2012)
1,E00003,16d3207c-8a95-4abb-9d6b-46215ab73d83,William,Phillips,1998-08-14,GB,male,william.phillips@example.com,2023-03-20,2023-06-20,1998.0,Generation Z (1997 - 2012)
2,E00006,afededdb-6441-423b-9917-872e22597dd5,Patsy,Davis,2000-11-06,US,female,patsy.davis@example.com,2022-09-04,2023-06-10,2000.0,Generation Z (1997 - 2012)
3,E00007,a55792b2-b0ac-464e-bcd9-68865662156b,Florence,Hart,2000-12-03,GB,female,florence.hart@example.com,2022-04-22,NaT,2000.0,Generation Z (1997 - 2012)
4,E00009,3af26094-7a1a-4bd9-ad56-a158670fb732,Jayden,Wells,2000-10-16,US,male,jayden.wells@example.com,2023-03-17,NaT,2000.0,Generation Z (1997 - 2012)
...,...,...,...,...,...,...,...,...,...,...,...,...
532,E01511,59ce341d-ca3e-43a3-aead-76f137c968f3,Stanley,Montgomery,1956-06-16,GB,male,stanley.montgomery@example.com,2022-03-27,2023-01-21,1956.0,Baby Boomers (1946 - 1964)
533,E01514,8a3ad32b-ad80-4b33-b246-4396d9eeffb0,Melvin,Williams,1947-12-28,GB,male,melvin.williams@example.com,2023-03-03,NaT,1947.0,Baby Boomers (1946 - 1964)
534,E01522,fa3d652f-b03e-40c6-a43a-cf8cc16e5867,Heather,Morgan,1946-10-26,GB,female,heather.morgan@example.com,2022-12-03,2023-01-16,1946.0,Baby Boomers (1946 - 1964)
535,E01523,22e7b134-44d5-480f-81de-77acd5a9a4d5,Darrell,Little,1946-12-10,GB,male,darrell.little@example.com,2019-02-28,2020-06-18,1946.0,Baby Boomers (1946 - 1964)
