# Mentorship Rewards Optimization

**Context**
The Rewards Program team is preparing for a major evaluation of how effectively the rewards are driving user engagement and satisfaction. They need insights to help optimize reward offerings, identify key segments, and understand the overall impact of the reward types on user behavior. Accurate data analysis is crucial for making data driven decisions on reward offerings and tailoring them to different user segments.

In [64]:
# Import the required libraries
import pandas as pd
import numpy as np

In [99]:
# Load the data set
path = 'C:\\Users\\ADMIN\\MentorshipRewards_Analysis\\MentorshipRewards_analysis\\assets\\Mentorship_Sessions.xlsx'
df = pd.read_excel(path, sheet_name = 'Mentorship_Sessions')

# Explore the data set
print(df)
df.dtypes

     Unnamed: 0  Mentor_ID     Mentor_Name    Mentee_Name  Session_Number  \
0        2001.0     1003.0     Sarah Clark      Bob Brown             1.0   
1        2002.0     1003.0     Emily Davis    Carol White             2.0   
2        2003.0     1005.0    James Wilson     Jane Smith             2.0   
3           4.0     1005.0  David Thompson            NaN             2.0   
4        2001.0     1004.0     Emily Davis      Bob Brown             1.0   
..          ...        ...             ...            ...             ...   
104      2002.0     1003.0     Michael Lee    Carol White             2.0   
105        54.0     1003.0     Michael Lee  Alice Johnson             2.0   
106      2003.0     1005.0     Michael Lee     Jane Smith             1.0   
107        46.0     1005.0     Michael Lee    Carol White             2.0   
108        45.0     1004.0     Michael Lee     Jane Smith             1.0   

     Session_Duration_Min Job_Info_Completed Session_Date  Points_Awarded  

Unnamed: 0              float64
Mentor_ID               float64
Mentor_Name              object
Mentee_Name              object
Session_Number          float64
Session_Duration_Min    float64
Job_Info_Completed       object
Session_Date             object
Points_Awarded          float64
dtype: object

# Task 1. Data Cleaning

In [100]:
###--- TASK 1: DATA CLEANING ---###
# 1. Rename Unnamed Columns
df.rename(columns={'Unnamed: 0': 'Mentee_ID'}, inplace=True)

# Verify changes
print(df.columns)

Index(['Mentee_ID', 'Mentor_ID', 'Mentor_Name', 'Mentee_Name',
       'Session_Number', 'Session_Duration_Min', 'Job_Info_Completed',
       'Session_Date', 'Points_Awarded'],
      dtype='object')


In [101]:
# 2. Handling Missing Values
# Check for missing values
MissingVals1 = df.isnull().sum()
print(MissingVals1)


Mentee_ID                 1
Mentor_ID                 1
Mentor_Name               0
Mentee_Name               2
Session_Number            1
Session_Duration_Min      2
Job_Info_Completed        1
Session_Date              1
Points_Awarded          109
dtype: int64


In [102]:
# Handle the missing values by dropping or filling in rows/columns
df.dropna(subset=['Mentee_ID','Mentor_ID', 'Mentee_Name', 'Job_Info_Completed', 'Session_Date'])

# Fill NaN values in Session Number and duration with mean
Session_Number_Mean = df['Session_Number'].mean()
Session_Duration_Mean = df['Session_Duration_Min'].mean()
df['Session_Number'] = df['Session_Number'].fillna(Session_Number_Mean)
df['Session_Duration_Min'] = df['Session_Duration_Min'].fillna(Session_Duration_Mean)
df['Points_Awarded'] = df['Points_Awarded'].fillna(0)  # Replace with starting value of 0

# Verify Changes
MissingVals2 = df.isnull().sum()
print(MissingVals2)

Mentee_ID               1
Mentor_ID               1
Mentor_Name             0
Mentee_Name             2
Session_Number          0
Session_Duration_Min    0
Job_Info_Completed      1
Session_Date            1
Points_Awarded          0
dtype: int64


In [91]:
# 3. Handling duplicates
duplicates = df.duplicated().sum()
print(f"Total duplicates: {duplicates}")

Total duplicates: 0


In [106]:
# 4. Correct the data types
df['Mentee_Name'] = df['Mentee_Name'].astype(str)
df['Mentor_Name'] = df['Mentor_Name'].astype(str)
df['Mentee_ID'] = df['Mentee_ID'].astype(str).apply(lambda x: str(x).split('.')[0])
df['Mentor_ID'] = df['Mentor_ID'].astype(str).apply(lambda x: str(x).split('.')[0])
df['Session_Date'] = pd.to_datetime(df['Session_Date'], format='%Y-%m-%d')
df['Session_Number'] = df['Session_Number'].astype(int)
df['Session_Duration_Min'] = df['Session_Duration_Min'].astype(int)
df['Points_Awarded'] = df['Points_Awarded'].astype(int)


# Verify Changes
print(df.dtypes)
print(df.head())

Mentee_ID                       object
Mentor_ID                       object
Mentor_Name                     object
Mentee_Name                     object
Session_Number                   int32
Session_Duration_Min             int32
Job_Info_Completed              object
Session_Date            datetime64[ns]
Points_Awarded                   int32
dtype: object
  Mentee_ID Mentor_ID     Mentor_Name  Mentee_Name  Session_Number  \
0      2001      1003     Sarah Clark    Bob Brown               1   
1      2002      1003     Emily Davis  Carol White               2   
2      2003      1005    James Wilson   Jane Smith               2   
3         4      1005  David Thompson          nan               2   
4      2001      1004     Emily Davis    Bob Brown               1   

   Session_Duration_Min Job_Info_Completed Session_Date  Points_Awarded  
0                    40                Yes   2023-01-01               0  
1                    30                Yes   2023-01-08           

In [113]:
# 5. Standardize the Job_Info_Completed variable
df['Job_Info_Completed'] = df['Job_Info_Completed'].replace({'Yes': 'Yes', 'No': 'No'})


0      Yes
1      Yes
2      Yes
3      Yes
4       No
      ... 
104     No
105    Yes
106    Yes
107    Yes
108     No
Name: Job_Info_Completed, Length: 109, dtype: object

In [114]:
# Save the cleaned data set
df.to_excel('Mentorship_Session_Cleaned.xlsx', index=False)