**OBJECTIVES:**

Our aim is to predict the reason for leaving a job (WHYLEFTN) using a Random Forest Classifier model.

In [1]:
import seaborn as sns
import matplotlib.pyplot as plt
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import LabelEncoder, StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, classification_report

# Load the dataset
df = pd.read_csv('/content/sasdata.csv')








In [2]:
# Define the target variable
target = 'WHYLEFTN'

# Select features (excluding the target and irrelevant columns)
features = [col for col in df.columns if col != target and col not in ['Unnamed: 0', 'REC_NUM', 'SURVYEAR', 'SURVMNTH']]

# **Instead of dropping rows with any missing values, impute them first.**
# Create an imputer object with 'most_frequent' strategy for categorical features
imputer = SimpleImputer(strategy='most_frequent')
# Impute missing values in the selected features and target
df_imputed = pd.DataFrame(imputer.fit_transform(df[[target] + features]), columns=[target] + features)

# **Now proceed with encoding and splitting using the imputed DataFrame**
# Encode categorical features
encoder = LabelEncoder()
for feature in features:
    if df_imputed[feature].dtype == 'object':  # Only encode object (string) columns
        df_imputed[feature] = encoder.fit_transform(df_imputed[feature])

# Split data into training and testing sets
X = df_imputed[features]
y = df_imputed[target]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale numerical features (if needed)
# scaler = StandardScaler()
# X_train = scaler.fit_transform(X_train)
# X_test = scaler.transform(X_test)

# Train the RandomForestClassifier model
model = RandomForestClassifier(random_state=42)
model.fit(X_train, y_train)

# Make predictions
y_pred = model.predict(X_test)

# Evaluate the model
accuracy = accuracy_score(y_test, y_pred)
report = classification_report(y_test, y_pred)

print(f"Accuracy: {accuracy}")
print(report)

Accuracy: 0.9832132184918263
              precision    recall  f1-score   support

         0.0       0.83      0.65      0.73        54
         1.0       0.77      0.94      0.85        69
         2.0       0.55      0.55      0.55        11
         3.0       0.80      0.57      0.67        14
         4.0       0.90      0.46      0.61        41
         5.0       0.95      1.00      0.98       313
         6.0       0.73      0.76      0.74        70
         7.0       1.00      1.00      1.00       199
         8.0       0.83      0.50      0.62        10
         9.0       0.99      1.00      1.00     21433
        10.0       0.65      0.65      0.65       233
        11.0       0.60      0.12      0.19        26
        12.0       0.64      0.57      0.60       203
        13.0       0.55      0.21      0.31        80

    accuracy                           0.98     22756
   macro avg       0.77      0.64      0.68     22756
weighted avg       0.98      0.98      0.98     227

**CONCLUSION**


For the Government:

Identify key drivers of job loss and market trends.
Target resources for job seekers and skills development.
Evaluate and adjust labor market policies.


For Citizens:

Personalized career guidance and early intervention for job seekers.
Improved job satisfaction and working conditions.


Better economic outlook and reduced unemployment.
Ultimately, the model can help the Canadian government create a more dynamic and responsive labor market that benefits both individuals and the economy as a whole