### Predict the employee attrition in an organization based on the following features. The features and the dataset are given below. use a classification model with KNN algorithm

Features:
- Age: Age of the employee (numerical).
- JobRole: The job role/position of the employee (categorical).
- MonthlyIncome: Employee's monthly salary (numerical).
- JobSatisfaction: A rating from 1 to 4 indicating the employee's satisfaction with the job (numerical).
- YearsAtCompany: Number of years the employee has been at the company (numerical).
- Attrition: Target label indicating whether the employee left the company (1 for attrition, 0 for no attrition)
```csv
Age,JobRole,MonthlyIncome,JobSatisfaction,YearsAtCompany,Attrition
29,Sales Executive,4800,3,4,1
35,Research Scientist,6000,4,8,0
40,Laboratory Technician,3400,2,6,0
28,Sales Executive,4300,3,3,1
45,Manager,11000,4,15,0
25,Research Scientist,3500,1,2,1
50,Manager,12000,4,20,0
30,Sales Executive,5000,2,5,0
37, Laboratory Technician,3100,2,9,0
26, Research Scientist,4500,3,2,1
```

In [None]:
# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.preprocessing import StandardScaler



### Load the data

In [None]:
df = pd.read_csv('dataset/EmployeeAtrition.csv')
df

Unnamed: 0,Age,JobRole,MonthlyIncome,JobSatisfaction,YearsAtCompany,Attrition
0,29,Sales Executive,4800,3,4,1
1,35,Research Scientist,6000,4,8,0
2,40,Laboratory Technician,3400,2,6,0
3,28,Sales Executive,4300,3,3,1
4,45,Manager,11000,4,15,0
5,25,Research Scientist,3500,1,2,1
6,50,Manager,12000,4,20,0
7,30,Sales Executive,5000,2,5,0
8,37,Laboratory Technician,3100,2,9,0
9,26,Research Scientist,4500,3,2,1


In [None]:
X = df[['Age', 'MonthlyIncome','JobSatisfaction','YearsAtCompany']]
# One-hot encode the 'JobRole' categorical variable
X_cat = pd.get_dummies(df['JobRole'], prefix='JobRole', drop_first=True)

X = pd.concat([X, X_cat], axis=1)
print(X)
Y = df[['Attrition']]
Y

   Age  MonthlyIncome  JobSatisfaction  YearsAtCompany  JobRole_Manager  \
0   29           4800                3               4            False   
1   35           6000                4               8            False   
2   40           3400                2               6            False   
3   28           4300                3               3            False   
4   45          11000                4              15             True   
5   25           3500                1               2            False   
6   50          12000                4              20             True   
7   30           5000                2               5            False   
8   37           3100                2               9            False   
9   26           4500                3               2            False   

   JobRole_Research Scientist  JobRole_Sales Executive  
0                       False                     True  
1                        True                    False  
2  

Unnamed: 0,Attrition
0,1
1,0
2,0
3,1
4,0
5,1
6,0
7,0
8,0
9,1


In [None]:
X_train, X_test, y_train, y_test = train_test_split(X, Y, test_size=0.2, random_state=42)
X_train.shape, X_test.shape, y_train.shape, y_test.shape
print(X_train)

print(X_test)

   Age  MonthlyIncome  JobSatisfaction  YearsAtCompany  JobRole_Manager  \
5   25           3500                1               2            False   
0   29           4800                3               4            False   
7   30           5000                2               5            False   
2   40           3400                2               6            False   
9   26           4500                3               2            False   
4   45          11000                4              15             True   
3   28           4300                3               3            False   
6   50          12000                4              20             True   

   JobRole_Research Scientist  JobRole_Sales Executive  
5                        True                    False  
0                       False                     True  
7                       False                     True  
2                       False                    False  
9                        True          

### Scaling the categorical values

In [None]:
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

### KNN Algorithm

In [None]:
from sklearn.metrics import accuracy_score, classification_report
# Train KNN
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train_scaled, y_train)      # y_train is a Series (1-D)

# Predict & evaluate
y_pred = knn.predict(X_test_scaled)
print('Accuracy:', accuracy_score(y_test, y_pred))
print(classification_report(y_test, y_pred))

# test with sample data
sample_data = np.array([[35, 6000, 4, 5, 0, 0, 0, 0, 0, 1, 0

Accuracy: 0.5
              precision    recall  f1-score   support

           0       1.00      0.50      0.67         2
           1       0.00      0.00      0.00         0

    accuracy                           0.50         2
   macro avg       0.50      0.25      0.33         2
weighted avg       1.00      0.50      0.67         2



  return self._fit(X, y)
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
  _warn_prf(average, modifier, f"{metric.capitalize()} is", result.shape[0])
