---
<h1 align='center'>Integration: Predict Absenteeism on New Data</h1>

This notebook demonstrates how to:

* Load a previously trained **logistic regression model** and custom **scaler**.
* Use the model to **predict excessive absenteeism** on a new dataset that has the same structure as the training data.
* Export results for integration with business intelligence tools like **Tableau**.

---
## 1. Import Required Class

In [3]:
# Import the absenteeism model class from the module
from Absenteeism_Module import *

---
## 2. Load and Initialize Model

In [5]:
# Initialize the model by loading the trained logistic regression model and scaler
model = absenteeism_model('model', 'scaler')  # files must exist in the current working directory

---
## 3. Load and Preprocess New Data

In [7]:
# Load and clean the new absenteeism dataset
# Make sure the dataset format matches the structure used in training
model.load_and_clean_data('absenteeism_new_data.csv')  # Replace with your actual file name

---
## 4. Generate Predictions (Preview)

In [9]:
# Output predicted categories (0 or 1)
model.predicted_output_category()

array([0, 1, 0, 0, 1, 1, 1, 0, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 0, 1, 0, 0,
       1, 1, 0, 1, 0, 1, 0, 1, 0, 0, 0, 0, 1, 1, 1, 0, 0, 0])

In [10]:
# Output predicted probabilities for each observation
model.predicted_probability()

array([0.14004594, 0.86011634, 0.25983593, 0.23083138, 0.71077038,
       0.69594337, 0.56185642, 0.14004594, 0.13006987, 0.51140023,
       0.42763561, 0.63221588, 0.37928795, 0.14004594, 0.07957207,
       0.18164398, 0.63221588, 0.53271359, 0.37055204, 0.54436287,
       0.13546677, 0.06396433, 0.52310519, 0.52310519, 0.06396433,
       0.52572276, 0.39038508, 0.62953916, 0.13546677, 0.62953916,
       0.24448927, 0.13546677, 0.47930791, 0.25520956, 0.96424235,
       0.91040548, 0.78970596, 0.02032853, 0.26229371, 0.06334238])

---
## 5. Attach Predictions to the Dataset

In [12]:
# Output predicted categories (0 or 1)
predicted_results  = model.predicted_outputs()
predicted_results .head()

Unnamed: 0,reason_group_1,reason_group_2,reason_group_3,reason_group_4,month,transportation_expense_dollars,age,body_mass_index,children,pets,probability,prediction
0,0,0.0,0,1,6,179,30,19,0,0,0.140046,0
1,1,0.0,0,0,6,361,28,27,1,4,0.860116,1
2,0,0.0,0,1,6,155,34,25,2,0,0.259836,0
3,0,0.0,0,1,6,179,40,22,2,0,0.230831,0
4,1,0.0,0,0,6,155,34,25,2,0,0.71077,1


#### Understanding the Prediction Outputs

The final DataFrame contains two key output columns:

* **`probability`**:
  This column contains the **predicted probability** (as float values between 0 and 1) that a given employee will be **excessively absent**, meaning they are expected to miss **more than 3 hours** of work. These probabilities are generated by the logistic regression model.

* **`prediction`**:
  This is a **binary classification** (0 or 1) derived from the `probability` column.

  * A value of `1` indicates a predicted probability of **50% or higher**, meaning the employee is likely to be excessively absent.
  * A value of `0` indicates a predicted probability of **less than 50%**, meaning the employee is not expected to be excessively absent.

Together, these columns allow us to interpret both the model's confidence and the final classification decision.

---
## 6. Save Final Output 

In [15]:
# Export the prediction results for Tableau integration or business use
predicted_results.to_csv('absenteeism_predictions.csv', index=False)

---