# Model Prototyping

This notebook prototypes a predictive model for crew fatigue. We utilize a simple linear regression as a baseline model. The notebook includes data preprocessing, model training, evaluation, and a brief discussion on hyperparameter tuning and potential next steps.

**Objectives:**
- Build a baseline predictive model for crew fatigue.
- Evaluate model performance using appropriate metrics.
- Lay the groundwork for more advanced modeling approaches.

In [1]:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import joblib

# Load the processed crew data
df = pd.read_csv('../datasets/processed/crew_data_processed.csv')
print('Data loaded. Shape:', df.shape)

# Define features and target
# (In a real scenario, multiple features would be used. Here, we use 'fatigue_score' as both feature and target as a placeholder.)
X = df[['fatigue_score']]
y = df['fatigue_score']

# Split the data into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
print('Training set size:', X_train.shape, 'Test set size:', X_test.shape)

In [2]:
# Train a simple linear regression model
model = LinearRegression()
model.fit(X_train, y_train)

# Evaluate the model on the test set
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print('Mean Squared Error on Test Set:', mse)

# Display model parameters
print('Model Coefficients:', model.coef_)
print('Model Intercept:', model.intercept_)

In [3]:
# Save the trained model for future use
joblib.dump(model, '../models/crew_fatigue_model.pkl')
print('Model saved to ../models/crew_fatigue_model.pkl')

## Next Steps and Considerations

- **Hyperparameter Tuning:** Explore grid search or random search methods to optimize model parameters.
- **Feature Engineering:** Incorporate additional features from the dataset to improve predictive power.
- **Model Evaluation:** Use additional metrics (e.g., MAE, R²) and cross-validation to better assess performance.
- **Advanced Models:** Consider more complex models (e.g., ensemble methods, neural networks) if baseline performance is insufficient.

## Conclusion

The baseline predictive model has been successfully prototyped and evaluated. The results, along with the outlined next steps, provide a strong foundation for further model refinement and development within UCODTS.