# EnginML Tutorial Notebook

This notebook demonstrates how to use the `EnginML` package for common machine learning tasks. The package is designed to make machine learning accessible to engineers with minimal programming experience.

## Setup

First, let's make sure the package is installed and import the necessary functions.

In [None]:
# Uncomment to install the package if needed
# !pip install EnginML

# Import required libraries
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# Import EnginML functions
try:
    from EnginML import fit_regression, fit_classification, fit_clustering, load_csv_or_excel, save_report
except ImportError:
    # If package is not installed, try relative import
    import sys
    import os
    sys.path.append(os.path.dirname(os.path.dirname(os.path.abspath('__file__'))))
    from EnginML import fit_regression, fit_classification, fit_clustering, load_csv_or_excel, save_report

## 1. Regression Example

Let's start with a regression example using our sample dataset.

In [None]:
# Load the sample data
df = pd.read_csv('sample_data.csv')

# Display the first few rows
print("Dataset shape:", df.shape)
df.head()

In [None]:
# Prepare features and target for regression
X = df.drop(columns=['target']).values
y = df['target'].values
feature_names = df.drop(columns=['target']).columns.tolist()

print("Features:", feature_names)
print("X shape:", X.shape)
print("y shape:", y.shape)

In [None]:
# Fit a regression model
print("Fitting Random Forest regression model...")
result = fit_regression(X, y, model="random_forest")

# Print metrics
print("\nModel Performance:")
for name, value in result["metrics"].items():
    print(f"  {name}: {value:.4f}")

In [None]:
# Try a different model
print("Fitting KNN regression model...")
result_knn = fit_regression(X, y, model="knn")

# Print metrics
print("\nModel Performance:")
for name, value in result_knn["metrics"].items():
    print(f"  {name}: {value:.4f}")

In [None]:
# Save the report
report_path = save_report(
    result, X, y, 
    task_type="regression",
    feature_names=feature_names,
    output_path="regression_notebook_report.html"
)
print(f"Report saved to: {report_path}")

## 2. Classification Example

Now let's create a synthetic classification dataset and run a classification model.

In [None]:
# Create a synthetic classification dataset
from sklearn.datasets import make_classification

X_cls, y_cls = make_classification(
    n_samples=100, 
    n_features=4, 
    n_informative=2, 
    n_redundant=0, 
    n_classes=2,
    random_state=42
)

# Create feature names
cls_feature_names = [f'feature{i+1}' for i in range(X_cls.shape[1])]

# Display dataset info
print("Classification dataset shape:", X_cls.shape)
print("Target distribution:")
pd.Series(y_cls).value_counts()

In [None]:
# Fit a classification model
print("Fitting Random Forest classification model...")
cls_result = fit_classification(X_cls, y_cls, model="random_forest")

# Print metrics
print("\nModel Performance:")
for name, value in cls_result["metrics"].items():
    print(f"  {name}: {value:.4f}")

In [None]:
# Save the classification report
cls_report_path = save_report(
    cls_result, X_cls, y_cls, 
    task_type="classification",
    feature_names=cls_feature_names,
    output_path="classification_notebook_report.html"
)
print(f"Report saved to: {cls_report_path}")

## 3. Clustering Example

Finally, let's perform clustering on our sample data.

In [None]:
# Create a synthetic clustering dataset
from sklearn.datasets import make_blobs

X_clust, y_true = make_blobs(
    n_samples=200, 
    centers=3, 
    cluster_std=0.60, 
    random_state=42
)

# Plot the true clusters
plt.figure(figsize=(8, 6))
plt.scatter(X_clust[:, 0], X_clust[:, 1], c=y_true, cmap='viridis', alpha=0.7)
plt.title('True Clusters')
plt.colorbar(label='Cluster')
plt.show()

In [None]:
# Fit a clustering model
print("Fitting K-Means clustering model...")
clust_result = fit_clustering(X_clust, model="kmeans", n_clusters=3)

# Print metrics
print("\nModel Performance:")
for name, value in clust_result["metrics"].items():
    print(f"  {name}: {value:.4f}")

In [None]:
# Plot the predicted clusters
plt.figure(figsize=(8, 6))
plt.scatter(X_clust[:, 0], X_clust[:, 1], c=clust_result['labels'], cmap='viridis', alpha=0.7)
plt.title('Predicted Clusters (K-Means)')
plt.colorbar(label='Cluster')
plt.show()

In [None]:
# Try a different clustering algorithm
print("Fitting Gaussian Mixture Model...")
gmm_result = fit_clustering(X_clust, model="gmm", n_clusters=3)

# Print metrics
print("\nModel Performance:")
for name, value in gmm_result["metrics"].items():
    print(f"  {name}: {value:.4f}")

# Plot the predicted clusters
plt.figure(figsize=(8, 6))
plt.scatter(X_clust[:, 0], X_clust[:, 1], c=gmm_result['labels'], cmap='viridis', alpha=0.7)
plt.title('Predicted Clusters (GMM)')
plt.colorbar(label='Cluster')
plt.show()

In [None]:
# Save the clustering report
clust_report_path = save_report(
    clust_result, X_clust, 
    task_type="clustering",
    feature_names=['Feature 1', 'Feature 2'],
    output_path="clustering_notebook_report.html"
)
print(f"Report saved to: {clust_report_path}")

## Conclusion

This notebook demonstrated how to use the `EnginML` package for three common machine learning tasks:

1. **Regression**: Predicting continuous values
2. **Classification**: Predicting categorical values
3. **Clustering**: Grouping similar data points

The package provides a simple, consistent interface for these tasks, making machine learning accessible to engineers with minimal programming experience. The HTML reports generated by the package provide visualizations and metrics to help understand model performance.

For more information, see the README.md file and the examples directory in the package repository.