# <font color="#418FDE" size="6.5" uppercase>**From Idea To Use**</font>

>Last update: 20260201.
    
By the end of this Lecture, you will be able to:
- Outline the main stages of a simple machine learning project lifecycle. 
- Relate each stage of the lifecycle to concepts learned earlier in the course. 
- Explain why ongoing monitoring and revision are important after a model is first deployed. 


## **1. ML Project Lifecycle**

### **1.1. Defining the ML Problem**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_01_01.jpg?v=1769975369" width="250">



>* Turn vague AI ideas into specific problems
>* Define inputs, outputs, stakeholders, and real use

>* Turn plain-language goals into specific ML tasks
>* Define examples, features, labels, and practical constraints

>* Identify ethical, legal, and data constraints early
>* Ensure models remain fair, transparent, and accountable



### **1.2. Building Models from Data**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_01_02.jpg?v=1769975380" width="250">



>* Prepare and clean data into useful features
>* Choose suitable models, balancing performance and interpretability

>* Model training adjusts parameters to match patterns
>* Hyperparameter choices affect fit and generalization

>* Model building is iterative, testing many options
>* Aim for models balancing accuracy, robustness, fairness



In [None]:
#@title Python Code - Building Models from Data

# This script shows building simple models from data.
# We use tiny synthetic data for clarity.
# Focus is on training and evaluating responsibly.

# import required built in and numerical libraries.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

# set deterministic random seed for reproducibility.
np.random.seed(42)

# create tiny synthetic dataset for a regression task.
size = 30
x = np.linspace(0, 10, size)

# generate target values with a simple linear pattern.
noise = np.random.normal(loc=0.0, scale=1.0, size=size)
y = 2.0 * x + 3.0 + noise

# place data into a pandas dataframe structure.
data = pd.DataFrame({"feature_x": x, "target_y": y})

# check dataset shape to ensure expected small size.
rows, cols = data.shape
assert rows == size and cols == 2

# split data into training and testing subsets.
train_fraction = 0.7
train_size = int(train_fraction * size)

# use simple index based splitting for clarity.
train_data = data.iloc[:train_size].copy()
test_data = data.iloc[train_size:].copy()

# separate features and targets for training and testing.
X_train = train_data["feature_x"].values
y_train = train_data["target_y"].values

# reshape features to column vectors for matrix operations.
X_train_matrix = np.vstack([np.ones_like(X_train), X_train]).T

# compute linear regression parameters using normal equation.
XtX = X_train_matrix.T @ X_train_matrix
Xty = X_train_matrix.T @ y_train

# solve for parameters with small regularization for stability.
lambda_identity = 1e-6 * np.eye(XtX.shape[0])
params = np.linalg.solve(XtX + lambda_identity, Xty)

# extract intercept and slope from learned parameters.
intercept, slope = params[0], params[1]

# prepare testing features and targets for evaluation.
X_test = test_data["feature_x"].values
y_test = test_data["target_y"].values

# reshape testing features for prediction matrix multiplication.
X_test_matrix = np.vstack([np.ones_like(X_test), X_test]).T

# generate predictions on testing data using learned model.
y_pred = X_test_matrix @ params

# compute mean squared error as evaluation metric.
errors = y_test - y_pred
mse = float(np.mean(errors ** 2))

# print concise summary of lifecycle style steps.
print("Data rows and columns:", rows, cols)
print("Training size and testing size:", train_size, size - train_size)
print("Learned intercept and slope:", round(intercept, 2), round(slope, 2))
print("Test mean squared error:", round(mse, 3))

# create scatter plot of data and learned regression line.
plt.figure(figsize=(6, 4))
plt.scatter(data["feature_x"], data["target_y"], label="data points")

# create line values using learned model parameters.
line_x = np.linspace(0, 10, 50)
line_y = intercept + slope * line_x

# plot learned regression line for visual comparison.
plt.plot(line_x, line_y, color="red", label="learned model")

# add labels and legend to explain the visualization.
plt.xlabel("feature_x value")
plt.ylabel("target_y value")

# add title connecting plot to project lifecycle stage.
plt.title("Building a simple model from data")
plt.legend()

# display the final plot to complete this teaching example.
plt.show()




### **1.3. Model Evaluation Choices**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_01_03.jpg?v=1769975425" width="250">



>* Define good performance for the specific context
>* Choose metrics based on acceptable and harmful errors

>* Split data into training and unseen test sets
>* Testing on unseen data prevents overoptimistic performance

>* Compare models, balancing accuracy, simplicity, and transparency
>* Include fairness, robustness, and real-world trade-offs



In [None]:
#@title Python Code - Model Evaluation Choices

# This script illustrates simple model evaluation choices.
# We compare two tiny models on a toy dataset.
# Focus on accuracy versus different error types.

# No extra installs are required for this script.
# Required libraries are available in this environment.

# Import numpy for simple numeric operations.
import numpy as np

# Set a deterministic random seed for reproducibility.
np.random.seed(42)

# Create a tiny synthetic binary classification dataset.
features = np.array([
    [0.1, 0.2],
    [0.2, 0.1],
    [0.8, 0.9],
    [0.9, 0.8],

    [0.15, 0.18],
    [0.82, 0.88],
    [0.4, 0.3],
    [0.7, 0.6],
])

# Create labels where 1 means positive class and 0 negative.
labels = np.array([0, 0, 1, 1, 0, 1, 0, 1])

# Validate shapes before continuing to avoid mistakes.
assert features.shape[0] == labels.shape[0]

# Define a simple function to split train and test.
def train_test_split_indices(n_samples, test_fraction):
    # Compute test size using fraction and integer rounding.
    test_size = max(1, int(n_samples * test_fraction))

    # Create shuffled indices deterministically using permutation.
    indices = np.random.permutation(n_samples)

    # Split indices into train and test subsets.
    test_idx = indices[:test_size]
    train_idx = indices[test_size:]

    return train_idx, test_idx

# Get train and test indices using the helper function.
train_idx, test_idx = train_test_split_indices(len(labels), 0.25)

# Create train and test sets using the computed indices.
X_train, y_train = features[train_idx], labels[train_idx]

# Create held out test features and labels for evaluation.
X_test, y_test = features[test_idx], labels[test_idx]

# Define a simple threshold based model using feature sums.
def predict_threshold(data, threshold):
    # Compute scores as sum of feature values per row.
    scores = data.sum(axis=1)

    # Predict positive class when score exceeds threshold.
    predictions = (scores >= threshold).astype(int)

    return predictions

# Define a helper to compute confusion counts.
def confusion_counts(y_true, y_pred):
    # True positives are predicted one and actually one.
    tp = int(((y_true == 1) & (y_pred == 1)).sum())

    # False positives are predicted one but actually zero.
    fp = int(((y_true == 0) & (y_pred == 1)).sum())

    # False negatives are predicted zero but actually one.
    fn = int(((y_true == 1) & (y_pred == 0)).sum())

    # True negatives are predicted zero and actually zero.
    tn = int(((y_true == 0) & (y_pred == 0)).sum())

    return tp, fp, fn, tn

# Define a helper to compute accuracy safely.
def accuracy_score(y_true, y_pred):
    # Ensure shapes match before computing accuracy.
    assert y_true.shape == y_pred.shape

    # Compute proportion of correct predictions overall.
    correct = (y_true == y_pred).sum()

    return float(correct) / float(len(y_true))

# Model A prefers catching positives using lower threshold.
model_a_threshold = 1.0

# Model B prefers avoiding false alarms using higher threshold.
model_b_threshold = 1.5

# Get predictions for both models on the test set.
y_pred_a = predict_threshold(X_test, model_a_threshold)

# Compute predictions for the second model on test data.
y_pred_b = predict_threshold(X_test, model_b_threshold)

# Compute confusion counts for both models.
metrics_a = confusion_counts(y_test, y_pred_a)

# Compute confusion counts for the second model.
metrics_b = confusion_counts(y_test, y_pred_b)

# Compute accuracy values for both models.
acc_a = accuracy_score(y_test, y_pred_a)

# Compute accuracy for the second model on test data.
acc_b = accuracy_score(y_test, y_pred_b)

# Unpack confusion counts for readability in printing.
tp_a, fp_a, fn_a, tn_a = metrics_a

# Unpack confusion counts for the second model.
tp_b, fp_b, fn_b, tn_b = metrics_b

# Print a short header explaining the comparison.
print("Comparing two simple models on held out test data.")

# Print accuracy for both models using formatted strings.
print("Model A accuracy prioritizing recall:", round(acc_a, 3))

# Print accuracy for the second model with different threshold.
print("Model B accuracy prioritizing precision:", round(acc_b, 3))

# Print confusion details for model A showing error types.
print("Model A TP FP FN TN:", tp_a, fp_a, fn_a, tn_a)

# Print confusion details for model B showing error types.
print("Model B TP FP FN TN:", tp_b, fp_b, fn_b, tn_b)

# Final line prints a reminder about evaluation choices.
print("Choose thresholds based on which mistakes matter more in context.")




## **2. Linking Core Concepts**

### **2.1. Connecting Features And Targets**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_02_01.jpg?v=1769975482" width="250">



>* Features are inputs; targets are desired outcomes
>* Choosing them well underpins safe, reliable projects

>* Define the decision, then choose a target
>* Select reliable features that reflect real goals

>* Features and targets embed historical values and biases
>* Rechecking them ensures fair, ethical model decisions



### **2.2. Training and Evaluation**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_02_02.jpg?v=1769975493" width="250">



>* Training combines data splits, loss, optimization, generalization
>* Model iteratively adjusts parameters to reduce prediction errors

>* Hold out data to test generalization performance
>* Use metrics and splits to spot overfitting, failures

>* Balance bias and variance using training, evaluation
>* Check fairness so models meet ethical standards



In [None]:
#@title Python Code - Training and Evaluation

# This script illustrates simple training and evaluation stages.
# We use tiny synthetic data for a regression example.
# Focus on connecting lifecycle ideas to earlier concepts.

# Required third party libraries would be installed like this.
# pip install numpy.

# Import numpy for numeric arrays and simple math.
import numpy as np

# Set deterministic random seed for reproducible tiny dataset.
np.random.seed(42)

# Create small synthetic feature data and target values.
X_all = np.linspace(0.0, 10.0, 20).reshape(-1, 1)

y_all = 2.0 * X_all.reshape(-1) + 1.0 + np.random.normal(
    loc=0.0,
    scale=1.0,
    size=X_all.shape[0],
)

# Validate shapes before splitting into training and test sets.
assert X_all.shape[0] == y_all.shape[0]

# Define simple index based split for training and test sets.
split_index = int(0.7 * X_all.shape[0])

# Slice arrays to obtain training subset for model fitting.
X_train = X_all[:split_index]

y_train = y_all[:split_index]

# Slice arrays to obtain test subset for later evaluation.
X_test = X_all[split_index:]

y_test = y_all[split_index:]

# Initialize linear model parameters for slope and intercept.
weight = 0.0

# Initialize bias term representing intercept of linear model.
bias = 0.0

# Set learning rate and training epochs for gradient descent.
learning_rate = 0.01

epochs = 200

# Define helper function to compute mean squared error loss.
def mean_squared_error(y_true, y_pred):
    # Compute squared differences and average them for loss.
    return float(np.mean((y_true - y_pred) ** 2))


# Training loop performs gradient descent on training data.
for epoch in range(epochs):
    # Compute current predictions using linear model parameters.
    y_pred_train = weight * X_train.reshape(-1) + bias

    # Compute gradients of loss with respect to parameters.
    error = y_pred_train - y_train

    # Gradient for weight uses feature values and error term.
    grad_w = float(2.0 * np.mean(error * X_train.reshape(-1)))

    # Gradient for bias uses average error across training examples.
    grad_b = float(2.0 * np.mean(error))

    # Update parameters by stepping opposite gradient direction.
    weight -= learning_rate * grad_w

    # Update bias similarly to reduce training loss gradually.
    bias -= learning_rate * grad_b

# After training compute predictions on both training and test sets.
y_pred_train_final = weight * X_train.reshape(-1) + bias

y_pred_test_final = weight * X_test.reshape(-1) + bias

# Compute mean squared error on training data for comparison.
train_mse = mean_squared_error(y_train, y_pred_train_final)

# Compute mean squared error on test data for generalization.
test_mse = mean_squared_error(y_test, y_pred_test_final)

# Print framework information and learned parameters succinctly.
print("Numpy version and learned parameters:", np.__version__, weight, bias)

# Print training and test losses to compare fit and generalization.
print("Training MSE and test MSE values:", round(train_mse, 3), round(test_mse, 3))

# Print short interpretation linking training and evaluation concepts.
print("Lower training loss shows fitting, test loss shows generalization performance.")



### **2.3. Better Data Better Models**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_02_03.jpg?v=1769975533" width="250">



>* Model quality depends on clean, representative data
>* Data collection choices shape accuracy and fairness

>* Data quality needs careful measurement and representation choices
>* Biased training data encodes discrimination into models

>* Unbalanced data can hide real fairness problems
>* Ongoing, reflective data work builds trustworthy models



## **3. Monitoring Deployed Models**

### **3.1. Adapting To Data Shifts**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_03_01.jpg?v=1769975547" width="250">



>* Real-world data changes, causing data shifts
>* Unmonitored models use outdated patterns, harming decisions

>* User groups and contexts change, causing data shifts
>* Unupdated models become inaccurate, unfair, and unreliable

>* Plan strategies to detect and handle data change
>* Update, validate, and review models to stay ethical



### **3.2. Spotting Performance Drift**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_03_02.jpg?v=1769975559" width="250">



>* Performance drift is gradual loss of model reliability
>* Monitoring drift prevents hidden harm and lost trust

>* Compare current model performance to original baseline
>* Track metrics over time to catch emerging problems

>* Drift can change errors, fairness, and impact
>* Monitor many metrics to catch subtle misalignment



### **3.3. Scheduled Model Checkups**

<img src="https://cdn.jsdelivr.net/gh/mhrafiei/contents@main/LFF/Machine Learning for Beginners/Module_10/Lecture_B/image_03_03.jpg?v=1769975569" width="250">



>* Plan regular reviews of deployed modelsâ€™ behavior
>* Routine checks catch hidden issues before serious harm

>* Plan fixed metrics, datasets, and review questions
>* Use consistent checkups to spot trends and disparities

>* Regular checkups trigger updates, retraining, and documentation
>* They align model changes with goals and users



# <font color="#418FDE" size="6.5" uppercase>**From Idea To Use**</font>


In this lecture, you learned to:
- Outline the main stages of a simple machine learning project lifecycle. 
- Relate each stage of the lifecycle to concepts learned earlier in the course. 
- Explain why ongoing monitoring and revision are important after a model is first deployed. 

<font color='yellow'>Congratulations on completing this course!</font>