<a href="https://colab.research.google.com/github/chefs-kiss/ML_J2026/blob/main/PA2_Evaluation_with_BreastCancer.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

Name:

Who you worked with:

##Objectives
The goals of this project are to:
*   Implement different cross-validation techniques to evaluate model performance
* Audit a model to discuss the ethical considerations in model selection and performance evaluation

##Overview
In this assignment, you will explore the Wisconsin Breast Cancer dataset and focus on key aspects of model evaluation and resampling techniques. You will comment on the complexity of chosen models in a developed workflow, discuss resampling techniques, as well as the implications on various evaluation metrics. You will also be asked to audit the algorithms using the ethical matrix framework discussed in class.

##Schedule
Here is the suggested schedule for working on this project:
*   Over the weekend, read through the project instructions and complete Task 0.
*   By Sunday, 2/23, complete Tasks 1-2 of the project, and start Task 3 of the project.
*   By Tuesday, 2/25, complete Tasks 3-4 of the project, and start Task 5.
*   By Wednesday, 2/26, complete Task 5 and check your solutions against the grading rubric (included at the end of this workbook), and submit your workbook url through moodle.

This project is due on Thursday, 2/27, by 11:59pm.


#Task 0: Breast Cancer Workflow

You will use the Breast Cancer dataset from sklearn.datasets. It contains features of cell nuclei obtained from breast cancer biopsies, and the target variable indicates whether the tumor is malignant or benign.


In [None]:
from sklearn.datasets import load_breast_cancer
import pandas as pd

# Load the dataset
data = load_breast_cancer()
X = pd.DataFrame(data.data, columns=data.feature_names)
y = pd.Series(data.target, name="target")

## ‚úè Question 1: Describe dataset

* Describe the type of data in our dataset.
* What is our target?
* What does our feature set contain?

##‚úè Question 2: The who behind the data
To answer the following questions, you may have to search the internet with search like "wisconsin breast cancer dataset who is in the data" or something similar
* Can you find who curated this dataset?
* Include a url to cite this information.
* Can you find the demographics of the individuals in the dataset?
* In your opinion, why would these types of questions be important to know when dealing with the data?

#Task 1: Model Complexity


Before jumping into evaluation and cross-validation, we're going to start by performing basic preprocessing and setting up three models for comparison: a null model (also called a baseline model), a basic model, and a complex model.

**Null Model**: This will predict the majority class from our target variable.

**Basic Model**: This will be a Logistic Regression classifier.

**Complex Model**:This will be a Random Forest classifier.

In [None]:
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.dummy import DummyClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score, confusion_matrix


X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


scaler = StandardScaler()
X_train = scaler.fit_transform(X_train)
X_test = scaler.transform(X_test)


null_model = DummyClassifier(strategy='most_frequent', random_state=42)
basic_model = LogisticRegression(solver='liblinear', random_state=42)
complex_model = RandomForestClassifier(random_state=42)


null_model.fit(X_train, y_train)
basic_model.fit(X_train, y_train)
complex_model.fit(X_train, y_train)


null_pred = null_model.predict(X_test)
basic_pred = basic_model.predict(X_test)
complex_pred = complex_model.predict(X_test)


print("Null Model Accuracy:", accuracy_score(y_test, null_pred))
print("Basic Model Accuracy:", accuracy_score(y_test, basic_pred))
print("Complex Model Accuracy:", accuracy_score(y_test, complex_pred))


print("Null Model Confusion Matrix:", confusion_matrix(y_test, null_pred))
print("Basic Model Confusion Matrix:", confusion_matrix(y_test, basic_pred))
print("Complex Model Confusion Matrix:", confusion_matrix(y_test, complex_pred))

##üíª Question 3: Comments

* Add comments to the code above that describe what each section of code is doing (to the best of your ability). You may want to consult our workbook on cross-validation (EVL2).

## ‚úè Question 4: Accuracy Discussion

* Compare the accuracy metric for each of the three models. Does increasing model complexity drastically change the accuracy of the models? How well does the null (baseline) model compare to the simple and complex?


#Task 2: Resampling

For this task, we're going to use different cross-validation techniques to evaluate the models' performance more robustly.

* Stratified K-Fold Cross-Validation (to maintain class distribution) using scikit-learn `StratifiedKFold`
* Repeated Cross-Validation (to get more robust performance metrics) using scikit-learn `RepeatedStratifiedKFold`
* Bootstrapping (random sampling with replacement) using scikit-learn `resample`

In [None]:
from sklearn.utils import resample
from sklearn.model_selection import cross_val_score, StratifiedKFold, RepeatedStratifiedKFold
import numpy as np

In [None]:
# Stratified K-Fold Cross-Validation
stratified_cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
stratified_scores = cross_val_score(basic_model, X_train, y_train, cv=stratified_cv)
print("Stratified K-Fold Cross-Validation Scores:", stratified_scores)
print("Mean Accuracy:", np.mean(stratified_scores))

In [None]:
# Repeated Cross-Validation
repeated_cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=3, random_state=42)
repeated_scores = cross_val_score(basic_model, X_train, y_train, cv=repeated_cv)
print("Repeated Cross-Validation Scores:", repeated_scores)
print("Mean Accuracy:", np.mean(repeated_scores))

In [None]:
# Bootstrapping (Resampling)
bootstrap_scores = []
for _ in range(50):
    X_resampled, y_resampled = resample(X_train, y_train, random_state=42)
    basic_model.fit(X_resampled, y_resampled)
    score = basic_model.score(X_test, y_test)
    bootstrap_scores.append(score)

print("Bootstrapping Accuracy (100 resamples):", np.mean(bootstrap_scores))

##üíª Question 5: Comments
This is similar to #3
* Add comments to each of the code chunks above.

##‚úè Question 6: Accuracy Discussion
This is the same question as #4, but now considering the resampled accuracy metrics.
* Describe how to compare the accuracy metric for each of the three models.


## ‚úè Question 7: Continued evaluation

Even if a model has high accuracy, it may not be the best choice for our given situation.

* Describe why this may be.
* What would be a solution for this problem?

#Task 3: Evaluation Metrics

In this task, we will evaluate the performance of the models using different evaluation metrics such as accuracy, precision, recall, and F1 score.

In [None]:
from sklearn.metrics import precision_score, recall_score, f1_score

# Train and evaluate model
models = [(null_model, "null model"), (basic_model, "basic model"), (complex_model, "complex model")]
for model_type in models:
  y_pred = model_type[0].predict(X_test)
  # Evaluate
  print(f"{model_type[1]}: {model_type[0]}")
  print("Accuracy:", round(accuracy_score(y_test, y_pred),3))
  print("Precision:", round(precision_score(y_test, y_pred),3))
  print("Recall:", round(recall_score(y_test, y_pred),3))
  print("F1 Score:", round(f1_score(y_test, y_pred),3), "\n")


##Questions 8-11: Interpret Metrics

For each model, interpret the following evaluation metrics:

##‚úè Question 8: Accuracy

Null model:

Basic model:

Complex model:


##‚úè Question 9: Precision

Null model:

Basic model:

Complex model:

##‚úè Question 10: Recall

Null model:

Basic model:

Complex model:

##‚úè Question 11: Which is the best
Once you have interpreted all of metrics, we‚Äôd like to choose which model is the best given the problem. You‚Äôll need to consider the trade-offs between precision, recall, and accuracy and how that impacts the model‚Äôs suitability for real-world application, especially in a healthcare context.

* Discuss which of the models is the best performing.

#Task 4: Ethical Matrix and Audit

In this task, you will create an ethical matrix to audit the algorithms you have been working with. The goal is to analyze the broader implications of your machine learning model and how it might affect different stakeholders. Following the guidelines outlined in class, your ethical audit should include key questions that explore the potential harms, benefits, fairness, and accountability of the algorithm.

##‚úè Question 12: Define the key stakeholders

In any algorithmic system, there are various stakeholders who will be affected by the decisions made by the model. For example, one stakeholder is the patient being screened for breast cancer.

* Add at least three more stakeholders:

##Questions 13-15: Identify the ethical dimensions

For each stakeholder, consider the potential benefits and harms of using the model. Also, think about the ethical principles of fairness and accountability.

The main issues we've covered in class are listed below:
* **Benefits**: What positive outcomes might each stakeholder experience if the model is deployed?
* **Harms**: What potential negative consequences might arise for each stakeholder?
Could the model cause harm or lead to incorrect conclusions?
* **Fairness**: Is the model equally fair to all stakeholders, especially those from under-represented or vulnerable groups? Does the model avoid reinforcing bias?
* **Accountability**: Who is responsible if the model makes a mistake? What steps should be taken if the model‚Äôs predictions are inaccurate or harmful?

If you decide to consider different issues, make sure to update the text boxes below to account for that. For example, if you want to consider something like medical advances you might replace the issue `fairness` with `medical advances`.

##‚úè Question 13:

Stakeholder1:

Benefits:

Harms:

Fairness:

Accountability:



##‚úè Question 14:

Stakeholder2:

Benefits:

Harms:

Fairness:

Accountability:



##‚úè Question 15:

Stakeholder3:

Benefits:

Harms:

Fairness:

Accountability:



##‚úè Question 16: Mitigating Harm

Based on your matrix, come up with two concrete actions that can be taken to mitigate potential harms.
Example action: we‚Äôd like to ensure the model is interpretable for healthcare providers so they can understand and trust its predictions. This means we‚Äôd need to choose a model that is less complex but still robust enough to give good results.

* First option:


* Second option:

# Task 5: Reflection

Take a moment to reflect on this assignment.

##‚úè Question 17:

What did you like about it? What could be improved? Your answers will not affect your overall grade. This feedback will be used to improve future programming assignments.



# Submission

You will be submitting your code using Moodle. For this project, you will need to submit the url to your colab workbook. Make sure you have shared access to your notebook, and create a link as your submission.

# Grading
For each of the following accomplishments, there is a breakdown of points which total to 20. The fraction of points earned out of 20 will be multiplied by 5 to get your final score (e.g. 17 points earned will be 17/20 * 5 ‚Üí 4.25)
* (1pt) Task0 q1: You have described the information in the dataset, and identified target and feature sets.
* (1pt) Task0 q2: You have sourced information about the dataset
* (1pt) Task0 q2: You have considered why this would be important.
* (2pt) Task1 q3 and Task2 q5: You have added informative comments to the code
* (1pt) Task1 q4 and Task2 q6: You have discussed an alternative to accuracy.
* (1pt) Task2 q7: Two reasonable options have been provided.
* (1pt) Task3 q8: You have correctly interpreted all three accuracy metrics.
* (1pt) Task3 q9: You have correctly interpreted all three precision metrics.
* (1pt) Task3 q10: You have correctly interpreted all three recall metrics.
* (1pt) Task3 q11: You have correctly identified the best model out of the null, basic, and complex models.
* (1pt) Task4 q12: You‚Äôve identified at least three stakeholders.
* (3pt) Task4 q13-15: You‚Äôve filled out the rest of the ethical matrix.
* (1pt) Task4 q13-15: Your ethical matrix is thoughtful.
* (1pt) Task4 q16: You have identified two ways to mitigate harm.
* (1pt) Task4 q16: Your solutions to mitigate harm are thoughtful.
* (1pt) Task5 q17: You have reflected on the assignment.
* (1pt) Task5 q17: Your reflection is thoughtful.
