# MBAI 448 | Week 4 Assignment: Deep Learning

##### Assignment Overview

This assignment explores how deep learning can be applied to a real-world quality control problem in manufacturing. It is organized into three Acts:

- Act I: Understand the problem and context
- Act II: Prototype a solution with AI technology
- Act III: Socialize the work with stakeholders

##### Assignment Tools

This assignment assumes you will be working with GitHub Copilot in VS Code or Google Colab, and will require you to submit your chat history along with this notebook. If you are curious about how to work effectively with GitHub Copilot, please consult the [VS Code documentation](https://code.visualstudio.com/docs/copilot/overview).

Submissions that demonstrate thoughtless interaction with Copilot (e.g., asking Copilot to just read the notebook and produce all the outputs) will receive reduced credit.

## Business Goal / Case Statement
Lower costs and improve efficiency by automatically identifying defective parts in production.

## Assignment Context

**Relevant Industry and/or Business Function:** Manufacturing

**Description:** TUV Limited produces mass market industrial components and offers precision machining services. The company wants to maintain historical levels of production despite facing pandemic-related staffing challenges. Your boss, the Chief Operating Officer, wants to explore automating aspects of the quality control process involving the flagging of defective or substandard products.

## The Data

**Data Location:** `'./images.zip/...'`

### Act 1: Understand the problem and context

#### Step 0: Scope the work in `agents.md`

Before moving forward, create a file named `agents.md` in the project root directory (likely the same level of the directory in which this notebook lives). This file specifies the intended role of AI in this project and serves as reference context for GitHub Copilot as you work.

Your `agents.md` must include the following five sections:

##### 1. What we're building
A one-sentence "elevator pitch" describing the prototype and its primary output (e.g., "An automated defect detection system that classifies manufacturing components as defective or non-defective using deep learning.").

##### 2. How AI helps solve the business problem
2–4 bullet points explaining the specific value-add of the AI components. Focus on the transition from the business "pain point" to the AI "solution."

##### 3. Key file locations and data structure
List the paths that matter (e.g., `./mbai448_week04_assignment.ipynb`, `./images.zip/...`).

##### 4. High-level execution plan
A step-by-step outline of the build process (e.g., 1. Data loading and inspection, 2. Pretrained model evaluation, 3. Model interpretation, 4. Fine-tuning, 5. Performance evaluation). Feel free to ask Copilot for help (or take a peek at the steps in Act II below) for a sense of structuring the work.

##### 5. Code conventions and constraints
To ensure the prototype remains manageable, add 1-2 bullet points specifying that code be as simple and straightforward as possible, using standard libraries unless instructed otherwise.

### Act 2: Prototype a solution with AI technology

## Prototyping a Deep Learning Classifier for Quality Assurance

In this act, you will prototype an image classification system using a pretrained deep learning model. The goal is not to build a production system, but to understand how such a model behaves when adapted to a specific task.

Throughout this act, use GitHub Copilot as a development assistant, following a disciplined loop in every step:

- **Plan**: Have Copilot draft a clear, plain-language plan describing what needs to happen and in what order.
- **Validate**: Review and refine that plan to ensure it does exactly what the step requires—no more, no less.
- **Execute**: Have Copilot implement the validated plan in code.
- **Check**: Perform one or two concrete actions that confirm the code worked and that you understand the result.

This is exploratory prototyping. The goal is to remain in contact with the system's real behavior at all times.

---

#### Environment Setup

To run this notebook in Google Colab, you'll need to connect to Google Drive and install the required packages.

If running locally in VS Code, you may want to create and activate a Python virtual environment.

##### On MacOS/Linux:
```
python -m venv venv
source venv/bin/activate
```

##### On Windows:
```
python -m venv venv
venv\Scripts\activate
```

Once your virtual environment is activated, you can set it as the kernel for this notebook in the top right corner of your notebook pane.

---

## Step 1: Load and inspect the image data

Before using any model, you need to understand what data you are working with and how it is organized.

### Plan
Have Copilot create a plan to:
- load the image dataset from disk
- split it into training, validation, and test sets
- display class labels and counts
- show a small sample of labeled images

### Validate
Ensure the plan:
- makes dataset structure visible rather than implicit
- clearly distinguishes training, validation, and test data
- allows you to see actual image–label pairs

### Execute
Implement the validated plan in code.

### Check
- Print the number of images in each split.
- Display a small grid of images with their labels.

**Food for thought:** What kinds of variation (lighting, angle, background) are present in this data?

In [None]:
# write Step 1 code below



---

## Step 2: Load a pretrained image classifier and inspect what it outputs

Before applying a model, you should understand what kind of predictions it is designed to produce.

### Plan
Have Copilot create a plan to:
- load a pretrained ResNet image classifier (you can find one here https://docs.pytorch.org/vision/stable/models.html)
- print or summarize its structure in a readable way
- identify how many output categories it predicts
- clarify what those categories represent

### Validate
Ensure the plan:
- makes it clear the model predicts ImageNet categories
- identifies the final prediction layer in plain language
- produces visible output you can reference later

### Execute
Implement the validated plan in code.

### Check
- Print the model summary.
- Confirm the number and meaning of the output classes.

**Food for thought:** Why might a pretrained model be useful even when its labels don't match your task?

In [None]:
# write Step 2 code below



---

## Step 3: Observe how the pretrained model behaves on your images

This establishes baseline behavior before any task-specific adaptation.

### Plan
Have Copilot create a plan to:
- run inference on a small set of your images
- display each image with its top predicted class names
- show confidence or probability scores

### Validate
Ensure the plan:
- prints readable class names, not numeric IDs
- does not train or modify the model
- makes prediction confidence visible

### Execute
Run the inference code.

### Check
- Inspect predictions for several images.
- Confirm you can explain what the model is calling each image.

**Food for thought:** Why does the model think your images are animals?

In [None]:
# write Step 3 code below



---

## Step 4: Inspect what the pretrained model is paying attention to

Predictions alone don't explain *why* the model behaves as it does.

### Plan
Have Copilot create a plan to:
- apply a model interpretation method to the pretrained classifier
- generate visual explanations for selected predictions (the captum Python library should work)
- overlay those explanations on the original images

### Validate
Ensure the plan:
- produces visual outputs you can inspect
- applies interpretation to specific examples
- includes both a reasonable and a surprising prediction

### Execute
Generate interpretation visualizations.

### Check
- Display at least two interpretation examples.
- Confirm that highlighted regions are visible and meaningful.

**Food for thought:** What does this suggest about how the model "sees" defects?

In [None]:
# write Step 4 code below



---

## Step 5: Adapt the model to your task (fine-tuning)

Now you modify the model so it learns to distinguish defective from non-defective parts.

### Plan
Have Copilot create a plan to:
- replace the model's final prediction layer with task-specific labels
- train the model on your labeled images
- track training and validation behavior

### Validate
Ensure the plan:
- uses your task labels rather than ImageNet labels
- updates model weights using your dataset
- makes training progress observable

### Execute
Run the fine-tuning process.

### Check
- Confirm training completes successfully.
- Plot or print training and validation metrics.

**Food for thought:** What information is the model learning that it did not have before?

In [None]:
# write Step 5 code below



---

## Step 6: Evaluate and compare performance before and after fine-tuning

You need evidence that adaptation changed behavior in meaningful ways.

### Plan
Have Copilot create a plan to:
- evaluate the fine-tuned model on test data
- generate a confusion matrix
- compare results to the untuned model

### Validate
Ensure the plan:
- uses the same evaluation data for both models
- makes different error types visible
- avoids mixing training and test data

### Execute
Run the evaluation and comparison.

### Check
- Produce confusion matrices for both models.
- Identify one improvement and one remaining weakness.

**Food for thought:** Which kind of error would be most costly in production?

In [None]:
# write Step 6 code below



---

## Step 7: Re-inspect model behavior after adaptation

Adaptation can change what the model attends to—not just accuracy.

### Plan
Have Copilot create a plan to:
- reapply the same interpretation method to the fine-tuned model
- use the same example images as before
- present before-and-after comparisons

### Validate
Ensure the plan:
- reuses the same interpretation technique
- compares like with like
- produces clearly comparable outputs

### Execute
Generate post-fine-tuning interpretation results.

### Check
- Display at least one before-and-after comparison.
- Confirm whether attention shifted toward defect-relevant regions.

**Food for thought:** If this model drifts over time, what would alert you first?

In [None]:
# write Step 7 code below



---

## End of Act II

At this point, you should have direct evidence of how a pretrained model behaves, how fine-tuning changes that behavior, and what tradeoffs remain. Use these observations to inform Act III discussions with stakeholders.

Before moving on to Act III, create a file named `README.md` in the project root.

This README should capture the current state of the prototype as if you were handing it off to a colleague. Keep it concise and grounded in what actually exists.

### 1. What this prototype does
In one sentence, clearly describe the capability that was built and the problem it is intended to address.

### 2. How it works (at a high level)
In a few bullet points, specify:
- what data the system operates over,
- what representation or model it uses,
- how results are produced.

### 3. Limitations and open questions
Briefly note:
- the most important limitations you observed or conceive of, and
- any open questions that would need to be addressed before broader use.

This README will be used as reference context in Act 3.

---

## Act 3 — Socialize the Work

You have built and evaluated a working prototype for automated defect detection. Now you need to think about what it would mean to use this system in practice.

In this act, you will have conversations with three colleagues who each engage with the system from a different professional perspective. Each one surfaces a distinct set of pressures that emerge when a deep learning model is introduced into a real manufacturing environment.

Your goal is not to convince them that the model is "good," but to reckon with how its behavior intersects with human judgment, organizational responsibility, and operational reality.

---

### Colleague Perspectives

You will speak with:

- A **Manufacturing Operations Manager** focused on how automated defect detection changes day-to-day workflows on the factory floor — including when operators trust the system, when they override it, and how errors affect throughput and staffing.

- A **Quality & Compliance Lead** focused on accountability, auditability, and risk — including what happens when defective parts slip through, how decisions are justified after the fact, and whether the system can be relied on in regulated or safety-critical contexts.

- A **Production Economics Manager** focused on efficiency and cost — including the tradeoffs between false positives and false negatives, how the system affects production velocity, and whether gains in automation introduce new forms of friction elsewhere in the process.

---

### How to Approach These Conversations

Each conversation should feel like a real internal discussion about a live prototype.

In these interactions, you should be prepared to:
- explain how the model behaves in concrete terms,
- reference evidence from your prototype (e.g., confusion matrices, example predictions, interpretation outputs),
- articulate tradeoffs clearly in plain, cross-functional language,
- and acknowledge uncertainty where it exists.

These colleagues are not trying to block the work — but they are responsible for understanding its implications within their domains.

When a colleague has enough information to understand the risks, assumptions, and consequences involved, the conversation will naturally come to a close.

---

### Submission

- Save the Notebook you have been working in and other files you created in your repo (i.e., `agents.md`, `README.md`, etc).
- Export your Copilot Chat and save as a `.txt`, `.json`, or `.md` in the same directory as the above.
- Stop / shut down the Google Colab session in which the Notebook was running.
- **Upload your completed Notebook to the [Canvas page for Assignment 4](https://canvas.northwestern.edu/courses/245397/assignments/1668983).**