**Jacob Petty, Sardor Nodirov, and Saad Khan**

Spring 2026

CS 443: Bio-inspired Machine Learning

Project 1: Hebbian Learning

#### Week 4: Decoding class labels from Competitive Hebbian Network activations

You will use single layer artificial neural networks to **decode** (i.e. predict) the class label corresponding to each MNIST and CIFAR-10 sample **encoded** (i.e. processed) by your Competitive Hebbian Network. This will take advantage of the Competitive Hebbian Network weights that you saved off last week. Once you obtain the Competitive Hebbian Network activations, you will predict the class labels and compute the classification accuracy for each dataset obtained by this **encoder-decoder** neural network architecture.

In [None]:
import time
import numpy as np
import matplotlib.pyplot as plt
import tensorflow as tf

plt.style.use(['seaborn-v0_8-colorblind', 'seaborn-v0_8-darkgrid'])
plt.show()
plt.rcParams.update({'font.size': 18})

np.set_printoptions(suppress=True, precision=3)

%load_ext autoreload
%autoreload 2

## Task 9: Decode class labels from Hebbian network activations

The goal of this task is to train your linear and nonlinear decoder networks to classify MNIST digits (then later on CIFAR-10 classes) based on the Hebbian activations to each sample.

In [None]:
from image_datasets import get_dataset, train_val_split
from decoder_nets import LinearDecoder, NonlinearDecoder
from hebb_net import HebbNet

### 9a. Preparing decoder inputs

In the cell below:
1. Load MNIST in train/test/validation samples. Use the default validation split.
2. Process them with your Competitive Hebbian Network (i.e. compute their corresponding netIn values) to get the input for your decoders.

**Tips:**
- Your Hebbian network constructor has a keyword argument that you can use to load wts from a previously trained network. You should **NOT** retrain your Hebbian network here!
- When creating your Hebbian network object, remember to build it with the same hyperparameters as you did last week (e.g. number of neurons, `k` value).

### 9b. Train linear decoder (MNIST)

Train your softmax classifier on the Hebbian network `net_in` activations obtained from processing the MNIST training set. Keep default hyperparameters except:
1. Play around with the mini-batch size. Try starting with `256` and adjust as needed.
2. Feel free to change the `patience` depending on your patience. A patience around `3-7` should produce good results.
3. Try a learning rate decay patience of `3-5` with max decays set to `3-4`. Adjust as needed.

#### Guidelines

- **DO NOT COPY AND PASTE HYPERPARAMETERS FROM THE IRIS TEST CODE CELLS!** They will not work well for MNIST ðŸ™„
- Training should be fairly quick (no more than a few minutes).
- **Remember:** you are **NOT** training the linear classifier **on MNIST** â€” you are training it on the `net_in` values produced by the Competitive Hebbian Network that you trained above!

Your results should be comparable to that of the linear decoder trained on the MNIST samples directly.

### 9c. Analyze Linear Decoder performance

Create a well-labeled plot showing the training and validation loss over epochs. Place the test accuracy of the linear classifier in your plot title.

**Note:** Because the initial training and validation losses are so large compared to the final values, plotting all the values may obscure details about the eventual loss curve. I would suggest trimming out the training and validation loss values for the first few epochs to highlight the long-term trend.

### 9d. Train nonlinear decoder (MNIST)

Repeat what you did for the linear classifier with the nonlinear classifier.

**Note there is one additional step:** Once you get the Hebbian network `net_in` values for the train/validation/test sets, the nonlinear decoding network proposed by Krotov & Hopfield (2019) assumes that the Hebbian network `net_in` values ($h_{ij}$) that serve as the input to the decoder are transformed by the following activation function:

$$x_{ij} = max(h_{ij}, 0)^n$$

where $h_{ij}$ are the Hebbian network `net_in` values. In other words, apply ReLU to the `net_in` values then raise the result to the power `n`. By default, we assume that the hyperparameter $n=4.0$.

You may implement this preprocessing step in the `preprocess_nonlinear` function in `image_datasets.py`. **This additional ReLU step needs to be performed on the `net_in` values representing each of the decoder train, validation, AND test sets!!**

**For full credit** your goal is to have your encoder-decoder system achieve either validation or test accuracy â‰¥ 97.5%. If this goal is not met, there will be point reductions, depending on how far below your system is from this target.

Here are suggested non-default hyperparameter values:
- patience: 5-9
- learning rate patience: below the regular patience, 3-5
- maximum learning rate decays: 3-4
- loss exponent: the default should do well, though you may be able to do a bit better with an exponent of `5.0`.

**Notes, reminders, and guidelines:**
- Remember that the nonlinear decoder uses the $L^p$ loss! 
- Training should take longer than your linear decoder, but not a lot longer (rough estimate 10-20 minutes).

In [None]:
from image_datasets import preprocess_nonlinear

### 9e: Analyze Nonlinear Decoder performance (MNIST)

Create a well-labeled plot showing the training and validation loss over training epochs. Place the accuracy of the nonlinear classifier on the test set in your plot title.

### 9f. Train linear decoder (CIFAR-10)

Train the linear decoder on the `net_in` Hebbian network values obtained to the CIFAR-10 dataset to decode the class labels. Repeat your data loading and training protocols.

**Notes:**
- If you configured your Hebbian network as suggested for CIFAR-10, with 2nd place neurons being inhibited (`k=2`), set `k=2` below when creating and loading your Hebbian network below.
- The same hyperparameters you used to decode MNIST should work fine here too.

Make a well-labeled plot showing the training and validation loss over the course of training. Place the test set accuracy in the plot title.

### 9g. Train nonlinear decoder (CIFAR-10)

Train the nonlinear decoder on the `net_in` Hebbian network values obtained to the CIFAR-10 dataset to decode the class labels. Repeat your data loading and training protocols.

If you configured your Hebbian network as suggested for CIFAR-10, with 2nd place neurons being inhibited (`k=2`), set `k=2` below when creating and loading your Hebbian network below.

**Suggested non-default hyperparameters:**
- Power `n` to raise Hebbian activations when applying ReLU: `2.0`
- Loss exponent: `4.0`
- patience: 5-9
- learning rate patience: below the regular patience, 3-5
- maximum learning rate decays: 3-4

**For full credit** your goal is to have your encoder-decoder system achieve either validation or test accuracy â‰¥ 47%. If this goal is not met, there will be point reductions, depending on how far below your system is from this target.

Make a well-labeled plot showing the training and validation loss over the course of training. Place the test set accuracy in the plot title.

### 9h. Questions

**Question 14:** Describe one specific case (*linear or nonlinear decoder paired with MNIST or CIFAR-10*)where the learning rate decay made a substantial difference in the validation accuracy that your encoder-decoder system achieved. What results would you have achieved with and without this technique? *You should be able to answer this question based on your training print outs; you do NOT need to retrain your networks to answer this question.*

**Answer 14:** YOUR ANSWER HERE

## Extensions

### General guidelines

1. Never integrate extensions into your base project so that they change the expected behavior of core functions. If your extension changes the core design/behavior, no problem, duplicate your working base project and add features from there.
2. Check the rubric to keep in mind how extensions on this project will be graded.
3. While I may consult your code and "written log" of what you did, **I am grading your extensions based on what you present in your 3-5 min video.**
3. I suggest documenting your explorations in a "log" or "lab notebook" style (i.e. documenting your thought/progression/discovery/learning process). I'm not grading your writing, so you can keep it succinct. **Whatever is most useful to you to remember what you did.** 
4. I suggest taking a hypothesis driven approach. For example "I was curious about X so I explored Y. I found Z, which was not what I expected because..., so then tried A..."
5. Make plots to help showcase your results.
6. **More is not necessarily better.** Generally, a small number of "in-depth" extensions count for more than many "shallow" extensions.

### AI guidelines

You may use AI in mostly any capacity for extensions. However, keep in mind:
1. There is no need to use AI at all!
2. You are welcome to use AI as a tool (e.g. automate something that is tedious, help you get unstuck, etc.). However, you should be coding, you should be thinking, you should be writing, you should be creating. If you are spending most (or even close to most) of your time typing into a chatbot and copy-pasting, you have probably gone too far with AI use.
3. I don't find large volumes of AI generated code/text/plots to be particularly impressive and you risk losing my interest while grading. Remember: I'm grading your extensions based on your video presentation. **More is not necessarily better.**

### Video guidelines

1. Please try to keep your video to 5 minutes (*I have other projects to grade!*). If you turn in a longer video, I make no promise that I will watch more than 5 minutes.
2. Your screen should be shared as you show me what you did. A live video of your face should also appear somewhere on the screen (e.g. picture-in-picture overlay / split screen).
3. Your partner should join you for the video and take turns talking, but, if necessary, it is fine to have one team member present during the record the video.
4. Do not simply read text from your notebook, do not read from a prepared script. I am not grading how polished your video presentation is (see extension grading criteria on rubric). 
5. I am looking for original and creative explorations sparked by your curiosity/interest/passion in a topic. This should be apparent in your video.
6. Be natural,, don't feel the need to impress me with fancy language. If it is helpful, imagine that we are talking one-on-one about your extension. Tell me what you did :)

### Extension ideas

#### 1. Compare encoder-decoder model to end-to-end training

Compare your results from this project to other end-to-end artificial neural networks trained directly on MNIST or CIFAR-10 (e.g. MLPs, CNNs). I would suggest keeping hyperparameters constant for a fair comparison. There is a lot to explore here! Here are a few questions to examine:
- Are there differences in how rapidly the systems learn their inputs (e.g. number of training epochs needed to achieve "good" accuracy on the validation set)?
- What test accuracy is achievable?
- How does the total training time compare?

In your analysis, account for the complexity/number of parameters in each system.

#### 2. Hyperparameter explorations

Explore how the hyperparameters affect classification accuracy. Can you improve upon the results from the base project?

- Remember, the encoder has numerous hyperparameters to experiment with. For example, remember that you can also control the dimension of the "embedding" performed by the Hebbian network (i.e. number of neurons in the net), the amount of inhibition, etc. 
- Use a grid or random search for encoder and/decoder networks to optimize performance.

#### 3. Use your CS 343 Softmax network as the linear decoder

This will require a few updates to support the Adam optimizer (that you implemented in the CS 343 CNN project) and validation sets.

Copy `softmax_layer.py` from your CS343 MLP project to your working directory. Also copy `optimizer.py` from your CS343 CNN project.

Make the following changes to `fit()` in `softmax_layer.py`:
1. Switch your optimizer from SGD to Adam. This will involve creating two `Adam` objects: one for the weights, one for the bias. Also, be sure to set the Adam learning rate based on the value passed into `fit()`.
2. Add support in `fit()` for a validation set by adding the keyword arguments: `x_val=None, y_val=None`. If `verbose > 0` print out the accuracy and loss over the entire validation set. 
3. If `verbose > 0` convert your print outs to happen in terms of epochs rather than iterations (e.g. every epoch, not every 100 iterations). Add a keyword argument `val_freq=50` to specify how often (in epochs) to check and print out the validation accuracy and loss. Be sure to always print out the validation accuracy and loss on the first and last epoch regardless of the `val_freq` value.
4. Have `fit()` return both the train and validation loss as Python lists or ndarrays. In cases when you do not pass in a validation set, the returned validation loss list may be `None` and that's ok.

The network should train similarily to your Tensorflow version. Compare/analyze runtime performance.

#### 4. Encode an image dataset of your choice with the Hebbian network

For example, Fashion MNIST, STL-10 or CIFAR-100. If your images contain color, I suggest either converting to grayscale or flattening the color channels when constructing your feature vectors (e.g. `(32, 32, 3)` color image made into a `(3072,)` vector).

Some areas to explore:
- Visualize the weights. Analyze how hyperparameters affect the structure.
- Compare decoding accuracy


#### 5. Learning rate decay alternatives

Krotov & Hopfield (2019) not only used step decays, similar to the approach taken in the project, but in some cases also linear decays, where the linear rate decayed by a fixed amount on every epoch. There are many other ways to the decay learning rate. Implement the linear decay or your own variant (for either encoder and/or decoder network) and explore whether it improves decoding performance.

#### 6. Early stopping alternatives

There are many other ways to implement early stopping. For example, you could abort training when the current val loss exceeds the recent moving average. Implement this or your own variant (for either encoder and/or decoder network) and explore whether it improves decoding performance.

#### 7. Confusion matrix and error analysis of MNIST classification

For one or both classifier, make a confusion matrix of the digit classifications. Use your confusion matrix to gain insight into misclassifications. Run follow-up analyses/training sessions to explore patterns in more depth. For example, if two classes are frequently misclassified, how neurons in the Hebbian network develop receptive fields that resemble each? Are the weights resembling the two classes strongly correlated (and how?)? To what degree are inhibitory weights learned for these neurons? What happens if you train the Hebbian network on only samples belonging to the two classes â€” do classes of either class become less/more confusable? And so forth...

<!-- ### 8. Implement the Generalized Hebbian Algorithm (GHA) and compare to PCA

The GHA provides an incremental version of PCA â€” compute PCA one sample at a time over a number of training epochs. This approach can be helpful when you want to run PCA on a large dataset, but the dataset is too large to fit in your computer's memory (e.g. perhaps STL-10 at full 96x96 resolution). 

Implement GHA then show for a large dataset (e.g. STL-10) that GHA computes the PCA representation, whereas regular PCA (e.g. from CS251/2) fails. Plot what the image samples look like over training epochs when projected to PCA space and then back to the original data space (i.e. filtered by the learned principle components / network weights). If this sounds interesting, please see me for guidance. -->

#### 8. Experiment with different decoder architectures

Create one or more different nonlinear decoders in TensorFlow (e.g. MLP, CNN). Compare performance/accuracy with the nonlinear one in the project.