# Widrow-Hoff learning from Spellman's (1996) example

## Data preparation

We need to import necessary libraries and functions:
- `chain` from `itertools` makes list *flattening* straightforward and easy (i.e., when you have lists in a list)
- `Counter` from `collections` handles elements in a list by making something like a frequency dictionary
- `scipy.sparse` handles sparse matrices (a.k.a., n-hot encoding matrices) easy to handle
- we also make use of
    + `numpy` for all kinds of numeric computing in Python
    + `pandas` for dealing with the **PANel DAta Sets** (essentially spreadsheets) with Python Data Analysis Library

In [1]:
from itertools import chain         # embedded list to chain of lists
from collections import Counter     # count unique items
from scipy.sparse import dok_matrix # sparse logical (binary) matrix
import numpy as np
import pandas as pd

We are ready to read the table (formated as `csv` - comma separated values) and show the first few lines:

In [2]:
tomato = pd.read_csv('spellman.csv')
tomato.head()

Unnamed: 0,Cues,Outcomes
0,pot,NO.TOMATO
1,pot_red_blue,TOMATO
2,pot_red,TOMATO
3,pot_red,NO.TOMATO
4,pot_blue,TOMATO


- rows $\rightarrow$ learning events
- columns $\rightarrow$
    + Cues : input units
    + Outcomes : output units (a.k.a., targets or teachers or criteria)

From the begining of our work on and use of the **Naive Discrimination Learning** (NDL), we developed certain "taste". In that, we had to have some *special characters* - jokers:
- `#`, if we are dealing with the continuous text, we use hashtags to mark the begining and the end of an item (usually a word)
- `_`, separates the items

Thus, from the above table, we can see that in the 2nd learning trial we have all three input units present. We can make sure that Python "understands" that there are three elements (or two, or one, given a learning trial):

In [3]:
tomato = tomato.applymap(lambda x : x.split('_'))
tomato.head()

Unnamed: 0,Cues,Outcomes
0,[pot],[NO.TOMATO]
1,"[pot, red, blue]",[TOMATO]
2,"[pot, red]",[TOMATO]
3,"[pot, red]",[NO.TOMATO]
4,"[pot, blue]",[TOMATO]


In this example we have very few cues, so we can specify them by hand:

In [4]:
all_cues = ['pot', 'red', 'blue']
all_outcomes = ['TOMATO', 'NO.TOMATO']

If you have many cues and/or outcomes, Python can handle them and give you back the unique ones!

[GitHub](https://github.com/striatum/Zagreb-2019.git)
[![Binder](https://mybinder.org/badge_logo.svg)](https://mybinder.org/v2/gh/striatum/Zagreb-2019/master?filepath=Automatic_cue_outcome_handling.ipynb)


Let Python know how many learning events (trials), and how many input units (cues) and output units (outcomes) we have :

In [5]:
num_events = len(tomato.index)
num_cues = len(all_cues)
num_outcomes = len(all_outcomes)

The above table format might be easily readable for us, humans, but learing machines prefer more straightforward formating (where everything is "spelled out"). Thus, many ready-made libraries and packages for Machine Learning (ML) like to use something that is called **n-hot encoding**.

Let's make one and it will be clear what that means...

**[1]** Create two empty matrices, one for cues, one for outcomes:

In [6]:
cue_matrix = dok_matrix((num_events, num_cues), dtype=np.bool)
outcome_matrix = dok_matrix((num_events, num_outcomes), dtype=np.bool)

**[2]** Run through our original table, row-by-row, and "flag" if cue is present or not (1/0) and if outcome is present or not (1/0):

In [7]:
for idx, row in tomato.iterrows():
	for cue in row['Cues']:
		cue_index = all_cues.index(cue)
		cue_matrix[idx, cue_index] = True
	for outcome in row['Outcomes']:
		outcome_index = all_outcomes.index(outcome)
		outcome_matrix[idx, outcome_index] = True

**[3]** This step is not necessary but it will help us read the resulting tables of cues and outcomes (otherwise, sparse matrices are cool for Python and/or other programming languages but less so for us, humans; Note that machines could struggle with our prefered ways of spreadsheeting and vice versa - we would struggle to read machine-prefered tables!)

In [8]:
cue_nhot_matrix = pd.DataFrame(cue_matrix.todense().tolist(),
	columns=all_cues, dtype=int)
outcome_nhot_matrix = pd.DataFrame(outcome_matrix.todense().tolist(),
	columns=all_outcomes, dtype=int)

In [9]:
pd.concat([cue_nhot_matrix.reset_index(drop=True), outcome_nhot_matrix], axis=1).head()

Unnamed: 0,pot,red,blue,TOMATO,NO.TOMATO
0,1,0,0,0,1
1,1,1,1,1,0
2,1,1,0,1,0
3,1,1,0,0,1
4,1,0,1,1,0


## The training

Now, we are ready to start our training. The only missing bit is to make our own *function* that will handle the weight updating:


$$ \Delta w_{i} = \gamma \times (t - y_{in}) \times x_{i} $$


In [10]:
def update_weights(cues, outcomes, old_weights, learning_rate):
	new_weights = old_weights + (learning_rate * cues.T * (outcomes - (cues * old_weights)))
	return(new_weights)

First, we need to prepare an empty matrix that will keep the booking of all weights - for each cue-outcome combination:

In [11]:
weight_matrix = np.zeros((num_cues, num_outcomes))

Second, we ask Python to go through the tables of cues and outcomes, for-by-row (or learning trial by learning trial), and update the weights accordingly:

$$
\Delta w_{i} =
\begin{cases}
    \text{cue ABSENT} &\rightarrow \textsf{nothing happens} &\rightarrow 0 \\
    \text{cue PRESENT & outcome PRESENT} &\rightarrow \textsf{positive evidence} &\rightarrow \gamma \times (1 - y_{in}) \times x_{i} \\
        \text{cue PRESENT & outcome ABSENT} &\rightarrow \textsf{negative evidence} &\rightarrow \gamma \times (0 - y_{in}) \times x_{i}
\end{cases}
$$


Let's first train only on the first three learning trials (events) to see whether Python will give us the same results that we got in our "paper-and-pencil" exercise:

In [12]:
for idx in range(3):
	current_cues = np.matrix(cue_nhot_matrix.iloc[idx,])
	current_outcomes = np.matrix(outcome_nhot_matrix.iloc[idx,])
	weight_matrix = update_weights(
		cues=current_cues,
		outcomes=current_outcomes,
		old_weights=weight_matrix,
		learning_rate = 0.01)

Finally, we can display the table of weights:

In [13]:
pd.DataFrame(weight_matrix, index=all_cues, columns=all_outcomes)

Unnamed: 0,TOMATO,NO.TOMATO
pot,0.0198,0.009802
red,0.0198,-0.000198
blue,0.01,-0.0001


### Train over the full learning session (all learning trials or events)

In [14]:
weight_matrix_full = np.zeros((num_cues, num_outcomes))

In [15]:
for idx in range(num_events):
	current_cues = np.matrix(cue_nhot_matrix.iloc[idx,])
	current_outcomes = np.matrix(outcome_nhot_matrix.iloc[idx,])
	weight_matrix_full = update_weights(
		cues=current_cues,
		outcomes=current_outcomes,
		old_weights=weight_matrix_full,
		learning_rate = 0.01)

In [16]:
pd.DataFrame(weight_matrix_full, index=all_cues, columns=all_outcomes)

Unnamed: 0,TOMATO,NO.TOMATO
pot,0.092879,0.080708
red,0.071453,0.015968
blue,0.049492,0.043993
