# Chapter 4. Learning involves attention

* 싸이그래머 / QGM : 파트 3 - 연결주의모형 [1]
* 김무성

# Contents

* INTRODUCTION
    - Shifts of attention facilitate learning
    - Shifts of attention can be assessed by subsequent learning
* INTRA- AND EXTRADIMENSIONAL SHIFTS
    - Experiment design and results
    - A connectionist model with attentional learning
        - Activation propagation
        - Learning of attention and associations
        - Fit of the model
* BLOCKING OF ASSOCIATIVE LEARNING
    - Experiment design and results
    - A connectionist model with attentional learning
        - Fit of the model
* THE INVERSE BASE RATE EFFECT
    - Experiment design and results
        - Fit of the model
* SUMMARY AND CONCLUSION
    - Relation to other learning models
    - Attentional shifting and learning are rational

#### 참고
* [2] attention(인지심리학 영역) - http://www.unt.edu/rss/class/mike/5640/Attention.ppt
* [3] 부산대 인지심리학 강의 노트 : attention - http://ocw.pusan.ac.kr/front/phtml/0101_04.php?aidx=2&cidx=89&act=view&idx=295&p=1&key=&keyword=#
* [4] Attention (의식과 연관) - http://www.slideshare.net/drjayeshpatidar/attention-44340875
* [5] Attention (의식과 연관) - http://www.slideshare.net/ArulLawrence/attention-14003200

# INTRODUCTION
* Shifts of attention facilitate learning
* Shifts of attention can be assessed by subsequent learning

A thesis of this chapter is that 
* connectionist learning models must incorporate 
    - rapidly shifting selective attention and 
    - the ability to learn attentional redistributions. 
* This kind of attentional shifting is not only necessary 
    - to mimic learning by humans and other animals, 
    - it is also a highly effective and rational solution 
        - to the demands of learning many new associations 
            - as quickly as possible. 
* This chapter describes 
    - three experiments (one previously published and two new) that
        - demonstrate the action of attentional learning.

## Shifts of attention facilitate learning

A basic fact of learning is that people quickly learn new associations without rapidly forgetting old associations.
* mushrooms example
    - Consider a hypothetical situation in which 
        - an animal 
            - learns that 
                - mushrooms with 
                    - a round top and 
                    - smooth texture are 
                - tasty and nutritious. 
        - After successfully using this knowledge for some time, 
            - the animal encounters 
                - a new mushroom with 
                    - a smooth texture but 
                    - a flat top. 
                - This mushroom turns out to induce nausea. 
        - <font color="red">How is the animal to quickly learn about this new kind of mushroom, without destroying still-useful knowledge about the old kind of mushroom?</font>
        - If the animal learns to associate 
            - both features of the new mushroom with nausea, 
        - then it will <font color="red">inappropriately destroy part of its previous knowledge</font> about healthy mushrooms, 
            - i.e. the previous association from smooth texture to edibility will be destroyed. 
        - On the other hand, if the <font color="blue">old association is retained</font>, it generates a <font color="blue">conflicting response</font>, 
            - i.e. eating the mushroom.

#### To facilitate learning

* To facilitate learning about the new case, it would be advantageous to selectively attend to the distinctive feature, viz. flat top, and learn to associate this feature with nausea

#### shifts of attention

* By <font color="red">selectively attending to the distinctive feature</font>, 
    - previous knowledge is preserved, and 
    - new learning is facilitated.
* Not only should <font color="red">attention be shifted</font> in this way to facilitate learning, 
    - but the <font color="red">shifted attentional distribution should itself be learned</font> :
        - Whenever the animal encounters a mushroom with smooth texture and flat top, 
            - it should shift attention to the flat top, away from the smooth texture.

## Shifts of attention can be assessed by subsequent learning

#### attention

The term “attention”, as used here, 
* refers to both  
    - <font color="red">the influence of a feature on an immediate response</font> and 
    - <font color="red">the influence of a feature on learning</font>. 
* If a feature is being strongly attended to, 
    - then that feature should have 
        - a strong influence 
            - on the <font color="red">immediate response</font> and 
            - on the <font color="red">imminent learning</font>. 
* This latter influence of attention on learning 
    - is sometimes referred to as the <font color="blue">feature’s associability</font>. 
* In this chapter, these two influences of attention are treated synonymously. 

#### subsequent learning ability

Because redistribution of attention is a learned response to stimuli, the <font color="red">degree of attentional learning</font> can be <font color="blue">assayed by examining</font> <font color="red">subsequent learning ability</font>.
* If a person has learned that 
    - a particular feature is 
        - highly indicative of an appropriate response, 
    - then, presumably, the person has also learned 
        - to attend to that feature. 
* If subsequent training makes 
    - a different feature relevant to new responses, 
    - then learning about this new correspondence 
        - should be relatively slow, 
            - because the person will have to unlearn 
                - the attention 
                    - given to the now-irrelevant feature.
* In general, learned attention to features or dimensions 
    - can be inferred 
        - from the ease with which subsequent associations are learned. 
    - This technique is used in all three examples presented below.

# INTRA- AND EXTRADIMENSIONAL SHIFTS
* Experiment design and results
* A connectionist model with attentional learning

#### 참고
* [6] Intra-Extra Dimensional Set Shift (IED) - http://www.cambridgecognition.com/tests/intra-extra-dimensional-set-shift-ied
* [7] extradimensional shift task - http://www.cognitiveatlas.org/term/Extradimensional_shift_task

A traditional learning paradigm in psychology investigates <font color="red">perseveration of learned attention across phases of training</font>. 
* In the first phase, 
    - participants learn that
        - one stimulus dimension 
            - is relevant to the outcome 
        - while other dimensions 
            - are irrelevant. 
* In the second phase, 
    - the mapping of stimuli to outcomes changes so that either 
        - a different dimension is relevant (= extradimensional shift) or 
        - the same dimension remains relevant (= intradimensional shift).
    - The former change of relevance is called <font color="red">extradimensional shift</font>, and
    - the latter change is called <font color="red">intradimensional shift</font>.

Many studies in many species have <font color="blue">shown that intradimensional shift is easier than extradimensional shift</font>, a fact that can be explained by the hypothesis that subjects learn to attend to the relevant dimension, and this attentional shift perseverates into the second phase

In this section of the chapter, 
* a recent experiment 
    - demonstrating this difference is summarized, and 
* a connectionist model that 
    - incorporates 
        - attentional learning 
            - is shown to fit the data, 
        - whereas the model 
            - cannot fit the data 
                - <font color="red">if its attentional learning mechanism is “turned off”</font>.

## Experiment design and results

Consider the <font color="red">simple line drawings of freight train box cars</font> shown in Figure 4.1. 
* They vary on three binary dimensions: 
    - height, 
    - door position, and 
    - wheel color.

In an experiment conducted in my lab (Kruschke, 1996b), 
* people learned to classify these cars into one of two routes.

<img src="figures/cap4.1.png" width=600 />

Figure 4.2 indicates the mapping of cars to routes.

<img src="figures/cap4.2.png" width=600 />

Human learning performance in this experiment is shown in Figure 4.3. It can be seen that people learned the intradimensional shift much faster than the extradimensional shift [t(118) = 3.65, SEdiff = .026, p < .0001 two-tailed].

<img src="figures/cap4.3.png" width=600 />

## A connectionist model with attentional learning
* Activation propagation
* Learning of attention and associations
* Fit of the model

#### dimensions & exemplars
* The advantage of intradimensional shift over extradimensional shift suggests that there is learned attention to dimensions.
* conjunctive combinations of dimensional values must be encoded in the model. 
* There has been much research that suggests that people can and do encode configurations of values, also called exemplars, during learning

#### AMBRY
* 참고 : [8] AMBRY github - https://github.com/MaLomm/AMBRY
* The model fit to these data was called AMBRY by Kruschke (1996b) because it is a variant of the ALCOVE model (Kruschke, 1992). 
* The architecture of (part of) AMBRY is shown in Figure 4.4.

<img src="figures/cap4.5.png" width=600 />

### Activation propagation

#### input node

* In AMBRY, each dimension is encoded by a separate input node. 
* If $ψ_{i}$ denotes the <font color="blue">psychological scale value</font> of the stimulus on dimension i, then the activation of input node i is simply that scale value:
    <img src="figures/cap4.4.png" width=600 />
    - Because the experiment counter-balanced the assignment of physical dimensions in Figure 4.1 to abstract dimensions in Figure 4.2, 
        - the dimensional values were simply assumed to be 1.0 and 2.0; 
        - e.g. for the short car, 
            - $ψ_{height}$ = 1.0, and 
        - for the tall car, 
            - $ψ_{height}$ = 2.0.

#### exampler node

An exemplar node is significantly activated only by stimuli that are fairly similar to the exemplar represented by the node. In other words, each exemplar node has a limited <font color="red">“receptive field”</font> in stimulus space. Formally, the activation value is given by:
    <img src="figures/eq4.2.png" width=600 />

* where $c$ is a constant called the <font color="blue">specificity</font> that determines the narrowness of the receptive field, 
* where $α_{i}$ is the <font color="blue">attention strength</font> on the ith dimension, and 
* where $ψ_{ji}$ is the <font color="blue">scale value</font> of the jth exemplar on the ith dimension. 
* Because stimulus values are either 1.0 or 2.0, the values of $ψ_{ji}$ are either 1.0 or 2.0.

Figure 4.5 shows the activation profile of an exemplar node in a two-dimensional stimulus space. 
* It is this pyramid-shaped activation profile that is used to represent the exemplar nodes in Figure 4.4.

<img src="figures/cap4.6.png" width=600 />

#### attention

* Importantly, equation (2) implies that 
    - increasing the attention strength on a dimension 
        - has the effect of magnifying differences on that dimension, 
    - so that differences along the dimension have 
        - a larger influence on the similarity. 
* Thus, if a dimension is relevant to a categorization, 
    - the attention strength on that dimension 
        - can be increased 
            - to better distinguish the exemplars
                - from the two categories. 
* On the other hand, an irrelevant dimension 
    - can have its attention decreased, 
    - so that differences along that dimension 
        - do not needlessly impede learning.

#### attention bias

The attention strengths are indicated in Figure 4.4 by the arrows from a “bias” node (which is always activated) to the boxes marked with Xs above the input nodes. 
* The boxes are marked with Xs to indicate that each attentional strength is a multiplier on the input.

<img src="figures/cap4.5.png" width=600 />

#### category node

* <font color="red">Activation from the exemplar nodes is propagated to category nodes via weighted connections</font>, illustrated in Figure 4.4 by the arrows from exemplar nodes to category nodes. 
* <font color="red">The activation of each category node</font> is determined by a standard <font color="red">linear combination of weighted exemplar-node activations</font>. 
* Finally, the activations of the category nodes are converted to choice probabilities by a ratio rule, such that the probability of choosing a category corresponds with the activation of the category relative to the total activation of all categories. 

### Learning of attention and associations

The association weights between the exemplar nodes and the category nodes are learned by standard back-propagation of error (Rumelhart et al., 1986).

#### attentional learning rate

Of importance here is to note that the learning of the attention strengths is based on error reduction, and the amount or speed of learning is governed by a single parameter called the <font color="red">attentional learning rate</font>.
* When this attentional earning rate is fixed at zero, 
    - the model has no ability to learn to selectively attend to relevant dimensions 
        - (but it can still learn categorizations because of the learnable association weights between exemplars and categories).

<font color="blue">The model is therefore referred to as a type of connectionist model, and is never referred to as a neural network model.</font>

### Fit of the model

The top graph of Figure 4.6 shows the predictions of AMBRY when fitted to the data shown in Figure 4.3.

<font color="red">Attentional learning is critical to account for the data.</font>

<img src="figures/cap4.7.png" width=600 />

<img src="figures/cap4.8.png" width=600 />

<img src="figures/cap4.9.png" width=600 />

# BLOCKING OF ASSOCIATIVE LEARNING
* Experiment design and results
* A connectionist model with attentional learning

Suppose two cues, A and B, are presented to a learner, followed by an outcome. Typically both cues will acquire moderate associative strength with the outcome. On the other hand, if the subject was trained in a previous phase to learn that A by itself predicts the outcome, then the associative strength from B seems to be very weak. It appears that the prior training with A has blocked, i.e. prevented, learning about B, despite the fact that B is now just as predictive of the outcome as A. 

For more than 30 years there have been two prominent theories of blocking.
* The dominant theory, formalized in the <font color="red">Rescorla–Wagner model</font> (1972)
* A competing theory, first suggested by Sutherland & Mackintosh (1971) and extended by Mackintosh (1975), claims that there is in fact <font color="blue">something learned about the redundant relevant cue</font>; namely, that it is irrelevant.
    - In other words, subjects learn to <font color="blue">suppress attention to the redundant cue</font>. 

<font color="red">This ubiquitous learning phenomenon, blocking, involves learned attention. Models of natural learning should incorporate mechanisms of learned attention</font>.

## Experiment design and results

<img src="figures/cap4.10.png" width=600 />

<img src="figures/cap4.11.png" width=600 />

## A connectionist model with attentional learning
* Fit of the model

Figure 4.7 depicts the ADIT model introduced by Kruschke (1996a) and extended by Kruschke (2001), referred to here as the EXIT model. EXIT is very similar in spirit to the AMBRY model described in the previous section (Figure 4.4). 

<img src="figures/cap4.12.png" width=600 />

Formally, the activation of the kth category node is determined as:

<img src="figures/cap4.13.png" width=600 />

where $w_{ki}$ is the associative weight from the $i$th cue to the $k$th category, $α_{i}$ is the attention allocated to the $i$th cue, and $a^{in}_{i}$ is the activation of the ith cue.

The change in the associative weight from the ith cue to the kth output (denoted ∆wki) is given by:

<img src="figures/cap4.14.png" width=600 />

where tk is the teacher value (correct response) for the kth category node.

### Fit of the model

<img src="figures/cap4.15.png" width=600 />

# THE INVERSE BASE RATE EFFECT
* Experiment design and results

<img src="figures/cap4.16.png" width=600 />

## Experiment design and results
* Fit of the model

<img src="figures/cap4.17.png" width=600 />

<img src="figures/cap4.18.png" width=600 />

### Fit of the model

<img src="figures/cap4.19.png" width=600 />
<img src="figures/cap4.20.png" width=600 />

# SUMMARY AND CONCLUSION
* Relation to other learning models
* Attentional shifting and learning are rational

## Relation to other learning models

## Attentional shifting and learning are rational

# 참고자료

* [1] Connectionist Models in Cognitive Psycholgy - http://www.amazon.com/Connectionist-Cognitive-Psychology-Studies-Cognition/dp/0415646901/
* [2] attention(인지심리학 영역) - http://www.unt.edu/rss/class/mike/5640/Attention.ppt
* [3] 부산대 인지심리학 강의 노트 : attention - http://ocw.pusan.ac.kr/front/phtml/0101_04.php?aidx=2&cidx=89&act=view&idx=295&p=1&key=&keyword=#
* [4] Attention (의식과 연관) - http://www.slideshare.net/drjayeshpatidar/attention-44340875
* [5] Attention (의식과 연관) - http://www.slideshare.net/ArulLawrence/attention-14003200
* [6] Intra-Extra Dimensional Set Shift (IED) - http://www.cambridgecognition.com/tests/intra-extra-dimensional-set-shift-ied
* [7] extradimensional shift task - http://www.cognitiveatlas.org/term/Extradimensional_shift_task
* [8] AMBRY github - https://github.com/MaLomm/AMBRY