In [None]:
from fastcore.foundation import patch

## A (very) simple Person.
Let's create people that can say their name.

In [None]:
class Person():
    def __init__(self, name):
        self.name = name
        
    def say_hi(self):
        print("Hi, I'm %s!"%self.name)

In [None]:
paul = Person("Paul")

In [None]:
marie = Person("Marie")

In [None]:
paul.say_hi()

Hi, I'm Paul!


In [None]:
marie.say_hi()

Hi, I'm Marie!


## Our first "learning" model

The random model is the easiest, it does not actually learn anything, but simply gives us equal choice probabilities for each possible actions.

In [None]:
class RandomModel():
    def __init__(self):
        self.person = None
        pass
    
    def get_choice_probabilities(self, actions):
        choice_probabilities = {}
        for action in actions:
            choice_probabilities[action] = 1/len(actions)
        
        return choice_probabilities

In [None]:
rando = RandomModel()
possible_actions = ['a','b']
rando.get_choice_probabilities(possible_actions)

{'a': 0.5, 'b': 0.5}

Let's allow people to have learning models.

In [None]:
@patch
def set_learning_model(self:Person, model):
    self.learning_model = model
    model.person = self

In [None]:
paul.set_learning_model(RandomModel())
type(paul.learning_model)

__main__.RandomModel

And to make choices based on these learning models.

In [None]:
@patch
def choose_action(self:Person, possible_actions):
    '''Note: This only works for two actions'''
    # Getting choice probabilities from learning model
    choice_probabilities = self.learning_model.get_choice_probabilities(possible_actions)
    # Chosing action based on choice probabilities
    random = np.random.random()
    if random < list(choice_probabilities.values())[0]:
        chosen_action = list(choice_probabilities.keys())[0]
    else:
        chosen_action = list(choice_probabilities.keys())[1]
    return chosen_action

> Warning: This choice function only works for two actions.  Are there functions that work for more actions?

Paul chooses an action.

In [None]:
paul.choose_action(['a','b'])

'a'

Does Paul choose randomely?

In [None]:
pd.Series([paul.choose_action(['a','b']) == 'a' for i in range(1000)]).mean()

0.55

Yes! :)

## A more complicated model
### RescorlaWagnerModel

In [None]:
class RescorlaWagnerModel():
    def __init__(self, alpha, beta):
        self.person = None
        self.alpha = alpha # A Rescorla-Wagner Model has a learning rate...
        self.beta = beta # ...and an inverse temperature parameter
        self.expected_reward_memory = {} # It can memorize expected rewards but starts with no knowledge of the world

In [None]:
resco = RescorlaWagnerModel(alpha = .2, beta = 4)

> A Rescorla-Wagner Model associates each action with an expected reward.  If it has not encountered a possible action, it assignes .5 to it (Note: Again this only seems to work for two actions).

In [None]:
@patch
def get_expected_reward_for_action(self:RescorlaWagnerModel, action):
    # If we haven't encountered the action, we set its expected reward to .5 and remember it
    if action not in self.expected_reward_memory:
        self.expected_reward_memory[action] = .5
    # We return the expected reward associated with the action
    return self.expected_reward_memory[action]

The model starts with no knowledge of the world.

In [None]:
resco.expected_reward_memory

{}

It encounteres an action and sets it's expected reward to .5.

In [None]:
resco.get_expected_reward_for_action('a')

0.5

After encountering an action, the model remembers it's reward.

In [None]:
resco.expected_reward_memory

{'a': 0.5}

Let's write a function that can consider several actions at once:

In [None]:
@patch
def get_expected_rewards_for_possible_actions(self:RescorlaWagnerModel, actions):
    expected_rewards = {}
    for action in actions:
        expected_rewards[action] = self.get_expected_reward_for_action(action)
    return expected_rewards

In [None]:
resco.get_expected_rewards_for_possible_actions(['b','c'])

{'b': 0.5, 'c': 0.5}

In [None]:
resco.expected_reward_memory

{'a': 0.5, 'b': 0.5, 'c': 0.5}

Based on the expected rewards, we can write the models choice function.

In [None]:
@patch
def get_choice_probabilities(self:RescorlaWagnerModel, actions):
    expected_rewards = self.get_expected_rewards_for_possible_actions(actions)
    expected_reward_values = np.array(list(expected_rewards.values()))
    choice_probabilities = np.exp(expected_reward_values * self.beta) / sum(np.exp(expected_reward_values * self.beta))
    choice_probabilities = dict(zip(actions,choice_probabilities)) # turning them into a dictionary
    return choice_probabilities

In [None]:
resco.get_choice_probabilities(['b','c'])

{'b': 0.5, 'c': 0.5}

Now a person with a Rescorla Wagner Model should already be able to choose things.

In [None]:
richard = Person('Richard')
richard.set_learning_model(RescorlaWagnerModel(alpha = .2, beta = 4))

In [None]:
richard.choose_action(['a','b'])

'a'

At this point the Rescorla Wagner Model acts the same as the random model.  This is because we have not given it the ability to learn (update it's expected reward values).  Therefore the values stay at their initial value .5.

Let's give the model the chance to learn by associating actions with rewards.

> Warning: For some reason rewards have to be between 0 and 1.

In [None]:
@patch
def learn(self:RescorlaWagnerModel, action, reward):
    self.prediction_error = reward - self.get_expected_reward_for_action(action)
    self.expected_reward_memory[action] = self.expected_reward_memory[action] + self.alpha * self.prediction_error

In [None]:
resco.expected_reward_memory

{'a': 0.5, 'b': 0.5, 'c': 0.5}

In [None]:
resco.learn('a',1)

In [None]:
resco.expected_reward_memory

{'a': 0.6, 'b': 0.5, 'c': 0.5}

For learning to occur, of course, the participant has to remember her last action (at least until receiving a reward):

In [None]:
@patch
def choose_and_remember_action(self:Person, possible_actions):
    action = self.choose_action(possible_actions)
    self.last_action = action
    return action

In [None]:
richard.choose_and_remember_action(['a','b'])

'b'

In [None]:
richard.last_action

'b'

Now when the person gets rewarded, the model can learn.

In [None]:
@patch
def get_rewarded(self:Person, reward):
    self.learning_model.learn(self.last_action, reward)

In [None]:
richard.get_rewarded(1)

In [None]:
richard.learning_model.expected_reward_memory

{'a': 0.5, 'b': 0.6}

In [None]:
richard.choose_and_remember_action(['a','b'])

'a'

In [None]:
for i in range(100000):
    richard.get_rewarded(1) # Jackpot!!!

In [None]:
richard.learning_model.expected_reward_memory

{'a': 0.9999999999999998, 'b': 0.6}

Richard should be pretty fond of a now.  Note that he still chooses a only in about 80% of the cases.  Could you imagine why?

In [None]:
pd.Series([richard.choose_action(['a','b']) == 'a' for i in range(1000)]).mean()

0.82