# Week 04 Notes - Model Free Learning

## Dopamine in Neuroscience

The human brain is wondrous in its capabilities. The rules that govern its prowess at so many tasks are becoming slightly clearer everyday. In this video, I'll highlight how 4 key reinforcement learning algorithms help explain how the human brain works, specifically through the lens of the neurotransmitter known as 'dopamine'. These algorithms have been used to help train everything from autopilot systems for airplanes, to video game bots. TD-Learning, Rescorla-Wagner, Kalman Filters, and Bayesian Learning, all in one go!

**Notes**:
- Associative Learning Theory:
    - Describes the process by which a person or animal learns an association between 2 stimuli
    - Reinforcement Learning is the acquisition of associations between states, actions and rewards
- Rescorla-Wagner Model:
    - Prediction error based learning model
    - Stimuli acquire value when there is a mismatch between the prediction and the outcome
    - Groundbreaking because
        - Able to explain the conditioning phenomena
        - Useful in early Natural Language Processing Systems
    - It only estimated a single value
- Kalman Filter
    - States that uncertainty grows over time due to the random diffusion of the weights
    - This uncertainty can be reduced by observing the data
    - Uses a series of measurements observed over time and produces estimates of unknown variables that tend to be more accurate than those based on a single measurement alone
    - Has numerous applications:
        - Nagivation
        - Control of Vehicles
        - Time Series Analysis
        - Robotics
    - Works by modeling the central nervous systems control of movement
- Temporal Difference Learning
    - Extended the R-L Model by introducing a discount factor into the prediction error, which helps define how much a reward matters to an agent depending on when in time it's received
    - We can make rewards that happen in the near term worth more
    - Invented by 2 researchers 


**Take Aways**:
- Associative learning is a learning process in which a new response becomes associated with a parciular stimulus
- When we build mathematical models of learning, we can use distributions instead of single values to help represent uncertainty about the world
- Temporal Difference Learning is a Model Free Learning technique that predicts the expected value of a variable occuring at the end of a sequence of states

**Learning Resources**:
- [Youtube Video](https://www.youtube.com/watch?v=-vhYoS3751g)
- [Code Link](https://github.com/llSourcell/Mathematics_of_Dopamine)
- [Youtube: The Rescorla-Wagner Model](https://www.youtube.com/watch?v=pYyUSh1veoo)
- [Youtube: TD Learning - Richard S. Sutton](https://www.youtube.com/watch?v=LyCpuLikLyQ)
- [Youtube: Special Topics - The Kalman Filter](https://www.youtube.com/watch?v=CaCcOwJPytQ)
- [Youtube: Bayesian Learning](https://www.youtube.com/watch?v=C2OUfJW5UNM)
- [PDF: A Unifying Probabilistic View of Associative Learning](https://dash.harvard.edu/bitstream/handle/1/23845336/4633133.pdf?sequence=1&isAllowed=y)
- [Book: Chapter 9 Temporal-Difference Learning](https://web.stanford.edu/group/pdplab/pdphandbook/handbookch10.html)



## Reading Assignment: Model Based vs. Model Free Learning
- [Model Based vs. Model Free Reading Assignment](https://www.theschool.ai/wp-content/uploads/2018/09/Move37-Reading-Assignment-Model-Based-vs-Model-Free-1.pdf)

**Model**: a plan for our agent. When we have a set of defined state transition probabilities, we call that working with a model. Reinforcement learning can be applied with or without a model, or even used to define a model.

A complete model of the environment is required to do Dynamic Programming. If our agent doesn't have a complete map of what to expect, we can instead employ what is called **model-free learning**, where the model learns via trial an error.

For some board games such as Chess and Go, although we can accurately model the environment's dynamics, computational power constrains us from calculating the Bellman Optimality equation. This is where Model-free Learning methods shine. We handle this situation by optimizing for a smaller subset of states that are frequently encountered, at the cost of knowing less about the infrequently visited states.

Further Reading:
- [Medium: Model Free Reinforcement Learning Algorithms](https://medium.com/deep-math-machine-learning-ai/ch-12-1-model-free-reinforcement-learning-algorithms-monte-carlo-sarsa-q-learning-65267cb8d1b4)
- [Book: Temporal-Difference Learning (RL: An Introducion Chapeter 6)](http://incompleteideas.net/book/the-book.html)
- [PDF: Reward-Based Learning, Model-Based and Model-Free (2014)](https://www.quentinhuys.com/pub/HuysEa14-ModelBasedModelFree.pdf)
- [Temporal Difference Methods: Model-Free Deep RL for Model-Based Control (2018)](https://bair.berkeley.edu/blog/2018/04/26/tdm/)
-[Temporal Difference Methods: Model-Free Deep RL for Model-Based Control (2018)](https://arxiv.org/abs/1802.09081)


## Homework Assignment: Q Learning

## Temporal Difference Learning

## Quiz: Model Free Learning

## Q Learning Tutorial for Ride Sharing

## Quantum Interview

## Dopamine in Neuroscience