# Week 05 Notes - RL in Continuous Spaces <a class="tocSkip">

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Inverse-and-Forward-Kinematics" data-toc-modified-id="Inverse-and-Forward-Kinematics-1">Inverse and Forward Kinematics</a></span><ul class="toc-item"><li><span><a href="#Notes" data-toc-modified-id="Notes-1.1">Notes</a></span></li><li><span><a href="#Learning-Resources" data-toc-modified-id="Learning-Resources-1.2">Learning Resources</a></span></li></ul></li><li><span><a href="#Augmented-Random-Search-Tutorial-(Teach-a-Robot-to-Walk)" data-toc-modified-id="Augmented-Random-Search-Tutorial-(Teach-a-Robot-to-Walk)-2">Augmented Random Search Tutorial (Teach a Robot to Walk)</a></span><ul class="toc-item"><li><span><a href="#Notes" data-toc-modified-id="Notes-2.1">Notes</a></span></li><li><span><a href="#Learning-Resources" data-toc-modified-id="Learning-Resources-2.2">Learning Resources</a></span></li></ul></li><li><span><a href="#Midterm-Assignment-(Make-a-Bipedal-Robot-Walk)" data-toc-modified-id="Midterm-Assignment-(Make-a-Bipedal-Robot-Walk)-3">Midterm Assignment (Make a Bipedal Robot Walk)</a></span></li><li><span><a href="#Inverse-Kinematics" data-toc-modified-id="Inverse-Kinematics-4">Inverse Kinematics</a></span></li><li><span><a href="#Kalman-Filters" data-toc-modified-id="Kalman-Filters-5">Kalman Filters</a></span></li><li><span><a href="#Continuous-Action-Space---Study-Guide" data-toc-modified-id="Continuous-Action-Space---Study-Guide-6">Continuous Action Space - Study Guide</a></span></li><li><span><a href="#Continuous-Action-Space-Quiz" data-toc-modified-id="Continuous-Action-Space-Quiz-7">Continuous Action Space Quiz</a></span></li><li><span><a href="#Quantum-Machine-Learning-(Livestream)" data-toc-modified-id="Quantum-Machine-Learning-(Livestream)-8">Quantum Machine Learning (Livestream)</a></span></li></ul></div>

## Inverse and Forward Kinematics

**Video Description**:

Robotics is a vast field of study, encompassing theories across multiple scientific disciplines. In this video, we'll program a robotic arm in a simulated environment to pick up an object. Along the way, we'll learn about both forward and inverse kinematics. We'll optimize our arms trajectory using calculus and observe how its angles change over time, measuring them with trigonometry. We'll code this in Python, this is an example of machine learning applied to robotic manipulation. Enjoy! 


### Notes

- Kinematics is the branch of Classical Mechanics that describes the motion of points, objects and systems with groups of objects without reference to the causes of motion

We can summarize the behavior of our system in two equations:

- **Rotation**: The global rotation $r_i$ of a joint is the sum of all the rotations of all the previous joints:

$$ r_i = \sum_{k=0}^{i} \alpha_{k} \\ $$

- **Position**: The global position $P_i$ of a joing is given by:

$$ P_i = P_{i-1} + rotate(\ D_i, P_{i-1}, \sum_{k=0}^{i-1} \alpha_{k}\ ) \\ $$


### Learning Resources

- [Youtube Video: Robotic Manipulation Explained](https://www.youtube.com/watch?v=mCI-f71MAvY)
- [Code Link: Robotic Manipulation](https://github.com/llSourcell/Robotic_Manipulation)
- [Robotiq Blog: How to Calculate a Robot's Forward Kinematics in 5 Easy Steps](https://blog.robotiq.com/how-to-calculate-a-robots-forward-kinematics-in-5-easy-steps)
- [MIT Course 6.141 Lecture Notes: Forward and Inverse Kinematics](http://courses.csail.mit.edu/6.141/spring2011/pub/lectures/Lec14-Manipulation-II.pdf)
- [Blog: The Mathematics of Forward Kinematics](https://www.alanzucconi.com/2017/04/06/forward-kinematics/)
- [Applied Go: Inverse Kinematics](https://appliedgo.net/roboticarm/)
- [Lecture Notes: Robot Manipulator Kinematics](http://www.ent.mrt.ac.lk/~rohan/teaching/ME5144/LectureNotes/Lec%205%20Kinematics.pdf)

## Augmented Random Search Tutorial (Teach a Robot to Walk)

**Video Description**:

Learn one of the most advanced reinforcement learning algorithms to emerge in 2018, which has advanced the field of robotics by leaps and bounds, Augmented Random Search. Follow along with the coding tutorial and teach your own robot how to walk in less than an hour!


### Notes

**Augmented Random Search**:

- Shallow learning algorithm
- Random noise
- Genetic evolution
- Cutting edge performance on locomotion tasks

**How it works (simplified)**:

1. Add random noise($\delta$) to the weights ($\theta$)
2. Run a test
3. If the reward improves, keep the new weights
4. Otherwise discard

Instead of gradient descent, we use a much simpler algorithm called the Method of Finite Differences to calculate updates to our weights.

**Method of Finite Differences**:

1. Generate random noise ($\delta$) of the same shape as the weights ($\theta$)
2. Clone two versions of our current weights
3. Add the noise to $\theta{[+]}$, subtract from $\theta{[-]}$
4. Test out both versions one episode each, collect $r[+]$, $r[-]$
5. Update the weights with: $ \theta\ += \alpha\ (\ r[+] - r[-]\ ) * \delta $
6. Test and repeat for maximum performance

This algorithm works best if all inputs are squeezed between 0 and 1. This is called normalizing. We do this with a standard normalization algorithm

**Normalize the Inputs**:

- Normalized = (Inputs - Observation_mean)/ Observation_sigma
- To track the mean, we keep a running average:
    - mean = last_mean + (observation - last_mean) / num_observations

**Training Loop**:

1. Generate num_deltas deltas and evaluate positive and negative
2. Run num_deltas episodes with positive and negative variations
3. Collect rollouts as $(r[+], r[-], \delta)$ tuples
4. Calculate the standard deviation of all rewards (sigma_rewards)
5. Sort the rollouts by maximum reward and select the best num_best_deltas rollouts
6. $ step = sum(\ (\ r[+] - r[-]\ )\ * \delta\ ) $ for each best rollout
7. theta += learning_rate / (num_best_deltas * sigma_rewards) * step
8. Evaluate: play an episode with the new weights to measure improvement
9. Continue until the desired performance is reached

**Dependencies**:

- OpenAI Gym (```pip install gym```)
- Box2d (```pip install box2d```)
- PyBullet environments (```pip install pybullet```) [optional]

**Things to Try**:

- Try to code this yourself!
- Play around with the hyper-parameters
- Try other environments (PyBullet Half Cheetah, Lunar Lander Continuous)
- What other tasks does ARS get good results on?
- One way to make this faster is to employ multi-processing to utilize more than one CPU core to run several episodes in parallel


### Learning Resources

- [Youtube Video: Augmented Random Search Tutorial - How to Train Robots to Walk!](https://www.youtube.com/watch?time_continue=1&v=2P2Dj5PX5cg)
- [Code Link: ARS](https://github.com/colinskow/move37/tree/master/ars)
- [Original Research Paper](https://arxiv.org/abs/1803.07055)
- [MathisFun: Dot Product Tutorial](https://www.mathsisfun.com/algebra/matrix-multiplying.html)
- [MathisFun: Standard Deviation Tutorial](https://www.mathsisfun.com/data/standard-deviation.html)
- [Github: ARRS with multiprocessing](https://github.com/modestyachts/ARS)

## Midterm Assignment (Make a Bipedal Robot Walk)

## Inverse Kinematics

## Kalman Filters

## Continuous Action Space - Study Guide

## Continuous Action Space Quiz

## Quantum Machine Learning (Livestream)