# Week 07 Notes - Policy Based Methods <a class="tocSkip">

<h1>Table of Contents<span class="tocSkip"></span></h1>
<div class="toc"><ul class="toc-item"><li><span><a href="#Neuroevolution-Meta-Learning" data-toc-modified-id="Neuroevolution-Meta-Learning-1">Neuroevolution Meta-Learning</a></span></li><li><span><a href="#Policy-Search-Algorithms" data-toc-modified-id="Policy-Search-Algorithms-2">Policy Search Algorithms</a></span></li><li><span><a href="#Evolutionary-Algorithms-Study-Guide" data-toc-modified-id="Evolutionary-Algorithms-Study-Guide-3">Evolutionary Algorithms Study Guide</a></span></li><li><span><a href="#Evolutionary-Algorithms-Quiz" data-toc-modified-id="Evolutionary-Algorithms-Quiz-4">Evolutionary Algorithms Quiz</a></span><ul class="toc-item"><li><ul class="toc-item"><li><span><a href="#Question-1" data-toc-modified-id="Question-1-4.0.1">Question 1</a></span></li><li><span><a href="#Question-2" data-toc-modified-id="Question-2-4.0.2">Question 2</a></span></li><li><span><a href="#Question-3" data-toc-modified-id="Question-3-4.0.3">Question 3</a></span></li><li><span><a href="#Question-4" data-toc-modified-id="Question-4-4.0.4">Question 4</a></span></li><li><span><a href="#Question-5" data-toc-modified-id="Question-5-4.0.5">Question 5</a></span></li><li><span><a href="#Question-6" data-toc-modified-id="Question-6-4.0.6">Question 6</a></span></li><li><span><a href="#Question-7" data-toc-modified-id="Question-7-4.0.7">Question 7</a></span></li><li><span><a href="#Question-8" data-toc-modified-id="Question-8-4.0.8">Question 8</a></span></li><li><span><a href="#Question-9" data-toc-modified-id="Question-9-4.0.9">Question 9</a></span></li></ul></li></ul></li><li><span><a href="#Homework-Assignment-(Neuroevolution)" data-toc-modified-id="Homework-Assignment-(Neuroevolution)-5">Homework Assignment (Neuroevolution)</a></span></li><li><span><a href="#Control-Theory" data-toc-modified-id="Control-Theory-6">Control Theory</a></span></li></ul></div>

# Neuroevolution Meta-Learning


**Video Description:**

Meta learning describes the concept of 'learning to learn'. What if we could have AI learn how to optimize itself? An AI could learn the optimal hyper-parameters, architecture, and even dataset! Its a really interesting topic, and in this video I'll describe some meta learning techniques and focus on one in particular; deep neuro-evolution. We'll build an image classifier using a deep neuro-evolutionary algorithm. Enjoy!


**Notes**

- Meta-Learning: When a Meta-Level AI trains a Bottom-level AI
- Why Meta Learning?
    - Faster AI Systems
    - More Adaptable to Environmental Changes
    - Generalizes to more tasks
- Neuroevolution, a specific meta-learning technique is the process of using an evolutionary algorithm to learn neural architectures
- Evolutionary Algorithms:
    1. Initial Population
    2. The environment changes
    3. Only one sub-population is suited to the new environment
    4. The sub-population has a higher probability of reproduction
    5. The sub-population spreads in the environment

![evolutionary algorithms process](imgs/move_37_evolutionary_algorithms_process.jpg)

<br/>

- Intra-Life Learning: A process of evolution via natural selection. E.g. Evolutionary Algorithms.
- Inter-Life Learning: Relates to how an animal learns during its lifetime through interacting with its environment. E.g. Neural Networks.
- Google's AmoebaNet


**Take Aways**

- Meta Learning is the process of learning to learn, where an AI optimizes one or several other AIs
- Evolutionary algorithms use concepts from the evolutionary process like mutation and natural selection to solve complex problems
- A Meta Learning technique called Neuroevolution uses Evolutionary Algorithms to optimize Neural Networks specifically


**Learning Resources**

- [Youtube Video](https://www.youtube.com/watch?v=2z0ofe2lpz4)
- [Code Link: Neural Network Genetic Algorithm](https://github.com/harvitronix/neural-network-genetic-algorithm)
- [UTexas: Tutorial on Evolution of Neural Networks (2013)](http://nn.cs.utexas.edu/?neuroevolution-tutorial-ijcnn2013)
- [Medium: Let's evolve a neural network with a genetic algorithm](https://blog.coast.ai/lets-evolve-a-neural-network-with-a-genetic-algorithm-code-included-8809bece164)
- [Medium: Paper Repro: Deep Neuroevolution](https://towardsdatascience.com/paper-repro-deep-neuroevolution-756871e00a66)
- [Youtube: Introduction to Neuroevolution - The Nature of Code](https://www.youtube.com/watch?v=lu5ul7z4icQ)

# Policy Search Algorithms

Policy search refers to methods that directly learn the policy for solving a Markov Decision Process (MDP). Policy Gradient methods are a subset of this wide class of algorithms.

Q-Learning, Fitted-Q, LSTD-Q...etc. are examples of action-search algorithms. They find the optimal policy by maximizing individual actions for every state (or feature vector if we consider value function approximation).

This is different from Policy Search algorithms, where one searches the policy space directly.

Policy Gradient methods learn the value function and use these values to learn a policy for a given MDP, hence why they are considered a subclass of Policy Search Algorithms. Actor-Critic methods is another subclass of Policy Search Algorithms which is related to Policy Gradient, with the only difference being that we reduce variance by comparing policies based on an estimate of the value function.

Global Optimization Algorithms are another subclass of Policy Search Algorithms - the Cross Entropy method for optimization or Optimistic Optimization methods.


**Difference between Policy Search and Policy Iteration:**

Policy Search seeks the optimal policy in the policy space.
- A policy $\pi\ :\ S \rightarrow A$ is a mapping from states to actions
- If state and action sets are finite, there are finite number of possible policies and the optimal policy relies somewhere within this set
- If the state-action sets are continuous (e.g. subsets of vector spaces), then the optimal policy may lie in an infinite dimensional vector space
- The common approach is to parameterize the policy with some parameter $w \in W$, so that the search is performed over the parameter space $W$, i.e. a subset of the finite dimensional vector space.
- In either discrete of continuous policies, we can formulate the search as an optimization problem or use any optimization algorithm to obtain a good local optimum that maximizes the return:
    - Monte Carlo tree search
    - Cross-Entropy
    - Genetic Algorithms
    - Gradient Descent

# Evolutionary Algorithms Study Guide

[Evolutionary Algorithms - Study Guide](https://www.theschool.ai/wp-content/uploads/2018/10/Evolutionary-Algorithms-Study-Guide.pdf)


**What are Evolutionary algorithms?**

- A style of optimization inspired by the study of genetics and evolution
    - A population of solutions are proposed for a given problem
    - The best solutions are held aside, and the rest are removed
    - By introducing change through crossever and/or mutation, the algorithm explores the search space
    - If no new genetic material via mutation is introduced, the process tends to stagnate due to a limited gene pool


**What kinds of Evolutionary Algorithms are out there?

- **Genetic Algorithm**: formulate a string of numbers (traditionally binary) to represent a genome and iteratively improve based on a fitness function in order to solve a problem
- **Evolution Strategies**: Genetic algorithm with a vector of real numbers
- **Genetic programming**: Genetic algorithms applied to generating programs
- **Neuroevolution**: Genetic programming applied to neural networks


**What are the steps in an Evolutionary Algorithm?**

1. **Initialization**: create initial population of solutions
2. **Selection**: evaluate members based on a fitness function
3. **Genetic Operators**: ('A' or 'A and B')
    A) mutation: vary the genes based on random noise
    B) crossover: swap genes between successful members of the population
    Repeat steps 2 and 3 until...
4. **Termination**: End after reaching max runtime or a threshold of performance


**What are the advantages of Evolutionary Algorithms over other methods?**

- They cover a large search space
- Highly creative approaches
- Doesn't require a gradient


**What kinds of problems are Evolution Algorithms suited for?**

- Problems with large search space for solutions
- Problems where you can't calculate a gradient
- Black box engineering: problems where you don't have a very informative model
- Quantum computing: designing quantum algorithms can be counter-intuitive


**How can Evolutionary Algorithms be combined with neural networks?**

- The weights can be trained with an evolutionary algorithm (conventional neuroevolution)
- Neural Architecture search: can be used to find optimal Neural Net architectures
- Use a Neural Network to get the features and evolution to get the policy


**Further Information**

- [Introduction to Evolutionary Algorithms](https://towardsdatascience.com/introduction-to-evolutionary-algorithms-a8594b484ac)
- [An introduction to Evolutionary Algorithms - Dr. Shahin Rostami](https://www.youtube.com/watch?v=L--IxUH4fac)
- [Multi-Objective Problems - Dr. Shahin Rostami](https://www.youtube.com/watch?v=56JOMkPvoKs)
- [What exactly are genetic algorithms and what sort of problems are they good for?](https://ai.stackexchange.com/questions/240/what-exactly-are-genetic-algorithms-and-what-sort-of-problems-are-they-good-for)
- [Evolutionary algorithms: A critical review and its future prospects (2016, pay-wall)](https://ieeexplore.ieee.org/document/7955308?reload=true)
- [Evolutionary-Neural Hybrid Agents for Architecture Search (2019, under review)](https://openreview.net/pdf?id=S1eBzhRqK7)


**Researchers**

- [Jeff Clune](https://www.youtube.com/watch?v=eCy-vUXXF_g)
- [Kenneth Stanley](https://www.youtube.com/watch?v=XWUsl24zYOU)
- [Wolfgang Banzhaf](https://www.youtube.com/watch?v=tj5-H6ECxyM)
- [Lee Spector](https://www.youtube.com/watch?v=UWoWBiMowLI)
- [Publications of Dr. A.E. Eiben](https://www.cs.vu.nl/~gusz/index.php/my-publications/)

# Evolutionary Algorithms Quiz

### Question 1

Check all of the following which are true about logarithms.

- []


**Explanation**:


### Question 2

- [] 

**Explanation**:


### Question 3

- [] 

**Explanation**:


### Question 4

- [] 

**Explanation**:


### Question 5

- [] 

**Explanation**:


### Question 6

- [] 

**Explanation**:


### Question 7

- [] 

**Explanation**:


### Question 8

- [] 

**Explanation**:


### Question 9

- [] 

**Explanation**:



# Homework Assignment (Neuroevolution)

This weeks homework assignment is to design a neuro-evolution algorithm  that will learn how to optimally play any one of the OpenAI Gym environments. Here is an [example notebook](https://github.com/ikergarcia1996/NeuroEvolution-Flappy-Bird) to get started. If your agent can learn how to play the game using both neural networks and an evolutionary algorithm, youâ€™ll successfully complete the assignment. Good luck!

# Control Theory


**Video Description:**

Boston Dynamics released yet another incredible video of its bipedal humanoid robot, this time performing parkour by jumping on a series of boxes. In this video, I'll explain how it works at both a hardware and software level. Their real value lies in the specific type of software they are using, we don't know for sure what it is but we can take some educated guesses based on a combination of whats been revealed so far and what's worked in other humanoid robots. Prepare yourselves for some mechanical engineering and control theory Wizards, enjoy!


**Take Aways**

- Boston Dynamics is powered using a very efficient Hydraulic Power Unit, which converts Mechanical Energy into Hydraulic Energy
- The Zero Moment Point Algorithm is used to help Bipedal Robots walk, run, and jump by computing a stabilized trajectory for the robot
- Boston Dynamics likely uses a stack of different Dynamic Control Algorithms to help its robot perform motions


**Learning Resources**

- [Youtube Video](https://www.youtube.com/watch?v=lXZ6y3lMymM)
- [Code Link: Boston Dynamics Atlas Explained](https://github.com/llSourcell/Boston_Dynamics_Atlas_Explained)
- [Bostn Dynamics Atlas](https://www.bostondynamics.com/atlas)
- [PDF: CMU Human-Supervised Control of the ATLAS Humanoid Robot for Traversing Doors](https://www.cs.cmu.edu/~cga/drc/door-submitted.pdf)
- [BDI Atlas Robot Interface 3.0.0](http://gazebosim.org/tutorials?tut=drcsim_atlas_robot_interface&branch=issue_24_atlas_robot_interface_drcsim_4)
- [PDF: Introduction to Control Systems](http://www.ent.mrt.ac.lk/~rohan/teaching/EN5001/Reading/DORFCH1.pdf)
- [PDF: Introduction to Robotics](http://engineering.nyu.edu/mechatronics/smart/Archive/intro_to_rob/Intro2Robotics.pdf)
- [Youtube: Parkour Atlas](https://www.youtube.com/watch?v=LikxFZZO2sk)