# Deep Q-Network (DQN)

It will be implemented a DQN agent with OpenAI Gym's LunarLander-v2 environment

## LunarLander-v2

https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py

Created by Oleg Klimov. Licensed on the same terms as the rest of OpenAI Gym.

Rocket trajectory optimization is a classic topic in Optimal Control.

According to Pontryagin's maximum principle it's optimal to fire engine full throttle or turn it off. 

That's the reason this environment is OK to have discreet actions (engine on or off).

To understand LunarLander
- Landing pad is always at coordinates (0,0). 
- The coordinates are the first two numbers in the state vector. 
- Reward for moving from the top of the screen to landing pad and zero speed is about 100..140 points. 
- If lander moves away from landing pad it loses reward back. 
- Episode finishes if the lander crashes or comes to rest, receiving additional -100 or +100 points. 
- Each leg with ground contact is +10 points.
- Firing the main engine is -0.3 points each frame. 
- Firing the side engine is -0.03 points each frame.
- Solved is 200 points.
- Landing outside the landing pad is possible. 
- Fuel is infinite, so an agent can learn to fly and then land on its first attempt. 

Four discrete actions available: 
1. Do nothing.
2. Fire left orientation engine.
3. Fire main engine.
4. Fire right orientation engine.

Please see the source code for details.
https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py
- To see a heuristic landing, run: python gym/envs/box2d/lunar_lander.py
- To play yourself, run: python examples/agents/keyboard_agent.py LunarLander-v2


References:

https://github.com/RMiftakhov/LunarLander-v2-drlnd
    
https://www.katnoria.com/nb_dqn_lunar/
    
https://drawar.github.io/blog/2019/05/12/lunar-lander-dqn.html
    
https://pytorch.org/tutorials/intermediate/reinforcement_q_learning.html

### 1. Import the nacessary packages

In [3]:
%matplotlib inline
%config InlineBackend.figure_formmat = 'retina'

In [4]:
import gym
import random
import torch
import numpy as np
from collections import deque
import matplotlib.pyplot as plt

### 2. instantiate the environmrnt and agent

In [5]:
env = gym.make('LunarLander-v2')
env.seed(0)
print('State shape: ', env.observation_space.shape)
print('Nunber of actions: ', env.action_space.n)

State shape:  (8,)
Nunber of actions:  4
