<a href="https://colab.research.google.com/github/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_01_ai_gym.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# T81-558: Applications of Deep Neural Networks
**Module 12: Reinforcement Learning**
* Instructor: [Jeff Heaton](https://sites.wustl.edu/jeffheaton/), McKelvey School of Engineering, [Washington University in St. Louis](https://engineering.wustl.edu/Programs/Pages/default.aspx)
* For more information visit the [class website](https://sites.wustl.edu/jeffheaton/t81-558/).
* Modified by uramoon@kw.ac.kr

# Module 12 Video Material

* **Part 12.1: Introduction to the OpenAI Gym** [[Video]](https://www.youtube.com/watch?v=_KbUxgyisjM&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_01_ai_gym.ipynb)
* Part 12.2: Introduction to Q-Learning [[Video]](https://www.youtube.com/watch?v=A3sYFcJY3lA&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_02_qlearningreinforcement.ipynb)
* Part 12.3: Keras Q-Learning in the OpenAI Gym [[Video]](https://www.youtube.com/watch?v=qy1SJmsRhvM&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_03_keras_reinforce.ipynb)
* Part 12.4: Atari Games with Keras Neural Networks [[Video]](https://www.youtube.com/watch?v=co0SwPWoZh0&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_04_atari.ipynb)
* Part 12.5: Application of Reinforcement Learning [[Video]](https://www.youtube.com/watch?v=1jQPP3RfwMI&list=PLjy4p-07OYzulelvJ5KVaT2pDlxivl_BN) [[Notebook]](https://github.com/jeffheaton/t81_558_deep_learning/blob/master/t81_558_class_12_05_apply_rl.ipynb)


# Part 12.1: Introduction to the OpenAI Gym

Gym은 OpenAI에서 만든 강화학습을 위한 API입니다.

[OpenAI Gym](https://gym.openai.com/) aims to provide an easy-to-setup general-intelligence benchmark with various environments. The goal is to standardize how environments are defined in AI research publications to make published research more easily reproducible. The project claims to provide the user with a simple interface. As of June 2017, developers can only use Gym with Python. 

OpenAI gym is pip-installed onto your local machine. There are a few significant limitations to be aware of:

* OpenAI Gym Atari only **directly** supports Linux and Macintosh
* OpenAI Gym Atari can be used with Windows; however, it requires a particular [installation procedure](https://towardsdatascience.com/how-to-install-openai-gym-in-a-windows-environment-338969e24d30)
* OpenAI Gym can not directly render animated games in Google CoLab.

Because OpenAI Gym requires a graphics display, an embedded video is the only way to display Gym in Google CoLab. The presentation of OpenAI Gym game animations in Google CoLab is discussed later in this module.

## OpenAI Gym Leaderboard

The OpenAI Gym does have a leaderboard, similar to Kaggle; however, the OpenAI Gym's leaderboard is much more informal compared to Kaggle. The user's local machine performs all scoring. As a result, the OpenAI gym's leaderboard is strictly an "honor system."  The leaderboard is maintained in the following GitHub repository:

* [OpenAI Gym Leaderboard](https://github.com/openai/gym/wiki/Leaderboard)

You must provide a write-up with sufficient instructions to reproduce your result if you submit a score. A video of your results is suggested but not required.

## Looking at Gym Environments

Gym은 강화학습을 수행할 수 있는 환경을 제공합니다.
The centerpiece of Gym is the environment, which defines the "game" in which your reinforcement algorithm will compete. An environment does not need to be a game; however, it describes the following game-like features:
* **action space**: 매 스텝에서 취할 수 있는 행동 목록 제공
* **observation space**: 현재 관측 가능한 상태 제공

Before we begin to look at Gym, it is essential to understand some of the terminology used by this library.

* **Agent** - 매 스텝 행동을 취하는 기계 학습 프로그램, 행동에 따라 다음 상태가 달라집니다.
* **Episode** - 연속된 스텝들의 모음. 에이전트가 실패하거나 미리 정해놓은 최대 스텝에 도달하면 에피소드는 종료하니다.
* **Render** - Gym은 frame단위로 에피소드에서 발생한 일들을 그릴 수 있습니다.
* **Reward** - 에피소드가 끝날 때 행동에 따라 에이전트는 보상을 받을 수 있습니다.
* **Non-deterministic** - 어떤 환경에서는 보상이 확률적으로 주어집니다. 예) 복권

It is important to note that many gym environments specify that they are not non-deterministic even though they use random numbers to process actions. Based on the gym GitHub issue tracker, a non-deterministic property means a deterministic environment behaves randomly. Even when you give the environment a consistent seed value, this behavior is confirmed. The program can use the seed method of an environment to seed the random number generator for the environment.

The Gym library allows us to query some of these attributes from environments. I created the following function to query gym environments.


In [9]:
import gym

# name에 해당하는 환경의 정보를 출력해주는 함수
def query_environment(name):
    env = gym.make(name)
    spec = gym.spec(name)
    print(f"Action Space: {env.action_space}")
    print(f"Observation Space: {env.observation_space}")
    print(f"Max Episode Steps: {spec.max_episode_steps}")
    print(f"Nondeterministic: {spec.nondeterministic}")
    print(f"Reward Range: {env.reward_range}")
    print(f"Reward Threshold: {spec.reward_threshold}")


## MountainCar 환경 살펴보기
We will look at the **MountainCar-v0** environment, which challenges an underpowered car to escape the valley between two mountains.  The following code describes the Mountian Car environment.

In [10]:
query_environment("MountainCar-v0")

Action Space: Discrete(3)
Observation Space: Box(-1.2000000476837158, 0.6000000238418579, (2,), float32)
Max Episode Steps: 200
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: -110.0


### TODO: MountainCar 질문에 대한 답 작성
Hint: https://www.gymlibrary.ml/environments/classic_control/mountain_car/


In [None]:
# Q: MountainCar-v0에서는 매 스텝 어떤 행동을 취할 수 있을까요? 각 행동의 의미를 기재하세요.
# A: 0 - Accelerate to the left, 1 - Don’t accelerate , 2 - Accelerate to the right

# Q: 관측할 수 있는 것은 실수 두 개인데 각 실수는 무엇을 의미할까요?
# A: 첫 번째 실수: position of the car along the x-axis , 두 번째 실수: velocity of the car

# Q: 강화학습의 목표는 최대한 많은 리워드를 받는 것입니다. 
# 이 환경에서는 골인 지점에 있지 않으면 매 스텝 -1의 리워드가 주어지는데 어떻게 행동해야 할까요?
# A: 가능한 한 빨리 오른쪽 언덕 위에 있는 깃발에 도달해야함

## CartPole 환경 살펴보기


In [11]:
query_environment("CartPole-v1")

Action Space: Discrete(2)
Observation Space: Box(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)
Max Episode Steps: 500
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: 475.0


## TODO: CartPole 질문에 대한 답 작성
Hint: https://www.gymlibrary.ml/environments/classic_control/cart_pole/


In [None]:
# Q: CartPole-v1에서는 매 스텝 어떤 행동을 취할 수 있을까요? 각 행동의 의미를 기재하세요.
# A: 0 - Push cart to the left , 1 - Push cart to the right


# Q: 관측할 수 있는 것은 실수 네 개인데 각 실수는 무엇을 의미할까요?
# A: 첫 번째 실수: Cart Position , 두 번째 실수: Cart Velocity , 세 번째 실수: Pole Angle , 네 번째 실수: Pole Angular Velocity

# Q: Observation Space에 기재된 에피소드 종료 조건 두 가지는 무엇일까요?
# A: 첫 번째 종료조건: Pole Angle is greater than ±12
# 두 번째 종료조건: Cart Position is greater than ±2.4 (center of the cart reaches the edge of the display)


# Q: 강화학습의 목표는 최대한 많은 리워드를 받는 것입니다. 
# 이 환경에서는 매 스텝 +1의 리워드가 주어지는데 어떻게 행동해야 할까요?
# A: 가능한 한 오랫동안 기둥을 똑바로 세워야함

## Atari ROM 파일 다운로드

Note: If you see a warning above, you can safely ignore it; it is a relatively minor bug in OpenAI Gym.

Atari games, like breakout, can use an observation space that is either equal to the size of the Atari screen (210x160) or even use the RAM of the Atari (128 bytes) to determine the state of the game.  Yes, that's bytes, not kilobytes!

In [12]:
# HIDE OUTPUT
!wget http://www.atarimania.com/roms/Roms.rar 
!unrar x -o+ /content/Roms.rar >/dev/nul
!python -m atari_py.import_roms /content/ROMS >/dev/nul

--2022-05-29 13:15:55--  http://www.atarimania.com/roms/Roms.rar
Resolving www.atarimania.com (www.atarimania.com)... 195.154.81.199
Connecting to www.atarimania.com (www.atarimania.com)|195.154.81.199|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 19583716 (19M) [application/x-rar-compressed]
Saving to: ‘Roms.rar’


2022-05-29 13:16:54 (331 KB/s) - ‘Roms.rar’ saved [19583716/19583716]



## 두 가지 버전의 벽돌깨기 (Breakout) 게임 환경
https://www.gymlibrary.ml/environments/atari/breakout/

In [13]:
# 인간 플레이어와 동일하게 210 x 160, 3채널 컬러 이미지를 관측하는 환경 (각 픽셀값은 0 ~ 255)
query_environment("Breakout-v0")

Action Space: Discrete(4)
Observation Space: Box(0, 255, (210, 160, 3), uint8)
Max Episode Steps: 10000
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: None


In [14]:
# Atari 게임기의 128 bytes 메모리를 관측하는 환경 (각 바이트는 0 ~ 255)
query_environment("Breakout-ram-v0")

Action Space: Discrete(4)
Observation Space: Box(0, 255, (128,), uint8)
Max Episode Steps: 10000
Nondeterministic: False
Reward Range: (-inf, inf)
Reward Threshold: None


## Render OpenAI Gym Environments from CoLab

그려봅시다! 

It is possible to visualize the game your agent is playing, even on CoLab. This section provides information on generating a video in CoLab that shows you an episode of the game your agent is playing. I based this video process on suggestions found [here](https://colab.research.google.com/drive/1flu31ulJlgiRL1dnN2ir8wGh9p7Zij2t).

Begin by installing **pyvirtualdisplay** and **python-opengl**.

In [15]:
# 이해하실 필요 없습니다.
# HIDE OUTPUT
!pip install gym pyvirtualdisplay > /dev/null 2>&1
!apt-get install -y xvfb python-opengl ffmpeg > /dev/null 2>&1

Next, we install the needed requirements to display an Atari game.

In [16]:
# 이해하실 필요 없습니다.
# HIDE OUTPUT
!apt-get update > /dev/null 2>&1
!apt-get install cmake > /dev/null 2>&1
!pip install --upgrade setuptools 2>&1
!pip install ez_setup > /dev/null 2>&1
!pip install gym[atari] > /dev/null 2>&1

Looking in indexes: https://pypi.org/simple, https://us-python.pkg.dev/colab-wheels/public/simple/


Next, we define the functions used to show the video by adding it to the CoLab notebook.

In [17]:
# 이해하실 필요 없습니다.
import gym
from gym.wrappers import Monitor
import glob
import io
import base64
from IPython.display import HTML
from pyvirtualdisplay import Display
from IPython import display as ipythondisplay

display = Display(visible=0, size=(1400, 900))
display.start()

"""
Utility functions to enable video recording of gym environment 
and displaying it.
To enable video, just do "env = wrap_env(env)""
"""


def show_video():
    mp4list = glob.glob('video/*.mp4')
    if len(mp4list) > 0:
        mp4 = mp4list[0]
        video = io.open(mp4, 'r+b').read()
        encoded = base64.b64encode(video)
        ipythondisplay.display(HTML(data='''<video alt="test" autoplay 
                loop controls style="height: 400px;">
                <source src="data:video/mp4;base64,{0}" type="video/mp4" />
             </video>'''.format(encoded.decode('ascii'))))
    else:
        print("Could not find video")


def wrap_env(env):
    env = Monitor(env, './video', force=True)
    return env


Now we are ready to play the game.  We use a simple random agent.<br>
아틀란티스라는 게임은 플레이어가 매 스텝 다음의 행동을 취할 수 있습니다.
0. 아무것도 안함
1. 가운데 총 발사
2. 오른쪽 총 발사
3. 왼쪽 총 발사

In [18]:
# HIDE OUTPUT
#env = wrap_env(gym.make("MountainCar-v0"))
env = wrap_env(gym.make("Atlantis-v0"))

observation = env.reset()

while True:
    # 매 스텝 다음을 수행

    # 그리기
    env.render()

    # 액션 정하기
    action = env.action_space.sample()  # 랜덤하게 설정

    # 행동에 따른 새로운 관측, 리워드, 에피소드 종료 여부, 종료에 관한 정보를 반환
    observation, reward, done, info = env.step(action)

    if done:
        break

env.close()
show_video()


## TODO: Kung Fu Master 랜덤 에이전트 만들기
https://www.gymlibrary.ml/environments/atari/kung_fu_master/<br>
매 스텝 무작위 액션을 취하는 에이전트를 만들어 플레이 장면을 그려보세요.

In [19]:
env = wrap_env(gym.make("KungFuMaster-v4"))

In [20]:
observation = env.reset()

while True:
    # 매 스텝 다음을 수행

    # 그리기
    env.render()

    # 액션 정하기
    action = env.action_space.sample()  # 랜덤하게 설정

    # 행동에 따른 새로운 관측, 리워드, 에피소드 종료 여부, 종료에 관한 정보를 반환
    observation, reward, done, info = env.step(action)

    if done:
        break

env.close()
show_video()

## TODO: CartPole 에이전트 만들기
https://www.gymlibrary.ml/environments/classic_control/cart_pole/<br>
단순 프로그래밍으로 150 스텝 이상 살아남는 에이전트를 만들어보세요. 


In [26]:
env.reset()
env = wrap_env(gym.make("CartPole-v1"))
observation = env.reset()

i = 0
done = False
while not done:   
    i += 1

    # 각도만 보는 에이전트
    if observation[2] < 0: 
      action = 0 # 왼쪽으로 기울어졌으면 왼쪽으로
      if observation[1] < -0.9: 
        action = action ^1
      else:
        action = action
    
    else: 
      action = 1                  # 오른쪽으로 기울어졌으면 오른쪽으로
      if observation[1] > 0.9: 
        action = action ^1
      else:
        action = action

   

    # 행동에 따른 새로운 관측, 리워드, 에피소드 종료 여부, 종료에 관한 정보를 반환
    observation, reward, done, info = env.step(action)


    
    print(f"Step {i}: Observation={observation}, Action={action}, Reward={reward}")

    env.render()

    if done:   
        break

env.close()
show_video()

Step 1: Observation=[-0.04258608  0.22115541  0.00566218 -0.25709026], Action=1, Reward=1.0
Step 2: Observation=[-3.81629716e-02  4.16196064e-01  5.20375465e-04 -5.47981881e-01], Action=1, Reward=1.0
Step 3: Observation=[-0.02983905  0.6113107  -0.01043926 -0.84050081], Action=1, Reward=1.0
Step 4: Observation=[-0.01761284  0.41633281 -0.02724928 -0.551119  ], Action=0, Reward=1.0
Step 5: Observation=[-0.00928618  0.22160397 -0.03827166 -0.26714446], Action=0, Reward=1.0
Step 6: Observation=[-0.0048541   0.02704853 -0.04361455  0.01322588], Action=0, Reward=1.0
Step 7: Observation=[-0.00431313 -0.16742168 -0.04335003  0.29183517], Action=0, Reward=1.0
Step 8: Observation=[-0.00766156 -0.36189958 -0.03751333  0.57053705], Action=0, Reward=1.0
Step 9: Observation=[-0.01489956 -0.55647592 -0.02610259  0.85116989], Action=0, Reward=1.0
Step 10: Observation=[-0.02602907 -0.75123244 -0.00907919  1.13553186], Action=0, Reward=1.0
Step 11: Observation=[-0.04105372 -0.94623443  0.01363145  1.42

일반적인 아타리 게임은 딥러닝 모델이 그림을 분석하여 현재 상태에서 취할 수 있는 바람직한 액션을 선택합니다.<br>
수고하셨습니다.