# Reinforcement Learning
## (Book: Artificial Intelligence with Python)

**Reinforcement learning** refers to the process of learning what to do and mapping situations to certain actions in order to maximize the reward. In most paradigms of machine learning, a learning agent is told what actions to take in order to achieve certain results. In the case of reinforcement leaning, the learning agent is not told what actions to take. Instead, it must discover what actions yield the highest reward by trying them out. These actions tend to affect the immediate reward as well as the next situation. This means that all the subsequent rewards will be affected too.

## Creating an Environment

In [1]:
# !pip install gym

In [2]:
import gym

In [3]:
# Create a mapping from input argument string to the names of the 
# environment as specified in OpenAI Gym Package

name_map = {
    "cartpole": "CartPole-v0",
    "mountaincar": "MountainCar-v0",
    "pendulum": "Pendulum-v0",
    "taxi": "Taxi-v1",
    "lake": "FrozenLake-v0",
}

In [4]:
def make_environment(input_env):
    # Create the environment based on the input argument and reset it
    
    env = gym.make(name_map[input_env])
    env.reset()
    
    for _ in range(1000):
        # Render the environment
        env.render()
        
        # Take a random action
        env.step(env.action_space.sample())

In [5]:
# !pip install pygame

In [6]:
make_environment("cartpole")

  logger.warn(
  deprecation(
  deprecation(
If you want to render in human mode, initialize the environment in this way: gym.make('EnvName', render_mode='human') and don't call the render method.
See here for more information: https://www.gymlibrary.ml/content/api/[0m
  deprecation(
  logger.warn(


In [7]:
make_environment("mountaincar")

## Building a learning agent

In [8]:
import gym

In [9]:
def make_environment(input_env):
    env = gym.make(name_map[input_env])

    for _ in range(20):
        # Reset the environment
        observation = env.reset()

        # Iterate 100 times
        for i in range(100):
            # Render the environment
            env.render()

            # Print the current observation
            print(observation)

            # Take action 
            action = env.action_space.sample()

            # Extract the observation, reward, status and 
            # other info based on the action taken
            observation, reward, done, info = env.step(action)
            
            # Check if it's done
            if done:
                print(f"Episode finished after {i+1} timesteps")
                break

In [10]:
make_environment("cartpole")

[ 0.00141893 -0.03185625 -0.02622089  0.00476723]
[ 0.00078181 -0.22659254 -0.02612554  0.2890632 ]
[-0.00375005 -0.03110797 -0.02034428 -0.01174364]
[-0.0043722  -0.22593233 -0.02057915  0.27445164]
[-0.00889085 -0.0305229  -0.01509012 -0.0246503 ]
[-0.00950131 -0.22542523 -0.01558313  0.26323357]
[-0.01400981 -0.42032132 -0.01031845  0.55096096]
[-0.02241624 -0.22505598  0.00070076  0.25504497]
[-0.02691736 -0.42018792  0.00580166  0.54794884]
[-0.03532112 -0.6153909   0.01676064  0.8424541 ]
[-0.04762894 -0.81073755  0.03360972  1.1403604 ]
[-0.06384369 -0.6160707   0.05641693  0.85840434]
[-0.0761651  -0.81191397  0.07358502  1.1682795 ]
[-0.09240338 -0.61782235  0.09695061  0.89954454]
[-0.10475983 -0.42413852  0.1149415   0.6388419 ]
[-0.1132426  -0.6206596   0.12771833  0.9653969 ]
[-0.12565579 -0.427463    0.14702627  0.7154095 ]
[-0.13420504 -0.234649    0.16133447  0.47237906]
[-0.13889803 -0.0421294   0.17078204  0.23457664]
[-0.13974062  0.15019329  0.17547359  0.00025563]


[-0.06321099 -0.34519088  0.08705562  0.68992394]
[-0.07011481 -0.15137798  0.1008541   0.42586753]
[-0.07314237  0.04218162  0.10937145  0.16660437]
[-0.07229874  0.23558186  0.11270353 -0.08967146]
[-0.06758711  0.03904     0.1109101   0.23633566]
[-0.0668063  -0.15747736  0.11563682  0.5618413 ]
[-0.06995586  0.03584815  0.12687364  0.3077112 ]
[-0.06923889 -0.1608318   0.13302787  0.6375607 ]
[-0.07245553  0.03220889  0.14577909  0.38955298]
[-0.07181135 -0.1646488   0.15357015  0.7244148 ]
[-0.07510433 -0.36152333  0.16805844  1.0612235 ]
[-0.08233479 -0.55842346  0.18928291  1.4015896 ]
Episode finished after 18 timesteps
[0.00421887 0.01010809 0.02786367 0.00235209]
[ 0.00442103 -0.18540215  0.02791071  0.3036945 ]
[0.00071299 0.00931114 0.03398461 0.01994298]
[ 0.00089921 -0.1862813   0.03438346  0.32315177]
[-0.00282641  0.00833461  0.0408465   0.0415072 ]
[-0.00265972  0.20284775  0.04167664 -0.23801361]
[ 0.00139723  0.39735028  0.03691637 -0.51726466]
[ 0.00934424  0.591933

[ 0.03006247 -0.6009716   0.01684053  0.91048294]
[ 0.01804304 -0.40608156  0.03505019  0.6231402 ]
[ 0.00992141 -0.21146607  0.047513    0.34169894]
[ 0.00569209 -0.40723068  0.05434697  0.648978  ]
[-0.00245253 -0.21290626  0.06732654  0.3738919 ]
[-0.00671065 -0.40891677  0.07480437  0.6870206 ]
[-0.01488899 -0.604993    0.08854479  1.0022844 ]
[-0.02698885 -0.80117923  0.10859048  1.3214091 ]
[-0.04301243 -0.6075842   0.13501866  1.0645899 ]
[-0.05516411 -0.80420953  0.15631045  1.3964186 ]
[-0.0712483  -0.6113388   0.18423882  1.156408  ]
[-0.08347508 -0.8083209   0.20736699  1.5007408 ]
Episode finished after 15 timesteps
[-0.01812849  0.02862358 -0.0271618  -0.03560494]
[-0.01755602  0.2241243  -0.0278739  -0.33673245]
[-0.01307353  0.02940987 -0.03460855 -0.05296812]
[-0.01248533  0.22501053 -0.03566791 -0.3563663 ]
[-0.00798512  0.42062098 -0.04279524 -0.6600795 ]
[ 0.0004273   0.22611988 -0.05599683 -0.38117293]
[ 0.00494969  0.4219904  -0.06362029 -0.6909726 ]
[ 0.0133895   

## Some of the screenshots of the gym environment using pygame

![Image](./image-1.png)
![Image](./image-2.png)
![Image](./image-3.png)
![Image](./image-4.png)
![Image](./image-5.png)
![Image](./image-6.png)