# Q-Learning

Simple script to understand Q-Learning algorithms.

The basic idea is to have a function
$$
Q(s_{n}, a_{n})
$$
with $s_0 = s$ the initial state and $a_0 = a$ the initial action, which desides on the next action to be taken.

It's doing that by using the formular
$$
Q^{new}(s_n, a_n)
=
Q(s_{n}, a_{n}) \cdot
\alpha
\left(
    r_n + \gamma \cdot \max \limits_{a}(Q(s_{n+1},a)-Q(s_{n},a_{n})
\right)
$$
where $r_n$ is the reward received from moving from state $s_{n-1} \rightarrow s_n$, $\alpha$ the learning rate and $\gamma$ the discount factor of the algorithm (chosen by the user).

In [3]:


import gym
import numpy as np

env = gym.make('MountainCar-v0')

print(env.observation_space.low)
print(env.observation_space.high)
print(env.action_space.n)

LEARNING_RATE = 0.1
# how important are future actions? 0 not, 1.0 ignore prior
DISCOUNT = 0.95
# number of iterations in "generations"
EPISODES = 4000

# display info every SHOW_EVERY episode
SHOW_EVERY = 2000

# should be tested
number_buckets = 20
DISCRETE_OS_SIZE = [number_buckets] * len(env.observation_space.high)
discrete_os_win_size = (env.observation_space.high-env.observation_space.low)/DISCRETE_OS_SIZE

# randomness in action picking, rate for exploration
epsilon = 0.5
START_EPSILON_DECAYING = 1
END_EPSILON_DECAYING = EPISODES // 2

epsilon_decay_value = epsilon/(END_EPSILON_DECAYING-START_EPSILON_DECAYING)

#
low = -2
high = 0

# table holding state for every potentially possible combination of state-action
q_table = np.random.uniform(low=low, high=high, size=(DISCRETE_OS_SIZE + [env.action_space.n]))


def get_discrete_state(state):
    discrete_state = (state - env.observation_space.low)/discrete_os_win_size
    return tuple(discrete_state.astype(int))


for episode in range(EPISODES):
    if episode % SHOW_EVERY == 0:
        print(episode)
        render = True
    else:
        render = False
    
    discrete_state_coords = get_discrete_state(env.reset())


    done = False
    while not done:
        if np.random.random() > epsilon:
            action = np.argmax(q_table[discrete_state_coords])
        else:
            action = np.random.randint(0, env.action_space.n)
        new_state, reward, done, _ = env.step(action)

        new_discrete_state_coords = get_discrete_state(new_state)
        if render:
            env.render()

        if not done:
            max_future_q = np.max(q_table[new_discrete_state_coords])
            current_q = q_table[discrete_state_coords + (action, )]

            new_q = (1-LEARNING_RATE) * current_q + LEARNING_RATE * (reward + DISCOUNT * max_future_q)

            q_table[discrete_state_coords + (action, )] = new_q
        elif new_state[0] >= env.goal_position:
            q_table[discrete_state_coords + (action, )] = 0
            print(f"Done by episode: {episode}")

        discrete_state_coords = new_discrete_state_coords
    if END_EPSILON_DECAYING >= episode >= START_EPSILON_DECAYING:
        epsilon -= epsilon_decay_value
env.close()

[-1.2  -0.07]
[0.6  0.07]
3
0
Done by episode: 579
Done by episode: 863
Done by episode: 872
Done by episode: 874
Done by episode: 885
Done by episode: 936
Done by episode: 939
Done by episode: 941
Done by episode: 942
Done by episode: 944
Done by episode: 946
Done by episode: 947
Done by episode: 948
Done by episode: 949
Done by episode: 950
Done by episode: 959
Done by episode: 963
Done by episode: 976
Done by episode: 1031
Done by episode: 1060
Done by episode: 1088
Done by episode: 1098
Done by episode: 1099
Done by episode: 1100
Done by episode: 1101
Done by episode: 1126
Done by episode: 1130
Done by episode: 1133
Done by episode: 1136
Done by episode: 1137
Done by episode: 1141
Done by episode: 1142
Done by episode: 1144
Done by episode: 1146
Done by episode: 1147
Done by episode: 1148
Done by episode: 1160
Done by episode: 1184
Done by episode: 1191
Done by episode: 1192
Done by episode: 1203
Done by episode: 1264
Done by episode: 1268
Done by episode: 1269
Done by episode: 127

Done by episode: 2049
Done by episode: 2050
Done by episode: 2051
Done by episode: 2053
Done by episode: 2054
Done by episode: 2055
Done by episode: 2056
Done by episode: 2057
Done by episode: 2058
Done by episode: 2059
Done by episode: 2060
Done by episode: 2061
Done by episode: 2062
Done by episode: 2063
Done by episode: 2064
Done by episode: 2065
Done by episode: 2066
Done by episode: 2068
Done by episode: 2069
Done by episode: 2070
Done by episode: 2072
Done by episode: 2073
Done by episode: 2074
Done by episode: 2075
Done by episode: 2076
Done by episode: 2077
Done by episode: 2078
Done by episode: 2079
Done by episode: 2081
Done by episode: 2082
Done by episode: 2083
Done by episode: 2084
Done by episode: 2086
Done by episode: 2087
Done by episode: 2088
Done by episode: 2089
Done by episode: 2091
Done by episode: 2093
Done by episode: 2095
Done by episode: 2119
Done by episode: 2123
Done by episode: 2125
Done by episode: 2131
Done by episode: 2196
Done by episode: 2197
Done by ep

Done by episode: 2701
Done by episode: 2702
Done by episode: 2703
Done by episode: 2704
Done by episode: 2705
Done by episode: 2706
Done by episode: 2707
Done by episode: 2709
Done by episode: 2710
Done by episode: 2711
Done by episode: 2712
Done by episode: 2713
Done by episode: 2714
Done by episode: 2715
Done by episode: 2716
Done by episode: 2717
Done by episode: 2718
Done by episode: 2719
Done by episode: 2725
Done by episode: 2728
Done by episode: 2729
Done by episode: 2731
Done by episode: 2732
Done by episode: 2733
Done by episode: 2734
Done by episode: 2735
Done by episode: 2736
Done by episode: 2737
Done by episode: 2738
Done by episode: 2739
Done by episode: 2740
Done by episode: 2741
Done by episode: 2744
Done by episode: 2745
Done by episode: 2746
Done by episode: 2747
Done by episode: 2749
Done by episode: 2751
Done by episode: 2752
Done by episode: 2754
Done by episode: 2755
Done by episode: 2762
Done by episode: 2763
Done by episode: 2764
Done by episode: 2765
Done by ep

Done by episode: 3297
Done by episode: 3298
Done by episode: 3299
Done by episode: 3300
Done by episode: 3301
Done by episode: 3302
Done by episode: 3303
Done by episode: 3304
Done by episode: 3305
Done by episode: 3306
Done by episode: 3307
Done by episode: 3309
Done by episode: 3311
Done by episode: 3312
Done by episode: 3313
Done by episode: 3316
Done by episode: 3319
Done by episode: 3320
Done by episode: 3321
Done by episode: 3322
Done by episode: 3323
Done by episode: 3325
Done by episode: 3326
Done by episode: 3327
Done by episode: 3328
Done by episode: 3329
Done by episode: 3330
Done by episode: 3331
Done by episode: 3332
Done by episode: 3334
Done by episode: 3335
Done by episode: 3336
Done by episode: 3338
Done by episode: 3339
Done by episode: 3340
Done by episode: 3341
Done by episode: 3342
Done by episode: 3343
Done by episode: 3344
Done by episode: 3345
Done by episode: 3346
Done by episode: 3347
Done by episode: 3348
Done by episode: 3349
Done by episode: 3350
Done by ep

Done by episode: 3741
Done by episode: 3742
Done by episode: 3743
Done by episode: 3744
Done by episode: 3745
Done by episode: 3746
Done by episode: 3747
Done by episode: 3748
Done by episode: 3749
Done by episode: 3750
Done by episode: 3751
Done by episode: 3752
Done by episode: 3753
Done by episode: 3754
Done by episode: 3755
Done by episode: 3756
Done by episode: 3757
Done by episode: 3758
Done by episode: 3759
Done by episode: 3760
Done by episode: 3761
Done by episode: 3762
Done by episode: 3763
Done by episode: 3764
Done by episode: 3765
Done by episode: 3766
Done by episode: 3767
Done by episode: 3768
Done by episode: 3769
Done by episode: 3770
Done by episode: 3771
Done by episode: 3772
Done by episode: 3773
Done by episode: 3774
Done by episode: 3775
Done by episode: 3776
Done by episode: 3777
Done by episode: 3778
Done by episode: 3779
Done by episode: 3780
Done by episode: 3781
Done by episode: 3782
Done by episode: 3783
Done by episode: 3784
Done by episode: 3785
Done by ep

In [4]:
env.close()
