AI

Reinforcemnt Learning

In this repository you can find different codes visuallizes how different learning algorithms and parameters can change the solution to the RL problem.

We don't want to tell the agent what are the rules, instead we'll let him explore, if he get's to the goal than a reward is given and the agent will want to keep taking good steps to recieve more reward. When at state s it is important to take the right action a in order to get to the goal faster/ achieve more reward.

RL parameters:

Discount factor(gamma) - we will get some rewards only in the future, they don't worth as much as the rewards we get now so how much weight do we want to give them? Learning rate (alpha) - Exploration rate (epsilon) - exploration rate decay Inverse temperature (beta) - inverse temperature increase

Algorithms:

In here we considered two algorithms for the Q-table update: Sarsa and Q-Learning.

In Sarsa the agent takes first action, gets reward, pick the next action and then updates the results - meaning that at each state he updates his policy. In Q-Learning the agent takes first state, gets reward and then picks the next action by following what he assumes the optimal policy, then updates.

Actions Choosing:

Enviornments:

FrozenLake https://gym.openai.com/envs/FrozenLake-v0

1 dimensional discrete state space with one dimensional discrete actions space: (F,S,H,G)

the agent start from S and recives reward only when reaching to the goal-G

Here you can see how choosing the right action can change the reward dramaticly:

softmax taking action policy	random action taking policy

as we expected, at the beggining explore more and towards the end exploit more is giving more reward than just exploring. also notice, this results came from initializing the Q-table to zeros, try to initialize to the mean of Gaussian and see the much worse results..

MountainCar https://gym.openai.com/envs/MountainCar-v0

2 dimensional continous state space: (position, velocity) with one dimensional discrete actions space: (left, neutral, right)

The learning rate will determine how much importance we give to new knowledge, giving it less importance can result in slower convergence, giving it high importance can lead to forgetting more and in case of noisy observation can result in inaccurate prediction

Here you can see in Sarsa with Softmax at episode 125 how changing the Learning rate changes the system behaviour:

learning rate - 0.5	learning rate - 0.1

you can see that at episode 64 the car reaches the goal and Q-learn does that in less steps than sarsa.

The number of episodes will determine how much knowledge on the enviornment we recieve. As more exprience we expect better results (that can lead to overfitting)

Epsilon-greedy with learning rate - 0.1:

Q-learn episode 8	Q-learn episode 64	Sarsa episode 64

How to run:

Use SARSA/Q-learning algorithm with epsilon-greedy/softmax policy

FrozenLake env: FrozenLake_QL.py, FrozenLake_QL_options.py
MountainCar env: MountainCar_QL.py, MountainCar_QL_options.py

Use Deep SARSA/Q-learning algorithm with epsilon-greedy/softmax policy

FrozenLake env: FrozenLake_DQN.py, FrozenLake_DQN_options.py
MountainCar env: MountainCar_DQN.py, MountainCar_DQN_options.py

you can play with the different parameters, as shown in each options file, for help run:

python MountainCar_QL.py -h

run with default parameters:

python MountainCar_QL.py

run Algorithm - sarsa, Policy - softmax Learning Rate - 0.2 Number of episodes - 300 :

python MountainCar_QL.py -a sarsa -p softmax -lr 0.2 -n 300

Name		Name	Last commit message	Last commit date
Latest commit History 100 Commits
MountainCar-DQN		MountainCar-DQN
images		images
results		results
FrozenLake_DQN.py		FrozenLake_DQN.py
FrozenLake_DQN_options.py		FrozenLake_DQN_options.py
FrozenLake_QL_SARSA.py		FrozenLake_QL_SARSA.py
FrozenLake_QL_SARSA_options.py		FrozenLake_QL_SARSA_options.py
MountainCar_QL.py		MountainCar_QL.py
MountainCar_QL_options.py		MountainCar_QL_options.py
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MountainCar-DQN

MountainCar-DQN

images

images

results

results

FrozenLake_DQN.py

FrozenLake_DQN.py

FrozenLake_DQN_options.py

FrozenLake_DQN_options.py

FrozenLake_QL_SARSA.py

FrozenLake_QL_SARSA.py

FrozenLake_QL_SARSA_options.py

FrozenLake_QL_SARSA_options.py

MountainCar_QL.py

MountainCar_QL.py

MountainCar_QL_options.py

MountainCar_QL_options.py

README.md

README.md

Repository files navigation

AI

Reinforcemnt Learning

RL parameters:

Algorithms:

Actions Choosing:

Enviornments:

FrozenLake https://gym.openai.com/envs/FrozenLake-v0

MountainCar https://gym.openai.com/envs/MountainCar-v0

How to run:

About

Releases

Packages

Languages

sagittefrat/AI

Folders and files

Latest commit

History

Repository files navigation

AI

Reinforcemnt Learning

RL parameters:

Algorithms:

Actions Choosing:

Enviornments:

FrozenLake https://gym.openai.com/envs/FrozenLake-v0

MountainCar https://gym.openai.com/envs/MountainCar-v0

How to run:

About

Topics

Resources

Stars

Watchers

Forks

Languages