Demo

You can see a live demo on this link.

Before Evening

A project where a simple Javascript car game combined with an autonomous driving agent powered by a Tensorflow.js. The model is based on a reinforcement learning model, it follows the principles of Q-learning.

The project consists a pre-trained model to control the vehicle in Before Evening game. Further, project offers a UI for training a new model in user's browser. User can change certain parameters of model and create their own autonomous agent in their browser.

Project consists two front-end applications and a back-end application. First front-end app is a Javascript module under ./src path. Module is the main car game, inspired from Javascript Racer. Second front-end app is an Angular web-app under ./implementations/before-evening. Angular application imports the car game module and creates/loads a Tensorflow.js model to train/test in realtime at the browser. It consists a simple UI that user can set training parameters, train the model or test already trained model. And finally, back-end app is a Node.js module at ./implementations/node-tensorflow. The app uses Tensorflow.js package for Node.js and the car game module, and trains a model. Node.js app is used to test training in a faster way. It uses same model and Q-learning method to train. Thus, output model is same both in browser and Node.js app.

The goal of project is to create an autonomous driving model by only using browser to explore the capabilities of WebGL based Tensorflow.js machine learning platform. A pre-trained very well working autonomous driving agent model is stored at ./implementations/shared/before-evening-{version}.

Installation

Dependencies

Node.js >= 14.x.x

Web App

To run front-end application and test pre-trained model or train your own model on browser;

cd  ./implementations/before-evening/
npm i
npm run start

Then go to your browser and navigate to localhost:4200.

Node.js training

To run Node.js application and train your own model in terminal/command-line;

cd  ./implementations/node-tensorflow/
npm i
npm run build
node .\build\implementations\node-tensorflow\src\index.js

Theory

Q-learning is a simple way for agents to learn how to act optimally in controlled Markovian domains. It amounts to an incremental method for dynamic programming which imposes limited computational demands. It works by successively improving its evaluations of the quality of particular actions at particular states. [1]

Algorithm

After Δ_t steps into the future the agent will decide some next step. The weight for this step is calculated as γ_t^Δ_t,where γ (the discount factor) is a number between 0 and 1, 0 ≤ γ ≤ 1, and has the effect of valuing rewards received earlier higher than those received later (reflecting the value of a "good start"). γ may also be interpreted as the probability to succeed (or survive) at every step Δ_t.

where r_t is the reward received when moving from the state s_t to the state s_t+1, and alpha is the learning rate 0 < α ≤ 1.

An episode of the algorithm ends when state s_t+1 is a final or terminal state. However, Q-learning can also learn in non-episodic tasks (as a result of the property of convergent infinite series). If the discount factor is lower than 1, the action values are finite even if the problem can contain infinite loops. For all final states s_f, Q(s_f, a) is never updated, but is set to the reward value r observed for state s_f. In most cases, Q(s_f, a) can be taken to equal zero. [2]

Variables

Learning rate

The learning rate or step size determines to what extent newly acquired information overrides old information. Thus, the learning rate controls how fast we modify our estimates. One expects to start with a high learning rate, which allows fast changes, and lowers the learning rate as time progresses. [3] The agent learns by receiving rewards after every action. It somehow keeps track of these rewards, and then selects actions that it believes will maximize the reward it gains, not necessarily only for the next action, but in the long run. The agent usually goes through the same environment many times in order to learn how to find the optimal actions. [4]

Discount factor

The discount factor gamma determines the importance of future rewards. A factor of 0 will make the agent "myopic" (or short-sighted) by only considering current rewards, while a factor approaching 1 will make it strive for a long-term high reward. [2]

Exploration Vs. Exploitation

The agent usually goes through the same environment many times in order to learn how to find the optimal actions. Balancing exploration and exploitation is particularly important here: the agent may have found a good goal on one path, but there may be an even better one on another path. Without exploration, the agent will always return to first goal, and the better goal will not be found. Or, the goal may lie behind very low reward areas, that the agent would avoid without exploration. On the other hand, if the agent explores too much, it cannot stick to a path; in fact, it is not really learning: it cannot exploit its knowledge, and so acts as though it knows nothing. Thus, it is critical to find a good balance between the two, to ensure that the agent is really learning to take the optimal actions. [5]

References

[1] WATKINS, Christopher JCH; DAYAN, Peter. Q-learning. Machine learning, 1992, 8.3: 279-292.

[2] Q-learning Algorithm. Wikipedia.

[3] EVEN-DAR, Eyal; MANSOUR, Yishay; BARTLETT, Peter. Learning Rates for Q-learning. Journal of machine learning Research, 2003, 5.1.

[4] Sutton, RS and Barto, AG. Reinforcement learning: an introduction. Trends in cognitive sciences, 1999, 3(9), 360.

[5] Coggan, Melanie. Exploration and exploitation in reinforcement learning. Research supervised by Prof. Doina Precup, CRA-W DMP Project at McGill University, 2004.

Name		Name	Last commit message	Last commit date
Latest commit History 107 Commits
.github		.github
implementations		implementations
src		src
.cspell.json		.cspell.json
.editorconfig		.editorconfig
.eslintrc.json		.eslintrc.json
.gitignore		.gitignore
.prettierignore		.prettierignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
README.md		README.md
package-lock.json		package-lock.json
package.json		package.json
tsconfig.json		tsconfig.json
tsconfig.module.json		tsconfig.module.json

License

omuryildirim/before-evening

Folders and files

Latest commit

History

Repository files navigation

Demo

Before Evening

Installation

Dependencies

Web App

Node.js training

Theory

Algorithm

Variables

Learning rate

Discount factor

Exploration Vs. Exploitation

References

About

Topics

Resources

License

Stars

Watchers

Forks

Languages