finalproj767

Final project for COMP-767 at McGill University

Group: Joao Pedro de Carvalho (McGill ID 260642102), Peyman Kafaei (McGill ID 260780776), Stefano Giacomazzi Dantas(McGill ID 260642029)

Goal

Study the learning behaviour of
Learning Heuristics for the TSP by Policy Gradient in terms of episodes and hyperparameter values.

Investigate the effects of removing the baseline policy from the objective function, and assess the impact on learning.

Implement TD learning instead of the default Monte Carlo sampling in REINFORCE.

Implement eligibility traces and investigate its performance.

Requirements

Usage

The code was adapted from https://github.com/MichelDeudon/encode-attend-navigate

The parameters are located at the beginning of the code for the .py files or in the Config block for .ipynb

The main parameters are:

Data

'--batch_size' : the batch size for training
'--max_length' : number of cities for training

Model

It is possible to change the NN parameters (number of attention heads, number of neurons, etc). However, the performance may vary compared to the results reported

Train / test parameters

'--nb_steps' : number of epochs for training
'--lr_start': actor learning rate
'--lr_decay_rate' : learning rate decay rate
'--temperature' : temperature for the policy distribution
'--C' : clipping parameter
To train a model just run blocks 1.DataGenerator, 2.Config, 3.Model and 4.Train with the Jupyter Notebook (for files .ipynb). If the file is .py, just execute the code

Files Specification

K1_Neural_Reinforce.ipynb : Model with K = 1
K5_Neural_Reinforce.ipynb : Model with K = 5
Neural_Reinforce-NoCritic.ipynb : Model without baseline
Neural_Reinforce.ipynb* : base model
PlotMemory.ipynb : code used to plot the memory learning curves
n_step_return.py : n step TD, eligibility traces implementation - the main parameters controlling the related file are in the top part of the code.
data_generator.py* : generates the trajectory
graph.py : code used to generate plots
utils.py* : utility functions,

* original or adapted from https://github.com/MichelDeudon/encode-attend-navigate

Name	Name	Last commit message	Last commit date
Latest commit joaopedroc16 Update n_step_return.py Apr 27, 2019 9eb994d · Apr 27, 2019 History 17 Commits
.ipynb_checkpoints	.ipynb_checkpoints	updates	Apr 27, 2019
tests memory	tests memory	adding everything	Apr 27, 2019
.DS_Store	.DS_Store	adding everything	Apr 27, 2019
K1_Neural_Reinforce.ipynb	K1_Neural_Reinforce.ipynb	add different Ks	Apr 27, 2019
K5_Neural_Reinforce.ipynb	K5_Neural_Reinforce.ipynb	updates	Apr 27, 2019
Neural_Reinforce-NoCritic.ipynb	Neural_Reinforce-NoCritic.ipynb	adding everything	Apr 27, 2019
Neural_Reinforce.ipynb	Neural_Reinforce.ipynb	adding everything	Apr 27, 2019
PlotMemory.ipynb	PlotMemory.ipynb	Rename Untitled.ipynb to PlotMemory.ipynb	Apr 27, 2019
README.md	README.md	Update README.md	Apr 27, 2019
data_generator.py	data_generator.py	adding everything	Apr 27, 2019
graph.py	graph.py	adding everything	Apr 27, 2019
n_step_return.py	n_step_return.py	Update n_step_return.py	Apr 27, 2019
utils.py	utils.py	adding everything	Apr 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

finalproj767

Final project for COMP-767 at McGill University

Group: Joao Pedro de Carvalho (McGill ID 260642102), Peyman Kafaei (McGill ID 260780776), Stefano Giacomazzi Dantas(McGill ID 260642029)

Goal

Study the learning behaviour of
Learning Heuristics for the TSP by Policy Gradient in terms of episodes and hyperparameter values.

Investigate the effects of removing the baseline policy from the objective function, and assess the impact on learning.

Implement TD learning instead of the default Monte Carlo sampling in REINFORCE.

Implement eligibility traces and investigate its performance.

Requirements

Usage

Data

Model

Train / test parameters

Files Specification

About

Releases

Packages

Contributors 2

Languages

sgdantas/finalproj767

Folders and files

Latest commit

History

Repository files navigation

finalproj767

Final project for COMP-767 at McGill University

Group: Joao Pedro de Carvalho (McGill ID 260642102), Peyman Kafaei (McGill ID 260780776), Stefano Giacomazzi Dantas(McGill ID 260642029)

Goal

Study the learning behaviour of Learning Heuristics for the TSP by Policy Gradient in terms of episodes and hyperparameter values.

Investigate the effects of removing the baseline policy from the objective function, and assess the impact on learning.

Implement TD learning instead of the default Monte Carlo sampling in REINFORCE.

Implement eligibility traces and investigate its performance.

Requirements

Usage

Data

Model

Train / test parameters

Files Specification

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Study the learning behaviour of
Learning Heuristics for the TSP by Policy Gradient in terms of episodes and hyperparameter values.

Packages