Skip to content

Testing different Reinforcement Learning strategies inspired by hippocampal replay for robotic navigation

Notifications You must be signed in to change notification settings

esther-poniatowski/Massi2022

Repository files navigation

Massi 2022

Model-based and model-free replay mechanisms for reinforcement learning in neurorobotics

Table of Contents
  1. About the Project
  2. Requirements
  3. Usage

About the Project

This repository is related to the article :
Model-based and model-free replay mechanisms for reinforcement learning in neurorobotics (2022, Accepted)
Elisa Massi, Remi Dromnelle, Julianne Mailly, Jeanne Barthéléemy, Julien Canitrot, Esther Poniatowski, Benoît Girard and Mehdi Khamassi.
Institute of Intelligent Systems and Robotics, CNRS, Sorbonne University, F-75005
Paris, France
It contains codes and data used and generated for the part :
3 Simulation of individual replay strategies with an autonomously learned state decomposition
Keywords: hippocampal replay, reinforcement learning, neurorobotics, model-based, model-free

Project Link: https://github.com/esther-poniatowski/Massi2022

Goals of the modeling

To study the implications of offline learning in spatial navigation, from rodents' behavior to robotics, this article investigated the role of several Reinforement Learning (RL) algorithms, by simulating artificial agents. The task of the agents mimicks the classical Morris water maze task (Morris, 1981). The environment is defined by a circular maze, consistent with the original experimental paradigm in terms of environment/robot size ratio. The goal of the task is to navigate the environment until reaching the rewarded location, starting from a fixed initial point. Agents learn over 50 trials, and the reward location is changed at the middle of the simulation (trial 25). In this robotic framework, the task is a Markov decision problem (MDP), where agents visit discrete states, using a finite set of discrete actions.

The learning performances of the agents are tested here in two conditions:

  • Deterministic environment: In this version of the task, any action a performed in a given state always leads the agent to the same arrival state (with probability 1).
  • Stochastic environment: In this version of the task, performing action in a given state can lead to distinct possible arrival state (non-null probabilities for several states).

Four learning strategies are compared. Three of them include replays of the experienced state-action-state transitions during each inter-trial interval.

  • Model Free (MF) No replay: In this classical reinforcement learning framework, the artificial agent learns only online, during behavior.
  • Model Free (MF) Backward replay: This agents stores the most recent experienced state-action-state transitions in a memory buffer, and replays them from the more recent (rewarded) one to the most remote one.
  • Model Free (MF) Backward replay: This agents stores the most recent experienced state-action-state transitions in a memory buffer, and replays them in random order.
  • Model Based (MB) Prioritized sweeping: This agents stores the most recent experienced state-action-state transitions in a memory buffer, and replays them from the more recent (rewarded) one to the most remote one. Note that one more replay strategy (Most diverse sequence replay) appears in the code, but is not investigated in the related article.

Contributors & Contacts

(back to top)

Requirements

All codes are built in shield and uses the following libraries:

  • numpy
  • random
  • bisect
  • itertools
  • copy
  • scipy
  • bioinfokit
  • statsmodels
  • similaritymeasures
  • pandas
  • pickle
  • matplotlib
  • seaborn

Those packages can be installes by the following command:

pip install -r requirements.txt

(back to top)

Usage

INFO

More details are provided in the Jupyter notebooks and the code comments.

Structure of the project

The project is made up of the following files and directories :

  • Two Jupyter notebooks guide the execution of the main functionalities.
    • Navigation_generate_data.ipynb can be used to generate data, with arbitrary parameters and different versions of the task.
    • Navigation_alanysis.ipynb provides graphical visualization of the results, reproducing in particular the figures of the article.
  • Nine python files correspond to the modules called by the Jupyter notebooks.
  • The folder Data/ is the location where generated data are stored. called by the Jupyter notebooks.
    • It aready contains most of the data files required to plot the figures from the Jupyter notebooks. Files' formats are either .csv (for dataframes) or .pickle (for dictionaries, arrays, lists).
    • The sub-folder Data_indiv/ specifically contains detailed data for 100 individual artificial agents.
  • The folder Figures/ is the location where generated figures can be saved. It already contains the file map1.pgm necessary to plot one type of figure, representing the environment.
  • Three .txt files contain the transition matrices which define the properties of the environment.
  • The folder data+code_2generate_the_paper_figures/ contains all the data and the scripts to generate what is needed for the figures in the paper in Sect.3.

Modules

Except for the module parameters_MF_MB.py, all the modules only contain functions (no script), which are called by the Jupyter notebooks.
More details about those modules and functions are available in the code documentation (accessed via help()).

  • parameters_MF_MB.py - Defines the parameters of the simulation and the transition matrices. All the parameters are collected in a dictionary, which is provided to the main functions as a default argument (as the module is imported in the preamble of all other files).
  • algorithms_MF_MB.py - Implements the reinforcement learning procedure and the different replay strategies, necessary to perform one trial (behavior + replay).
  • simulations_MF_MB.py - Generates simulations of n_individuals (100) agents over n_trials (50) trials, in a given environmental condition (deterministic/stochastic). Saves data in the appropriate folder.
  • analyzes_MF_MB.py - Extracts relevant features of the data: computes summary statistics, performs statistical analyses...
  • figures_MF_MB - Generates the main functions of the article.
  • figures_indiv, figures_pop, figures_qvalue_map, figures_utils - Other graphical functions to display results more flexibly in exploratory invesigations.

(back to top)

About

Testing different Reinforcement Learning strategies inspired by hippocampal replay for robotic navigation

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published