<p align="center">
  <img width="350" src="https://github.com/gorkamunoz/rl_opts/blob/master/nbs/figs/logo_midjourney_scaled.png?raw=true">
</p>
<h1 align="center">RL-OptS</h1>
<h4 align="center">Projective Simulation and Reinforcement Learning of Optimal Search strategies</h4>

<p align="center">
  <a href="https://zenodo.org/badge/latestdoi/424986383"><img src="https://zenodo.org/badge/424986383.svg" alt="DOI"></a>
  <a href="https://badge.fury.io/py/rl_opts"><img src="https://badge.fury.io/py/rl_opts.svg" alt="PyPI version"></a>
  <a href="https://badge.fury.io/py/b"><img src="https://img.shields.io/badge/python-3.9-red" alt="Python version"></a>
</p>


This library builds the necessary tools needed to study, replicate and
develop Projective Simulation agents for learninng efficient strategies in target search problems, as
well as a benchmark baselines with which to compare the. This library is
based in three different publications:

- [“Optimal foraging strategies can be learned”](https://arxiv.org/abs/2303.06050) by *G. Muñoz-Gil, A. López-Incera, L. J. Fiderer* and *H. J. Briegel* (2024). Here we developed
  agents able to learn how to forage efficiently in environments with multiple targets.

- ["Learning how to find targets in the micro-world: the case of intermittent active Brownian particles"](https://pubs.rsc.org/en/content/articlehtml/2024/sm/d3sm01680c) by *M. Caraglio, H. Kaur, L. Fiderer, A. López-Incera, H. J. Briegel, T. Franosch*, and *G. Muñoz-Gil* (2024). In this case, we study the ability of agents to learn how to switch from passive to active diffusion to enhance their target search efficiency.

- [“Learning to reset in target search problems”](https://arxiv.org/abs/2503.11330) by *G. Muñoz-Gil, H. J. Briegel* and *M. Caraglio* (2025). Here we extended the agents to be able to
  reset to the origin, a feature that has revolutionize target search problems in the last years.
  
- [“Run-and-Tumble Particles Learning Chemotaxis”](https://arxiv.org/pdf/2507.23519) by *N. Tovazzi, G. Muñoz-Gil and *M. Caraglio* (2025). In this work we explore the ability of active particles to adapt their run-and-tumble strategy to reach targets, based on their chemotactic response, just as bacteria do in the real world!



### Installation

You can access all these tools installing the python package `rl_opts` via Pypi:
```python
pip install rl-opts
```
You can also opt for cloning the [source repository](https://github.com/gorkamunoz/rl_opts) and executing the following on the parent folder you just cloned the repo:
```python
pip install -e rl_opts
```
This will install both the library and the necessary packages. 

### Tutorials

We have prepared a series of tutorials to guide you through the most important functionalities of the package. You can find them in the [Tutorials folder](https://github.com/gorkamunoz/rl_opts/tree/master/nbs/tutorials) of the Github repository or in the Tutorials tab of our [webpage](https://gorkamunoz.github.io/rl_opts/), with notebooks that will help you navigate the package as well as reproducing the results of our paper via minimal examples. In particular, we have three tutorials:

- <a href="tutorials/tutorial_learning.ipynb" style="text-decoration:none">Learning to forage with RL </a> : shows how to train a RL agent based on Projective Simulation agents to search targets in randomly distributed environments as the ones considered in our paper.
- <a href="tutorials/tutorial_reset.ipynb" style="text-decoration:none">Learning to reset in target search problems </a> : shows how to train a RL agent similar to the previous, but with the ability to reset to the origin, an action that is learned along its spatial dynamics.
- <a href="tutorials/tutorial_imitation.ipynb" style="text-decoration:none">Imitation learning </a> : shows how to train a RL agent to imitate the policy of an expert equipped with a pre-trained policy. The latter is based on the benchmark strategies common in the literature.
- <a href="tutorials/tutorial_benchmarks.ipynb" style="text-decoration:none">Forangin benchmarks: beyond Lévy walks </a> : shows how launch various benchmark strategies with which to compare the trained RL agents.



### Cite

We kindly ask you to cite us if any of the previous material was useful for you. You can either cite this library:
``` latex
@software{rlopts,
  author       = {Mu\~noz-Gil, Gorka and L\'opez-Incera, Andrea and Caraglio Michele and Fiderer, Lukas J. and Briegel, Hans J.},
  title        = {\uppercase{RL}-\uppercase{O}pt\uppercase{S}: Reinforcement Learning of Optimal Search Strategies},
  month        = jan,
  year         = 2024,
  publisher    = {Zenodo},
  version      = {v1.0},
  doi          = {10.5281/zenodo.10450489},
  url          = {https://doi.org/10.5281/zenodo.7727873}}
```

or the works it's based on:
``` latex
@article{munoz2024optimal,
  title={Optimal foraging strategies can be learned},
  author={Mu{\~n}oz-Gil, Gorka and L{\'o}pez-Incera, Andrea and Fiderer, Lukas J and Briegel, Hans J},
  journal={New Journal of Physics},
  volume={26},
  number={1},
  pages={013010},
  year={2024},
  publisher={IOP Publishing}
}
```

```latex
@misc{munoz2025learning,
      title={Learning to reset in target search problems}, 
      author={Gorka Muñoz-Gil and Hans J. Briegel and Michele Caraglio},
      year={2025},
      eprint={2503.11330},
      archivePrefix={arXiv},
      primaryClass={cond-mat.stat-mech},
      url={https://arxiv.org/abs/2503.11330}, 
}
```
```