Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

How to use the code

Input

The inputs to the reinforcement learning algorithm are located in ./utils/params.yaml that include:

LTL Task: ltl_task and obstacle_list
Grid size: grid_size (e.g., if grid_size=50, then the MDP models a 50x50 grid world resulting in an MDP with 2,500 states)
Number of episodes: episodes
Maximum number of steps per episode: max_steps

System Requirement

This repo was implemented using using Python 3.8 and tested on Ubuntu 20.04.

Our Method

To run our method, type:

python3 main.py --type X, where X can take values from {1,2,3,4,5,6}. The value of X represents different decay rates of the parameters \epsilon, \delta_b, and \delta_e of our method.

Values of X
- X = 1: Corresponds to Biased-1 used in [A] for the 10x10 and 20x20 MDPs
- X = 2: Corresponds to Biased-2 used in [A] for the 10x10 and 20x20 MDPs
- X = 3: Corresponds to Biased-3 used in [A] for the 10x10 and 20x20 MDPs
- X = 4: Corresponds to Biased-1 used in [A] for the 50x50 MDP
- X = 5: Corresponds to Biased-2 used in [A] for the 50x50 MDP
- X = 6: Corresponds to Biased-3 used in [A] for the 50x50 MDP

Epsilon-Greedy Method

To run our method with the biased part disabled (i.e., \delta_b=0), use the following command python3 random_explore.py

Output

The code will return a deterministic policy (greedy with respect to the Q value function) along with its satisfaction probability. The latter is computed using the unknown-to-the-system MDP transition probabilities.

Reference

[A] Y. Kantaros, and J. Wang. "Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration." IEEE Transactions on Automatic Control (2024).

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
utils		utils
README.md		README.md
ltl2ba		ltl2ba
ltl2dstar		ltl2dstar
ltlfilt		ltlfilt
main.py		main.py
random_explore.py		random_explore.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

How to use the code

Input

System Requirement

Our Method

Epsilon-Greedy Method

Output

Reference

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Sample-Efficient Reinforcement Learning with Temporal Logic Objectives: Leveraging the Task Specification to Guide Exploration

How to use the code

Input

System Requirement

Our Method

Epsilon-Greedy Method

Output

Reference

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages