Skip to content

This project aims to solve the route optimisation problem of individual vehicle by RL.

License

Notifications You must be signed in to change notification settings

gmmrr/route-optim

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

24 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

route-optim

Screenshot 2023-12-24 at 22 18 03

Description

This project aims to solve the route optimisation problem of individual vehicle by reinforcement learning.

Giving a set of start terminal and end terminal, a vehicle will follow the route computed by RL.

For multiple vehicles and demands case, gmmrr/fleet-route-optim is another version aims to deal with vehicles in fleet.

This repo is part of Guanming's capstone project.

Result

Dijkstra Algorithm

It is executed as a reference.
Screenshot 2023-12-24 at 22 15 56

Processing Time: 0.09724 seconds
Travelled Time: 6.15 mins

Q Learning Algorithm

Screenshot 2023-12-24 at 22 17 12 Screenshot 2023-12-24 at 22 18 03
Last Episode: 120
Processing Time: 42.869093 seconds
Travelled Time: 6.67 mins

SARSA Algorithm

Similar to Q Learning, but we trigger exploration in a given exploration_rate.
As in the graph, bumps are caused by those explorations.
Screenshot 2023-12-24 at 22 22 46 Screenshot 2023-12-24 at 22 23 37

Last Episode: 465
Processing Time: 161.478861 seconds
Travelled Time: 6.65 mins

Obviously, outcomes of both Q Learning and SARSA are slightly worse than the classic Dijkstra algorithm. If the map is well informed, Dijkstra is one of the best with its highly effective performance. Though the map in this project is not infinitely extended because of the limit of node amount by OSM, they are close enough. The result of SARSA is slightly better than Q Learning counterpart as well. It indicates the benefit of the policy that SARSA take.

Project Scope

  1. Congestion is randomly generated
    Similar to mentioned situation above. It is randomly chosen from edges space, and it can be defined in fleet_environment.py to low, medium, or high level.

  2. Speed is a constant
    Net downloaded from OSM website helps classify the edge type, like primary, secondary, residential highway. Each of them has a defined speed. In this project, we don't take acceleration into consideration. Thus, it seems like to be far away from the practical case.

  3. Traffic light is set in a 90 seconds interval
    Even if it is close to the practical case, it is still not real. They are set as a program rather than a constant pattern in reality.

  4. The terminal condition of RL
    It is set that convergence occurs when time taken (round to the second decimal place) in 5 episodes is consistent.

Setup

  1. Download SUMO (https://sumo.dlr.de/docs/Downloads.php)
  2. Clone this repository to your local machine
  3. Install the necessary packages by following operations:
$ pip3 install -r requirements.txt
  1. Update the main.py with your SUMO directory to set the environment variable
def sumo_config():
    os.environ["SUMO_HOME"] = '$SUMO_HOME' # -- change to your path to $SUMO_HOME
    ...
  1. Upload your netedit file and update the network_file variable
network_file = './network_files/ncku_network.net.xml'

More on OSM website: https://www.openstreetmap.org/
Config command is saved in ./network_files/config.txt

  1. Upload your traffic_light file
tls = tls_from_tllxml('./network_files/ncku_network.tll.xml')

This file can be converted by Netedit, more on https://sumo.dlr.de/docs/Netedit/index.html

  1. Edit start_node and end_node in main.py
# 02 Configure network variables
start_node = "864831599"  # can be defined, the scope is the nodes in the network
end_node = "5739293224"
  1. Run the code
$ python3 main.py

Customisable Section

In agent.py, we can set

# Hyperparameters for Q_Learning
learning_rate = 0.9  # alpha
discount_factor = 0.1  # gamma

# Hyperparameters for SARSA
learning_rate = 0.9  # alpha
discount_factor = 0.1  # gamma
exploration_rate = 0.1  # ratio of exploration and exploitation

and we have

reward_lst = [-50, -50, -30, 100, 50, -1]

They are defined as [invalid_action_reward, dead_end_reward, loop_reward, completion_reward, bonus_reward, continue_reward] respectively.

About

This project aims to solve the route optimisation problem of individual vehicle by RL.

Topics

Resources

License

Stars

Watchers

Forks

Languages