Revised "Controlling Your Traffic Lights via RL"

flow-project · Jul 9, 2019 · 70a78d9 · 70a78d9
1 parent c897095
commit 70a78d9
Showing 1 changed file with 107 additions and 45 deletions.
diff --git a/tutorials/tutorial11_traffic_lights.ipynb b/tutorials/tutorial11_traffic_lights.ipynb
@@ -342,9 +342,10 @@
    "metadata": {},
    "source": [
     "## 6. Controlling Your Traffic Lights via RL\n",
-    "This is where we switch from the `grid.py` experiment script to `green_wave.py`. \n",
     "\n",
-    "To control traffic lights via RL, no `tl_logic` element is necessary. This is because rllab is controlling all the parameters you were able to customize in the prior sections. Your `additional_net_params` should look something like this: "
+    "This is where we switch from the non-RL experiment script to the RL experiment. \n",
+    "\n",
+    "To control traffic lights via RL, no `tl_logic` element is necessary. This is because rllib is controlling all the parameters you were able to customize in the prior sections. The `additional_net_params` should look something like this: "
    ]
   },
   {
@@ -362,7 +363,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "This will enable the program to recognize all nodes as traffic lights. The experiment then gives control to the environment; we are using `TrafficLightGridEnv`, which is an environment created for RL that applies RL-specified traffic light actions (e.g. change the state) via TraCI at each timestep.\n",
+    "This will enable the program to recognize all nodes as traffic lights. The experiment then gives control to the environment; we are using `TrafficLightGridEnv`, which is an environment created for applying RL-specified traffic light actions (e.g. change the state) via TraCI.\n",
     "\n",
     "This is all you need to run an RL experiment! It is worth taking a look at the `TrafficLightGridEnv` class to further understanding of the experiment internals. The rest of this tutorial is an optional walkthrough through the various components of `TrafficLightGridEnv`:\n",
     "\n",
@@ -382,7 +383,7 @@
     "self.last_change = np.zeros((self.rows * self.cols, 1))\n",
     "# keeps track of the direction of the intersection (the direction that is currently being allowed to flow. 0 indicates flow from top to bottom, and 1 indicates flow from left to right.)\n",
     "self.direction = np.zeros((self.rows * self.cols, 1))\n",
-    "# value of 0 indicates that the intersection is in a red-yellow state. 1 indicates that the intersection is in a red-green state.\n",
+    "# value of 1 indicates that the intersection is in a red-yellow state. 0 indicates that the intersection is in a red-green state.\n",
     "self.currently_yellow = np.zeros((self.rows * self.cols, 1))"
    ]
   },
@@ -392,9 +393,9 @@
    "source": [
     "* The variable `self.last_change` indicates the last time the lights were allowed to change from a red-green state to a red-yellow state.\n",
     "* The variable `self.direction` indicates the direction of the intersection, i.e. the direction that is currently being allowed to flow. 0 indicates flow from top to bottom, and 1 indicates flow from left to right.\n",
-    "* The variable `self.currently_yellow` with a value of 0 indicates that the traffic light is in a red-yellow state. 1 indicates that the traffic light is in a red-green state.\n",
+    "* The variable `self.currently_yellow` with a value of 1 indicates that the traffic light is in a red-yellow state. 0 indicates that the traffic light is in a red-green state.\n",
     "\n",
-    "`self.last_change` is contingent on an instance variable `self.min_switch_time`. This is a variable that can be set in `additional_env_params` with the key name \"switch_time\". Setting switch_time enables more control over the RL experiment by preventing traffic lights from switching until `switch_time` timesteps have occurred. In practice, this can be used to prevent flickering."
+    "`self.last_change` is contingent on an instance variable `self.min_switch_time`. This is a variable that can be set in `additional_env_params` with the key name `switch_time`. Setting `switch_time` enables more control over the RL experiment by preventing traffic lights from switching until `switch_time` timesteps have occurred. In practice, this can be used to prevent flickering."
    ]
   },
   {
@@ -406,20 +407,27 @@
     "additional_env_params = {\"target_velocity\": 50, \"switch_time\": 3.0}"
    ]
   },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "### Elements of RL for Controlling Traffic Lights"
+   ]
+  },
   {
    "cell_type": "markdown",
    "metadata": {
     "collapsed": true
    },
    "source": [
-    "### Action Space"
+    "#### Action Space"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The action space for RL-controlled traffic lights directly matches the number of traffic lights in the system. Each traffic light node corresponds to one action. The action space is thus defined as:"
+    "Based on the need, the action space can be defined in a different way In this example, the action space for RL-controlled traffic lights directly matches the number of traffic intersections in the system. Each intersection (traffic light node) corresponds to an action. The action space is thus defined as:"
    ]
   },
   {
@@ -430,22 +438,35 @@
    "source": [
     "@property\n",
     "def action_space(self):\n",
-    "    return Box(low=0, high=1, shape=(self.k.traffic_light.num_traffic_lights,),\n",
-    "               dtype=np.float32)"
+    "    if self.discrete:\n",
+    "            return Discrete(2 ** self.num_traffic_lights)\n",
+    "        else:\n",
+    "            return Box(\n",
+    "                low=-1,\n",
+    "                high=1,\n",
+    "                shape=(self.num_traffic_lights,),\n",
+    "                dtype=np.float32)"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Note that the variable `num_traffic_lights` is actually the number of intersections in the grid system, not the number of traffic lights. Number of traffic lights in our example is 4 times the number of intersections"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### Observation Space"
+    "#### Observation Space"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "The existing observation space for our existing traffic lights experiments is designed to be a fully observable state space with these metrics in mind. For each vehicle, we want to know its velocity, its distance (in [unit]) from the next intersection, and the unique edge it is traveling on. For each traffic light, we want to know its current state (i.e. what direction it is flowing), when it last changed, and whether it was yellow. Recall that the traffic light states are encoded in `self.min_switch_time`.  "
+    "Based on the need, the observation space can be defined in a different way. The existing observation space for our existing traffic lights experiments is designed to be a fully observable state space with the following metrics. For each vehicle, we want to know its velocity, its distance (in [unit]) from the next intersection, and the unique edge it is traveling on. For each traffic light, we want to know its current state (i.e. what direction it is flowing), when it last changed, and whether it was yellow. "
    ]
   },
   {
@@ -456,24 +477,34 @@
    "source": [
     "@property\n",
     "def observation_space(self):\n",
-    "    speed = Box(low=0, high=1, shape=(self.initial_vehicles.num_vehicles,),\n",
-    "                dtype=np.float32)\n",
-    "    dist_to_intersec = Box(low=0., high=np.inf,\n",
-    "                           shape=(self.initial_vehicles.num_vehicles,),\n",
-    "                           dtype=np.float32)\n",
-    "    edge_num = Box(low=0., high=1, shape=(self.initial_vehicles.num_vehicles,),\n",
-    "                   dtype=np.float32)\n",
-    "    traffic_lights = Box(low=0., high=np.inf,\n",
-    "                         shape=(3 * self.rows * self.cols,),\n",
-    "                         dtype=np.float32)\n",
-    "    return Tuple((speed, dist_to_intersec, edge_num, traffic_lights))"
+    "    speed = Box(\n",
+    "            low=0,\n",
+    "            high=1,\n",
+    "            shape=(self.initial_vehicles.num_vehicles,),\n",
+    "            dtype=np.float32)\n",
+    "        dist_to_intersec = Box(\n",
+    "            low=0.,\n",
+    "            high=np.inf,\n",
+    "            shape=(self.initial_vehicles.num_vehicles,),\n",
+    "            dtype=np.float32)\n",
+    "        edge_num = Box(\n",
+    "            low=0.,\n",
+    "            high=1,\n",
+    "            shape=(self.initial_vehicles.num_vehicles,),\n",
+    "            dtype=np.float32)\n",
+    "        traffic_lights = Box(\n",
+    "            low=0.,\n",
+    "            high=1,\n",
+    "            shape=(3 * self.rows * self.cols,),\n",
+    "            dtype=np.float32)\n",
+    "        return Tuple((speed, dist_to_intersec, edge_num, traffic_lights))"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "### State Space"
+    "#### State Space"
    ]
   },
   {
@@ -491,31 +522,62 @@
    "source": [
     "def get_state(self):\n",
     "    # compute the normalizers\n",
-    "    max_speed = self.k.scenario.max_speed()\n",
-    "    grid_array = self.net_params.additional_params[\"grid_array\"]\n",
-    "    max_dist = max(grid_array[\"short_length\"], \n",
-    "                   grid_array[\"long_length\"],\n",
-    "                   grid_array[\"inner_length\"])\n",
-    "\n",
-    "    # get the state arrays\n",
-    "    speeds = [self.k.vehicle.get_speed(veh_id) / max_speed for veh_id in\n",
-    "              self.k.vehicle.get_ids()]\n",
-    "    dist_to_intersec = [self.get_distance_to_intersection(veh_id)/max_dist\n",
-    "                        for veh_id in self.k.vehicle.get_ids()]\n",
-    "    edges = [self._convert_edge(self.k.vehicle.get_edge(veh_id)) / (\n",
-    "        self.k.scenario.num_edges - 1) for veh_id in self.k.vehicle.get_ids()]\n",
-    "\n",
-    "    state = [speeds, dist_to_intersec, edges,\n",
-    "             self.last_change.flatten().tolist()]\n",
-    "    return np.array(state)"
+    "        grid_array = self.net_params.additional_params[\"grid_array\"]\n",
+    "        max_dist = max(grid_array[\"short_length\"],\n",
+    "                       grid_array[\"long_length\"],\n",
+    "                       grid_array[\"inner_length\"])\n",
+    "\n",
+    "        # get the state arrays\n",
+    "        speeds = [\n",
+    "            self.k.vehicle.get_speed(veh_id) / self.k.scenario.max_speed()\n",
+    "            for veh_id in self.k.vehicle.get_ids()\n",
+    "        ]\n",
+    "        dist_to_intersec = [\n",
+    "            self.get_distance_to_intersection(veh_id) / max_dist\n",
+    "            for veh_id in self.k.vehicle.get_ids()\n",
+    "        ]\n",
+    "        edges = [\n",
+    "            self._convert_edge(self.k.vehicle.get_edge(veh_id)) /\n",
+    "            (self.k.scenario.network.num_edges - 1)\n",
+    "            for veh_id in self.k.vehicle.get_ids()\n",
+    "        ]\n",
+    "\n",
+    "        state = [\n",
+    "            speeds, dist_to_intersec, edges,\n",
+    "            self.last_change.flatten().tolist(),\n",
+    "            self.direction.flatten().tolist(),\n",
+    "            self.currently_yellow.flatten().tolist()\n",
+    "        ]\n",
+    "        return np.array(state)"
    ]
   },
   {
-   "cell_type": "code",
-   "execution_count": null,
+   "cell_type": "markdown",
    "metadata": {},
-   "outputs": [],
-   "source": []
+   "source": [
+    "#### Reward"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "...."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "#### Apply RL Actions"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "..."
+   ]
   }
  ],
  "metadata": {