Merge pull request #205 from rte-france/dev_1.5.2

Ready for version 1.5.2
rte-france · May 10, 2021 · 7a8a307 · 7a8a307
2 parents 8960180 + c288bb4
commit 7a8a307
Show file tree

Hide file tree

Showing 46 changed files with 1,626 additions and 174 deletions.
diff --git a/.gitignore b/.gitignore
@@ -303,6 +303,9 @@ test_bug_discord1.py
 test_networkx.py
 test_issue185.py
 test_can_make_opponent.py
+enigma_nili.py
+test_issue196.py
+test_increasingreward.py
 
 # profiling files
 **.prof
diff --git a/CHANGELOG.rst b/CHANGELOG.rst
@@ -24,10 +24,42 @@ Change Log
 
 [1.5.2] - 2021-xx-yy
 -----------------------
-
+- [BREAKING]: allow the opponent to chose the duration of its attack. This breaks the previous "Opponent.attack(...)"
+  signature by adding an object in the return value. All code provided with grid2op are compatible with this
+  new change. (for previously coded opponent, the only thing you have to do to make it compliant with
+  the new interface is, in the `opponent.attack(...)` function return `whatever_you_returned_before, None` instead
+  of simply `whatever_you_returned_before`
+- [FIXED]: `Issue#196 <https://github.com/rte-france/Grid2Op/issues/196>`_ an issue related to the
+  low / high of the observation if using the gym_compat module. Some more protections
+  are enforced now.
+- [FIXED]: `Issue#196 <https://github.com/rte-france/Grid2Op/issues/196>`_ an issue related the scaling when negative
+  numbers are used (in these cases low / max would be mixed up)
+- [FIXED]: an issue with the `IncreasingFlatReward` reward types
+- [FIXED]: a bug due to the conversion of int to float in the range of the `BoxActionSpace` for the `gym_compat` module
+- [FIXED]: a bug in the `BoxGymActSpace`, `BoxGymObsSpace`, `MultiDiscreteActSpace` and `DiscreteActSpace`
+  where the order of the attribute for the conversion
+  was encoded in a set. We enforced a sorted list now. We did not manage to find a bug caused by this issue, but
+  it is definitely possible. This has been fixed now.
+- [FIXED]: a bug where, when an observation was set to a "game over" state, some of its attributes were below the
+  maximum values allowed in the `BoxGymObsSpace`
+- [ADDED]: a reward `EpisodeDurationReward` that is always 0 unless at the end of an episode where it returns a float
+  proportional to the number of step made from the beginning of the environment.
+- [ADDED]: in the `Observation` the possibility to retrieve the current number of steps
+- [ADDED]: easier function to manipulate the max number of iteration we want to perform directly from the environment
+- [ADDED]: function to retrieve the maximum duration of the current episode.
+- [ADDED]: a new kind of opponent that is able to attack at "more random" times with "more random" duration.
+  See the `GeometricOpponent`.
+- [IMPROVED]: on windows at least, grid2op does not work with gym < 0.17.2 Checks are performed in order to make sure
+  the installed open ai gym package meets this requirement (see issue
+  `Issue#185 <https://github.com/rte-france/Grid2Op/issues/185>`_ )
+- [IMPROVED] the seed of openAI gym for composed action space (see issue `https://github.com/openai/gym/issues/2166`):
+  in waiting for an official fix, grid2op will use the solution proposed there
+  https://github.com/openai/gym/issues/2166#issuecomment-803984619 )
 
 [1.5.1] - 2021-04-15
 -----------------------
+- [FIXED]: `Issue#194 <https://github.com/rte-france/Grid2Op/issues/194>`_: (post release): change the name
+  of the file `platform.py` that could be mixed with the python "platform" module to `_glop_platform_info.py`
 - [FIXED]: `Issue #187 <https://github.com/rte-france/Grid2Op/issues/187>`_: improve the computation and the
   documentation of the `RedispReward`. This has an impact on the `env.reward_range` of all environments using this
   reward, because the old "reward_max" was not correct.

diff --git a/docs/conf.py b/docs/conf.py
@@ -81,6 +81,7 @@
 
 
 def setup(app):
-  app.add_javascript('custom.js')
-  if app.config.language == 'ja':
+    # app.add_javascript('custom.js')
+    app.add_js_file('custom.js')
+    if app.config.language == 'ja':
         app.config.intersphinx_mapping['py'] = ('https://docs.python.org/ja/3', None)
diff --git a/getting_started/11_IntegrationWithExistingRLFrameworks.ipynb b/getting_started/11_IntegrationWithExistingRLFrameworks.ipynb
@@ -505,7 +505,89 @@
     "            trainer.train()\n",
     "    finally:   \n",
     "        # shutdown ray\n",
-    "        ray.shutdown()"
+    "        ray.shutdown()\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Because we are approximating a physical system with real equations, and limited computational power\n",
+    "regardless of the \"backend\" / \"powergrid simulator\" used internally by grid2op, it is sometimes possible\n",
+    "that an observation obs[\"gen_p\"] is not exactly in the range \n",
+    "env.observation_space[\"gen_p\"].low, env.observation_space[\"gen_p\"].high.\n",
+    "\n",
+    "In this \"pathological\" cases we recommend to manually change the low / high value of the `gen_p` part of the observation space, for example by adding, after the definition of self.observation_space something like:\n",
+    "\n",
+    "```python\n",
+    "        # 4. specific to rllib\n",
+    "        self.action_space = self.env_gym.action_space\n",
+    "        self.observation_space = self.env_gym.observation_space\n",
+    "        self.observation_space[\"gen_p\"].low[:] = -np.inf\n",
+    "        self.observation_space[\"gen_p\"].high[:] = np.inf\n",
+    "```\n",
+    "\n",
+    "More information at https://github.com/rte-france/Grid2Op/issues/196\n",
+    "\n",
+    "**NB** these cases can be spotted with an error like:\n",
+    "\n",
+    "```\n",
+    "RayTaskError(ValueError): ray::RolloutWorker.par_iter_next() (pid=378, ip=172.28.0.2)\n",
+    "  File \"python/ray/_raylet.pyx\", line 480, in ray._raylet.execute_task\n",
+    "  File \"python/ray/_raylet.pyx\", line 432, in ray._raylet.execute_task.function_executor\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/util/iter.py\", line 1152, in par_iter_next\n",
+    "    return next(self.local_it)\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/rollout_worker.py\", line 327, in gen_rollouts\n",
+    "    yield self.sample()\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/rollout_worker.py\", line 662, in sample\n",
+    "    batches = [self.input_reader.next()]\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py\", line 95, in next\n",
+    "    batches = [self.get_data()]\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py\", line 224, in get_data\n",
+    "    item = next(self.rollout_provider)\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py\", line 620, in _env_runner\n",
+    "    sample_collector=sample_collector,\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/evaluation/sampler.py\", line 1056, in _process_observations_w_trajectory_view_api\n",
+    "    policy_id).transform(raw_obs)\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/models/preprocessors.py\", line 257, in transform\n",
+    "    self.check_shape(observation)\n",
+    "  File \"/usr/local/lib/python3.7/dist-packages/ray/rllib/models/preprocessors.py\", line 68, in check_shape\n",
+    "    observation, self._obs_space)\n",
+    "ValueError: ('Observation ({}) outside given space ({})!', OrderedDict([('actual_dispatch', array([0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.,\n",
+    "       0., 0., 0., 0., 0.], dtype=float32)), ('gen_p', array([0.        , 0.14583334, 0.        , 0.5376    , 0.        ,\n",
+    "       0.13690476, 0.        , 0.        , 0.13988096, 0.        ,\n",
+    "       0.        , 0.        , 0.        , 0.        , 0.        ,\n",
+    "       0.        , 0.        , 0.10416667, 0.        , 0.9975    ,\n",
+    "       0.        , 0.0872582 ], dtype=float32)), ('load_p', array([-8.33333358e-02,  1.27543859e+01, -3.14843726e+00, -4.91228588e-02,\n",
+    "       -7.84314200e-02,  2.70270016e-02,  4.51001197e-01, -7.63358772e-02,\n",
+    "       -8.42104480e-02, -7.90961310e-02, -2.31212564e-02, -7.31706619e-02,\n",
+    "       -5.47945984e-02, -5.57769537e-02, -4.65115122e-02,  0.00000000e+00,\n",
+    "       -6.25000373e-02, -2.98508592e-02,  0.00000000e+00,  2.59741265e-02,\n",
+    "       -5.12821227e-02,  2.12766770e-02, -4.38757129e-02,  1.45455096e-02,\n",
+    "       -1.45278079e-02, -3.63636017e-02,  7.14286715e-02,  1.03358915e-02,\n",
+    "        8.95522386e-02,  4.81927246e-02, -1.76759213e-02,  1.11111533e-02,\n",
+    "        1.00000061e-01, -5.28445065e-01,  3.00833374e-01,  7.76839375e-01,\n",
+    "       -7.07498193e-01], dtype=float32)), ('rho', array([0.49652272, 0.42036632, 0.12563582, 0.22375877, 0.54946697,\n",
+    "       0.08844228, 0.05907034, 0.10975129, 0.13002895, 0.14068729,\n",
+    "       0.17318982, 0.6956544 , 0.38796344, 0.67179894, 0.22992906,\n",
+    "       0.25189328, 0.15049867, 0.09095841, 0.35627988, 0.35627988,\n",
+    "       0.36776555, 0.27249542, 0.6269728 , 0.62393713, 0.3464659 ,\n",
+    "       0.35879263, 0.22755426, 0.35994047, 0.36117986, 0.12019955,\n",
+    "       0.03638522, 0.2805753 , 0.5809281 , 0.6191531 , 0.5243356 ,\n",
+    "       0.60382956, 0.35834518, 0.35867074, 0.3580954 , 0.6681824 ,\n",
+    "       0.3441911 , 0.6081861 , 0.34460714, 0.18246886, 0.10307808,\n",
+    "       0.46778303, 0.47179568, 0.45407027, 0.30089107, 0.30089107,\n",
+    "       0.34481782, 0.3182735 , 0.35940355, 0.21895139, 0.19766088,\n",
+    "       0.63653564, 0.46778303, 0.4566811 , 0.64398617], dtype=float32)), ('topo_vect', array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,\n",
+    "       1], dtype=int32))]), Dict(actual_dispatch:Box(-1.0, 1.0, (22,), float32), gen_p:Box(0.0, 1.2000000476837158, (22,), float32), load_p:Box(-inf, inf, (37,), float32), rho:Box(0.0, inf, (59,), float32), topo_vect:Box(-1, 2, (177,), int32)))\n",
+    "```"
    ]
   },
   {
@@ -1085,7 +1167,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.8.5"
+   "version": "3.8.3"
   }
  },
  "nbformat": 4,

diff --git a/grid2op/Action/ActionSpace.py b/grid2op/Action/ActionSpace.py
@@ -143,6 +143,6 @@ def _is_legal(self, action, env):
         """
         if env is None:
             warnings.warn("Cannot performed legality check because no environment is provided.")
-            return True
+            return True, None
         is_legal, reason = self.legal_action(action, env)
         return is_legal, reason
diff --git a/grid2op/Action/BaseAction.py b/grid2op/Action/BaseAction.py
@@ -412,7 +412,11 @@ def __init__(self):
         mycls = type(self)
         if mycls.shunt_added is False and mycls.shunts_data_available:
             mycls.shunt_added = True
+
+            mycls.attr_list_vect = copy.deepcopy(mycls.attr_list_vect)
             mycls.attr_list_vect += ["shunt_p", "shunt_q", "shunt_bus"]
+
+            mycls.authorized_keys = copy.deepcopy(mycls.authorized_keys)
             mycls.authorized_keys.add("shunt")
             mycls.attr_nan_list_set.add("shunt_p")
             mycls.attr_nan_list_set.add("shunt_q")
@@ -712,6 +716,26 @@ def get_topological_impact(self, powerline_status=None):
             component ``True`` if the substation is impacted by the action, and ``False`` otherwise. See
             :attr:`BaseAction._subs_impacted` for more information.
 
+        Examples
+        --------
+
+        You can use this function like;
+
+        .. code-block:: python
+
+            import grid2op
+            env_name = ...  # chose an environment
+            env = grid2op.make(env_name)
+
+            # get an action
+            action = env.action_space.sample()
+            # inspect its impact
+            lines_impacted, subs_impacted = action.get_topological_impact()
+
+            for line_id in np.where(lines_impacted)[0]:
+                print(f"The line {env.name_line[line_id]} with id {line_id} is impacted by this action")
+
+            print(action)
         """
         if powerline_status is None:
             isnotconnected = np.full(self.n_line, fill_value=True, dtype=dt_bool)

diff --git a/grid2op/Chronics/ChronicsHandler.py b/grid2op/Chronics/ChronicsHandler.py
@@ -69,9 +69,9 @@ def __init__(self, chronicsClass=ChangeNothing, time_interval=timedelta(minutes=
         try:
             self._real_data = self.chronicsClass(time_interval=time_interval, max_iter=self.max_iter,
                                                  **self.kwargs)
-        except TypeError:
+        except TypeError as exc_:
             raise ChronicsError("Impossible to build a chronics of type {} with arguments in "
-                                "{}".format(chronicsClass, self.kwargs))
+                                "{}".format(chronicsClass, self.kwargs)) from exc_
 
     @property
     def real_data(self):
@@ -87,6 +87,28 @@ def next_time_step(self):
         res = self._real_data.load_next()
         return res
 
+    def max_episode_duration(self):
+        """
+        Returns
+        -------
+        max_duration: ``int``
+            The maximum duration of the current episode
+        Notes
+        -----
+        Using this function (which we do not recommend) you will receive "-1" for "infinite duration" otherwise
+        you will receive a positive integer
+
+        """
+        tmp = self.max_iter
+        if tmp == -1:
+            # tmp = -1 means "infinite duration" but in this case, i can have a limit
+            # due to the data used (especially if read from files)
+            tmp = self._real_data.max_timestep()
+        else:
+            # i can also have a limit on the maximum number of data in the chronics (especially if read from files)
+            tmp = min(tmp, self._real_data.max_timestep())
+        return tmp
+
     def get_name(self):
         """
         This method retrieve a unique name that is used to serialize episode data on
@@ -104,15 +126,18 @@ def set_max_iter(self, max_iter):
         Parameters
         ----------
         max_iter: ``int``
-            The maximum number of timestep in the chronics.
-
-        Returns
-        -------
+            The maximum number of steps that can be done before reaching the end of the episode
 
         """
 
         if not isinstance(max_iter, int):
             raise Grid2OpException("The maximum number of iterations possible for this chronics, before it ends.")
+        if max_iter == 0:
+            raise Grid2OpException("The maximum number of iteration should be > 0 (or -1 if you mean "
+                                   "\"don't limit it\")")
+        elif max_iter < -1:
+            raise Grid2OpException("The maximum number of iteration should be > 0 (or -1 if you mean "
+                                   "\"don't limit it\")")
         self.max_iter = max_iter
         self._real_data.max_iter = max_iter
 

diff --git a/grid2op/Chronics/GridValue.py b/grid2op/Chronics/GridValue.py
@@ -608,10 +608,9 @@ def max_timestep(self):
         Returns
         -------
         res: ``int``
-            -1 if possibly infinite length of a positive integer representing the maximum duration of this episode
+            -1 if possibly infinite length or a positive integer representing the maximum duration of this episode
 
         """
-        # warnings.warn("Class {} has possibly and infinite duration.".format(type(self).__name__))
         return self.max_iter
 
     def shuffle(self, shuffler=None):

diff --git a/grid2op/Environment/BaseEnv.py b/grid2op/Environment/BaseEnv.py
@@ -1488,7 +1488,7 @@ def step(self, action):
                 dictionary with keys:
 
                     - "disc_lines": a numpy array (or ``None``) saying, for each powerline if it has been disconnected
-                        due to overflow
+                      due to overflow
                     - "is_illegal" (``bool``) whether the action given as input was illegal
                     - "is_ambiguous" (``bool``) whether the action given as input was ambiguous.
                     - "is_dispatching_illegal" (``bool``) was the action illegal due to redispatching
@@ -1551,6 +1551,7 @@ def step(self, action):
         lines_attacked, subs_attacked = None, None
         conv_ = None
         init_line_status = copy.deepcopy(self.backend.get_line_status())
+
         beg_step = time.time()
         try:
             beg_ = time.time()

diff --git a/grid2op/Environment/Environment.py b/grid2op/Environment/Environment.py
@@ -244,13 +244,6 @@ def _init_backend(self,
                                                             actionClass=CompleteAction,
                                                             legal_action=self._game_rules.legal_action)
 
-        self._helper_observation_class = ObservationSpace.init_grid(gridobj=bk_type)
-        self._observation_space = self._helper_observation_class(gridobj=bk_type,
-                                                                 observationClass=observationClass,
-                                                                 actionClass=actionClass,
-                                                                 rewardClass=rewardClass,
-                                                                 env=self)
-
         # handles input data
         if not isinstance(chronics_handler, ChronicsHandler):
             raise Grid2OpException(
@@ -263,6 +256,15 @@ def _init_backend(self,
                                          names_chronics_to_backend=names_chronics_to_backend)
         self.names_chronics_to_backend = names_chronics_to_backend
 
+        # this needs to be done after the chronics handler: rewards might need information
+        # about the chronics to work properly.
+        self._helper_observation_class = ObservationSpace.init_grid(gridobj=bk_type)
+        self._observation_space = self._helper_observation_class(gridobj=bk_type,
+                                                                 observationClass=observationClass,
+                                                                 actionClass=actionClass,
+                                                                 rewardClass=rewardClass,
+                                                                 env=self)
+
         # test to make sure the backend is consistent with the chronics generator
         self.chronics_handler.check_validity(self.backend)
         self.delta_time_seconds = dt_float(self.chronics_handler.time_interval.seconds)
@@ -314,6 +316,39 @@ def _init_backend(self,
         # reset everything to be consistent
         self._reset_vectors_and_timings()
 
+    def max_episode_duration(self):
+        """
+        Return the maximum duration (in number of steps) of the current episode.
+
+        Notes
+        -----
+        For possibly infinite episode, the duration is returned as `np.iinfo(np.int32).max` which corresponds
+        to the maximum 32 bit integer (usually `2147483647`)
+
+        """
+        tmp = dt_int(self.chronics_handler.max_episode_duration())
+        if tmp < 0:
+            tmp = dt_int(np.iinfo(dt_int).max)
+        return tmp
+
+    def set_max_iter(self, max_iter):
+        """
+
+        Parameters
+        ----------
+        max_iter: ``int``
+            The maximum number of iteration you can do before reaching the end of the episode. Set it to "-1" for
+            possibly infinite episode duration.
+
+        Notes
+        -------
+
+        Maximum length of the episode can depend on the chronics used. See :attr:`Environment.chronics_handler` for
+        more information
+
+        """
+        self.chronics_handler.set_max_iter(max_iter)
+
     @property
     def _helper_observation(self):
         return self._observation_space