Update to support tuple state #14

sbsekiguchi · 2021-06-30T10:09:40Z

This PR is for updating the codes to support tuple state
Following features were implemented.

Updated EnviromentInfo
- Because of gym.Discrete implementation, action_shape and state_shape of EnviromentInfo also returned empty when action_shape and state_shape are discrete. I fixed this to return (1, ). Then, we can use action_shape and state_shape to make nnabla variable.

Example:

    def _setup_training_variables(self, batch_size) -> TrainingVariables:
        # Training input variables
        s_current_var = nn.Variable((batch_size, *self._env_info.state_shape))
        if self._env_info.is_discrete_action_env():  
            a_current_var = nn.Variable((batch_size, 1))
        else:
            a_current_var = nn.Variable((batch_size, self._env_info.action_dim))
        s_next_var = nn.Variable((batch_size, *self._env_info.state_shape))
        reward_var = nn.Variable((batch_size, 1))
        gamma_var = nn.Variable((1, 1))
        non_terminal_var = nn.Variable((batch_size, 1))
        s_next_var = nn.Variable((batch_size, *self._env_info.state_shape))
        weight_var = nn.Variable((batch_size, 1))

        training_variables = TrainingVariables(batch_size=batch_size,
                                               s_current=s_current_var,
                                               a_current=a_current_var,
                                               reward=reward_var,
                                               gamma=gamma_var,
                                               non_terminal=non_terminal_var,
                                               s_next=s_next_var,
                                               weight=weight_var)
        return training_variables

    def _setup_training_variables(self, batch_size) -> TrainingVariables:
        # Training input variables
        s_current_var = create_variable(batch_size, self._env_info.state_shape)
        a_current_var = create_variable(batch_size, self._env_info.action_shape)  # we can use shape instead of dim 
        s_next_var = create_variable(batch_size, self._env_info.state_shape)
        reward_var = nn.Variable((batch_size, 1))
        gamma_var = nn.Variable((1, 1))
        non_terminal_var = nn.Variable((batch_size, 1))
        weight_var = nn.Variable((batch_size, 1))

        training_variables = TrainingVariables(batch_size=batch_size,
                                               s_current=s_current_var,
                                               a_current=a_current_var,
                                               reward=reward_var,
                                               gamma=gamma_var,
                                               non_terminal=non_terminal_var,
                                               s_next=s_next_var,
                                               weight=weight_var)
        return training_variables

Implemented three useful functions to support the tuple state
- create_variable(utils/misc.py)
- set_data_to_variable(utils/data.py)
- add_batch_dimension(utils/data.py).
In each algorithm, added a class method, is_support_env, to return a bool. This flag shows if the algorithm supports the environment.

ishihara-y

LGTM

Update to support tuple state

c2212ae

sbsekiguchi self-assigned this Jun 30, 2021

sbsekiguchi requested a review from ishihara-y June 30, 2021 10:17

ishihara-y approved these changes Jul 1, 2021

View reviewed changes

ishihara-y merged commit a673c54 into master Jul 1, 2021

ishihara-y deleted the feature/20210517-support-tuple-state-action branch July 1, 2021 01:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update to support tuple state #14

Update to support tuple state #14

sbsekiguchi commented Jun 30, 2021 •

edited

ishihara-y left a comment

Update to support tuple state #14

Update to support tuple state #14

Conversation

sbsekiguchi commented Jun 30, 2021 • edited

ishihara-y left a comment

Choose a reason for hiding this comment

sbsekiguchi commented Jun 30, 2021 •

edited