## Creating a custom `gym` environment for the Inventory Management problem Part 2: Implementing `__init__()`

Task: We have to create a **custom `gym` environment** based on our knowledge of the states, actions, state transition and reward function.

<img src="images/shop.png" width="500"/>

<img src="images/state_action_transition_rewards.png" width="1000"/>

In [1]:
import gym


class InventoryEnv(gym.Env):
    def __init__(self):
        """
        Must define self.observation_space and self.action_space here
        """
        
        # Define action space: bounds, space type, shape
        
        # Bound: Shelf space is limited
        self.max_capacity = 4000
        
        # Space type: Better to use Box than Discrete, since Discrete will lead to too many output nodes in the NN
        
        # Shape: rllib cannot handle scalar actions, so turn it into a numpy array with shape (1,)
        self.action_space = Box(low=np.array([0]), high=np.array([self.max_capacity]))
        
        # Define observation space: bounds, space type, shape
        
        # Shape: The lead time controls the shape of observation space
        self.lead_time = 5
        self.obs_dim = self.lead_time + 4
        
        # Bounds: Define low and high of the remaining observation space elements
        obs_low = np.zeros((self.obs_dim,))
        
        self.max_mean_daily_demand = 200
        self.max_unit_selling_price = 100 
        self.max_daily_holding_cost_per_unit = 5
        
        obs_high = np.array([self.max_capacity for _ in range(self.lead_time)] + 
                            [self.max_mean_daily_demand, self.max_unit_selling_price,
                             self.max_unit_selling_price, self.max_daily_holding_cost_per_unit
                             ]
                            )
        self.observation_space = Box(low=obs_low, high=obs_high)

    def reset(self):
        """
        Returns: the observation of the initial state
        Reset the environment to initial state so that a new episode (independent of previous ones) may start
        """
        raise NotImplementedError

    def step(self, action):
        """
        Returns: the next observation, the reward, done and optionally additional info
        """
        raise NotImplementedError

    def render(self, mode="human"):
        """
        Returns: None
        Show the current environment state e.g. the graphical window in `CartPole-v1`
        This method must be implemented, but it is OK to have an empty implementation if rendering is not
        important
        """
        pass

    def close(self):
        """
        Returns: None
        This method is optional. Used to cleanup all resources (threads, graphical windows) etc.
        """
        pass
    
    def seed(self, seed=None):
        """
        Returns: List of seeds
        This method is optional. Used to set seeds for the environment's random number generator for 
        obtaining deterministic behavior
        """
        return

| Observation/Action | Sample | `gym Space` |
| --- | --- | --- |
| (Array of) floating point numbers | `[ 1.2, 3.4, 4.1, 0.9]` | `Box` |
| Integer | 1 | `Discrete` |
| Array of integers | `[2, 1]` | `MultiDiscrete` |

- [Documentation of all `gym` `Space`s](https://www.gymlibrary.dev/api/spaces/)