# Implement the `reset()` method of `InventoryEnvHard` (with goodwill penalty)

The `reset()` method is used to define the initial state in an episode. Since the `goodwill_penalty_per_unit` is now part of the state, we need to pick a value for it in the `reset()` method. 

In each episode of `InventoryEnvHard`, we want to randomly pick a different value for `goodwill_penalty_per_unit` within the allowed range. We may use the **uniform distribution** to pick the values. 

In the following code block, the correct `__init__()` method (from last exercise) is already included. Your job is to define the `reset()` method. 

To get you started, I have copied the `reset()` code from the video lesson into the `reset()` method of the `InventoryEnvHard` class. Your job is to edit the `reset()` method  to return the random initial state in the harder problem.

In [None]:
import gym
from gym.spaces import Box
import numpy as np
from numpy.random import default_rng


class InventoryEnvHard(gym.Env):
    def __init__(self):
        """
        Must define self.observation_space and self.action_space here
        """
        self.max_capacity = 4000

        self.action_space = Box(low=np.array([0]), high=np.array([self.max_capacity]))

        self.lead_time = 5
        self.obs_dim = self.lead_time + 5

        self.max_mean_daily_demand = 200
        self.max_unit_selling_price = 100
        self.max_daily_holding_cost_per_unit = 5
        self.max_goodwill_penalty_per_unit = 10

        obs_low = np.zeros((self.obs_dim,))
        obs_high = np.array([self.max_capacity for _ in range(self.lead_time)] +
                            [self.max_mean_daily_demand, self.max_unit_selling_price,
                             self.max_unit_selling_price, self.max_daily_holding_cost_per_unit,
                             self.max_goodwill_penalty_per_unit
                             ]
                            )
        self.observation_space = Box(low=obs_low, high=obs_high)
        
        self.rng = default_rng()
        self.current_obs = None
        
    def reset(self):
        """
        Returns: the observation of the initial state
        Reset the environment to initial state so that a new episode (independent of previous ones) may start
        """
        mean_daily_demand = self.rng.uniform() * self.max_mean_daily_demand
        selling_price = self.rng.uniform() * self.max_unit_selling_price
        buying_price = self.rng.uniform() * selling_price
        daily_holding_cost_per_unit = self.rng.uniform() * min(buying_price,
                                                               self.max_daily_holding_cost_per_unit
                                                               )
        
        self.current_obs = np.array([0 for _ in range(self.lead_time)] +
                                    [mean_daily_demand, selling_price, buying_price,
                                     daily_holding_cost_per_unit,
                                     ]
                                    )
        return self.current_obs

    def step(self, action):
        """
        Returns: the next observation, the reward, done and optionally additional info
        """
        raise NotImplementedError

    def render(self, mode="human"):
        """
        Returns: None
        Show the current environment state e.g. the graphical window in `CartPole-v1`
        This method must be implemented, but it is OK to have an empty implementation if rendering is not
        important
        """
        pass

    def close(self):
        """
        Returns: None
        This method is optional. Used to cleanup all resources (threads, graphical windows) etc.
        """
        pass
    
    def seed(self, seed=None):
        """
        Returns: List of seeds
        This method is optional. Used to set seeds for the environment's random number generator for 
        obtaining deterministic behavior
        """
        return