# Inventory management with penalty for unfulfilled demand

## Introduction

<img src="images/2.png" width="500"/>

If a customer comes but your inventory is empty, you have to send the customer away empty-handed (unfulfilled demand). It's a bad experience for the customer, and every time this happens, the customer is less likely to come back to your shop and more likely to go to a competitor. Basically, unfulfilled demand leads to a loss of customer goodwill, and that leads to loss of potential profits.

We do not account for this in the course lessons. But in the course exercises, you are tasked to solve the harder (and more realistic) problem that takes this phenomenon into account. That way you can get your hands dirty and practice what you are learning. Are you up for the challenge?


## Modeling goodwill loss due to unfulfilled demand

Goodwill loss can be best modeled at the level of individual customers. But since our model is at the level of aggregated demand (and not individual customers), we will take a simple approach of punishing the agent each time demand is unfulfilled. This means adding the following term to the reward function.

$$-k \max(0, d - I)$$.

Here, $d$ is the realized demand for the day (sampled from a Poisson distribution), $I$ is the on-hand-inventory, and $k$ is the punishment for unit unfulfilled demand. We will call $k$ `goodwill_penalty_per_unit` in our code.

## State

We want to solve the problem for a defined range of the `goodwill_penalty_per_unit` parameter. 

$$ 0 \le \mathrm{goodwill\_penalty\_per\_unit} \le 10 $$

In each simulation, we will use a different value of this parameter, choosing randomly. So the `goodwill_penalty_per_unit` needs to be a part of our state. The modified state is going to look as follows.


$$
\begin{aligned} 
    s_t & = [ \mathrm{on\_hand\_inventory}_{t}, a_{t-4}, a_{t-3}, a_{t-2}, a_{t-1}, \lambda, \\
        & \phantom{ = [ } \mathrm{unit\_selling\_price}, \mathrm{unit\_buying\_price}, \\
        & \phantom{ = [ } \mathrm{daily\_unit\_holding\_price}, \mathrm{goodwill\_penalty\_per\_unit}]
\end{aligned}
$$.

Here we have added `goodwill_penalty_per_unit` at the end of the array.

The max value for `goodwill_penalty_per_unit` that we will consider is `10`. This is necessary for defining the high of the observation space.

## Your task

In the following code block, I have defined the `InventoryEnvHard` class for simulating the inventory management problem with goodwill penalty for unfulilled demand. In the `__init__()` method, I have copied the code from the video lesson. This code corresponds to `InventoryEnv` i.e. without the goodwill penalty. Edit the `__init__()` method to account for the state definition in the harder problem.

In [None]:
import gym
from gym.spaces import Box
import numpy as np


class InventoryEnvHard(gym.Env):
    # Edit the __init__() method to match the state definition in the harder problem
    def __init__(self):
        """
        Must define self.observation_space and self.action_space here
        """

        self.max_capacity = 4000
        
        self.action_space = Box(low=np.array([0]), high=np.array([self.max_capacity]))
        
        self.lead_time = 5
        self.obs_dim = self.lead_time + 4
        
        self.max_mean_daily_demand = 200
        self.max_unit_selling_price = 100
        self.max_daily_holding_cost_per_unit = 5
        
        obs_low = np.zeros((self.obs_dim,))
        obs_high = np.array([self.max_capacity for _ in range(self.lead_time)] +
                            [self.max_mean_daily_demand, self.max_unit_selling_price,
                             self.max_unit_selling_price, self.max_daily_holding_cost_per_unit
                             ]
                            )
        self.observation_space = Box(low=obs_low, high=obs_high)

    def reset(self):
        """
        Returns: the observation of the initial state
        Reset the environment to initial state so that a new episode (independent of previous ones) may start
        """
        raise NotImplementedError

    def step(self, action):
        """
        Returns: the next observation, the reward, done and optionally additional info
        """
        raise NotImplementedError

    def render(self, mode="human"):
        """
        Returns: None
        Show the current environment state e.g. the graphical window in `CartPole-v1`
        This method must be implemented, but it is OK to have an empty implementation if rendering is not
        important
        """
        pass

    def close(self):
        """
        Returns: None
        This method is optional. Used to cleanup all resources (threads, graphical windows) etc.
        """
        pass
    
    def seed(self, seed=None):
        """
        Returns: List of seeds
        This method is optional. Used to set seeds for the environment's random number generator for 
        obtaining deterministic behavior
        """
        return