### Knowing the probability distiburion of the demand of products can help determine the optimal order quantity.

The Newsvendor problem is a situation where there is uncertain demand for a perishable product.

Newspaper is an example. A newsvendor does not know how many newspapers they will sell that day, and they have to estimate the order quanity. If the amount the newsvendor ordered is too little and the demand is too large, then they lose potential income. If the demand is too little and they ordered too much newspaper, then they will not get a profit on the money they spent to buy the papers.

This problem can be optimized if we know the distribution of the demand *D*, *Co* or the costs of ordering one more newspaper over the demand (*d+1*), and *Cu* the cost of ordering one less newspaper under the demand (*d-1*).

Suppose the CDF of the demand distribution is *F(.)* and *y* is the order quantity. 

The optinal *y = F^-1 (Cu / (Cu + Co))*

We do not always know how the demand is distributed.

**But how can be estimate the distribution?**

In [None]:
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import norm

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

## Import Sales/Damand Data and Clean

The factor quantity represents demand of the products.

In [None]:
train_df = pd.read_csv("/kaggle/input/predict-demand/train.csv")
# train[6479:6490]
train  = train_df.drop(train_df.index[6480:]) # after row 6479, all the values are Nan.

In [None]:
train.tail()

In [None]:
test = pd.read_csv("/kaggle/input/predict-demand/test.csv")
test.tail()

In [None]:
train['date'] = pd.to_datetime(train['date']) # change date to datetime data type
train.info()

In [None]:
train.head()

In [None]:
pd.DataFrame(train.groupby('city').count()['id'])
# Here we see that these are cities in Greece.

In [None]:
pd.DataFrame(train.groupby('brand').count()['id'])
# Since there are multiple brands, we will just focus on the demand 

#### Let us consider the demand of only one product, Gazoza. Why did I choose that? Because Gazoza is a fun name.

In [None]:
gazoza = train[train['brand'] == 'gazoza'].reset_index(drop=True)

In [None]:
gazoza.head()

In [None]:
# q = gazoza.groupby(['date'], as_index=False).sum()
# sns.lineplot(x =q['date'], y=q['quantity'])

In [None]:
gazoza.describe()

In [None]:
# let's see how the demand of this product is distributed
sns.displot(gazoza['quantity'], kde = True, color = 'g')

In [None]:
sns.displot(data = gazoza, x = 'quantity', kde = True, col = 'container', color = 'g')

### Now let us try to fit a normal distribution to the demand of Gazoza in Greece
To do this, we find the sample average of the demand to approximate the mean.
We also find the sample standard deviation to approximate the true standard deviation.

In [None]:
# find the sample mean and sample standard deviation
mean = np.mean(gazoza['quantity'])
std = np.std(gazoza['quantity'])


In [None]:
# Plot the histogram.
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(x = gazoza['quantity'], 
         density=True, alpha = 0.6, 
         color='g', bins = 25, linewidth=1, edgecolor='black')


# Plot the PDF.
xmin, xmax = plt.xlim()
x = np.linspace(xmin, xmax, 100)
p = norm.pdf(x, mean, std)
plt.plot(x, p, 'k', linewidth=2)
title = "Fit results: mu = %.2f,  std = %.2f" % (mean, std)
plt.title(title)

plt.show()

(The distribution is not really normal. We'll deal with that in a bit.)

Now that we have fit a normal disribution to the demand data, we can calculate the optimal order quanity.

In [None]:
def orderQuan(mean, std, Cu, Co):
    # let y be the optimal order quanity
    criticalVal = Cu/(Cu+Co) # Critical Value for the optimal order quantity
    y = norm.ppf(criticalVal, mean, std) # Calculation for optimal order quanity
    return y

Now let's assume that the cost of underage Cu (!not under-age!) is 0.30 euros and the cost of underage Cu (not under-age) is 0.4 euros.

The optimal order quantity can be calculated as follows.

In [None]:
orderQuan(mean, std, 0.3, 0.4)

However, the histogram is skewed. This suggests that the true distribution of the demand is not a normal distribution. Hence we have to find other ways to estimate the distribution of the demand. Afterall, we need the distribution to solve for the optimal order quantity.

## Parametric Probability Density Distribution Estimation

Here, we can find the average and standard deviation of the sample data and use them to estimate the true distribution.

Let's assume that we know the true distribution of the demand to be a normal with mean of 1000 and standard deviation of 200.

How can we use different techniques to approximate the distribution of the demand?

Let us generate a sample data of size 1000 from a distribution of our choice.

In [None]:
mu, sigma = 1000, 200
sample = np.random.normal(mu, sigma, size=1000)
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(sample, bins = 25, edgecolor='black')

In [None]:
sample_mean = np.mean(sample)
sample_std = np.std(sample)
print('Mean = %.3f,  Standard Deviation = %.3f' % (sample_mean, sample_std))

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(sample, bins = 25, edgecolor='black', alpha = 0.6, density = True) 

# Plot the PDF.
xmin2, xmax2 = plt.xlim()
x2 = np.linspace(xmin2, xmax2, 100)
p2 = norm.pdf(x2, sample_mean, sample_std)
plt.plot(x2, p2, 'k', linewidth=2)
title2 = "Fit results: mu = %.2f,  std = %.2f" % (sample_mean, sample_std)
plt.title(title2)

plt.show()

We could have also gotten the same result by utilizing the ``stats.norm.fit()`` function. More on that later.

In [None]:
# for Cu = 190 and Co = 274
orderQuan(sample_mean, sample_std, Cu=190, Co=274)

This is a straightforward process when the distribution is normal. 

When the distribution is not normal, it may not be as easy to compute the parameters of the distribution. But we can have the computer do the computation.

### In the following code, we will try to fit multiple distribution to find the distribution with the smallest sse.

The ``stats.<distribution>.fit()`` function reutrns the MLEs for shape, location, and scale parameters from data. MLE stands for Maximum Likelihood Estimate.  Starting estimates for the fit are given by input arguments.

For the normal distribution, the same results could have been achieved by using ``stats.norm.fit()``. We can do this for multiple distibution and see which one fits the best.


[I also wanna point out there a risk of data snooping with this method here.]

In [None]:
%matplotlib inline
import warnings
import matplotlib
import scipy.stats as st
import statsmodels as sm

In [None]:
matplotlib.rcParams['figure.figsize'] = (16.0, 12.0)
matplotlib.style.use('ggplot')

# Create models from data
def best_fit_distribution(data, bins=200, ax=None):
    """Model data by finding best fit distribution to data"""
    # Get histogram of original data
    y, x = np.histogram(data, bins=bins, density=True)
    x = (x + np.roll(x, -1))[:-1] / 2.0

    # Distributions to check
    DISTRIBUTIONS = [        
        st.alpha,st.beta,st.betaprime,st.chi,st.chi2,st.cosine,st.dgamma,st.dweibull,st.erlang,
        st.expon,st.exponnorm,st.exponweib,st.exponpow,st.f,st.genlogistic,st.genpareto,st.gennorm,
        st.genexpon,st.genextreme,st.gausshyper,st.gamma,st.gengamma,st.invgamma,st.invgauss,
        st.invweibull,st.johnsonsb,st.johnsonsu,st.laplace,st.logistic,st.loggamma,st.loglaplace,
        st.lognorm,st.lomax,st.maxwell,st.nakagami,st.norm,st.pareto,st.pearson3,st.powerlaw,
        st.powerlognorm,st.reciprocal,st.triang,st.tukeylambda,st.uniform,st.weibull_min,st.weibull_max
    ]

    # Best holders
    best_distribution = st.norm
    best_params = (0.0, 1.0)
    best_sse = np.inf

    # Estimate distribution parameters from data
    for distribution in DISTRIBUTIONS:

        # Try to fit the distribution
        try:
            # Ignore warnings from data that can't be fit
            with warnings.catch_warnings():
                warnings.filterwarnings('ignore')

                # fit dist to data
                params = distribution.fit(data)

                # Separate parts of parameters
                arg = params[:-2]
                loc = params[-2]
                scale = params[-1]

                # Calculate fitted PDF and error with fit in distribution
                pdf = distribution.pdf(x, loc=loc, scale=scale, *arg)
                sse = np.sum(np.power(y - pdf, 2.0))

                # if axis pass in add to plot
                try:
                    if ax:
                        pd.Series(pdf, x).plot(ax=ax)
                    end
                except Exception:
                    pass

                # identify if this distribution is better
                if best_sse > sse > 0:
                    best_distribution = distribution
                    best_params = params
                    best_sse = sse

        except Exception:
            pass

    return (best_distribution.name, best_params)


In [None]:
# Load data from statsmodels datasets
data = pd.Series(gazoza['quantity'])

# Find best fit distribution
best_fit_name, best_fit_params = best_fit_distribution(data, 200)
print("The best Distibution is: " + best_fit_name)
print("The parameters are: ")
print(best_fit_params)
best_dist = getattr(st, best_fit_name)

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(data, bins = 25, edgecolor='black', alpha = 0.6, density = True) 

# Plot the PDF.
xmin5, xmax5 = plt.xlim()
x5 = np.linspace(xmin5, xmax5, 100)
paramSample = st.johnsonsu.fit(data)
p5 = st.johnsonsu.pdf(x5, paramSample[0], paramSample[1], paramSample[2], paramSample[3])
plt.plot(x5, p5, 'k', linewidth=2)
title5 = ""
plt.title(title5)

plt.show()

**Now that we know the parameters of the distrubution, we can easily compute the order quanity using the Newsvendor model formula**

In [None]:
Cu=0.3
Co=0.4
criticalVal = Cu/(Cu+Co) # Critical Value for the optimal order quantity
st.johnsonsu.ppf(criticalVal, paramSample[0], paramSample[1], paramSample[2], paramSample[3]) 
# Calculation for optimal order quanity

For the same values of Cu and Co, using the normal distribution, we got an order quantity of 40431.270, which is 3366.1535 more than what we got using the JohnsonSu distribution.

#### Let's see how the distribution fits for the demand from the test set (Demand for 2018).

In [None]:
gazozaTest = test[test['brand'] == 'gazoza'].reset_index(drop=True)

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(gazozaTest['quantity'], bins = 25, edgecolor='black', alpha = 0.6, density = True) 
plt.hist(data, bins = 25, edgecolor='black', alpha = 0.2, density = True) 

# Plot the PDF.
xmin6, xmax6 = plt.xlim()
x6 = np.linspace(xmin6, xmax6, 100)
paramSample = st.johnsonsu.fit(data)
p6 = st.johnsonsu.pdf(x6, paramSample[0], paramSample[1], paramSample[2], paramSample[3])
plt.plot(x6, p6, 'k', linewidth=2)
title6 = ""
plt.title(title6)

plt.show()

#### And let's see how it fits for other years.

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(gazoza[pd.DatetimeIndex(gazoza['date']).year ==2014]['quantity'], 
         bins = 25, edgecolor='black', alpha = 0.6, density = True) 
plt.hist(data, bins = 25, edgecolor='black', alpha = 0.2, density = True) 

# Plot the PDF.
xmin6, xmax6 = plt.xlim()
x6 = np.linspace(xmin6, xmax6, 100)
paramSample = st.johnsonsu.fit(data)
p6 = st.johnsonsu.pdf(x6, paramSample[0], paramSample[1], paramSample[2], paramSample[3])
plt.plot(x6, p6, 'k', linewidth=2)
title6 = ""
plt.title(title6)

plt.show()

In [None]:
def sse_johnsonsu(sse_data,train_data, bins =200):
    y, x = np.histogram(sse_data, bins=bins, density=True)
    x = (x + np.roll(x, -1))[:-1] / 2.0
    # Ignore warnings from data that can't be fit
    with warnings.catch_warnings():
        warnings.filterwarnings('ignore')
        # fit dist to data
        params = st.johnsonsu.fit(train_data)
        # Separate parts of parameters
        arg = params[:-2]
        loc = params[-2]
        scale = params[-1]
        # Calculate fitted PDF and error with fit in distribution
        pdf = st.johnsonsu.pdf(x, loc=loc, scale=scale, *arg)
        sse = np.sum(np.power(y - pdf, 2.0))
    return sse

print("SSE for train data is: %.9f" % (sse_johnsonsu(data, data)))
print("SSE for test data is: %.9f" % (sse_johnsonsu(gazozaTest['quantity'], data)))
print("SSE for 2014 data is: %.9f" % (sse_johnsonsu(gazoza[pd.DatetimeIndex(gazoza['date']).year ==2014]['quantity'], data)))
print("SSE for 2015 data is: %.9f" % (sse_johnsonsu(gazoza[pd.DatetimeIndex(gazoza['date']).year ==2015]['quantity'], data)))
print("SSE for 2016 data is: %.9f" % (sse_johnsonsu(gazoza[pd.DatetimeIndex(gazoza['date']).year ==2014]['quantity'], data)))
print("SSE for 2017 data is: %.9f" % (sse_johnsonsu(gazoza[pd.DatetimeIndex(gazoza['date']).year ==2015]['quantity'], data)))

## Non-Parametric Density Estimation
1. Using Kernel Density Estimation

read more: https://machinelearningmastery.com/probability-density-estimation/

In [None]:
from sklearn.neighbors import KernelDensity

In [None]:
# the library espects the data to be 2D, so let's reshape it
sampleArray = sample.reshape((len(sample), 1))

In [None]:
# we can try multiple bandwidth values and kernels. Let's start with bandwidth=2, kernel='gaussian'
model = KernelDensity(bandwidth=50, kernel='gaussian')
# I kept inccreasing the bandwidth, 
# but it is important to note that using a low bandwidth can result in overfitting.

model.fit(sampleArray)

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(sampleArray, bins = 25, edgecolor='black', alpha = 0.6, density = True) 

# Plot the PDF.
# values = np.asarray([value for value in range(200, 1600)])
# values = values.reshape((len(values), 1))
xmin3, xmax3 = plt.xlim()
x3 = np.linspace(xmin3, xmax3, 100)
x3 = x3.reshape((len(x3), 1))
pKernel = model.score_samples(x3)
pKernel = np.exp(pKernel)
title3 = "Fit results using Kernel Density Estimators"
plt.plot(x3[:], pKernel)
plt.title(title3)

plt.show()

In [None]:
model.get_params(deep=True)

Now we can generate a sample from the Kenerl density model that we have gotten.

In [None]:
kernel_sample = model.sample(n_samples=1000, random_state=None)

In [None]:
plt.hist(kernel_sample, bins = 25, edgecolor='black', alpha = 0.6, density = False, color = 'c') 

In [None]:
print('Kernel Sample Means = %.2f, and Standard Deviation = %.2f'% (np.mean(kernel_sample), np.std(kernel_sample)))

We can see that the non-parametric method also yielded a good estimate of the true mean and true standard deviation.

Let's apply this non-parametric approach to the gazoza demand data.

In [None]:
model2 = KernelDensity(bandwidth=1000, kernel='gaussian')

gazozaArray = gazoza.loc[:,'quantity'].values
gazozaArray = gazozaArray.reshape((len(gazozaArray),1))

model2.fit(gazozaArray)

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(gazozaArray, bins = 25, edgecolor='black', alpha = 0.6, density = True, color = 'g') 

# Plot the PDF.
# values = np.asarray([value for value in range(200, 1600)])
# values = values.reshape((len(values), 1))
xmin4, xmax4 = plt.xlim()
x4 = np.linspace(xmin4, xmax4, 100)
x4 = x4.reshape((len(x4), 1))
pGazoza = model2.score_samples(x4)
pGazoza = np.exp(pGazoza)
title4 = "Fit results using Kernel Density Estimators for Gazoza Data"
plt.plot(x4[:], pGazoza)
plt.title(title4)

plt.show()

In [None]:
# Plot the Histogram
plt.figure(figsize=(8, 6), dpi=80)
plt.hist(gazozaTest['quantity'], bins = 25, edgecolor='black', alpha = 0.5, density = True, color = 'm') 
plt.hist(gazozaArray, bins = 25, edgecolor='black', alpha = 0.2, density = True, color = 'g') 

# Plot the PDF.
# values = np.asarray([value for value in range(200, 1600)])
# values = values.reshape((len(values), 1))
xmin4, xmax4 = plt.xlim()
x4 = np.linspace(xmin4, xmax4, 100)
x4 = x4.reshape((len(x4), 1))
pGazoza = model2.score_samples(x4)
pGazoza = np.exp(pGazoza)
title4 = "Fit results using Kernel Density Estimators for Gazoza Data"
plt.plot(x4[:], pGazoza)
plt.title(title4)

plt.show()

#### But, how can be use this non-parametric approach in the Newsvendor model?

- The critical value Cu/Cu+Co is equvalent to the area under the curve (AUC) of a probability desnity function to the left of the optimal order quanity. Thus, once we know the critical calue, we can find which order quantity gives an AUC of that value.