# Risk Management

Informed decision is made based on data. We do not need to have the entire data to make one decision. That would take forever to make just one decision. Therefore, a notion of sampling comes into picture as the essential part of making informed decision. 

![Risk Management in a Nutshell](https://www.invensislearning.com/blog/wp-content/uploads/2020/07/Common-Examples-of-Risk-Management-1068x552-1-1024x529.jpg)


If we talk about sampling, we have to touch on statistics real quick. I am not an expert in Math nor statistics, but this is my understanding of statistics based on [Investopia](https://www.investopedia.com/terms/s/statistics.asp):

> Statistics is a branch of applied mathematics that involves the collection, description, analysis, and inference of conclusions from quantitative data.

I am going to point you to a {cite}`sampling` definition:
> Sampling is the selection of a subset (a statistical sample) of individuals from within a statistical population to estimate characteristics of the whole population.

To determine a sample size, I found that {cite}`cochran_formula` can be used to calculate an ideal size given several factors.

```{math}
:label: sampling-formula
n_0 = \frac{Z^2pq}{e^2}
```

> where:
> - e is the desired level of precision (i.e. the margin of error)
> - p is the (estimated) proportion of the population which has the attribute in question
> - q is 1 - p

Honestly, I will make my example simple for the purpose of this project.

## ![Earnest Ice Cream](https://earnestice.wpenginepowered.com/wp-content/uploads/2021/11/Home-Med-WereHiring-1.jpg)

For example, I would like to find out through an online survey about Earnest's favorite ice cream flavor. This would help Earnest to determine which flavor they should produce more or less, and subsequently reducing the risk of producing something people do not like. So they have 4 locations and so many customers. There is no way I could collect everyone's opinion. My goal is to collect at least 200 inputs.

Assuming I have the result and I would like to build a quick dataframe for exploration, I am going to use this code:

```python
item1 = ['Cookie and Cream', 'Regular' '20']

item2 = ['Matcha', 'Vegan', '15']
etc...

column_names = ['Ice Cream Flavor', 'Dietary', 'Vote']

table_1 = pd.Dataframe(data=[item1,item2, etc], columns=column_names)
```

To generate the table, I am going to use this code:

```python
print(table_1)
```

From the table, we can sort the result from the highest number to the lowest and determine the fractions of each flavor to answer our problem:
> Which flavor Earnest should produce more?
> Which flavor Earnest should produce less?

The formula for the fraction is super simple. I don't need to find any reference to spell this out:
$$
\frac{\chi}{y} * 100 = P 
$$

> where:
>
> P is a percentage
>
> $\chi$ is observation 1 for example
>
> y is the total observation

In [1]:
#building the dataframe

import pandas as pd
item1 = ['Whiskey Hazelnut', 'Regular', 5]
item2 = ['Cookie n Cream', 'Regular', 50]
item3 = ['Vegan Choco', 'Vegan', 20]
item4 = ['Matcha', 'Regular', 15]
item5 = ['Vegan Maple Walnut', 'Vegan', 10]
item6 = ['Espresso Flake', 'Regular', 17]
item7 = ['Oatmeal Brown Sugar', 'Regular', 3]
item8 = ['Serious Chocolate', 'Regular', 45]
item9 = ['London Fog', 'Regular', 24]
item10 = ['Salted Caramel', 'Regular', 11]
column_names = ['Flavor', 'Dietary', 'Vote']

earnest_df = pd.DataFrame(data=[item1, item2, item3, item4, item5, item6, item7, item8, item9, item10], columns=column_names)
earnest_df


Unnamed: 0,Flavor,Dietary,Vote
0,Whiskey Hazelnut,Regular,5
1,Cookie n Cream,Regular,50
2,Vegan Choco,Vegan,20
3,Matcha,Regular,15
4,Vegan Maple Walnut,Vegan,10
5,Espresso Flake,Regular,17
6,Oatmeal Brown Sugar,Regular,3
7,Serious Chocolate,Regular,45
8,London Fog,Regular,24
9,Salted Caramel,Regular,11


In [2]:
# calculate the total of vote

earnest_total_vote = earnest_df[['Vote']].sum()
earnest_total_vote

Vote    200
dtype: int64

In [3]:
#to sort the dataframe from the highest vote to the lowest vote

earnest_df = earnest_df.sort_values('Vote', ascending=False)
earnest_df

Unnamed: 0,Flavor,Dietary,Vote
1,Cookie n Cream,Regular,50
7,Serious Chocolate,Regular,45
8,London Fog,Regular,24
2,Vegan Choco,Vegan,20
5,Espresso Flake,Regular,17
3,Matcha,Regular,15
9,Salted Caramel,Regular,11
4,Vegan Maple Walnut,Vegan,10
0,Whiskey Hazelnut,Regular,5
6,Oatmeal Brown Sugar,Regular,3


In [4]:
#showing the top three favorite ice cream flavor

earnest_df_top3 = earnest_df.head(3)
earnest_df_top3

Unnamed: 0,Flavor,Dietary,Vote
1,Cookie n Cream,Regular,50
7,Serious Chocolate,Regular,45
8,London Fog,Regular,24


In [5]:
#showing the top three least favorite ice cream flavor
earnest_df_bottom3 = earnest_df.loc[4:6]
earnest_df_bottom3

Unnamed: 0,Flavor,Dietary,Vote
4,Vegan Maple Walnut,Vegan,10
0,Whiskey Hazelnut,Regular,5
6,Oatmeal Brown Sugar,Regular,3


In [6]:
!pip install altair vega_datasets



In [7]:
import altair as alt
earnest_chart = alt.Chart(earnest_df).mark_bar().encode(
    x='Flavor',
    y='Vote'
)
earnest_chart

This is the end of the example for data sampling exploration with Earnest scenario.
The conclusion is that Earnest should consider to produce more of Cookie and Cream, Serious Chocolate, and London Fog, and maybe they have to consider produce less of the Oatmeal Brown Sugar, Whiskey Hazelnut, and Vegan Maple Walnut.