# Problem

Suppose Candy Company XYZ produces lollipops. The company claims that 30% of the lollipops are cherry, 60% are grape, and 10% are lime.


    
Given a random sampling of 100 lollipops with 50 cherry, 45 grape, and 5 lime, determine whether or not this is consistent with the company's claim. Use a 0.05 level of significance.

In [1]:
import numpy as np
import scipy.stats as stats
import scipy

### Step 1: Stating the Hypotheses:

**Null hypothesis:** The proportion of cherry, grape, and lime is 30%, 60% and 10%, respectively. <br>
**Alternate hypothesis:** The proportion of cherry, grape, and lime != 30%, 60% and 10%, respectively.

### Step 2: Calculate the Degrees of Freedom

Degrees of Freedom = k - 1 <br>
There are three observations, or k = 3 <br>
Therefore, there degrees of freedom = 2 (3 -1)

### Step 3: Calculate the expected values

The expected values is calculated by the expected percent per 100 lollipops produced

In [2]:
ex_cherry = 100 * 0.30
ex_grape = 100 * 0.60
ex_lime = 100 * 0.10

In [3]:
# recording the observed values
ob_cherry = 50
ob_grape = 45
ob_lime = 5

### Step 4: Calculate chi-square test statistic

The chi-squared test statistic is calculated as follows: 

$\chi^2$ = $\sum\frac{({Observations-Expected Values})^2}{Expected Values}$

where $\chi^2$ is the chi square value

In [4]:
chi = ((ob_cherry - ex_cherry)*(ob_cherry - ex_cherry) / ex_cherry) + ((ob_grape - ex_grape)*(ob_grape - ex_grape) / ex_grape) + ((ob_lime - ex_lime)*(ob_lime - ex_lime) / ex_lime)

In [5]:
chi

19.583333333333336

Here, we calculated the chi-square value manually, but it does not mean much to us.  We need to look up its corresponding p-value on a Chi-Square table with 2 degrees of freedom  This will give us a p-value of 0.0001.  Alternatively, we can use a statistical package to do the work for us in the following step...  

### Step 5: Calculate p value

In [6]:
observed_values = np.array([ob_cherry, ob_grape, ob_lime])
n = observed_values.sum()

expected_values = np.array([ex_cherry, ex_grape, ex_lime])

chi_square_stat, p_value = stats.chisquare(observed_values, f_exp=expected_values)

print('At 5 %s level of significance, the p-value is %1.7f' %('%', p_value))

At 5 % level of significance, the p-value is 0.0000559


### Step 6:  Decide to reject or accept null hypothesis

The p value is 0.0000559 and is << than 0.05, therefore we cannot accept the null hypothesis. The low p-value is telling us that our observed distribution in our random sampling has a very low probably of being due to chance