## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [4]:
import numpy as np
import pandas as pd
import scipy.stats as st
from scipy.stats import poisson
from scipy.stats import binom

In [5]:
# your answer here
# our observation O:
O = [35,99,104,110,62,25,10,3]
alpha = 0.05
mu = 2.435
poisson_dist = poisson(mu)

# poisson distribution for 1 to 6th value of probability
poisson_pmfs = np.array([poisson_dist.pmf(i) for i in range(0,7)])  
# probabilityo of scoring more than 7 will be 1 - cumulative probability of getting less than 7 i.e. sum of all value poisson_pmfs
poisson_pmfs_tail = 1-sum(poisson_pmfs) 
# Now our final pmfs will be with the tail value
with_tail = np.append(poisson_pmfs, poisson_pmfs_tail)
# Calcualte the expected score according to our calculated pmfs where population will be the sum of our observed score
E = with_tail * sum(O)

stats, p_value = st.chisquare(f_obs = O, f_exp  = E)
print(f"p_value is: {p_value}")
if p_value < alpha:
    print("We can reject the null hypothesis :  there is no reason to believe that at a .05 level the number of scores is a Poisson variable")
else: 
    print("We can not reject the null hypothesis: there is reason to believe that at a .05 level the number of scores is a Poisson variable")


p_value is: 0.483688906853727
We can not reject the null hypothesis: there is reason to believe that at a .05 level the number of scores is a Poisson variable


## BONUS/OPTIONAL - Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

For some extra insights check the following link: https://online.stat.psu.edu/stat504/node/63/ 

Does the distribution of defective items follow this distribution?

In [34]:
# your code here
defective_quantity = np.array([0, 1, 2, 3, 4])
O = np.array([32,15, 0, 9, 4])
# mu = ∑(x * f(x)) / n
mu = ((0*32)+(1*15)+(3*15)+(4*4))/60
alpha = 0.05
poisson_dist = poisson(mu)

# poisson distribution for each observed defective items
pmf_0 = poisson_dist.pmf(0)
pmf_1 = poisson_dist.pmf(1)
pmf_2 = poisson_dist.pmf(2)
pmf_3 = poisson_dist.pmf(3)
pmf_4 = poisson_dist.pmf(4)


# calculate pmf for each occurance
pmfs = np.array([pmf_0, pmf_1, pmf_2, pmf_3, pmf_4])
# Calcualte the expected frequency according to our calculated pmfs where population will be the sum of our observed frequency
E = pmfs * sum(O)
# normalize the expected frequency with the observed frequency
E_norm = E / np.sum(E) * np.sum(O)

stats, p_value = st.chisquare(f_obs = O, f_exp  = E_norm, ddof = 1)

print(f"p_value is: {p_value}")
if p_value < alpha:
    print("We can reject the null hypothesis :  there is no reason to believe that at a .05 level the number of defective items follow Poisson distribution")
else: 
    print("We can not reject the null hypothesis: there is reason to believe that at a .05 level the number of defective items follow Poisson distribution")



p_value is: 3.008342324775625e-07
We can reject the null hypothesis :  there is no reason to believe that at a .05 level the number of defective items follow Poisson distribution


## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [40]:
# your answer here
n = 10
p = 0.05
O = np.array([138, 53, 9])
Population = O.sum()

binom_dist = binom(n,p)
binom_pmfs = np.array([binom_dist.pmf(i) for i in range(2)])
pmf_tail = 1 - binom_pmfs.sum()
with_tail = np.append(binom_pmfs, pmf_tail)

E = with_tail * Population

stats, p_value = st.chisquare(O, E)

print(stats)
print(f"P_value: {p_value}")

if p_value < alpha:
    print("We can reject the null hypothesis")
else:
    print("We can not reject the null hypothesis")






8.306179519542757
P_value: 0.015715783395951262
We can reject the null hypothesis


## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [47]:
#your answer here

# H0: there is no association between physical activity and the consumption of sugary drinks
# H1: there is an association between physical activity and the consumption of sugary drinks

alpha = 0.05
category = np.array([[32, 12],
                     [14, 22],
                     [6, 9]])
stats, p_value, df, E = st.chi2_contingency(category)
print(p_value)
if p_value < alpha:
    print("We can reject the null hypothesis: There is association between physical activity and sugary drinks consumption")
else:
    print("We can not reject the null hypothesis. We can not say that there is association between phsical activity and sugary drinks consumption")

0.004719280137040844
We can reject the null hypothesis: There is association between physical activity and sugary drinks consumption
