In [48]:
import pandas as pd
import numpy as np
import scipy.stats as st

## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [49]:
from scipy.stats import poisson

n=448
f_obs = [35,99,104,110,62,25,10,3]
mu=2.435
poisson_dist=poisson(mu)
f_exc = [poisson_dist.pmf(i)*n for i in range(8)]
f_exc

pValue,tStat = st.chisquare(f_obs, f_exc)
print("P-Value:{0} T-Statistic:{1}".format(pValue,tStat))

P-Value:5.526588649191276 T-Statistic:0.5959787428784398


In [50]:
# we can't reject the null hypothesis. The number of scores is a Poisson variable.

## Question 2
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table2.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

Does the distribution of defective items follow this distribution?

In [51]:
n = 60
items = [0,1,3,4]
obs = [32,15,9,4]

# in this we have to calculate lambda (igual ao número esperado de ocorrências que ocorrem num dado intervalo de tempo)
lam = sum([i[0]*i[1] for i in list(zip(items,obs))])/sum(obs)

poisson_dist=poisson(lam)
f_exc = [poisson_dist.pmf(i)*n for i in range(4)]

pValue,tStat = st.chisquare(obs, f_exc)
print("P-Value:{0} T-Statistic:{1}".format(pValue,tStat))

P-Value:6.3034965141828545 T-Statistic:0.09774272956839174


In [52]:
# p-value is greater than 0.05 and we can't reject the null hypothesis. So yes, the distribution of defective items follow this distribution.

## Question 3
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table3.png)


In [56]:
from scipy.stats import binom

n = 10
p = 0.05

obs_tires = [138,53,9]
binomial_dist = binom(n,p)
f_exp = [binomial_dist.pmf(i)*200 for i in range(3)]

pValue,tStat = st.chisquare(obs_tires, f_exp)
print("P-Value:{0} T-Statistic:{1}".format(pValue,tStat))

P-Value:6.730152995610064 T-Statistic:0.034559372300758


In [58]:
# we reject the null hypothesis, the sample doesn't comes from a binomial population with n = 10 and p = 0.05.

## Question 4
A researcher gathers information about the patterns of Physical Activity of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (Low, Medium, High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (Yes = consumed, No = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table4.png)

In [None]:
chi2, p, dof, ex = st.chi2_contingency(table)

In [62]:
yes = [32,14,6]
no = [12,22,9]

table = np.array([yes,no])

chi2, p, dof, ex = st.chi2_contingency(table)
print('The test statistic is {0} \nThe p-value of the test is {1} \nDegrees of freedom = {2} \nThe expected frequencies (array), based on the marginal sums of the table are {3}'.format(chi2,p,dof,ex))

The test statistic is 10.712198008709638 
The p-value of the test is 0.004719280137040844 
Degrees of freedom = 2 
The expected frequencies (array), based on the marginal sums of the table are [[24.08421053 19.70526316  8.21052632]
 [19.91578947 16.29473684  6.78947368]]


In [None]:
# There isn't an association between patterns of physical activity and the consumption of sugary drinks for the children of this school because p-value is less than 0.05.