## Question 1
The following table indicates the number of 6-point scores in an American rugby match in the 1979 season.

![](table1.png)

Based on these results, we create a Poisson distribution with the sample mean parameter  = 2.435. Is there any reason to believe that at a .05 level the number of scores is a Poisson variable?

In [1]:
# your answer here

import numpy as np
import pandas as pd
import seaborn as sns
from scipy.stats import ttest_1samp, ttest_rel, ttest_ind
from scipy.stats import f_oneway
from scipy.stats import poisson
from scipy import stats
import matplotlib.pyplot as plt


# poisson = how many events in timeframe

# H0: The sample data follow the poisson distribution
# H1: The sample data do not follow the poisson distribution

# For distribution tests, small p-values indicate that you can reject the null hypothesis and conclude 
# that your data were not drawn from a population with the specified distribution.

total_times = 448  # of observations
mu = 2.435         # sample mean parameter, used to create poisson distribution   

times = [35,99,104,110,62,25,10,3]  # y
scores = [0,1,2,3,4,5,6,7]  # x


n_scores = len(scores) # amount of probabilities to calculate

probability = [poisson.pmf(i,mu) for i in range(n_scores -1)] # 7 probability calculations assuming poisson 
probability.append(1-sum(probability)) # 8th probability calculation (because it's 7+)

exp_val = [p*total_times for p in probability] # with probability from poisson distribution --> calculate expected values

stats.chisquare(times, f_exp=exp_val) # comparison if exp_values are similar to given values 

# p-value we can't reject the null hypothesis, the given values don't represent a poisson distribution, number of scores is not a poisson varibale.

Power_divergenceResult(statistic=6.491310681109821, pvalue=0.4836889068537269)

## Question 2
The following are the ordered values of a random sample of SAT scores (university entrance exam) for several students: 852, 875, 910, 933, 957, 963, 981, 998, 1010, 1015, 1018, 1023, 1035, 1048, 1063. In previous years, the scores were presented by N (985,50). Based on the sample, is there any reason to believe that there has been a change in the distribution of scores this year? Use the level alpha = 0.05. 

In [8]:
#your answer here: one sample, did something change in the distribution?
# H0: The sample data follow the same distribution
# H1: The sample data do not follow the same distribution
# assuming that it's a normal distribution

scores = [852, 875, 910, 933, 957, 963, 981, 998, 1010, 1015, 1018, 1023, 1035, 1048, 1063] # test results

# loc and scale from year before to create the normal probability distribution

n = stats.norm(loc = 985, scale = 50) # generate a normal continuous random variable with given pararmeters as reference probability distribution 

stats.kstest(scores, n.cdf)  # kstest with (Cumulative distribution function)

# p-value is high -> we can not reject the H0

KstestResult(statistic=0.1581291279406798, pvalue=0.847406396427736)

## Question 3
Let's analyze a discrete distribution. To analyze the number of defective items in a factory in the city of Medellín, we took a random sample of n = 60 articles and observed the number of defectives in the following table:

![](table4.png)

A poissón distribution was proposed since it is defined for x = 0,1,2,3, .... using the following model:

![](image1.png)

Does the distribution of defective items follow this distribution?

In [11]:
# your code here: How mant times an event happend
# H0: The sample data follow the poisson distribution
# H1: The sample data do not follow the poisson distribution

# For distribution tests, small p-values indicate that you can reject the null hypothesis and conclude 
# that your data were not drawn from a population with the specified distribution.

total_observation = 60

freq = [32,15,0,9,4]
def_items = [0,1,2,3,4]

mu = (0*32 + 1*15 + 3*9 + 4*4) / total_observation 

n_def_items = len(def_items)

prob = [poisson.pmf(i,mu) for i in range(n_def_items)]

exp_val = [p*total_observation for p in prob] 

stats.chisquare(freq, f_exp=exp_val)

# p-value we can't reject the null hypothesis

Power_divergenceResult(statistic=37.72656768931596, pvalue=1.2759420913385983e-07)

## Question 4
A quality control engineer takes a sample of 10 tires that come out of an assembly line, and would like to verify on the basis of the data that follows, if the number of tires with defects observed over 200 days, if it is true that 5% of all tires have defects (that is, if the sample comes from a binomial population with n = 10 and p = 0.05). 

![](table6.png)


In [14]:
# your answer here: is the sample binomial distributed? (n=10, p=0.05)

from scipy.stats import binom

# binomial = k success in n trials

# H0: The sample data follow the binomial distribution
# H1: The sample data do not follow the binomial distribution

# For distribution tests, small p-values indicate that you can reject the null hypothesis and conclude 
# that your data were not drawn from a population with the specified distribution.

total_observations = 200

n = 10  # sample size

ob_val = [138,53,9]
defect = [0,1,2]

n_defect = len(defect)


probability = [binom.pmf(i,n,0.05) for i in range(n_defect -1)] # 2 probability calculations assuming poisson 
probability.append(1-sum(probability)) # 3rd probability calculation (because it's 2+)

exp_val = [p*total_observations for p in probability] 


stats.chisquare(ob_val, f_exp=exp_val)

# If the p-value is less than  α , then we reject the null hypothesis  H0 in favor of  HA
# p-value =0.0157 < 0.05
# The sample data do not follow the binomial distribution

Power_divergenceResult(statistic=8.30617951954273, pvalue=0.015715783395951474)

## Question 5
A researcher gathers information about the patterns of physical activity (AF) of children in the fifth grade of primary school of a public school. He defines three categories of physical activity (1 = Low, 2 = Medium, 3 = High). He also inquires about the regular consumption of sugary drinks at school, and defines two categories (1 = consumed, 0 = not consumed). We would like to evaluate if there is an association between patterns of physical activity and the consumption of sugary drinks for the children of this school, at a level of 5% significance. The results are in the following table: 

![](table5.png)

In [15]:
# your answer here - patterns of physical activity (AF) and sugar consumption - chi-square test for independence 
# compares two variables in a contingency table to see if they are related

low = [32,12]
medium = [14,22]
high = [6,9]

df = pd.DataFrame([low,medium,high], columns = {"yes","no"}, index = {"low","middle","high"})
df

Unnamed: 0,yes,no
low,32,12
high,14,22
middle,6,9


In [16]:
# H0: Variable sugar consumption and Variable physical activity are independent
# H1: They are not independent

stats.chisquare(df.yes, f_exp=df.no)

Power_divergenceResult(statistic=37.24242424242424, pvalue=8.182958137426038e-09)