# Universidad Panamericana
**Estadística inferencial**

Julio César Galindo López \
Economía \
Cuarto semestre


Juan Álvaro Díaz Raimond Kedilhac

# Problem 2

A random sample $X_1, X_2, X_3, \ldots, X_{100}$. $X$ is given from a distribution with known variance $Var(X_i) = 16$. For the observed sample, the sample mean is $\bar{X} = 23.5$. Find an approximate $95\%$ confidence interval for $\theta =\mathbb{E}[X_i]$.

In [None]:
import scipy.stats as sp
import numpy as np

sigma2 = 16
x_bar = 23.5
n = 100
z = sp.norm.ppf(0.975)

theta = (x_bar - z*np.sqrt(sigma2/n), x_bar + z*np.sqrt(sigma2/n))
print(theta)

(22.71601440618398, 24.28398559381602)


# Problem 3

To estimate the portion of voters who plan to vote for Candidate A in an election, a random sample of size n from the voters is chosen. The sampling is done with replacement. Let $\theta$ be the portion of voters who plan to vote for Candidate A among all voters. How large does $n$ need to be so that we can obtain a $90\%$ confidence interval with $3\%$ margin of error? That is, how large $n$ needs to be such that

$$\mathbb{P}\left( \bar{X} - 3\% \leq \theta \leq \bar{X} + 3\%  \right) \geq 90\% $$

In [None]:
from sympy import *
import scipy.stats as sp
import numpy as np

z = sp.norm.ppf(0.95)
p = 0.5
n = symbols('n')

n_sample = solve(z*p/n**0.5 - 0.03)[0]
print(f'n = {n_sample:.0f}')

n = 752


# Problem 4

a) Let $X$ be a random variable such that $R_X⊂[a,b]$, i.e., we always have $a \leq X \leq b$. Show that

$$ Var(X) \leq \frac{(b-a)^2}{4} $$

\begin{align}
Y &= X - \frac{a + b}{2} \\
Var[X] &= Var[Y] \\
&= \mathbb{E}[Y^2] - \mu^2_Y \\
&\leq  \mathbb{E}[Y^2] \text{, since } Y^2 \leq \frac{(b-a)^2}{4} \\
&\leq \frac{(b-a)^2}{4}
\end{align}

b) Let $X_1, X_2, X_3, \ldots, X_n$ be a random sample from an unknown distribution with CDF $F_X(x)$ such that $R_X⊂[a,b]$. Specifically, $\mathbb{E}[X]$ and $Var(X)$ are unknown. Find a $(1−\alpha)100\%$ confidence interval for $\theta=\mathbb{E}[X]$. Assume that $n$ is large.

$$\mathbb{P} \left(  \bar{X} - \frac{z_{\alpha/2} * s}{\sqrt{n}}  \leq \theta \leq \bar{X} + \frac{z_{\alpha/2} * s}{\sqrt{n}}   \right)$$

Since we have a $\sigma_{max}$, then we can substitue and find

$$\mathbb{P} \left(  \bar{X} - \frac{z_{\alpha/2} * (b-a)}{2\sqrt{n}}  \leq \theta \leq \bar{X} + \frac{z_{\alpha/2} * (b-a)}{2\sqrt{n}}   \right)$$

Thus

$$\left[ \bar{X} - \frac{z_{\alpha/2} * (b-a)}{2\sqrt{n}}, \bar{X} + \frac{z_{\alpha/2} * (b-a)}{2\sqrt{n}}   \right]$$


# Problem 5

A random sample $X_1, X_2, X_3, \ldots, X_{144}$ is given from a distribution with unknown variance $Var(X_i)=\sigma^2$. For the observed sample, the sample mean is $\bar{X}=55.2$, and the sample variance is $S^2=34.5$. Find a $99\%$ confidence interval for $\theta = \mathbb{E}[X_i]$.

In [None]:
import scipy.stats as sp
import numpy as np

n = 144
x_bar = 55.2
s2 = 34.5
z = sp.norm.ppf(0.995)

theta = (x_bar - z*np.sqrt(s2/n), x_bar + z*np.sqrt(s2/n))
print(theta)

(53.939202377859694, 56.46079762214031)


# Problem 6

A random sample $X_1, X_2, X_3, ..., X_{16}$ is given from a normal distribution with unknown mean $\mu=\mathbb{E}[X_i]$ and unknown variance $Var(X_i)=\sigma^2$. For the observed sample, the sample mean is $\bar{X}=16.7$, and the sample variance is $S^2=7.5$.
* Find a $95\%$ confidence interval for $\mu$.
* Find a $95\%$ confidence interval for $\sigma^2$.

In [None]:
import scipy.stats as sp
import numpy as np

x_bar = 16.7
s2 = 7.5
n = 16
z = sp.norm.ppf(0.975)

mu = (x_bar - z*np.sqrt(s2/n), x_bar + z*np.sqrt(s2/n))
print(mu)

(15.358104392212132, 18.04189560778787)


In [None]:
import scipy.stats as sp
import numpy as np

x_bar = 16.7
s2 = 7.5
n = 16
Lchi2 = sp.chi2.ppf(0.975, n-1)
Rchi2 = sp.chi2.ppf(0.025, n-1)

sigma2 = ( (n-1)*s2/Lchi2, (n-1)*s2/Rchi2)
print(sigma2)

(4.092636501481853, 17.965110906541934)


# Example 8.24

Let $X_1,X_2,\ldots,X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution, where $\mu$ is unknown but $\sigma$ is known. Design a level $\alpha$ test to choose between

$$H_0: \mu = \mu_0$$
$$H_1: \mu \neq \mu0$$


## Z-test
$$\left| \sqrt{n}\frac{\bar{X} - \mu}{\sigma} \right| \leq c \text{, } \forall c = z_{\alpha/2} $$

# Example 8.25

For the above example (Example 8.24), find $\beta$, the probability of type II error, as a function of $\mu$.

\begin{align}
\mathbb{P}\left( \left| \sqrt{n}\frac{\bar{X} - \mu}{\sigma}\right| > c \right) &= 1 - \beta(\mu) \\
1 - \mathbb{P}\left( \left| \sqrt{n}\frac{\bar{X} - \mu}{\sigma} \right| > c \right) &= \beta(\mu) \\
\text{Applying null hypothesis} \\
1 - \mathbb{P}\left( \left| \sqrt{n}\frac{\mu_0 - \mu}{\sigma} \right| > c \right) &= \beta(\mu) \\
\end{align}

# Example 8.26

Let $X_1,X_2,\ldots,X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution, where $\mu$ and $\sigma$ are unknown. Design a level α test to choose between.
* $H_0: \mu_0 = \mu$
* $H_1: \mu_0 \neq \mu$

## T-test
$$\left| \sqrt{n}\frac{\bar{X} - \mu_0}{\sigma} \right| \leq  t_{\alpha/2, n-1} $$between

# Example 8.27

The average adult male height in a certain country is 170 cm. We suspect that the men in a certain city in that country might have a different average height due to some environmental factors. We pick a random sample of size 9 from the adult males in the city and obtain the following values for their heights (in cm):

176.2, 157.9, 160.1, 180.9, 165.1, 167.2, 162.9, 155.7, 166.2

Assume that the height distribution in this population is normally distributed. Here, we need to decide between

* $H_0: \mu_0 = 170$
* $H_1: \mu_0 \neq 170$

In [None]:
import pingouin as pg

height = [176.2, 157.9, 160.1, 180.9, 165.1, 167.2, 162.9, 155.7, 166.2]
pg.ttest(height, 170, confidence = 0.95)
# Thus we fail to reject the null hypothesis.

Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-1.527834,8,two-sided,0.165074,"[159.46, 172.14]",0.509278,0.772,0.270709


# Example 8.28

Let $X_1,X_2,\ldots,X_n$ be a random sample from a $N(\mu,\sigma^2)$ distribution, where $\mu$ is unknown and $\sigma$ is known. Design a level $\alpha$ test to choose between

* $H_0: \mu \leq \mu_0$
* $H_1: \mu_0  > \mu$

## Z-test
$$ \sqrt{n}\frac{\bar{X} - \mu_0}{\sigma}  \leq  z_{\alpha} $$

# Problem 1

Let $X \sim Geometric(\theta)$. We observe $X$ and we need to decide between

* $H_0: \theta = \theta_0 = 0.5$
* $H_1: \theta = \theta_0 = 0.1$

a) Design a level 0.05 test $(\alpha=0.05)$ to decide between $H_0$ and $H_1$. \
b) Find the probability of type-II error $\beta$
.

In [None]:
# a)
import sympy as sp
k, c = symbols('k c')
p0 = 0.5

G = 1- Sum(p0**(k-1)*(1-p0), (k, 1, c-1)).doit()
c_value = solve(G <= 0.05, c)
c_value

5.32192809488736 <= c

In [None]:
# b)
import sympy as sp
k, c = symbols('k c')
p1 = 0.1

G = Sum(p0**(k-1)*(1-p0), (k, 1, 6-1)).doit()
G

0.968750000000000

# Problem 2

Let $X_1,X_2,X_3, X_4$ be a random sample from a $N(\mu,1)$ distribution, where $\mu$ is unknown. Suppose that we have observed the following values

2.82, 2.71, 3.22, 2.67

We would like to decide between
* $H0: \mu = \mu_0 = 2$
* $H1: \mu = \mu_0 \neq 2$


a) Assuming $\alpha=0.1$. Do you accept $H_0$ or $H_1$? \


In [None]:
# a)
import scipy.stats as sp
import numpy as np

sample = [2.82, 2.71, 3.22, 2.67]
x_bar = np.mean(sample)
z = sp.norm.ppf(0.95)
s2 = 1

W = (x_bar - 2)/np.sqrt(s2/len(sample))
z = sp.norm.ppf(0.975)

if abs(W)>z:
    print(f'W = {W:.2f}. We reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We fail to reject the null hypothesis')

W = 1.71. We fail to reject the null hypothesis


b) If we require significance level $\alpha$, find $\beta$ as a function of $\mu$ and $\alpha$

\begin{align}
\beta &= \mathbb{P}(type 2 error) = \mathbb{P}(accept H_0 | H_1) \\
&= \mathbb{P}(|W| < z_{\alpha/2}) \\
&= \mathbb{P}(|2(\mu-2)| < z_{\alpha/2}) \\
&= \Phi(2(\mu-2)) - \Phi(-2(\mu-2))
\end{align}

# Problem 3

Let $X_1,X_2,...,X_{100}$ be a random sample from an unknown distribution. After observing this sample, the sample mean and the sample variance are calculated to be $\bar{X} = 21.32$, $S^2 = 27.6$ Design a level 0.05 test to choose between

* $H_0: \mu = 20$
* $H_1: \mu_0  > 20$

Do you accept or reject $H_0$?

In [None]:
import scipy.stats as sp
import numpy as np

x_bar = 21.32
s2 = 27.6
n = 100


W = (x_bar - 20)/np.sqrt(s2/n)
z = sp.norm.ppf(0.975)

if abs(W)>z:
    print(f'W = {W:.2f}. We reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We fail to reject the null hypothesis')

W = 2.51. We reject the null hypothesis


# Problem 4

Let $X_1, X_2, X_3, X_4$ be a random sample from a m a $N(\mu, \sigma^2)$ distribution, where $\mu$ and $\sigma$ are unknown. Suppose that we have observed the following values $3.58, 10.03, 4.77, 14.66$. We would like to decide between

* $H_0: \mu_0 \geq 10$
* $H_1: \mu_0  < 10$

Assuming $\alpha = 0.05$, Do you accept $H_0$ or $H_1$?

In [None]:
import pingouin as pg

sample = [3.58, 10.03, 4.77, 14.66]
pg.ttest(sample, 10, alternative = 'less', confidence = 0.95)
# Thus we fail to reject the null hypothesis.

  return np.clip(_boost._nct_cdf(x, df, nc), 0, 1)


Unnamed: 0,T,dof,alternative,p-val,CI95%,cohen-d,BF10,power
T-test,-0.681718,3,less,0.272165,"[-inf, 14.27]",0.340859,1.026,0.13497


# Problem 5

Let $X_1,X_2,\ldots,X_{81}$ be a random sample from an unknown distribution. After observing this sample, the sample mean and the sample variance are calculated to be $\bar{X} = 8.25$, $S^2 = 14.6$ Design a test to decide between

* $H_0: \mu_0 = 9$
* $H_1: \mu_0  < 9$

and calculate the P-value for the observed data.

In [None]:
import scipy.stats as sp
import numpy as np

W = (8.25-9)/np.sqrt(14.6/81)
p_value = sp.norm.cdf(T)
print(f'P-value is {p_value:.4f}')

P-value is 0.0387


# Problem 15

Let $X_1, X_2, X_3, X_4, X_5$ be a random sample from a $N(\mu,1)$ distribution, where $\mu$ is unknown. Suppose that we have observed the following values 5.45, 4.23, 7.22, 6.94, 5.98. We would like to decide between

* $H_0: \mu_0 = 5$
* $H_1: \mu_0  \neq 5$

a) Define a test statistic to test the hypotheses and draw a conclusion assuming $\alpha=0.05$. \
b) Find a 95% confidence interval around $\bar{X}$. Is $\mu_0$ included in the interval? How does the exclusion of $\mu_0$ in the interval relate to the hypotheses we are testing?

In [None]:
# a)
import numpy as np
import scipy.stats as sp
sample = [5.45, 4.23, 7.22, 6.94, 5.98]

W = (np.mean(sample) - 5)/(np.sqrt(1/len(sample)))
z = sp.norm.ppf(0.975)

if abs(W)>z:
    print(f'W = {W:.2f}. We reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We fail to reject the null hypothesis')

W = 2.16. We reject the null hypothesis


In [None]:
# b)

mu = (np.mean(sample) - z/np.sqrt(len(sample)), np.mean(sample) + z/np.sqrt(len(sample)))
print(mu)
print(f'The population mean is not in the 95% confidence interval, thus we reject the null hypothesis.')

(5.087477459423419, 6.840522540576582)
The population mean is not in the 95% confidence interval, thus we reject the null hypothesis.


# Problem 16

Let $X_1,\ldots,X9$ be a random sample from a $N(\mu,1)$ distribution, where $\mu$ is unknown. Suppose that we have observed the following values
16.34, 18.57, 18.22, 16.94, 15.98, 15.23, 17.22, 16.54, 17.54. We would like to decide between

* $H_0: \mu_0 = 16$
* $H_1: \mu_0  \neq 16$

a) Find a 90% confidence interval around $\bar{X}$. Is $\mu$ included in the interval? How does this relate to our hypothesis test? \
b) Define a test statistic to test the hypotheses and draw a conclusion assuming $\alpha = 0.1$.

In [None]:
# a)
import numpy as np
import scipy.stats as sp

sample = [16.34, 18.57, 18.22, 16.94, 15.98, 15.23, 17.22, 16.54, 17.54]
z = sp.norm.ppf(0.95)

mu = (np.mean(sample) - z/np.sqrt(len(sample)), np.mean(sample) + z/np.sqrt(len(sample)))

print(mu)
print(f'The population mean is not in the 90% confidence interval, thus we reject the null hypothesis.')

(16.405048791016174, 17.501617875650492)
The population mean is not in the 90% confidence interval, thus we reject the null hypothesis.


In [None]:
W = (np.mean(sample) - 16)/(np.sqrt(1/len(sample)))

if abs(W)>z:
    print(f'W = {W:.2f}. We reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We fail to reject the null hypothesis')

W = 2.86. We reject the null hypothesis


# Problem 17

Let $X_1, X_2 ,\ldots, X_{150}$ be a random sample from an unknown distribution. After observing this sample, the sample mean and the sample variance are calculated to be $\bar{X} = 52.28$, $S^2 = 30.9$

Design a level 0.05 test to choose between

* $H_0: \mu_0 = 50$
* $H_1: \mu_0  > 50$

Do you accept or reject $H_0$?

In [None]:
import numpy as np
import scipy.stats as sp

W = (52.38 - 50)/np.sqrt(30.9/150)
z = sp.norm.ppf(0.975)

if abs(W)>z:
    print(f'W = {W:.2f}. We reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We fail to reject the null hypothesis')

W = 5.24. We reject the null hypothesis


# Problem 18

Let $X_1, X_2, X_3, X_4, X_5$ be a random sample from a $N(\mu,\sigma^2)$ distribution, where $\mu$ and $\sigma$ are both unknown. Suppose that we have observed the following values 27.72, 22.24, 32.86, 19.66, 35.34. We would like to decide between

* $H_0: \mu_0 \geq 30$
* $H_1: \mu_0  < 30$

Assuming $\alpha = 0.05$, what do you conclude?

In [None]:
import numpy as np
import scipy.stats as sp

sample = [27.72, 22.24, 32.86, 19.66, 35.34]

W = (np.mean(sample) - 30)/np.sqrt(np.var(sample, ddof = 1)/len(sample))
t = sp.t.ppf(0.05, df = len(sample) - 1)

if W>t:
    print(f'W = {W:.2f}. We fail reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We reject the null hypothesis')

W = -0.81. We fail reject the null hypothesis


# Problem 19

Let $X_1, X_2 ,..., X_{121}$ be a random sample from an unknown distribution. After observing this sample, the sample mean and the sample variance are calculated to be $\bar{X} = 29.25$, $S^2 = 20.7$. Design a test to decide between

* $H_0: \mu_0 = 30$
* $H_1: \mu_0  < 30$

and calculate the P-value for the observed data.

In [None]:
import numpy as np
import scipy.stats as sp

x_bar = 29.25
s2 = 20.7

W = (x_bar - 30)/np.sqrt(s2/121)
z = sp.norm.ppf(0.05)

if W>z:
    print(f'W = {W:.2f}. We fail reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We reject the null hypothesis')

p_value = sp.norm.cdf(W)
print(f'The p-value is {p_value:.2f}')

W = -1.81. We reject the null hypothesis
The p-value is 0.03


# Problem 20

Suppose we would like to test the hypothesis that at least 10% of students suffer from allergies. We collect a random sample of 225 students and 21 of them suffer from allergies.

a) State the null and alternative hypotheses. \

* $H_0: \mu_0 = 10\%$
* $H_1: \mu_0  \neq 10\%$

b) Obtain a test statistic and a P-value. \
c) State the conclusion at the $\alpha = 0.05$ level.

In [None]:
# b)
import numpy as np
import scipy.stats as sp

W = (21/225 - 0.1)/np.sqrt(0.1*0.9 / 225)
p_value = 2*sp.norm.cdf(W)
print(f'The p-value is {p_value:.2f}')

The p-value is 0.74


In [None]:
z = sp.norm.ppf(0.05)

if abs(W)>z:
    print(f'W = {W:.2f}. We fail reject the null hypothesis')
else:
    print(f'W = {W:.2f}. We reject the null hypothesis')

W = -0.33. We fail reject the null hypothesis
