# Exercises about Hypothesis testing, confidence intervals, probability and t-test

1. [CO₂ Emissions & Car Performance Analysis](#-problem-co-emissions--car-performance-analysis)
2. [Battery Life of Electric Scooters](#-new-scenario-battery-life-of-electric-scooters)


## 🚗 Problem: CO₂ Emissions & Car Performance Analysis


You're analyzing data from a recent survey of fuel consumption and CO₂ emissions of cars from different brands. You randomly select 50 cars from the dataset and compute the following (summary statistics are fictional for practice):

- Sample size: 50
- Mean CO₂ emission: 250 g/km
- Standard deviation of CO₂ emissions: 40 g/km
- Sample mean fuel consumption (L/100km): 8.2
- Standard deviation of fuel consumption: 1.5

A regulation agency claims that the average CO₂ emission for this category of cars should be no more than 240 g/km.

**🔢 Questions:**

### 1. Hypothesis Testing
Using a 5% significance level, test the claim made by the regulation agency about CO₂ emissions.

**a. Clearly state the null and alternative hypotheses.**
H0: u = 240
H1: u > 240

**b. Compute the relevant test statistic and determine whether to reject the null.**
I can apply the z-score, since n>30

z = (observed mean - claimed mean) / (std/sqrt(n) )
z = 250 - 240 / 40 / V(50)

z = 1,76

For a significance level of 5%, one-tailed test, the critical value is z(0.05)=1.645.
As z falls inside the rejection area, so I can reject the null hypothesis.


**c. What’s the p-value and how would you interpret it?**
This is the cummulative probability from the left, so we do 1 - P(z).

p-value=P(Z>1.7678) = 1−P(Z<1.7678 )= 1−0.9611= 0.0389

p-value=0.0389 < 0.05     ⇒    Reject H0


✅ Interpretation: At 5% significance, there’s enough evidence to say that the true average CO₂ emissions may be higher than 240 g/km.

- A sample mean of 250 is somewhat unlikely (p = 0.039) if the true average is really 240. That gives us evidence to doubt the null hypothesis.

- If the true mean CO₂ emission is 240, the probability of obtaining a sample mean of 250 or more (i.e., as extreme or more extreme) is approximately 3.9%


### 2. Confidence Interval
Construct a 95% confidence interval for the true average CO₂ emission of these cars.

Margin of error: z(1-alpha/2) * std / sqrt(n)
So for alpha=0.05, z(0.975)=1.96

1.96 * 40 / sqrt(50) = 11,08

The confidence interval is 238,91 and 261,08.

**a. Based on your interval, does the agency’s target of 240 g/km seem plausible?**
It's plausible, although it's very close to the lower bound.

### 3. Probability
Assuming CO₂ emissions follow a normal distribution:

**a. What’s the probability that a randomly selected car emits more than 280 g/km?**
1. Compute the z-score
    z= 280 - 250 / 40 = 0.75
2. Look for p(z)
    P(Z<0.75) = 0.7734

Then P(z>0.75) = 1 - 0.7734 = 0.2266

✅ Final answer:
There’s a 22.66% chance that a randomly selected car emits more than 280 g/km.


**b. What’s the probability that a car emits between 230 and 270 g/km?**

1. Compute both z scores
    - For 230, z1 = (230-250)/ 40 = −0.5
    - For 270, z2 = (270 - 250) / 40 = 0.5

2. Look for p(z)
    - P(z < 0.5) = 0.6915
    - P(z < -0.5) = 0.3085

3. Subtracht to get the area in between
    - P(230 < X < 270) = 0.6915−0.3085 = 0.3830

✅ Final answer:
There’s a 38.30% chance that a randomly selected car emits between 230 and 270 g/km.

### 4. Bonus: Fuel Efficiency and CO₂
Let’s say you also divide the 50 cars into two groups:

Group A (25 fuel-efficient cars): mean CO₂ = 230 g/km, std dev = 35 g/km

Group B (25 standard cars): mean CO₂ = 270 g/km, std dev = 30 g/km

You want to test whether the average CO₂ emissions of Group A and B differ significantly.

**a. What kind of test would you use here?**
Independent two-sample t-test (because you’re comparing the means of two independent groups with known sample statistics).

**b. State the null and alternative hypotheses.**
H0 u(group a) = u(group b); H1 u(group a) <> u(group b)

c. Would the difference be considered statistically significant at a 0.05 level?
Step 1: Use formula for two-sample t-test (equal n, unequal variances):

#### Two-Sample t-Test Formula (Unequal Variances)

t = (x1 - x2) / sqrt( (sdt1/n1) + (std2/n2))
degrees of freedom: (n1 -1) + (n2 - 1)

--

## 🧪 New Scenario: Battery Life of Electric Scooters

A company claims its new electric scooter model has an average battery life of 32 kilometers on a full charge.

You randomly sample 60 scooters, and the sample shows:

Sample mean: 30.5 km

Sample standard deviation: 5 km

### 1. Hypothesis Testing
You want to test the company’s claim at a 5% significance level.

**a. State the null and alternative hypotheses.**
H0: u = 32 km
H1: u <>32km


**b. Calculate the test statistic and determine whether to reject the null.**
z-score: (30.5-32)/(5/sqrt(60)) = - 2.32

z(0.05)= 1.96

- I can reject the null hypothesis as the z-scores falls into the rejection area.

**c. What is the p-value, and how do you interpret it?**
P(z=-2.32)= 0.02

- As the p value is smaller than the significance level, I can reject the null hypothesis.
If the true mean was really 32 km, there's only 2% of chance of observing the results I had.


# 2. Confidence Interval
Construct a 95% confidence interval for the true average battery life.
**a. Does the confidence interval support the company’s claim?**

z(1-alpha/2) * std/sqrt(n) = 1.96 * 0.6454 = 1.265

The confidence interval is 29.23 - 31.76.
Thefore there is not evidence to support the company's claim.



## 3. Probability
Assume the battery life follows a normal distribution with a mean of 30.5 km and a standard deviation of 5 km (from the sample).

**a. What is the probability that a randomly chosen scooter lasts more than 35 km?**
First calculate the z-value of 35:

z = (35 - 30.5) / 5  = 0.9

P(z=0.9) = 0.8159 (but this is for up to that point, since we want more than that, we have to calculate)
1 - P(z=0.9) = 0.18406
From: https://datatab.net/tutorial/z-distribution 

Note: Most online calculators will give you directly: P(z<0.9) = 0.18406

So there is a probability of about 63% that a randomly choosen scooter will last more than 35km.


**b. What’s the probability that a scooter lasts between 28 km and 33 km?**

Z(28) = (28-30.5)/5 = -0.5
Z(33) - (33-30.5)/5 = 0.5

P(z<0.5) = 0.6915
P(z<-0.5) = 0.3085

P(28<x<33) = 0.6915 - 0.3085 = 0.383

So there's a probability of about 38.3% that a scooter lasts between 28 km and 33 km.


## 4. Two-Sample Comparison
Suppose you test another model of scooter (Model B) with this data:

Sample size = 60

Mean battery life = 33.2 km

Standard deviation = 6 km

You want to check if Model B has a significantly higher average battery life than Model A (the first one tested).

**a. What kind of test should you use?**
Independent Two-sample t-test

**b. State the null and alternative hypotheses.**
- `H0 u(model A)=u(model B)`
- `H1 u(model A)<=u(model B)` --> we will use a one tailed test because we're testing if model B is better, not only different.

**c. Is the difference statistically significant at a 5% level?**
- t = (x1 - x2) / sqrt( (sdt1^2/n1) + (std2^2/n2))
- degrees of freedom: (n1 -1) + (n2 - 1)

(30.5 - 33.2) / sqrt((25/60)+36/60) = -2.677
degrees of freedom: 118

- from the distribution table, t(0.05, df=118) = 1.658, where p(1.658) = 0.0048

- Decision rule
    - If your t-statistic > critical value → reject H₀.
    - If not → fail to reject H₀.

- You reject the null hypothesis. This means the battery life of Model B is significantly higher than Model A at the 5% significance level.

