<a href="https://colab.research.google.com/github/jendives2000/probabilities-statistics/blob/main/Hypothesis%20Test%20Practice%203.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

You are a data analyst working for a company that manufactures batteries. The company claims that their batteries have an average lifespan of 500 hours. You suspect that the actual lifespan is lower, so you decide to test this claim.


---



**Given:**

Sample Size (𝑛): 35 batteries

Significance Level (𝛼): 0.05

**Hypotheses**:
*   Null Hypothesis ($𝐻_0$):
𝜇
≥
500 hours (The average battery lifespan is at least 500 hours)
*   Alternative Hypothesis ($𝐻_1$):
𝜇
<
500 hours (The average battery lifespan is less than 500 hours)

Therefore we need to go with a **left tail test**.

𝜇 = 500 hours



---


The data for each battery is in this data file:


In [1]:
import pandas as pd
df = pd.read_csv('/content/battery_lifespan_large_data.csv')
df

Unnamed: 0,Battery Lifespan (hours)
0,496.109226
1,483.490064
2,498.75698
3,488.993613
4,492.083063
5,488.982934
6,513.522782
7,494.865028
8,484.422891
9,503.225449




---


We were not given the population standard deviation (sigma) but with the data from the dataframe here above we can calculate the **sample standard deviation** from the sample battery life mean.

In [4]:
# getting all the battery life values into a variable:

batterylife_values = df['Battery Lifespan (hours)'].values
blv = batterylife_values

# calculating the mean of all blv values:
mean_blv = blv.mean()
print(f'The sample mean (x̄) of our 35 batteries is:\n\t{round(mean_blv, 3)} hours')

The sample mean (x̄) of our 35 batteries is:
	493.517 hours


With the sample mean (x̄) now known, let's get the sample standard deviation (sigma bar):

$\bar{\sigma} = \sqrt{\frac{\sum_{i=1}^{n}(x_i - \bar{x})^2}{n-1}}$


In [14]:
import math

#calculating population standard deviation of the dataset
deviation_sum = 0
for i in range(len(blv)):
   deviation_sum+=(blv[i]- mean_blv)**2
   psd = math.sqrt((deviation_sum)/len(blv))

#calculating sample standard deviation of the dataset
ssd = math.sqrt((deviation_sum)/len(blv) - 1)

print(f"\nThe Sample standard deviation (sigma bar) of the dataset is:\n\t{round(ssd, 3)}")


The Sample standard deviation (sigma bar) of the dataset is:
	8.683


So $\bar{\sigma}$ = 8.683

$\bar{\sigma}$ = S



---


Now that we have S we can calculate:
*   $S_{\bar{x}}$ = $\frac{S}{\sqrt{n}}$
*   t score = $\frac{\bar{x} - \mu}{S_{\bar{x}}}$

In [8]:
# calculating S sub x bar:
ssubxbar = ssd / math.sqrt(len(blv))
print(f'\nS sub x bar =\n\t{round(ssubxbar, 3)}')



S sub x bar =
	1.468


So $S_{\bar{x}}$ = 1.468

With that we can go on and calculate the t score:

In [9]:
# calculating the t score:

mu = 500

tscore = (mean_blv - mu) / ssubxbar
print(f'\nThe t score =\n\t{round(tscore, 3)}')



The t score =
	-4.418


So our t value = -4.418



---



We now can use scipy to reference the t score table and get the critical value attached to our significance level, sample size n and degree of freedom.

In [10]:
# getting the critical value from the t score table:

from scipy.stats import t

alpha = 0.05  # Significance level
n = len(blv)  # Sample size
d_f = n - 1  # Degrees of freedom

# For a left-tailed test:
critical_value = t.ppf(alpha, d_f)

print(f'The critical_value is:\n\t{round(critical_value, 3)}')

The critical_value is:
	-1.691




---


With both the t value and the critical value we can make a comparison and get a conclusion:

# CONCLUSIONS:  

Given the t value of -4.418 and a critical value of -1.691, and also the fact that this is a **left-tailed test**, **the directionality** tells us that the t value falls further into the left area of the critical value, placing it well **into the rejection region**.
Hence, we indeed can say with confidence that the null hypothesis **is rejected**.

Based on the data of 35 batteries, we have enough evidence to prove that the **average lifespan of a battery is not equal to or greater than 500 hours.**

## Recommendations:  

*   a. Production Costs Vs Batteries Lifespan:  
An analysis of the correlation between the production costs and the batteries lifespan could indicate if lower production costs strongly diminishes batteries' lifespan. Consequently this could help the production team to refine their production components and process. 

*   b. Competitors' Batteries:  
Assessing the lifespan of competitors' batteries, if not already done of course, helps the business strategy team to refine our market positioning and product line. 

*   c. Integrate Other Data Sources:  
Combine this analysis with other relevant data sources, such as pricing, promotions, or product availability, to understand the broader context. This multi-variable analysis can help pinpoint additional factors influencing sales and optimize the overall strategy accordingly.

---

