### One sample t-test

The t-test is used when the population standard deviation *S* is unknown and is estimated from the sample.

$t-statistics = \frac{(\bar{X} - \mu)}{\frac{S}{\sqrt{n}}}$


For example, Aravind Productions (AP) is a newly formed movie production house based out of Mumbai, India. AP was interested in understanding the production cost required for producing a Bollywood movie. The industry believes that the production house will require at least INR 500 million (50 crore) on average. It is assumed that the Bollywood movie production cost follows a normal distribution. Production cost of 40 Bollywood movies in millions of rupees are given in *bollywoodmovies.csv* file. Conduct an appropriate hypothesis test at $\alpha$ = 0.05 to check whether the belief about average production cost is correct.


In [None]:
import pandas as pd
# Read the CSV file into a DataFrame
df_bollywood_movies = pd.read_csv('bollywoodmovies.csv')
print(df_bollywood_movies)

$H_{0}$: $\mu =$ 500

$H_{A}$: $\mu \ne$ 500


*scipy.stats.ttest_1samp()* can be used to doing this test. It takes two parameters:

- a : array_like - sample observation
- popmean : float - expected value in null hypothesis.

In [12]:
# Perform t test
from scipy import stats
# Assuming 'production cost' is the column we want to perform the t-test on
t_statistic, p_value = stats.ttest_1samp(df_bollywood_movies['production_cost'], 500)
print(f"T-statistic: {t_statistic}, P-value: {p_value}")

# Make inference for p value if alpha is 0.05
alpha = 0.05
if p_value < alpha:
    print("Reject the null hypothesis: The mean production cost is significantly different from 500.")
else:
    print("Fail to reject the null hypothesis: The mean production cost is not significantly different from 500.")

T-statistic: -2.2845532872667547, P-value: 0.027862556406761777
Reject the null hypothesis: The mean production cost is significantly different from 500.
