# Hypothesis 1: Effect of Inflation on Average Movie Revenue


*   **Null Hypotheses (H₀):** There is no significant difference in average movie revenue between low-inflation and high-inflation years.
*   **Alternative Hypotheses (H₁):** There is a significant difference in average movie revenue between low and high inflation years.

A two-sample independent t-test was performed comparing movie revenues across low- and high-inflation years using a mean inflation threshold of 7.03%.

**Results:**
*   Mean Revenue (Low Inflation): $106.78 M

*   Mean Revenue (High Inflation): $100.60 M
*   t-statistic: 0.698
*   p-value: 0.485


**Conclusion:**
Since the p-value is greater than 0.05, we fail to reject the null hypothesis.
There is no statistically significant difference in average movie revenues between low- and high-inflation years in the 2000–2010 period.

In [4]:
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from scipy.stats import ttest_ind, chi2_contingency

movies_df = pd.read_csv('/content/Movies_CLEANED.csv')

threshold = movies_df['mean_inflation'].mean()

low_inflation_revenue = movies_df[movies_df['mean_inflation'] < threshold]['revenue']
high_inflation_revenue = movies_df[movies_df['mean_inflation'] >= threshold]['revenue']

t_stat, p_value = ttest_ind(low_inflation_revenue, high_inflation_revenue, equal_var=False)


threshold_percent = threshold
low_mean = low_inflation_revenue.mean() / 1e6
high_mean = high_inflation_revenue.mean() / 1e6

print("\n--- Hypothesis 1 Test Results ---")
print(f"Inflation Threshold: {threshold_percent:.2f}%")
print(f"Mean Revenue (Low Inflation): ${low_mean:.2f}M")
print(f"Mean Revenue (High Inflation): ${high_mean:.2f}M")
print(f"t-statistic: {t_stat:.3f}")
print(f"p-value: {p_value:.3f}")



--- Hypothesis 1 Test Results ---
Inflation Threshold: 7.03%
Mean Revenue (Low Inflation): $106.78M
Mean Revenue (High Inflation): $100.60M
t-statistic: 0.698
p-value: 0.485


# Hypothesis 2: Effect of Inflation on Genre Distribution


*   **•	Null Hypotheses (H₀):** Genre distribution is independent of inflation level.
*   **•	Alternative Hypotheses (H₁):** Genre distribution is dependent on inflation level.



I selected the top 5 most frequent movie genres and constructed a contingency table of their frequency across low and high inflation years.
A Chi-square test of independence was then applied to examine the relationship between inflation level and genre distribution.

**Results:**
*   Chi-square statistic: 6.24
*   Degrees of Freedom: 4
*   p-value: 0.182

**Conclusion:**
Since the p-value (0.182) is greater than 0.05, we fail to reject the null hypothesis.
There is no statistically significant evidence that genre popularity differs between low and high inflation years in the 2000–2010 period.

In [5]:
import pandas as pd
from scipy.stats import chi2_contingency

movies_df = pd.read_csv('/content/Movies_CLEANED.csv')


threshold = movies_df['mean_inflation'].mean()
movies_df['Inflation Level'] = movies_df['mean_inflation'].apply(lambda x: 'Low' if x < threshold else 'High')

movies_df['genres'] = movies_df['genres'].apply(eval) if isinstance(movies_df['genres'].iloc[0], str) else movies_df['genres']
movies_exploded = movies_df.explode('genres')
movies_exploded['genres'] = movies_exploded['genres'].str.strip()


top_genres = movies_exploded['genres'].value_counts().head(5).index
filtered_df = movies_exploded[movies_exploded['genres'].isin(top_genres)]

contingency_table = pd.crosstab(filtered_df['Inflation Level'], filtered_df['genres'])


chi2, p_value, dof, expected = chi2_contingency(contingency_table)

print("--- Hypothesis 2 Test Results ---")
print(contingency_table)
print(f"\nChi-square statistic: {chi2:.3f}")
print(f"Degrees of Freedom: {dof}")
print(f"p-value: {p_value:.3f}")

--- Hypothesis 2 Test Results ---
genres           Action  Comedy  Drama  Romance  Thriller
Inflation Level                                          
High                112     163    188      101       116
Low                 305     470    537      213       353

Chi-square statistic: 6.235
Degrees of Freedom: 4
p-value: 0.182
