<a href="https://colab.research.google.com/github/shorub/Projects/blob/main/Hypothesis_testing_and_Predictive_modelling_practice.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>


## Introduction to Hypothesis Testing

Hypothesis testing is a crucial statistical method used to make inferences about population parameters based on sample data. It helps us determine whether an observed effect is statistically significant or if it occurred by chance.

In [None]:
import numpy as np
from scipy import stats

In [None]:
np.random.seed(0)
groupA_heights = np.random.normal(175, 5, 100)
groupB_heights = np.random.normal(180, 5, 100)

In [None]:
t_statistic, p_value = stats.ttest_ind(groupA_heights, groupB_heights)
print("t-statistic:", t_statistic)
print("p-value:", p_value)

## Setting Up Null and Alternative Hypotheses

* Null Hypothesis (H0): This represents the default assumption, typically stating no effect or no difference.

* Alternative Hypothesis (H1 or Ha): This represents the hypothesis we are trying to test, suggesting that there is an effect or difference present.

## Introduction to t-tests and chi-square tests

* T-tests: T-tests are used to determine if there is a significant difference between the means of two groups.

* Chi-square tests: Chi-square tests are used to assess the association between categorical variables.

In [None]:
import numpy as np
from scipy.stats import chi2_contingency

values = np.array([[10, 20, 30], [6, 9, 17]])
chi2, p, dof, expected = chi2_contingency(values)

print("Chi-square statistic:", chi2)
print("p-value:", p)
print("Degrees of freedom:", dof)
print("Expected frequencies:")
print(expected)

Try the same with bigger data set

In [None]:
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import r2_score

data = pd.read_csv('data.csv')
data


In [None]:
data.dropna(inplace = True)

duration = data['Duration']
x = duration.values.reshape(-1, 1)
y= data['Calories']

x_train, x_test, y_train, y_test = train_test_split(x, y, test_size = 0.2, random_state = 42)

model = LinearRegression()
model.fit(x_train, y_train)

y_pred = model.predict(x_test)
r_2 = r2_score(y_test, y_pred)
print("R²:", r_2)

In [None]:
values = data[['Duration', 'Calories']]
chi2, p, dof, expected = chi2_contingency(values)

print("Chi-square statistic:", chi2)
print("p-value:", p)
print("Degrees of freedom:", dof)
print("Expected frequencies:")
print(expected)