# Hypothesis Testing 

The hypothesis test serves as a preliminary step to evaluate whether there is a significant trend in the stock price direction. This analysis helps determine if the stock exhibits any inherent directional bias that could inform model development. Identifying a significant trend ensures that the classification model is built on meaningful patterns, increasing its potential effectiveness in predicting price direction in the highly volatile and complex stock market.



## Define the Hypotheses:
- **Null Hypothesis (H0)**: The proportion of days where the closing price is higher than the opening price (*p*) is equal to 0.50 (*p* = 0.50)
- **Alternative Hypothesis (H1)**: The proportion of days where the closing price is higher than the opening price (*p*) is equal to 0.50 (*p* &ne; 0.50)

This is a two-tailed test since we are interested in deviations in both directions.

## Collect Data 
Use the datasets provided (AMZN_train.csv, AMZN_val.csv, AMZN_test.csv) to calculate the proportion of days where the closing price (*Close*) is higher than the opening price (*Open*).

In [1]:
import pandas as pd 

#Load datasets:
df_train = pd.read_csv('/Users/DELL/Desktop/Projects/Prediciton of Stock Price/myworkspace/datasets/AMZN_train.csv')
df_val = pd.read_csv('/Users/DELL/Desktop/Projects/Prediciton of Stock Price/myworkspace/datasets/AMZN_val.csv')
df_test = pd.read_csv('/Users/DELL/Desktop/Projects/Prediciton of Stock Price/myworkspace/datasets/AMZN_test.csv')

print(type(df_train), type(df_val), type(df_test))



<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.frame.DataFrame'> <class 'pandas.core.frame.DataFrame'>


In [2]:
# Concatenate the datasets
df_combined = pd.concat([df_train, df_val, df_test])



## Compute the Observed Proportion 

In [3]:
# Compute the observed proportion (p_obs)
p_obs = (df_combined['Close'] > df_combined['Open']).mean()

## Determine the Test Statistic (Z)

In [4]:
import numpy as np

# Hypothesized proportion (50%)
p_0 = 0.50

n = len(df_combined)  # Total number of days
z = (p_obs - p_0) / np.sqrt(p_0 * (1 - p_0) / n)



## Calculate the p-value

In [5]:
from scipy.stats import norm
p_value = 2 * (1 - norm.cdf(abs(z)))

## Make a decision and Draw decision

In [6]:
alpha = 0.05
if p_value < alpha:
    result = "Reject the null hypothesis (there is a significant trend)"
else:
    result = "Fail to reject the null hypothesis (no significant trend)"

## Conclusion

In [7]:
# Output the results
print(f"Observed Proportion: {p_obs}")
print(f"Z-Statistic: {z}")
print(f"P-Value: {p_value}")
print(f"Result: {result}")

Observed Proportion: 0.5003455425017277
Z-Statistic: 0.05257699129677698
P-Value: 0.958068949893516
Result: Fail to reject the null hypothesis (no significant trend)


If the Result is "Fail to reject the null hypothesis (no significant trend)", it means that, based on the data and the hypothesis test, there is not enough evidence to suggest that the proportion of days where the closing price is higher than the opening price is significantly different from 50%.

There is no clear trend or statistically significant difference in the frequency of days where the closing price is higher versus the opening price.

From a trading perspective, this could suggest that, based on the available data, there isn't a significant bias toward either "buying" or "selling" based on the daily price movement of Amazon stock (since it is essentially random or balanced).