We perform a multiple linear regression analysis to analyze the relationship between the "Returns" column and the "CCI", "RSI", and "MACD" columns. This will allow us to determine the extent to which changes in the three independent variables (CCI, RSI, and MACD) are associated with changes in the dependent variable (Returns).

The output will include information including the coefficients, standard errors, F-statistics, and p-values for each predictor variable along with the Confidence Intervals. We use the p-values to determine whether each predictor is significantly associated with the dependent variable.

The null hypothesis for multiple linear regression analysis states that there is no significant relationship between the independent variables (CCI, RSI, and MACD) and the dependent variable (Returns). In other words, the regression coefficients for all the predictors are zero, and any observed association between the independent and dependent variables is due to chance.

The alternative hypothesis, on the other hand, states that there is a significant relationship between at least one independent variable and the dependent variable. This means that the regression coefficients for one or more predictors are non-zero, and the observed association between the independent and dependent variables is not due to chance.

Therefore, we can test the null and alternative hypotheses by examining the p-values associated with each predictor variable in the multiple linear regression output. If the p-value is less than the chosen significance level (0.05), we reject the null hypothesis and conclude that there is a significant relationship between the predictor and the dependent variable. If the p-value is greater than the significance level, we fail to reject the null hypothesis and conclude that there is no significant relationship between the predictor and the dependent variable.

In [1]:
#cd C:/UW/WI23/BIOSTAT557/Project/

C:\UW\WI23\BIOSTAT557\Project


In [2]:
import warnings
warnings.filterwarnings('ignore')

import pandas as pd
import numpy as np
import statsmodels.api as sm

# load your dataset
df = pd.read_csv("Top100stocks_indicator.csv")
df['Returns'] = df['CLOSE'].pct_change()
df.dropna(inplace=True)
df

Unnamed: 0,TIMESTAMP,SYMBOL,OPEN,HIGH,LOW,CLOSE,LAST,PREVCLOSE,TOTTRDQTY,TOTTRDVAL,TOTALTRADES,Typical Price,SMA,Mean Deviation,CCI,RSI,MACD,Signal,Histogram,Returns
66,28-10-2021,ZOMATO,135.35,136.50,131.50,135.60,135.50,134.80,11936370,1.592582e+09,82472,134.533333,138.136765,2.913725,-82.447286,47.377622,-0.427251,-0.058903,-0.368348,0.005935
67,29-10-2021,ZOMATO,135.00,135.00,130.50,131.55,131.70,135.60,12925877,1.705536e+09,107530,132.350000,137.867647,2.867921,-128.261252,42.744479,-0.781654,-0.203453,-0.578201,-0.029867
68,01-11-2021,ZOMATO,133.60,134.30,132.00,132.65,132.55,131.55,6200243,8.259720e+08,52656,132.983333,137.569118,2.760943,-110.729908,43.457944,-0.962663,-0.355295,-0.607368,0.008362
69,02-11-2021,ZOMATO,132.70,133.60,131.75,132.45,132.40,132.65,7367792,9.772073e+08,63828,132.600000,137.223529,2.626024,-117.377197,43.525741,-1.109463,-0.506128,-0.603334,-0.001508
70,03-11-2021,ZOMATO,132.90,132.90,127.50,128.35,128.70,132.45,15919925,2.058662e+09,128004,129.583333,136.834804,2.619738,-184.534273,39.857143,-1.538899,-0.712683,-0.826217,-0.030955
...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...,...
141586,26-12-2022,ACC,2372.00,2448.50,2360.00,2423.65,2429.15,2374.10,480611,1.165814e+09,33035,2410.716667,2532.664216,117.982771,-68.906981,28.940073,-1.850621,30.858689,-32.709310,0.020871
141587,27-12-2022,ACC,2430.95,2487.95,2428.10,2478.00,2486.00,2423.65,260633,6.408237e+08,17590,2464.683333,2532.916667,117.570026,-38.690890,37.407735,-6.263760,23.434199,-29.697960,0.022425
141588,28-12-2022,ACC,2475.00,2477.00,2447.55,2455.35,2456.00,2478.00,201331,4.953475e+08,10875,2459.966667,2534.337255,118.382656,-41.881466,33.905704,-11.456804,16.455999,-27.912803,-0.009140
141589,29-12-2022,ACC,2455.00,2467.70,2411.15,2447.85,2446.70,2455.35,352346,8.594227e+08,25719,2442.233333,2535.506863,119.860107,-51.879107,33.531613,-15.993157,9.966168,-25.959325,-0.003055


In [3]:
# Define the function to perform the regression analysis for a given stock symbol
def analyze_stock(df, stock):
    # Load the data for the stock symbol
    symbol_df = df[df["SYMBOL"] == stock]

    if len(symbol_df) != 0:
        # Define the independent variables (predictors)
        X = symbol_df[['CCI', 'RSI', 'MACD']]

        # Define the dependent variable (response)
        Y = symbol_df['Returns']

        # Add a constant to the predictors for the intercept
        X = sm.add_constant(X)

        # Fit the linear regression model
        model = sm.OLS(Y, X).fit()

        # Extract the important parameters from the model summary
        rsquared = model.rsquared
        fvalue = model.fvalue
        pvalue = model.f_pvalue
        adj_rsquared = model.rsquared_adj
        
        # Extract the coefficients and 95% confidence intervals
        coeffs = model.params
        ci = model.conf_int(alpha=0.05)
        
        # Create a dataframe with the results
        result = pd.DataFrame({'Stock': [stock],
                               'R-squared': [rsquared],
                               'Adj. R-squared': [adj_rsquared],
                               'F-value': [fvalue],
                               'P-value': [pvalue],
                               'CCI Coeff': [coeffs['CCI']],
                               'CCI CI Lower': [ci.loc['CCI'][0]],
                               'CCI CI Upper': [ci.loc['CCI'][1]],
                               'RSI Coeff': [coeffs['RSI']],
                               'RSI CI Lower': [ci.loc['RSI'][0]],
                               'RSI CI Upper': [ci.loc['RSI'][1]],
                               'MACD Coeff': [coeffs['MACD']],
                               'MACD CI Lower': [ci.loc['MACD'][0]],
                               'MACD CI Upper': [ci.loc['MACD'][1]]})

        return result


# Define a list of stock symbols to analyze
stocks = df['SYMBOL'].unique()

# Iterate over the list of stock symbols and combine the results into a single dataframe
results = pd.concat([analyze_stock(df, stock) for stock in stocks])

# Print the combined results
results['pflag'] = np.where(results['P-value'] < 0.05, True, False)
print('Total Stocks:', len(results))
print('Reject H0:', len(results[results['pflag'] == True]))


print(results)


Total Stocks: 90
Reject H0: 90
         Stock  R-squared  Adj. R-squared    F-value       P-value  CCI Coeff  \
0       ZOMATO   0.131062        0.122042  14.529921  7.686604e-09   0.000049   
0        WIPRO   0.142159        0.140608  91.641561  7.095468e-55   0.000113   
0   MCDOWELL-N   0.117090        0.115494  73.338010  1.534267e-44   0.000134   
0   ULTRACEMCO   0.117318        0.115722  73.499708  1.239733e-44   0.000072   
0   TORNTPHARM   0.116324        0.114727  72.795326  3.138767e-44   0.000094   
..         ...        ...             ...        ...           ...        ...   
0         ATGL   0.150001        0.143901  24.588476  1.136374e-14   0.000076   
0   ADANIPORTS   0.102206        0.100583  62.954552  1.509917e-38   0.000074   
0   ADANIGREEN   0.144898        0.142457  59.364436  1.868024e-35   0.000075   
0     ADANIENT   0.094849        0.093212  57.947913  1.267869e-35   0.000039   
0          ACC   0.137149        0.135588  87.898391  8.732964e-53   0.000084 

In [4]:
results.to_csv('LR_Results.csv',index=False)