Null Hypothesis (H0): There is no significant correlation between the market values of Ethereum and other cryptocurrencies.

Alternative Hypothesis (H1): Market values of Ethereum exhibit a significant correlation with other cryptocurrencies.


Johansen’s Test: 

Applicability: This test determines whether three or more time series are cointegrated and can be used to analyse whether different ETH market prices are correlated with each other over time.

Methodology: Use of complex statistical models and systematic methods to measure multiple cointegration relationships.

Statistical Power: It is applicable to complex financial modelling markets involving multiple variables.


Engle-Granger Test, ADF Test (Augmented Dickey-Fuller Test): 

Applicability: Engle-Granger method is mainly used for cointegration analysis of two variables. It is a two-step method that first estimates the long-run relationship between two non-stationary time series and then uses the ADF test to test the smoothness of the residuals.
This method can be used to test whether the price movements of different ETH markets will remain somewhat balanced in the long term.

Methodology: Cointegration was estimated using simple linear regression, and then cointegration was judged by performing a unit root test on the regression residuals. 


In [37]:
import pandas as pd

def process_file(file_path, currency_code):
    df = pd.read_csv(file_path)
    df['Date'] = pd.to_datetime(df['Date']).dt.strftime('%d-%m-%Y')
    df.set_index('Date', inplace=True)
    columns_to_keep = ['Close']

    # Convert columns to keep to float, removing commas
    for col in columns_to_keep:
        df[col] = df[col].replace(',', '', regex=True).astype(float)

    df = df[columns_to_keep]
    df.columns = [f"{col}_{currency_code}" for col in columns_to_keep]
    return df

file_paths = {
    'ETH-EUR.csv': 'ETH',
    'BTC-EUR.csv': 'BTC',
    'DOGE-EUR.csv': 'DOGE',
    'USDT-EUR.csv': 'USDT',
    'XRP-EUR.csv': 'XRP'
}

dataframes = {code: process_file(file, code) for file, code in file_paths.items()}
combined_df = pd.concat(dataframes.values(), axis=1)
print(combined_df.head())


def check_and_clean_data(df):
    # Check for NaN and Inf values
    if df.isna().any().any() or np.isinf(df).any().any():
        # Handle NaN values with forward fill and backward fill
        df = df.ffill().bfill()
        
        # Handle Inf values
        df.replace([np.inf, -np.inf], np.nan, inplace=True)
        # Again handle any NaNs that might have been introduced
        df = df.ffill().bfill()

    return df

# Clean data before analysis
combined_df_clean = check_and_clean_data(combined_df)

            Close_ETH  Close_BTC  Close_DOGE  Close_USDT  Close_XRP
Date                                                               
04-12-2023    2061.30    39728.0    0.082351      0.9225     0.5759
03-12-2023    2007.32    37994.0    0.078228      0.9191     0.5726
02-12-2023    1992.83    37965.0    0.079103      0.9193     0.5718
01-12-2023    1922.37    37353.0    0.076899      0.9194     0.5629
30-11-2023    1880.00    36662.0    0.076479      0.9190     0.5578


In [66]:
import numpy as np
from scipy import signal

column1 = combined_df_clean['Close_ETH']
column2 = combined_df_clean['Close_BTC']
column3 = combined_df_clean['Close_DOGE']
column4 = combined_df_clean['Close_USDT']
column5 = combined_df_clean['Close_XRP']

correlation2 = signal.correlate(column1, column2, mode="full")

lags2 = signal.correlation_lags(column1.size, column2.size, mode="full")

lag2 = lags[np.argmax(correlation)]

print(lag2)

correlation3 = signal.correlate(column1, column3, mode="full")

lags3 = signal.correlation_lags(column1.size, column3.size, mode="full")

lag3 = lags[np.argmax(correlation)]

print(lag3)



0
0


In [50]:
#Pearson correlation coefficient 

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

pearsoncorr = combined_df_clean.corr(method='pearson')
pearsoncorr


Unnamed: 0,Close_ETH,Close_BTC,Close_DOGE,Close_USDT,Close_XRP
Close_ETH,1.0,0.907844,-0.111456,-0.519716,0.654732
Close_BTC,0.907844,1.0,-0.235942,-0.395861,0.764494
Close_DOGE,-0.111456,-0.235942,1.0,0.001937,-0.287613
Close_USDT,-0.519716,-0.395861,0.001937,1.0,-0.423699
Close_XRP,0.654732,0.764494,-0.287613,-0.423699,1.0


In [62]:
from scipy.stats import pearsonr

column1 = combined_df_clean['Close_ETH']
column2 = combined_df_clean['Close_BTC']
column3 = combined_df_clean['Close_DOGE']
column4 = combined_df_clean['Close_USDT']
column5 = combined_df_clean['Close_XRP']

coef_btc, p_btc = pearsonr(column1, column2)
coef_doge, p_doge = pearsonr(column1, column3)
coef_usdt, p_usdt = pearsonr(column1, column4)
coef_xrp, p_xrp = pearsonr(column1, column5)

print(coef_btc, p_btc)
print(coef_doge, p_doge)
print(coef_usdt, p_usdt)
print(coef_xrp, p_xrp)

if p_btc < 0.05:
    print('There is a relationship between ETH and BTC')
else:
    print('No relationship between ETH and BTC')
    

if p_doge < 0.05:
    print('There is a relationship between ETH and Doge')
else:
    print('No relationship between ETH and Doge')
    

if p_usdt < 0.05:
    print('There is a relationship between ETH and USDT')
else:
    print('No relationship between ETH and USDT')
    

if p_xrp < 0.05:
    print('There is a relationship between ETH and XRP')
else:
    print('No relationship between ETH and XRP')
    


0.9078437271733143 1.8414022684760425e-139
-0.11145571585436825 0.03303767044798627
-0.5197162289208814 1.0365023968894856e-26
0.654731733702213 3.600771476020092e-46
There is a relationship between ETH and BTC
There is a relationship between ETH and Doge
There is a relationship between ETH and USDT
There is a relationship between ETH and XRP


In [49]:
#Spearman rank correlation coeffcient 

spearmancorr = combined_df_clean.corr(method='spearman')
spearmancorr



Unnamed: 0,Close_ETH,Close_BTC,Close_DOGE,Close_USDT,Close_XRP
Close_ETH,1.0,0.877586,0.022579,-0.544049,0.642452
Close_BTC,0.877586,1.0,-0.165999,-0.47244,0.76878
Close_DOGE,0.022579,-0.165999,1.0,-0.091918,-0.32592
Close_USDT,-0.544049,-0.47244,-0.091918,1.0,-0.326462
Close_XRP,0.642452,0.76878,-0.32592,-0.326462,1.0


In [60]:
from scipy.stats import spearmanr

column1 = combined_df_clean['Close_ETH']
column2 = combined_df_clean['Close_BTC']
column3 = combined_df_clean['Close_DOGE']
column4 = combined_df_clean['Close_USDT']
column5 = combined_df_clean['Close_XRP']

coef_btc, p_btc = spearmanr(column1, column2)
coef_doge, p_doge = spearmanr(column1, column3)
coef_usdt, p_usdt = spearmanr(column1, column4)
coef_xrp, p_xrp = spearmanr(column1, column5)

print(coef_btc, p_btc)
print(coef_doge, p_doge)
print(coef_usdt, p_usdt)
print(coef_xrp, p_xrp)

if p_btc < 0.05:
    print('There is a relationship between ETH and BTC')
else:
    print('No relationship between ETH and BTC')
    

if p_doge < 0.05:
    print('There is a relationship between ETH and Doge')
else:
    print('No relationship between ETH and Doge')
    

if p_usdt < 0.05:
    print('There is a relationship between ETH and USDT')
else:
    print('No relationship between ETH and USDT')
    

if p_xrp < 0.05:
    print('There is a relationship between ETH and XRP')
else:
    print('No relationship between ETH and XRP')
    


0.8775859412469664 2.8702008855510624e-118
0.022578551471406815 0.6668098282853209
-0.5440490054815174 1.387068576153082e-29
0.6424515607232271 5.472371168856873e-44
There is a relationship between ETH and BTC
No relationship between ETH and Doge
There is a relationship between ETH and USDT
There is a relationship between ETH and XRP


In [31]:
#Granger causality 
import numpy as np
from statsmodels.tsa.stattools import ccf
from statsmodels.tsa.stattools import grangercausalitytests

# Define a list of currency codes for the other cryptocurrencies
other_currencies = ['BTC', 'DOGE', 'USDT', 'XRP']

# Iterate over each currency and analyze the relationship with ETH
for code in other_currencies:
    # Calculate Cross-Correlation Function
    cross_corr = ccf(combined_df_clean[f'Close_ETH'], combined_df_clean[f'Close_{code}'])

    # Perform Granger Causality Test
    max_lag = 10  # Maximum lag order
    granger_test_results = grangercausalitytests(combined_df_clean[[f'Close_ETH', f'Close_{code}']], max_lag, verbose=False)

    # Print the currency pair being analyzed
    print(f"Analyzing relationship between ETH and {code}:\n")

    # Delay in Correlation
    lag_with_significant_causality = None
    for lag in range(1, max_lag + 1):
        p_value = granger_test_results[lag][0]['ssr_chi2test'][1]
        if p_value < 0.05:
            lag_with_significant_causality = lag
            break

    if lag_with_significant_causality:
        print(f"Hypothesis (H1 - Delay in Correlation) is supported: A discernible delay exists in the correlation between the market values of ETH and {code} at lag {lag_with_significant_causality}.")
    else:
        print(f"Hypothesis (H0 - Delay in Correlation) is supported: There is no discernible delay in correlation between the market values of ETH and {code}.")

    print("\n" + "-"*50 + "\n")


Analyzing relationship between ETH and BTC:

Hypothesis (H0 - Delay in Correlation) is supported: There is no discernible delay in correlation between the market values of ETH and BTC.

--------------------------------------------------

Analyzing relationship between ETH and DOGE:

Hypothesis (H1 - Delay in Correlation) is supported: A discernible delay exists in the correlation between the market values of ETH and DOGE at lag 1.

--------------------------------------------------

Analyzing relationship between ETH and USDT:

Hypothesis (H1 - Delay in Correlation) is supported: A discernible delay exists in the correlation between the market values of ETH and USDT at lag 3.

--------------------------------------------------

Analyzing relationship between ETH and XRP:

Hypothesis (H0 - Delay in Correlation) is supported: There is no discernible delay in correlation between the market values of ETH and XRP.

--------------------------------------------------





In [34]:
# Function to perform the Johansen cointegration test
def perform_johansens_test(data, det_order=-1, k_ar_diff=1):
    test_result = coint_johansen(data, det_order=det_order, k_ar_diff=k_ar_diff)
    return test_result

# Performing Johansen cointegration test between 'Close_USDT' and each of the other currencies
for code in file_paths.values():
    if code == 'USDT':
        continue  # Skip comparison of 'Close_USDT' with itself

    # Prepare a DataFrame with only the 'Close' prices of USDT and the other currency
    pair_columns = [f'Close_USDT', f'Close_{code}']
    pair_df = combined_df[pair_columns].dropna()  # Drop rows with missing values

    # Perform the Johansen cointegration test on the pair
    print(f"Testing cointegration between Close_USDT and Close_{code}:")
    johansen_test_results = perform_johansens_test(pair_df)

    # Display the results for the pair
    print("Eigenvalues:", johansen_test_results.eig)
    print("Trace Statistics:", johansen_test_results.lr1)
    print("Critical Values (Trace):", johansen_test_results.cvt)
    print("Max-Eigen Statistics:", johansen_test_results.lr2)
    print("Critical Values (Max-Eigen):", johansen_test_results.cvm)

    # Conclusions
    trace_stat, crit_value_trace = johansen_test_results.lr1[0], johansen_test_results.cvt[0, 1]
    max_stat, crit_value_max = johansen_test_results.lr2[0], johansen_test_results.cvm[0, 1]
    if trace_stat > crit_value_trace and max_stat > crit_value_max:
        print(f"Cointegration exists between Close_USDT and Close_{code}, which rejects the null hypothesis.")
    else:
        print(f"No cointegration between Close_USDT and Close_{code}, which fails to reject the null hypothesis.")
    print("\n" + "-"*50 + "\n")


Testing cointegration between Close_USDT and Close_ETH:
Eigenvalues: [0.01245503 0.00025067]
Trace Statistics: [4.64057124 0.09100441]
Critical Values (Trace): [[10.4741 12.3212 16.364 ]
 [ 2.9762  4.1296  6.9406]]
Max-Eigen Statistics: [4.54956682 0.09100441]
Critical Values (Max-Eigen): [[ 9.4748 11.2246 15.0923]
 [ 2.9762  4.1296  6.9406]]
No cointegration between Close_USDT and Close_ETH, which fails to reject the null hypothesis.

--------------------------------------------------

Testing cointegration between Close_USDT and Close_BTC:
Eigenvalues: [1.60068940e-02 5.09730534e-05]
Trace Statistics: [5.87601257 0.01850369]
Critical Values (Trace): [[10.4741 12.3212 16.364 ]
 [ 2.9762  4.1296  6.9406]]
Max-Eigen Statistics: [5.85750888 0.01850369]
Critical Values (Max-Eigen): [[ 9.4748 11.2246 15.0923]
 [ 2.9762  4.1296  6.9406]]
No cointegration between Close_USDT and Close_BTC, which fails to reject the null hypothesis.

--------------------------------------------------

Testing 

In [35]:
from statsmodels.tsa.vector_ar.vecm import coint_johansen

# Function to perform the Johansen cointegration test
def perform_johansens_test(data, det_order=-1, k_ar_diff=1):
    test_result = coint_johansen(data, det_order=det_order, k_ar_diff=k_ar_diff)
    return test_result

conf_level_index = 0

# Performing Johansen cointegration test between 'Close_ETH' and each of the other currencies
for code in file_paths.values():
    if code == 'ETH':
        continue  # Skip comparison of 'Close_ETH' with itself

    # Prepare a DataFrame with only the 'Close' prices of ETH and the other currency
    pair_columns = [f'Close_ETH', f'Close_{code}']
    pair_df = combined_df[pair_columns].dropna()  # Drop rows with missing values

    # Perform the Johansen cointegration test on the pair
    print(f"Testing cointegration between Close_ETH and Close_{code}:")
    johansen_test_results = perform_johansens_test(pair_df)

    # Display the results for this pair
    print("Eigenvalues:", johansen_test_results.eig)
    print("Trace Statistics:", johansen_test_results.lr1)
    print("Critical Values (Trace):", johansen_test_results.cvt)
    print("Max-Eigen Statistics:", johansen_test_results.lr2)
    print("Critical Values (Max-Eigen):", johansen_test_results.cvm)

    # Conclusions
    trace_stat, crit_value_trace = johansen_test_results.lr1[0], johansen_test_results.cvt[0, 1]
    max_stat, crit_value_max = johansen_test_results.lr2[0], johansen_test_results.cvm[0, 1]
    if trace_stat > crit_value_trace and max_stat > crit_value_max:
        print(f"Cointegration exists between Close_ETH and Close_{code}, which rejects the null hypothesis.")
    else:
        print(f"No cointegration between Close_ETH and Close_{code}, which fails to reject the null hypothesis.")
    print("\n" + "-"*50 + "\n")


Testing cointegration between Close_ETH and Close_BTC:
Eigenvalues: [0.02086987 0.0013698 ]
Trace Statistics: [8.15351101 0.49757877]
Critical Values (Trace): [[10.4741 12.3212 16.364 ]
 [ 2.9762  4.1296  6.9406]]
Max-Eigen Statistics: [7.65593224 0.49757877]
Critical Values (Max-Eigen): [[ 9.4748 11.2246 15.0923]
 [ 2.9762  4.1296  6.9406]]
No cointegration between Close_ETH and Close_BTC, which fails to reject the null hypothesis.

--------------------------------------------------

Testing cointegration between Close_ETH and Close_DOGE:
Eigenvalues: [0.01220348 0.00192135]
Trace Statistics: [5.15523696 0.69812187]
Critical Values (Trace): [[10.4741 12.3212 16.364 ]
 [ 2.9762  4.1296  6.9406]]
Max-Eigen Statistics: [4.45711509 0.69812187]
Critical Values (Max-Eigen): [[ 9.4748 11.2246 15.0923]
 [ 2.9762  4.1296  6.9406]]
No cointegration between Close_ETH and Close_DOGE, which fails to reject the null hypothesis.

--------------------------------------------------

Testing cointegrat

In [25]:
import pandas as pd
import numpy as np
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller

def check_and_clean_data(df):
    # Check for NaN and Inf values
    if df.isna().any().any() or np.isinf(df).any().any():
        # Handle NaN values with forward fill and backward fill
        df = df.ffill().bfill()
        
        # Handle Inf values
        df.replace([np.inf, -np.inf], np.nan, inplace=True)
        # Again handle any NaNs that might have been introduced
        df = df.ffill().bfill()

    return df

# Clean data before analysis
combined_df_clean = check_and_clean_data(combined_df)

# Engle-Granger Cointegration Test
def engle_granger_cointegration_test(y, x):
    # Perform OLS regression
    model = sm.OLS(y, x)
    result = model.fit()

    # Get the residuals
    residuals = result.resid

    # Perform ADF Test on residuals
    adf_test_result = adfuller(residuals)
    return residuals, adf_test_result

# Iterate over each currency pair
for code in file_paths.values():
    if code == 'USDT':
        continue

    # Ensure using cleaned data
    y = combined_df_clean[f'Close_USDT']
    x = combined_df_clean[f'Close_{code}']

    # Perform Engle-Granger Test
    residuals, adf_test_result = engle_granger_cointegration_test(y, x)
    print(f"Engle-Granger Test between Close_USDT and Close_{code}:")
    print("ADF Statistic: %f" % adf_test_result[0])
    print("p-value: %f" % adf_test_result[1])
    print("Critical Values:")
    for key, value in adf_test_result[4].items():
        print('\t%s: %.3f' % (key, value))

    # Interpretation of ADF Test results
    if adf_test_result[1] < 0.05:
        print(f"The series is stationary. Reject the null hypothesis. Suggests cointegration.")
    else:
        print(f"The series is non-stationary. Fail to reject the null hypothesis. No cointegration.")
    print("\n" + "-"*50 + "\n")


Engle-Granger Test between Close_USDT and Close_ETH:
ADF Statistic: -1.945244
p-value: 0.311085
Critical Values:
	1%: -3.448
	5%: -2.869
	10%: -2.571
The series is non-stationary. Fail to reject the null hypothesis. No cointegration.

--------------------------------------------------

Engle-Granger Test between Close_USDT and Close_BTC:
ADF Statistic: -2.655877
p-value: 0.081990
Critical Values:
	1%: -3.448
	5%: -2.869
	10%: -2.571
The series is non-stationary. Fail to reject the null hypothesis. No cointegration.

--------------------------------------------------

Engle-Granger Test between Close_USDT and Close_DOGE:
ADF Statistic: -1.587185
p-value: 0.490028
Critical Values:
	1%: -3.448
	5%: -2.869
	10%: -2.571
The series is non-stationary. Fail to reject the null hypothesis. No cointegration.

--------------------------------------------------

Engle-Granger Test between Close_USDT and Close_XRP:
ADF Statistic: -1.876473
p-value: 0.343209
Critical Values:
	1%: -3.448
	5%: -2.870
	

In [23]:
import statsmodels.api as sm
from statsmodels.tsa.stattools import adfuller

def check_and_clean_data(df):
    # Check for NaN and Inf values
    if df.isna().any().any() or np.isinf(df).any().any():
        # Handle NaN values with forward fill and backward fill
        df = df.ffill().bfill()
        
        # Handle Inf values
        df.replace([np.inf, -np.inf], np.nan, inplace=True)
        # Again handle any NaNs that might have been introduced
        df = df.ffill().bfill()

    return df

# Clean data before analysis
combined_df_clean = check_and_clean_data(combined_df)

# Engle-Granger Cointegration Test
def engle_granger_cointegration_test(y, x):
    # Perform OLS regression
    model = sm.OLS(y, x)
    result = model.fit()

    # Get the residuals
    residuals = result.resid

    # Perform ADF Test on residuals
    adf_test_result = adfuller(residuals)
    return residuals, adf_test_result

# Iterate over each currency pair
for code in file_paths.values():
    if code == 'ETH':
        continue

    # Ensure using cleaned data
    y = combined_df_clean[f'Close_ETH']
    x = combined_df_clean[f'Close_{code}']

    # Perform Engle-Granger Test
    residuals, adf_test_result = engle_granger_cointegration_test(y, x)
    print(f"Engle-Granger Test between Close_ETH and Close_{code}:")
    print("ADF Statistic: %f" % adf_test_result[0])
    print("p-value: %f" % adf_test_result[1])
    print("Critical Values:")
    for key, value in adf_test_result[4].items():
        print('\t%s: %.3f' % (key, value))

    # Interpretation of ADF Test results
    if adf_test_result[1] < 0.05:
        print(f"The series is stationary. Reject the null hypothesis. Suggests cointegration.")
    else:
        print(f"The series is non-stationary. Fail to reject the null hypothesis. No cointegration.")
    print("\n" + "-"*50 + "\n")


Engle-Granger Test between Close_ETH and Close_BTC:
ADF Statistic: -2.229643
p-value: 0.195663
Critical Values:
	1%: -3.448
	5%: -2.869
	10%: -2.571
The series is non-stationary. Fail to reject the null hypothesis. No cointegration.

--------------------------------------------------

Engle-Granger Test between Close_ETH and Close_DOGE:
ADF Statistic: 1.821257
p-value: 0.998390
Critical Values:
	1%: -3.449
	5%: -2.870
	10%: -2.571
The series is non-stationary. Fail to reject the null hypothesis. No cointegration.

--------------------------------------------------

Engle-Granger Test between Close_ETH and Close_USDT:
ADF Statistic: -1.948358
p-value: 0.309665
Critical Values:
	1%: -3.448
	5%: -2.869
	10%: -2.571
The series is non-stationary. Fail to reject the null hypothesis. No cointegration.

--------------------------------------------------

Engle-Granger Test between Close_ETH and Close_XRP:
ADF Statistic: -2.765842
p-value: 0.063321
Critical Values:
	1%: -3.449
	5%: -2.870
	10%: