#### Import libraries

In [1]:
from pandas_datareader import data as pdr
from datetime import date
import yfinance as yf
yf.pdr_override()
import pandas as pd
import numpy as np

#### 1.	Returns	

#### a.	Download 1-2 years of price history of a stock.

In [2]:
start_date= '2020-01-01'
end_date='2021-10-30'

btc = pdr.get_data_yahoo('BTC-USD', start=start_date, end=end_date)
btc = pd.DataFrame(btc['Close'])
btc = btc.rename(columns={'Close':'Bitcoin'})

eth = pdr.get_data_yahoo('ETH-USD', start=start_date, end=end_date)
eth = pd.DataFrame(eth['Close'])
eth = eth.rename(columns={'Close':'Ethereum'})

data = pd.concat([btc,eth], axis = 1) 
data

[*********************100%***********************]  1 of 1 completed
[*********************100%***********************]  1 of 1 completed


Unnamed: 0_level_0,Bitcoin,Ethereum
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-01,7200.174316,130.802002
2020-01-02,6985.470215,127.410179
2020-01-03,7344.884277,134.171707
2020-01-04,7410.656738,135.069366
2020-01-05,7411.317383,136.276779
...,...,...
2021-10-26,60363.792969,4131.102051
2021-10-27,58482.386719,3930.257324
2021-10-28,60622.136719,4287.318848
2021-10-29,62227.964844,4414.746582


#### b. Compute its log return. 

In [3]:
df_ret = data.copy(deep=True)

for i in data.columns:
  if i != 'Date':
    df_ret[i] = np.log(data[i]) - np.log(data[i].shift(1))
df_ret = df_ret.dropna()
df_ret

Unnamed: 0_level_0,Bitcoin,Ethereum
Date,Unnamed: 1_level_1,Unnamed: 2_level_1
2020-01-02,-0.030273,-0.026273
2020-01-03,0.050172,0.051709
2020-01-04,0.008915,0.006668
2020-01-05,0.000089,0.008899
2020-01-06,0.047161,0.057235
...,...,...
2021-10-26,-0.043377,-0.020788
2021-10-27,-0.031664,-0.049839
2021-10-28,0.035934,0.086957
2021-10-29,0.026144,0.029289


#### c. Compute the mean, standard deviation, skewness, and excess kurtosis of its log return. 

In [4]:
print('Mean:\n\n',df_ret.mean())

Mean:

 Bitcoin     0.003240
Ethereum    0.005269
dtype: float64


In [5]:
print('Standard Deviation:\n\n', df_ret.std())

Standard Deviation:

 Bitcoin     0.041921
Ethereum    0.055786
dtype: float64


In [6]:
print('Skewness:\n\n', df_ret.skew())

Skewness:

 Bitcoin    -2.038285
Ethereum   -1.727125
dtype: float64


In [7]:
print('Kurtosis:\n\n', df_ret.kurt())

Kurtosis:

 Bitcoin     24.196784
Ethereum    17.128672
dtype: float64


#### d. Repeat for a second stock.  
Done above

#### e.	Compute the covariance and the correlation. Explain their difference. How do you convert one to the other?

In [8]:
print('Correlation:\n\n', df_ret.corr())

Correlation:

            Bitcoin  Ethereum
Bitcoin   1.000000  0.816289
Ethereum  0.816289  1.000000


In [9]:
print('Covariance:\n\n', df_ret.cov())

Covariance:

            Bitcoin  Ethereum
Bitcoin   0.001757  0.001909
Ethereum  0.001909  0.003112


Covariance and correlation both assess the relationship and dependability of two variables. Covariance indicates the direction of a linear relationship between two variables, whereas correlation assesses both the strength and direction of a linear relationship between two variables. Correlation is determined by covariance. The difference between these two concepts is that correlation values are standardised whereas covariance values are not.

The correlation coefficient of two variables can be calculated by dividing their covariance by the product of their standard deviations. We can also go in the opposite direction, as shown below.

The variances (or standard deviations) of the p variables are required to convert a p x p correlation matrix to a covariance matrix. Remember that the ijth element of the correlation matrix is related to the corresponding element of the covariance matrix using the formula Rij = Sij / mij, where mij is the product of the ith and jth variables' standard deviations. The correlation matrix can be rescaled by pre- and post-multiplying it by a diagonal matrix containing the standard deviations:

In [10]:
df_std = df_ret.std()
df_corr = df_ret.corr()

In [11]:
df_cov = df_corr.multiply(df_std.multiply(df_std.T.values))

In [12]:
df_cov

Unnamed: 0,Bitcoin,Ethereum
Bitcoin,0.001757,0.00254
Ethereum,0.001435,0.003112


We converted correlation into covariance as can be checked above.

#### 2.	Build your own transition

#### a. Divide the data into 2 uneven parts: the first part is 80% of your data, and the second part is 20%. 


In [13]:
df_train = df_ret.head(542)
df_test = df_ret.tail(122)

#### b. Categorize each day in the 1-2 year price history as belonging to one of four categories:
    i. Both stocks up
    ii. Stock #1 up, stock #2 down
    iii. Stock #1 down, stock #2 up
    iv. Both stocks down

In [21]:
conditions = [
    (df_train['Bitcoin'] > 0) & (df_train['Ethereum'] > 0),
    (df_train['Bitcoin'] > 0) & (df_train['Ethereum'] < 0),
    (df_train['Bitcoin'] < 0) & (df_train['Ethereum'] > 0),
    (df_train['Bitcoin'] < 0) & (df_train['Ethereum'] < 0)]
choices = ['uu', 'ud', 'du', 'dd']
df_train['Classification'] = np.select(conditions, choices, default='error')
df_train

A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy
  df_train['Classification'] = np.select(conditions, choices, default='error')


Unnamed: 0_level_0,Bitcoin,Ethereum,Classification
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1
2020-01-02,-0.030273,-0.026273,dd
2020-01-03,0.050172,0.051709,uu
2020-01-04,0.008915,0.006668,uu
2020-01-05,0.000089,0.008899,uu
2020-01-06,0.047161,0.057235,uu
...,...,...,...
2021-06-26,0.017188,0.008797,uu
2021-06-27,0.073747,0.078638,uu
2021-06-28,-0.006233,0.049665,du
2021-06-29,0.040785,0.038261,uu


In [16]:
print( 'uu :',round(df_train[(df_train['Classification'] == 'uu')]['Classification'].count()/542,3))
print( 'dd :',round(df_train[(df_train['Classification'] == 'dd')]['Classification'].count()/542,3))
print( 'du :',round(df_train[(df_train['Classification'] == 'du')]['Classification'].count()/542,3))
print( 'ud :',round(df_train[(df_train['Classification'] == 'ud')]['Classification'].count()/542,3))

uu : 0.461
dd : 0.345
du : 0.113
ud : 0.081


#### c. Build a transition matrix of portfolio direction that shows your portfolio in four scenarios:
    i.  From moving together to moving together That means starting from uu or dd & going to uu or dd
    ii. From moving together to moving apart That means starting from uu or dd & going to ud or du
    iii. From moving apart to moving together That means starting from ud or du & going to uu or dd
    iv. From moving apart to moving apart That means starting from ud or du & going to ud or du  

In [19]:
transitions =  list(df_train['Classification'])
round(pd.crosstab(pd.Series(transitions[1:],name='Tomorrow'), pd.Series(transitions[:-1],name='Today'),normalize=1),3)

Today,dd,du,ud,uu
Tomorrow,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
dd,0.273,0.383,0.318,0.392
du,0.086,0.117,0.114,0.132
ud,0.086,0.033,0.114,0.084
uu,0.556,0.467,0.455,0.392


#### d. Discuss the similarities or differences of the two transition matrices.  

Because the stocks are highly correlated, they tend to move in lockstep most of the time. As shown in the first matrix, in 46 percent of the training days, both moved up, while in another 34 percent, both moved down, for a total of 80 percent of the period moving together. Using this sample, it is also possible to see that when the movement is 'dd,' there is a significant probability (55 percent) of a pullback 'uu' the following day. The two matrices are similar, but the second is more precise and detailed, providing the conditional probability of a move tomorrow as well as the move today.

#### e. Is the process Markovian? Be sure to comment how this relates to mean-reversion and momentum.


Markovian is a process for which predictions about future outcomes can be made solely based on its current state, and such predictions are just as good as those that could be made knowing the process's entire history. In other words, the system's future and past states are independent of its current state. We cannot say that Bitcoin and Ethereum returns (even if only classified as up or down) are Markovian because they are influenced by a number of other variables. Respectable models, on the other hand, treat stock returns as if they were random walks. As previously demonstrated, there is logic and value in presenting these processes in such a way that they are commonly used to analyse Markovian ones.However, any predictions or investments based on this assumption must be approached with extreme caution. It is also critical to incorporate (at least) momentum and mean-reversion into this analysis, as they can provide the investor with a more comprehensive view of the asset's short and long run dynamics. In other words, if such strategies have been extensively researched and proven to work in specific cases, we cannot claim that they are memoryless, as this would be a required property in a Markovian process.

#### f. Suppose you built a table with changes that are 3 days apart.  For example, this table would show changes from Day 1 to Day 4, Day 2 to Day 5, etc.  The four categories remain the same: uu, ud, du, an dd.  Here is the question: would you expect to be able to use the overnight table (e.g. Day 1 to Day 2, Day 2 to Day 3, etc.)  and derive the 3-days-apart table?  Why or why not?


In [23]:
transitions =  list(df_train['Classification'])
round(pd.crosstab(pd.Series(transitions[1:],name='3 days apart'), pd.Series(transitions[:-4],name='Today'),normalize=1),3)

Today,dd,du,ud,uu
3 days apart,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
dd,0.273,0.39,0.318,0.395
du,0.086,0.119,0.114,0.125
ud,0.086,0.034,0.114,0.085
uu,0.556,0.458,0.455,0.395
