# MLR in cryptocurrency pricing analysis: Proposal

In [1]:
library(broom)
install.packages("latex2exp")
library(latex2exp)
library(tidyverse)
library(repr)
library(digest)
library(gridExtra)

Updating HTML index of packages in '.Library'

Making 'packages.html' ...
 done

── [1mAttaching packages[22m ─────────────────────────────────────── tidyverse 1.3.1 ──

[32m✔[39m [34mggplot2[39m 3.3.5     [32m✔[39m [34mpurrr  [39m 0.3.4
[32m✔[39m [34mtibble [39m 3.1.5     [32m✔[39m [34mdplyr  [39m 1.0.7
[32m✔[39m [34mtidyr  [39m 1.1.4     [32m✔[39m [34mstringr[39m 1.4.0
[32m✔[39m [34mreadr  [39m 2.0.2     [32m✔[39m [34mforcats[39m 0.5.1

── [1mConflicts[22m ────────────────────────────────────────── tidyverse_conflicts() ──
[31m✖[39m [34mdplyr[39m::[32mfilter()[39m masks [34mstats[39m::filter()
[31m✖[39m [34mdplyr[39m::[32mlag()[39m    masks [34mstats[39m::lag()


Attaching package: ‘gridExtra’


The following object is masked from ‘package:dplyr’:

    combine




![alt text](https://ethereum.org/static/28214bb68eb5445dcb063a72535bc90c/f3583/hero.png)

## Introduction

### Background
Cryptocurrency is a decentralized "purely peer-to-peer version of electronic cash" (Nakamoto, 2009). ETH (Ether) is one of the native cryptocurrencies designed for the Ethereum platform (Buterin, n.d.). As the second cryptocurrency only to Bitcoin in market capitalization, ETH has a tenfold price increase in one year since November, 2020 ([Yahoo finance](https://finance.yahoo.com/quote/ETH-USD/)). The surging value of cryptocurrencies raises studies with computational tools for price prediction.
### Purpose
This study will apply MLR and assessment / selection methods on the ETH dataset, aiming to build an optimized model that explains the linearity between inputs and prices, and being able to predict the future prices of ETH.
### Dataset

In [9]:
eth_data <- read_csv('https://raw.githubusercontent.com/gzzen/stat-301-group-project/main/eth_dataset.csv')
head(eth_data, 3)
tail(eth_data, 3)

[1m[1mRows: [1m[22m[34m[34m493[34m[39m [1m[1mColumns: [1m[22m[34m[34m47[34m[39m

[36m──[39m [1m[1mColumn specification[1m[22m [36m────────────────────────────────────────────────────────[39m
[1mDelimiter:[22m ","
[32mdbl[39m  (46): reddit_index, block_size, mining_difficulty, ethe_open_interest, ...
[34mdate[39m  (1): date


[36mℹ[39m Use [30m[47m[30m[47m`spec()`[47m[30m[49m[39m to retrieve the full column specification for this data.
[36mℹ[39m Specify the column types or set [30m[47m[30m[47m`show_col_types = FALSE`[47m[30m[49m[39m to quiet this message.



date,reddit_index,block_size,mining_difficulty,ethe_open_interest,activate_address,on_chain_transaction_vol,unit_computing_income,daily_block_reward,block_gen_rate,⋯,trade_vol_l,trade_vol_x,short,long,ethe_otc_premium,block_count,avg_computing_power,transaction_vol,supply,reddit_follower_growth_rate
<date>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2020-06-24,2004.993,38301.08,2405277000000000.0,1702398,566754,5530577,0.0202,3723607,13.4,⋯,0,0,141651691,137850993,3.5046,6450,191623300000000.0,1042273,108818350,0.086374
2020-06-25,2035.98,35299.83,2406899000000000.0,1702281,584713,2795780,0.0197,3632507,13.4,⋯,0,0,128435656,115084909,3.0171,6448,191503100000000.0,1017408,108831273,0.101543
2020-06-26,2056.894,36471.11,2387370000000000.0,1702281,554079,1809875,0.0194,3507628,13.6,⋯,0,0,78961382,73166365,3.0171,6365,187735500000000.0,1017192,108844030,0.104361


date,reddit_index,block_size,mining_difficulty,ethe_open_interest,activate_address,on_chain_transaction_vol,unit_computing_income,daily_block_reward,block_gen_rate,⋯,trade_vol_l,trade_vol_x,short,long,ethe_otc_premium,block_count,avg_computing_power,transaction_vol,supply,reddit_follower_growth_rate
<date>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,⋯,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>,<dbl>
2021-10-27,272.2137,67421.64,1.000351e+16,3128215,812670,1804725,0.083,59779364,13.5,⋯,163367864,88796720,3930069648,3823507926,-0.0205612,6408,782147200000000.0,1510502,115168748,0.37011
2021-10-28,254.6987,64129.1,1.006534e+16,3128001,826217,2972658,0.0858,63507365,13.6,⋯,180195352,152798432,4755173026,4677848018,-0.0289822,6368,784279700000000.0,1456333,115181507,0.45518
2021-10-29,264.5458,62412.12,1.023876e+16,3128001,765907,1313643,0.0835,65172687,13.6,⋯,102822580,41716992,2839980504,2612652448,-0.0289822,6345,798609000000000.0,1379095,115194221,0.391936


The raw data is pulled from [qkl123](https://www.qkl123.com/project/eth/data) that consists a comprehensive collection of featured indicators for ETH. The semi-finished dataset `eth_data` is then hand-crafted by merging columns with Python ([resource](https://github.com/gzzen/stat-301-group-project/blob/main/proposal/dataset/merge_dataset.ipynb)). Below we roughly peek at columns of the data. 

In [10]:
colnames(eth_data)

The composition of dataset have 5 categories:
- `price` is the response variable. `date` helps sorting out desired time interval.
- `block_size`, `mining_difficulty`, etc. consists information of the blockchain.
- `business_vol`, `outflow_retail`, etc. are indicators in spot market.
- `long_margin_call`, `margin_call_l`, etc. are indicators in future market.
- `reddit_index`, `ethe_otc_premium`, etc. have no specific attributes, but are interesting to consider.

## Reference

Nakamoto, S. (2009). Bitcoin: A peer-to-peer electronic cash system. Retrieved from https://bitcoin.org/bitcoin.pdf.

Buterin, V. (n.d.). Ethereum whitepaper. Retrieved from https://ethereum.org/en/whitepaper/. 