# Links I used for this project 
- [Make a table in Jupyter](https://stackoverflow.com/questions/35160256/how-do-i-output-lists-as-a-table-in-jupyter-notebook)
- [Crypto Data metrics](https://coinmetrics.io/tools/)
- [Youtube ML tutorial](https://www.youtube.com/watch?v=rAdAVcS4aL0&list=PLQVvvaa0QuDd0flgGphKCej-9jp-QdzZ3&index=4)
- [Predicting Stock Prices using deep learning](https://towardsdatascience.com/getting-rich-quick-with-machine-learning-and-stock-market-predictions-696802da94fe)
- [Moving Average Crossover](https://www.europeanproceedings.com/files/data/article/44/1143/article_44_1143_pdf_100.pdf)
- [Common MAC Periods](https://www.investopedia.com/ask/answers/122414/what-are-most-common-periods-used-creating-moving-average-ma-lines.asp)
- [Algorithmic Financial Trading](https://www.researchgate.net/publication/324802031_Algorithmic_Financial_Trading_with_Deep_Convolutional_Neural_Networks_Time_Series_to_Image_Conversion_Approach)



# Me Navigating My Way Through the World of Data Science

<br/><br/>


<img src="photo/climbing.jpg" width = "60%">


<br/><br/>

### About Me:

I spend a lot of time climbing, playing music, and adventuring, but have started to miss my STEM roots. Machine learning application is something I feel naturally passionate about so I am doing all these projects for fun to discover if its something I want to persue. Please leave any comments and tell me what I need to improve!! Thank you! Also feel free to go critique my github, I could use any feedback I can get.

__email:__ jamorsicato@gmail.com

__github:__ https://github.com/jamorsicato

__Linekdin:__ https://www.linkedin.com/in/jonathan-morsicato-089977196/

__instagram:__ @jonnymorsicato (if you feel like checking out some Colorado mountain adventures)



In [2]:
# General imports 

import numpy as np 
import pandas as pd
import sklearn
from sklearn.model_selection import train_test_split
from sklearn import preprocessing
from sklearn.metrics import mean_squared_error, r2_score

#
import os
import time
from datetime import datetime 

#Make table for Data Exploration 
from IPython.display import HTML, display
import tabulate

# family imports
## families are a broad type of model
from sklearn.ensemble import RandomForestRegressor

# cross validation tools
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV

# plotting imports 
import matplotlib.pyplot as plt

# alternative to pickle and save our model for the future
## from sklearn.externals import joblib !!!! This didnt load, maybe we will use pickle at the end to save our model 
import pickle

# Following "Algorithmic Financial trading" 

## Overview 

- __General Workflow__
    - dataset extract/transform, (DONE)
    - labelling data (label all data as buy/sell/hold)
    - image creation,
    - CNN analysis
    - ﬁnancial evaluation phases
- __Goal__: determine best fit for buy/ sell/ and hold positions associated with crypto prices

<img src="photo/algo_trading_flowchart.png" width = "80%">

- __Image Creation:__ 
    - For each day 
    - RSI, Williams %R, WMA, EMA, SMA, HMA,Triple EMA, CCI, CMO, MACD, PPO, ROC, CMFI, DMI, and PSI values
    - Intervals 6 - 20 days 
    - We want oscialtor analysis indicators that we can turn into a signal image 
    - 15x15 image is generated by using 15 technical indicators and 15 diﬀerentintervals of technical indicators
    - Its important that the order of data is unique and cant be changed 
    
- __Labeling__ 
    - (This is NOT how i am gonna do it)“Hold”, “Buy”, or “Sell” by determining the top and bottom points in a sliding win- dow. Bottom points are labeled as “Buy”, top points are labeled as “Sell”, and the remaining points are labeled as “Hold”.
    - make 1 model that uses labeled data off of a simple min max in a sliding window
    - make another training set that uses a min max on an Exponential Moving average.
    
## Method
- Libraries Used: Apache Spark, Keras and Tensorﬂow

## Workflow

- Where are we gonna get the data? (Do not take data at a very high rate DDOS) (Contact Companies)
- What estimator am i gonna use? (looks like I will use SGD or Linear SVC)
- Technical Analysis or Fundimental Analaysis 
- CNN (Convolutional Neural Network) or LSTM (Long Term Short Memory)
- Organize and decide what metrics are useful
- need to label our data set with a Buy, Hold, and Sell label for every data point
- build training and testing sets 
- pickl and export model
- build app with ML into investing spreadsheet? 



## Goal: Predict tomorrows bitcoin price 

- Input{} Output{price USD}


## Design

- I want this ML algorithm to begin to tell me when to buy and sell crypto. Its important to note tha this is probably not gonna work as this model will be pretty Naive, but i think it would be interesting. We alwasy have access to current and previous data, but we alwasy wish we knew the future. I 

- Inputs 
    - Moving average crossover (time lagged) (NEED TO CALC FROM PRICE)
    - volume (FROM DATA SET)
    - Hash difficulty (FROM DATA SET)
    - Market Cap 
    - Bitcoin Difficulty 
    - Addresses (sum fof unique addresses in the network)
    
- Output 
    - Price
    
   
   
- PricePrediction(i) = {x(i-k),y(i-k),z(i-k),..., etc} __where__ i = current time period, and k = time lag
    
    
## Questions

- why does somehting have value?
- what makes bitcoins price go up or down?
    - price increases when buying pressure goes up
    - regulations on its buy/sell
    - 
    
    
## Notes

- dataset extract/transform, (DONE)
- labelling data (label all data as buy/sell/hold)
- image creation,
- CNN analysis
- ﬁnancial evaluation phases.

- Apache Spark, Keras and Tensorﬂow to create and analyze theimages and perform big data analytics

- Use a sliding window for training and testing (Train: (1-5) + i years Test: 6 + i years)) i = i + 1
    - using the sliding window allows for model refinement by retetsting and retraining



In [10]:
## Load Data For Use into pandas dataframe
# I need to get better at scraping data directly from a website

# for the initial model I will only use btc, after i know it works I will build the model to include others
df = pd.read_csv("/Users/jonnymorsicato/Desktop/Data Science/machine_learning_practice/data/btc.csv")

# data fram with the definitions for all columns in data set 
df_m = pd.read_csv("/Users/jonnymorsicato/Desktop/Data Science/machine_learning_practice/data/metrics_info.csv")

# We will calc everything im USD for ease of presentation

# Columns of interest
# - date - 
# - market cap -5 - the sum USD value of the current supply
# - difficulty -7 - difficult of finding a new block in that interval
# - Addresses - 0 - sum of unique addresses that were active in the network.
# - splyFF -   28 - number of native assets that are available for trade 
# - HashRate - 14 - the mean hash rate
# - TxTfrCnt - 30 - sum of transfers in that interval. Movement of native token between transactions (demand?)
# - Lagged MAC - MAKE THIS, lag an exponential moving average crossover DEMA = 2*EMA - EMA(EMA)

# - PriceUSD - 23 closing price of BTC at 00:00 Utc   OUTPUT VALUE 


# Questions, what makes something have value
# why 

In [4]:
display(HTML(tabulate.tabulate(df_m, tablefmt='html')))

0,1,2,3,4,5,6,7,8,9
0,AdrActCnt,"Addresses, active, count",Addresses,Active,Sum,Addresses,1 day,1 block,The sum count of unique addresses that were active in the network (either as a recipient or originator of a ledger change) that interval. All parties in a ledger change action (recipients and originators) are counted. Individual addresses are not double-counted if previously active.
1,BlkCnt,"Block, count",Network Usage,Blocks,Sum,Blocks,1 day,,The sum count of blocks created that interval that were included in the main (base) chain.
2,BlkSizeByte,"Block, size, bytes",Network Usage,Blocks,Sum,Bytes,1 day,1 block,The sum of the size (in bytes) of all blocks created that interval.
3,BlkSizeMeanByte,"Block, size, mean, bytes",Network Usage,Blocks,Mean,Bytes,1 day,,The mean size (in bytes) of all blocks created that day.
4,CapMVRVCur,"Capitalization, MVRV, current supply",Market,Market Capitalization,Ratio,Dimensionless,1 day,,"The ratio of the sum USD value of the current supply to the sum ""realized"" USD value of the current supply."
5,CapMrktCurUSD,"Capitalization, market, current supply, USD",Market,Market Capitalization,Product,USD,1 day,,The sum USD value of the current supply. Also referred to as network value or market capitalization.
6,CapRealUSD,"Capitalization, realized, USD",Market,Market Capitalization,Product,USD,1 day,,"The sum USD value based on the USD closing price on the day that a native unit last moved (i.e., last transacted) for all native units."
7,DiffMean,"Difficulty, mean",Mining,Difficulty,Mean,Dimensionless,1 day,1 block,"The mean difficulty of finding a hash that meets the protocol-designated requirement (i.e., the difficulty of finding a new block) that interval. The requirement is unique to each applicable cryptocurrency protocol. Difficulty is adjusted periodically by the protocol as a function of how much hashing power is being deployed by miners."
8,FeeMeanNtv,"Fees, transaction, mean, native units",Fees and Revenue,Fees,Mean,Native units,1 day,1 block,The mean fee per transaction in native units that interval.
9,FeeMeanUSD,"Fees, transaction, mean, USD",Fees and Revenue,Fees,Mean,USD,1 day,1 block,The USD value of the mean fee per transaction that interval.


In [5]:
df.head()

Unnamed: 0,date,AdrActCnt,BlkCnt,BlkSizeByte,BlkSizeMeanByte,CapMVRVCur,CapMrktCurUSD,CapRealUSD,DiffMean,FeeMeanNtv,...,TxTfrValAdjUSD,TxTfrValMeanNtv,TxTfrValMeanUSD,TxTfrValMedNtv,TxTfrValMedUSD,TxTfrValNtv,TxTfrValUSD,VtyDayRet180d,VtyDayRet30d,VtyDayRet60d
0,2009-01-03,0,0,0,,,,0.0,,,...,,,,,,0.0,,,,
1,2009-01-04,0,0,0,,,,0.0,,,...,,,,,,0.0,,,,
2,2009-01-05,0,0,0,,,,0.0,,,...,,,,,,0.0,,,,
3,2009-01-06,0,0,0,,,,0.0,,,...,,,,,,0.0,,,,
4,2009-01-07,0,0,0,,,,0.0,,,...,,,,,,0.0,,,,


In [17]:
#new data frma with values I want
df_a  = pd.DataFrame()

#columns I want
cols_want = {"date","CapMrktCurUSD","DiffMean","AdrActCnt","SplyFF","HashRate","TxTfrCnt","PriceUSD"}

# # make all column names lower case for ease of use
# dfc.columns = map(str.lower, df.columns)

#make new dataframe with columns I want
for c in cols_want:
       df_a[c] = df[c]
            
# snag price for creation of trading values below
pricUSD = df_a.PriceUSD
date = df_a.date

In [16]:
df_a.describe()

Unnamed: 0,HashRate,CapMrktCurUSD,SplyFF,PriceUSD,TxTfrCnt,AdrActCnt,DiffMean
count,4415.0,3860.0,4421.0,3860.0,4421.0,4421.0,4415.0
mean,20467820.0,59237950000.0,10531920.0,3358.532886,336412.6,353990.5,2805668000000.0
std,39036740.0,100746600000.0,4377399.0,5514.230986,307920.1,345336.7,5355028000000.0
min,1.988411e-07,177670.5,0.0,0.050541,0.0,0.0,1.0
25%,11.23733,482056600.0,8100838.0,44.285891,16808.0,17741.0,1577913.0
50%,308850.4,7096020000.0,12605020.0,526.897718,252820.0,237524.0,42557750000.0
75%,20375280.0,109681400000.0,13936850.0,6343.913488,606990.0,653042.0,2603077000000.0
max,174939200.0,866928500000.0,14550070.0,46548.161991,2041653.0,1344921.0,21434400000000.0


# Technical Analysis Strategy

# Simple Moving Average (200 Day)

- Calc Simple Moving Average for 200 day period
- Cross over points are good buy or sell indicators for long term positions
- Plot graph with 200 day average versus priceUSD with green dots on buy and red dots on sell
- Calc how much money 200 dollars would be after 5 years trading buy/ sell points


# Moving Average Crossover

- The assumption behind this is that we can trade with the momentum of the market. So if there was a positive momentum an hour ago it will probably keep going the same way untill we start hitting a moving average convergence and the momenum shifts. 

- I will use a an exponential cross over


## Questions
- Difference between a standard and exponential cross over 
- Common problems (lags price, npt fast enough for decisions)
- Best periods to use? 20 vs 50 day? 200 versus 50? 
- SMA or EMA
- moving averge price cross over

In [20]:
#Labeling
# for every price value we need a buy, sell, and hold
# y is out price 

# pseudo code
# calc SMA
# calc EMA
# calc 20/50 cross over

# procedure Labeling()
# windowSize = 11 days
# while(counterRow < numberOfDaysInFile)
# counterRow ++
# If (counterRow > windowSize)
# windowBeginIndex = counterRow − windowSize windowEndIndex = windowBeginIndex + windowSize − 1 windowMiddleIndex = (windowBeginIndex + windowEndIndex)/2 for (i = windowBeginIndex;i <= windowEndIndex;i ++)
# number = closePriceList. get(i) if(number < min)
# min = number
# minIndex = closePriceList. indexOf(min) if(number > max)
# max = number
# maxIndex = closePriceList. indexOf(max) if(maxIndex == windowMiddleIndex)
# result=”SELL”
# elif(minIndex == windowMiddleIndex)
# result=”BUY” else
# result=”HOLD”

window = 11
numDays = len(date)
i = 0  # day counter
j = 0  # row counter
buySellHold = []

# while(i < numDays):
#     i += 1
#     if j
    