<a class="anchor" id="0.1"></a>

## Table of Contents

1. [Import libraries and set parameters](#1)
1. [Download data](#2)
1. [EDA](#3)
   - [3.1 Market Cap](#3.1)
   - [3.2 Cryptocurrency data](#3.2)
   - [3.3 Cryptocurrency features data](#3.3)
   - [3.4 Stationarity check](#3.4)
   - [3.5 Identification of seasonality](#3.5)
   - [3.6 EDA with Pandas Profiling Report](#3.6)   
1. [FE](#4)
   - [4.1 FE with TSFRESH](#4.1)
   - [4.2 FE from technical features (Finance knowledge and Data Science)](#4.2)
   - [4.3 Analysis of anomalies](#4.3)
       - [4.3.1 Analysis of anomalies for "Close"](#4.3.1)
       - [4.3.2 Analysis of anomalies for the first data difference "Close_diff"](#4.3.2)
   - [4.4 Analysis of the impact of COVID-19 on the cryptocurrency rate](#4.4)
   - [4.5 Get target, training, validation and test datasets for ML models](#4.5)
1. [Model training and forecasting](#5)
    - [5.1 Facebook Prophet](#5.1)
    - [5.2 ARIMA](#5.2)
        - [5.2.1 How to find the order of differencing (d) in ARIMA model](#5.2.1)
        - [5.2.2 How to find the order of the AR term (p)](#5.2.2)
        - [5.2.3 How to find the order of the MA term (q)](#5.2.3)
        - [5.2.4 How to build the ARIMA Model with manually defined parameters](#5.2.4)
        - [5.2.5 How to build the ARIMA automatically](#5.2.5)
    - [5.3 Other ML models (Multi-factors models)](#5.3)
        - [5.3.1 Set parameters for many models](#5.3.1)
        - [5.3.2 Models training and forecasting](#5.3.2)
    - [5.4 Choosing the main optimal model and forecasting](#5.4)
    - [5.5 Feature importance study](#5.5)    

1. Import libraries and set parameters

In [23]:
# Import libraries
import random
import os
import numpy as np 
import pandas as pd 
import requests
import pandas_datareader as web

# Date
import datetime as dt
from datetime import date, timedelta, datetime

# EDA
import matplotlib.pyplot as plt
from matplotlib.pylab import rcParams
import plotly.express as px
import plotly.graph_objects as go
from plotly.offline import init_notebook_mode
init_notebook_mode(connected=True)
import pandas_profiling as pp

# FE
from tsfresh import extract_features, select_features, extract_relevant_features
from tsfresh.utilities.dataframe_functions import impute
from sklearn.inspection import permutation_importance
import eli5
from eli5.sklearn import PermutationImportance
import shap

# Time Series - EDA and Modelling
import statsmodels.api as sm
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.stattools import adfuller
from statsmodels.tsa.seasonal import seasonal_decompose
from statsmodels.tsa.arima_model import ARIMA

# Metrics
from sklearn.metrics import r2_score
from sklearn.metrics import mean_squared_error, mean_absolute_percentage_error

# Modeling and preprocessing
from sklearn.preprocessing import StandardScaler, MinMaxScaler
from sklearn.model_selection import train_test_split, GridSearchCV
from sklearn.linear_model import LinearRegression
from sklearn.svm import SVR, LinearSVR
from sklearn.neighbors import KNeighborsRegressor
from sklearn.ensemble import RandomForestRegressor
from sklearn.ensemble import BaggingRegressor, AdaBoostRegressor
from sklearn.neural_network import MLPRegressor
from prophet import Prophet
import xgboost as xgb
from xgboost import XGBRegressor
import lightgbm as lgb
from lightgbm import LGBMRegressor

import warnings
warnings.filterwarnings("ignore")

ModuleNotFoundError: No module named 'tsfresh'

In [22]:
# !pip install pandas_datareader
# import sys
# !{sys.executable} -m pip install pandas-profiling
conda install -c conda-forge pandas-profiling

SyntaxError: invalid syntax (Temp/ipykernel_30840/2545233521.py, line 4)

In [9]:
# What EDA & FE techniques use?
is_EDA_with_Pandas_Profiling = True # or False - Get Pandas Profiling Report or no?
is_EDA_with_COVID19_data = True # or False - Make EDA with COVID-19 data or no?
is_anomalies = True # or False - Take into account anomalies or no?

In [10]:
# What type of model to use?
is_Prophet = True   # or False - Facebook Prophet
is_ARIMA = True     # or False - ARIMA and AutoARIMA
is_other_ML = True  # or False - multi-factors models: trees, neural networks, etc.

In [19]:
# Automatic building ARIMA for Time Series
if is_ARIMA:
    !pip install pmdarima
    import pmdarima as pm

Collecting pmdarima
  Using cached pmdarima-2.0.1-cp39-cp39-win_amd64.whl (571 kB)
Collecting statsmodels>=0.13.2
  Using cached statsmodels-0.13.2-cp39-cp39-win_amd64.whl (9.1 MB)
Collecting numpy>=1.21
  Using cached numpy-1.23.2-cp39-cp39-win_amd64.whl (14.7 MB)
  Using cached numpy-1.22.4-cp39-cp39-win_amd64.whl (14.7 MB)
Collecting packaging>=21.3
  Using cached packaging-21.3-py3-none-any.whl (40 kB)
Installing collected packages: numpy, packaging, statsmodels, pmdarima
  Attempting uninstall: numpy
    Found existing installation: numpy 1.20.3
    Uninstalling numpy-1.20.3:


ERROR: Could not install packages due to an OSError: [WinError 5] Access is denied: 'c:\\programdata\\anaconda3\\lib\\site-packages\\numpy-1.20.3.dist-info\\direct_url.json'
Consider using the `--user` option or check the permissions.



ModuleNotFoundError: No module named 'pmdarima'

In [17]:
# Set random state
def fix_all_seeds(seed):
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)

random_state = 42
fix_all_seeds(random_state)

**TASK:** It is proposed to experiment with forecasting_days

In [18]:
# Set main parameters
cryptocurrency = 'BTC'
target = 'Close'
forecasting_days = 10  # forecasting_days > 1