# Exploratory Data Analysis-focusing on Moveing Averages with Crypto-currency Data

### Getting the Data

I have uploaded to this [Kaggle Dataset](https://www.kaggle.com/paulrohan2020/crypto-data) which is a zipped file with 26,320 `.csv` files containing the top cryptocurrencies on https://coinmarketcap.com/ by market cap worldwide. After 20:45:05 on August 4, data was collected every five minutes for three months.

Also I have uploaded the 9,432 .csv files based on which this below EDA analysis done into my [Github repository](https://github.com/rohan-paul/Cryptocurrency-Kaggle/tree/main/Notebooks/kaggle/input/crypto-data).

This dataset is from **CoinMarketCap Data** From August 4 to November 4, 2017

Filenames represent the date and time at which the data was collected: ymd-hms. The data in cr_20170804-210505.csv was collected on August 4, 2017 at 21:05:05.

### The Columns in the Data

symbol,ranking by market cap,name,market cap,price,circulating supply,volume,% 1h,% 24h,% 1wk

# Some basics on Moving Average

##### Moving averages are one of the most often-cited data-parameter in the space of Stock market trading, technical analysis of market and is extremely useful for forecasting long-term trends.. And beyond its use in financial time series this is intensively used in signal processing to neural networks and it is being used quite extensively many other fields. Basically any data that is in a sequence.

The most commonly used Moving Averages (MAs) are the simple and exponential moving average. Simple Moving Average (SMA) takes the average over some set number of time periods. So a 10 period SMA would be over 10 periods (usually meaning 10 trading days).

Rolling mean/Moving Average (MA) smooths out price data by creating a constantly updated average price. This is useful to cut down “noise” in our price chart. Furthermore, this Moving Average could act as “Resistance” meaning from the downtrend and uptrend of stocks you could expect it will follow the trend and less likely to deviate outside its resistance point.

### Factors to choose the Simple Moving Average (SMA) window or period

In order to find the best period of an SMA, we first need to know how long we are going to keep the stock in our portfolio. If we are swing traders, we may want to keep it for 5–10 business days. If we are position traders, maybe we must raise this threshold to 40–60 days. If we are portfolio traders and use moving averages as a technical filter in our stock screening plan, maybe we can focus on 200–300 days.

---

### Now some real-world Exploratory Data Analysis with real Crypto-currency data from Coinbase


In [1]:
from mpl_toolkits.mplot3d import Axes3D
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt 
import numpy as np
import os 
import pandas as pd
import glob

In [2]:
file_dir = '/kaggle/input/crypto-data/Crypto-Coinmarketcap/'

In [3]:
nRowsRead = 1000 
df1 = pd.read_csv(file_dir+'cr_20170804-034052.csv', delimiter=',', nrows = nRowsRead)
df2 = pd.read_csv(file_dir+'cr_20170804-035004.csv', delimiter=',', nrows = nRowsRead)


FileNotFoundError: [Errno 2] No such file or directory: '/kaggle/input/crypto-data/Crypto-Coinmarketcap/cr_20170804-034052.csv'

In [None]:
nRowsRead = 1000 
nRow, nCol = df1.shape
print(f'There are {nRow} rows and {nCol} columns')

In [None]:
df1.head(5)

Distribution graphs (histogram/bar graph) of sampled columns:

In [None]:
df2.dataframeName = 'cr_20170804-035004.csv'
nRow, nCol = df2.shape
print(f'There are {nRow} rows and {nCol} columns')

In [None]:
df2.head(5)

Distribution graphs (histogram/bar graph) of sampled columns:

In [None]:
print(df1.shape)
print(df1.dtypes)

In [None]:
!ls $file_dir | wc -l 


In [None]:
all_files = glob.glob(os.path.join(file_dir, "*.csv"))
files_list = all_files[:9432]
df1 = pd.read_csv(files_list[0])
df2 = pd.read_csv(files_list[1])

df1.head()

In [None]:
df2.head()

### Code to combine 9432 .csv files into a single dataframe and then
### Filter data for 'Symbol' column == 'BTC'
### generating a .csv file out that combined-single dataframe to work with.

As we can see above, all these files have the same columns so it seems reasonable to concatenate everything into one dataframe. However, I want to keep track of the file names because that's the only reference to the date of the records.

- First, creating a list of dataframes with the filenames in a "file_name" column
- Then concatenate them all into one big dataframe

#### The  below are the scripts for that, but I have commented-out all of these lines,
#### as obviously I dont want to run this huge process-intensive steps every time
#### of creating a single DataFrame out of 9432 .csv files.

##### The above dataframe has all the SYMBOLS of all the crypto-currencies as was in the individual .csv files.
##### But now I want to extract ONLY the symbol 'BTC' for Bitcoin for the further analysis.

Below is the code for that.

#### Now generating a .csv file contain which will be used as a training dataset
#### This file is created out that combined-single dataframe (that I earlier created from 9432 .csv files )
as obviously I dont want to run this huge process-intensive step of creating a single
Data-Frame out of 9432 .csv files.

Below code is commented out as well, because I have run this script just once to create the file.
And then I have saved the file (to be used as an input) both in Kaggle and also for my local machine

In [None]:
original_btc_train = pd.read_csv("/kaggle/input/crypto-data/train_BTC_combined.csv")

original_btc_train.head()

In [None]:
original_btc_train.shape

Lets analyze the dataframe with .info() method. This method prints a concise summary of the data frame, including the column names and their data types, the number of non-null values, the amount of memory used by the data frame.

In [None]:
original_btc_train.info()

As shown above, the data sets do not contain null values but some of the columns where I expected numerical or float values, instead contain object Dtype like the 'market cap' column.

Also starting with 'file_name' upto '1wk' columns dont have

In [None]:
print("All Features list", original_btc_train.columns.tolist())
print("\nMissing Values", original_btc_train.isnull().any())
print("\nUnique Values ", original_btc_train.nunique())

In [None]:
original_btc_train['market cap'] =original_btc_train['market cap'].str.replace(',', '')
original_btc_train['market cap'] =pd.to_numeric(original_btc_train['market cap'].str.replace('$', ''))
original_btc_train['market cap']

In [None]:
original_btc_train.head()

In [None]:
original_btc_train.describe()

In [None]:
original_btc_train['market cap'].astype('float')

In [None]:
btc_train = original_btc_train.set_index('file_name')
btc_train.head()

In [None]:
market_cap = btc_train[['market cap']]
market_cap.head()

# Rolling Mean (Moving Average) — to determine trend

A simple moving average, also called a rolling or running average is formed by computing the average price of a security over a specific number of periods. Most moving averages are based on closing prices; for example, a 5-day simple moving average is the five-day sum of closing prices divided by five. As its name implies, a moving average is an average that moves. Old data is dropped as new data becomes available, causing the average to move along the time scale. The example below shows a 5-day moving average evolving over three days.

```
Daily Closing Prices: 11,12,13,14,15,16,17

First day of 5-day SMA: (11 + 12 + 13 + 14 + 15) / 5 = 13

Second day of 5-day SMA: (12 + 13 + 14 + 15 + 16) / 5 = 14

Third day of 5-day SMA: (13 + 14 + 15 + 16 + 17) / 5 = 15
```

![img](https://i.imgur.com/TpOZqYb.png)

The first day of the moving average simply covers the last five days. The second day of the moving average drops the first data point (11) and adds the new data point (16).

So the simple moving average is the unweighted mean of the previous M data points. The selection of M (sliding window) depends on the amount of smoothing desired since increasing the value of M improves the smoothing at the expense of accuracy.

The moving average is used to analyze the time-series data by calculating averages of different subsets of the complete dataset. Since it involves taking the average of the dataset over time, it is also called a moving mean (MM) or rolling mean. Moving averages are widely used in finance to determine trends in the market and in environmental engineering to evaluate standards for environmental quality such as the concentration of pollutants.

The easiest way to calculate the simple moving average is by using the pandas.Series.rolling method. This method provides rolling windows over the data. On the resulting windows, we can perform calculations using a statistical function (in this case the mean). The size of the window (number of periods) is specified in the argument window.

In [None]:
market_cap.rolling(window=3).mean()

In [None]:
market_cap['ma_rolling_3-Day'] = market_cap['market cap'].rolling(window=3).mean().shift(1)
market_cap['ma_rolling_30-Day'] = market_cap['market cap'].rolling(window=30).mean().shift(1)
market_cap['ma_rolling_3-Months'] = market_cap['market cap'].rolling(window=90).mean().shift(1)
market_cap

In [None]:
colors = ['steelblue', 'red', 'purple', 'black']

market_cap.plot(color=colors, linewidth=2, figsize=(20,6))


# Weighted moving average

Weighted moving average = (t weighting factor) + ((t-1) weighting factor-1) + ((t-n) * weighting factor-n)/n

**weighted moving average assigns a specific weight or frequency to each observation, with the most recent observation being assigned a greater weight than those in the distant past to obtain the average.**

**Example**

Assume that the number of periods is 10, and we want a weighted moving average of four stock prices of $70, $66, $68, and $69, with the first price being the most recent.

Using the information given, the most recent weighting will be 4/10, the previous period before that will be 3/10, and the next period before that will be 2/10, and the initial period weighting will be 1/10.

The weighting average for the four different prices will be calculated as follows:

#### WMA = [70 x (4/10)] + [66 x (3/10)] + [68 x (2/10)] + [69 x (1/10)]

WMA = $28 + $19.80 + $13.60 + $6.90 = $68.30

![img](https://i.imgur.com/MZO1bbC.png)

The accuracy of this model depends largely on your choice of weighting factors. If the time series pattern changes, you must also adapt the weighting factors.

When creating a weighting group, you enter the weighting factors as percentages. The sum of the weighting factors does not have to be 100%.

In [None]:
def weighted_mov_avg(weights):
    def calc(x):
        return (weights*x).mean()
    return calc

market_cap['market cap'].rolling(window=3).apply(weighted_mov_avg(np.array([0.5,1,1.5]))).shift(1)

market_cap['wma_rolling_3'] = market_cap['market cap'].rolling(window=3).apply(weighted_mov_avg(np.array([0.5,1,1.5]))).shift(1)
market_cap.plot(color=colors, linewidth=2, figsize=(20,6))

# Exponentially weighted moving average (EMA)

![img](https://i.imgur.com/bFRUfN3.png)

The formula states that the value of the moving average(S) at time t is a mix between the value of raw signal(x) at time t and the previous value of the moving average itself i.e. t-1. It is basically a value between the previous EMA and the current price The degree of mixing is controlled by the parameter a (value between 0–1).

The 'a' in the above is called the smoothing factor and sometime also denonted as  **𝛼** ( alpha ) is defined as:

![img](https://i.imgur.com/DCXU7Vc.jpg)

where 𝑛 is the number of days in our span. Therefore, a 10-day EMA will have a smoothing factor:

So the above Formulae can also be written as by simpley re-arranging the terms in the above formulae

### Exponential moving average = (Closing Price - Previous EMA) * (2/(Alpha + 1)) + Previous EMA

So,
- if a = 10%(small), most of the contribution will come from the previous value of the signal. In this case, “smoothing” will be very strong.
- if a = 90%(large), most of the contribution will come from the current value of the signal. In this case, “smoothing” will be minimum.


By looking at the documentation, we can note that the .ewm() method has an adjust parameter that defaults to True. This parameter adjusts the weights to account for the imbalance in the beginning periods (if you need more detail, see the Exponentially weighted windows section in the [pandas documentation](https://pandas.pydata.org/pandas-docs/stable/user_guide/computation.html#exponentially-weighted-windows)).

In [None]:
market_cap['market cap'].ewm(span=3, adjust=False, min_periods=0).mean()

In [None]:
market_cap['ewm_window_3'] = market_cap['market cap'].ewm(span=3, adjust=False, min_periods=0).mean().shift(1)
market_cap

In [None]:
market_cap[['ewm_window_3']].plot(color=colors, linewidth=2, figsize=(20,6))

# What Is Exponential Smoothing?

Exponential smoothing is a time series forecasting method for univariate data. Exponential smoothing forecasting is a weighted sum of past observations, but the model explicitly uses an exponentially decreasing weight for past observations. Specifically, past observations are weighted with a geometrically decreasing ratio.

The underlying idea of an exponential smoothing model is that, at each period, the model will learn a bit from the most recent demand observation and remember a bit of the last forecast it did.

The smoothing parameter (or learning rate) alpha will determine how much importance is given to the most recent demand observation.

![img](https://i.imgur.com/eL82Ugp.jpg)

Where 0 <= alpha <= 1

alpha is a ratio (or a percentage) and  of how much importance the model will allocate to the most recent observation compared to the importance of demand history. The one-step-ahead forecast for time  T+1 is a weighted average of all of the observations in the series  y1,…,yT

In [None]:
market_cap['market cap'].ewm(alpha=0.7, adjust=False, min_periods=3).mean()

In [None]:
market_cap['esm_window_3_7'] = market_cap['market cap'].ewm(alpha=0.7, adjust=False, min_periods=3).mean()
market_cap

In [None]:
market_cap[['esm_window_3_7']].plot(color=colors, linewidth=2, figsize=(20,6))

In [None]:
market_cap[['market cap','esm_window_3_7']].plot(color=colors, linewidth=2, figsize=(20,6))

# Trading Strategy based on Moving Average

There are quite a few very popular strategies that Traders regularly executes based on Moving Averages. Lets checkout couple of them.

First note the fact  that a moving average timeseries (for both SMA or EMA) lags the actual price behaviour. And also the assumption that when a change in the long term behaviour of the asset occurs, the actual price timeseries will react faster than the EMA one. Therefore, we will consider the crossing of the two as potential trading signals.

- When the price of an asset crosses the EMA timeseries of the same from below, we will close any existing short position and go long (buy) one unit of the asset.

- And when the price crosses the EMA timeseries from above, we will close any existing long position and go short (sell) one unit of the asset.

For some more of these strategies have a look at [this article](https://blackwellglobal.com/3-simple-moving-average-strategies-for-day-trading/)