**This is the final project in Data Analysis.**

Goals:
1. Analyze the cryptocurrency market in the selected time period
2. Try to predict the results of exchange rates (without focusing on events)
3. Analyze current profitability and risk
4. Prepare forecasted financial statements
5. Analyze the share of cryptocurrencies in the economy

Description of variables

* slug - unique name of cryptocurrency (text)
* symbol - unique short name (text)
* name - name of cryptocurrency (text)
* date - dates (categorical)
* ranknow - market entry (ordinal)
* open - starting bid price (numerical)
* high - highest bid price (numerical)
* low - lowest bid price (numerical)
* close - closing bid price (numerical)
* volume - number of transactions (quantitative)
* market - market capitalization (numerical)
* close_ration - difference between open and close price (numerical)
* spread - difference between the lowest and the highest price (numerical)

In [1]:
# This Python 3 environment comes with many helpful analytics libraries installed
# It is defined by the kaggle/python docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load in 

import datetime as dt
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)

import plotly.express as px
from plotly import tools
import plotly.offline as py
py.init_notebook_mode(connected=True)
import plotly.graph_objs as go


import xgboost as xgb

from sklearn.linear_model import LinearRegression as LinReg
from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import r2_score, mean_absolute_error, mean_squared_error

# Input data files are available in the "../input/" directory.
# For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory

import os
for dirname, _, filenames in os.walk('/kaggle/input'):
    for filename in filenames:
        print(os.path.join(dirname, filename))

# Any results you write to the current directory are saved as output.

Read the data and drop the symbols

In [2]:
df = pd.read_csv("crypto-markets.csv")

In [3]:
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 942297 entries, 0 to 942296
Data columns (total 13 columns):
 #   Column       Non-Null Count   Dtype  
---  ------       --------------   -----  
 0   slug         942297 non-null  object 
 1   symbol       942297 non-null  object 
 2   name         942297 non-null  object 
 3   date         942297 non-null  object 
 4   ranknow      942297 non-null  int64  
 5   open         942297 non-null  float64
 6   high         942297 non-null  float64
 7   low          942297 non-null  float64
 8   close        942297 non-null  float64
 9   volume       942297 non-null  float64
 10  market       942297 non-null  float64
 11  close_ratio  942297 non-null  float64
 12  spread       942297 non-null  float64
dtypes: float64(8), int64(1), object(4)
memory usage: 93.5+ MB


In [4]:
df.describe()

Unnamed: 0,ranknow,open,high,low,close,volume,market,close_ratio,spread
count,942297.0,942297.0,942297.0,942297.0,942297.0,942297.0,942297.0,942297.0,942297.0
mean,1000.170608,348.3522,408.593,296.2526,346.1018,8720383.0,172506000.0,0.459499,112.34
std,587.575283,13184.36,16163.86,10929.31,13098.22,183980200.0,3575590000.0,0.32616,6783.713
min,1.0,2.5e-09,3.2e-09,2.5e-10,2e-10,0.0,0.0,-1.0,0.0
25%,465.0,0.002321,0.002628,0.002044,0.002314,175.0,29581.0,0.1629,0.0
50%,1072.0,0.023983,0.026802,0.021437,0.023892,4278.0,522796.0,0.4324,0.0
75%,1484.0,0.22686,0.250894,0.204391,0.225934,119090.0,6874647.0,0.7458,0.03
max,2072.0,2298390.0,2926100.0,2030590.0,2300740.0,23840900000.0,326502500000.0,1.0,1770563.0


In [5]:
dfnum = df.drop(['slug', 'symbol', 'name', 'date'], axis=1)

In [6]:
dfnum.mean()  # Mean value

ranknow        1.000171e+03
open           3.483522e+02
high           4.085930e+02
low            2.962526e+02
close          3.461018e+02
volume         8.720383e+06
market         1.725060e+08
close_ratio    4.594995e-01
spread         1.123400e+02
dtype: float64

In [7]:
df_range = pd.DataFrame(dfnum.max() - dfnum.min()).T
df_range

Unnamed: 0,ranknow,open,high,low,close,volume,market,close_ratio,spread
0,2071.0,2298390.0,2926100.0,2030590.0,2300740.0,23840900000.0,326502500000.0,2.0,1770563.0


In [8]:
dfnum.std()  # Standard deviation

ranknow        5.875753e+02
open           1.318436e+04
high           1.616386e+04
low            1.092931e+04
close          1.309822e+04
volume         1.839802e+08
market         3.575590e+09
close_ratio    3.261605e-01
spread         6.783713e+03
dtype: float64

In [9]:
dfnum.std() ** 2  # Dispersion is squared degree of standard deviation

ranknow        3.452447e+05
open           1.738273e+08
high           2.612703e+08
low            1.194499e+08
close          1.715634e+08
volume         3.384870e+16
market         1.278484e+19
close_ratio    1.063807e-01
spread         4.601876e+07
dtype: float64

In [10]:
df.isnull().sum()  # Checking NULLs

slug           0
symbol         0
name           0
date           0
ranknow        0
open           0
high           0
low            0
close          0
volume         0
market         0
close_ratio    0
spread         0
dtype: int64

In [11]:
df = df.drop(['symbol', 'slug'], axis=1)  # Drop useless columns

In [12]:
df.head(10)

Unnamed: 0,name,date,ranknow,open,high,low,close,volume,market,close_ratio,spread
0,Bitcoin,2013-04-28,1,135.3,135.98,132.1,134.21,0.0,1488567000.0,0.5438,3.88
1,Bitcoin,2013-04-29,1,134.44,147.49,134.0,144.54,0.0,1603769000.0,0.7813,13.49
2,Bitcoin,2013-04-30,1,144.0,146.93,134.05,139.0,0.0,1542813000.0,0.3843,12.88
3,Bitcoin,2013-05-01,1,139.0,139.89,107.72,116.99,0.0,1298955000.0,0.2882,32.17
4,Bitcoin,2013-05-02,1,116.38,125.6,92.28,105.21,0.0,1168517000.0,0.3881,33.32
5,Bitcoin,2013-05-03,1,106.25,108.13,79.1,97.75,0.0,1085995000.0,0.6424,29.03
6,Bitcoin,2013-05-04,1,98.1,115.0,92.5,112.5,0.0,1250317000.0,0.8889,22.5
7,Bitcoin,2013-05-05,1,112.9,118.8,107.14,115.91,0.0,1288693000.0,0.7521,11.66
8,Bitcoin,2013-05-06,1,115.98,124.66,106.64,112.3,0.0,1249023000.0,0.3141,18.02
9,Bitcoin,2013-05-07,1,112.25,113.44,97.7,111.5,0.0,1240594000.0,0.8767,15.74


Traders still like to analyze the concept of HLC (and OHLC|HL) [proof](https://www.mypivots.com/dictionary/definition/92/hlc-3)

In [None]:
df['hl_average'] = (df['high'] + df['low']) / 2
df['hlc_average'] = (df['high'] + df['low'] + df['close']) / 3
df['ohlc_average'] = (df['open'] + df['high'] + df['low'] + df['close']) / 4

Checking other currencies

In [None]:
top10 = df[(df['ranknow'] >= 1) & (df['ranknow'] <= 10)]
top10.name.unique()

*Volume* - All trades buys and sells that were made during that time (for example 24 hours like coinmarketcap does by default).

*Circulating supply* - number of coins mined and existing right now.

*Marketcap* = circulating supply multiply by price of coin.

In [None]:
fig = px.pie(top10, values='volume', names='name', title='Cryptocurrencies Top-10 by Transaction Volume')
fig.show()

In [None]:
fig = px.pie(top10, values='market', names='name', title='Cryptocurrencies Top-10 by Market capitalization')
fig.show()

In [None]:
fig = tools.make_subplots(subplot_titles=('Time'))
for name in top10.name.unique():
    currency = top10[top10['name'] == name]
    trace = go.Scatter(x=currency['date'], y=currency['ohlc_average'], name=name)
    fig.append_trace(trace, 1, 1)
    
fig['layout'].update(title='Top-10 Cryptocurrencies Comparison')
fig['layout']['yaxis1'].update(title='USD')
fig.show()

Adding minor cryptocurrencies that not affect too much on the market

In [None]:
top10minorCurrencies = df[(df['ranknow'] >= 11) & (df['ranknow'] <= 21)]

top10minorCurrencies.name.unique()

In [None]:
fig = px.pie(top10minorCurrencies, values='volume', names='name', title='Minor Cryptocurrencies by Transaction Volume')
fig.show()

In [None]:
fig = px.pie(top10minorCurrencies, values='market', names='name', title='Minor Cryptocurrencies by Market capitalization')
fig.show()

In [None]:
top10loserCoins = df[(df['ranknow'] >= max(df['ranknow']) - 10) & (df['ranknow'] <= max(df['ranknow']))]

top10loserCoins.name.unique()

In [None]:
fig = px.pie(top10loserCoins, values='volume', names='name', title='Loser Cryptocurrencies by Transaction Volume')
fig.show()

In [None]:
fig = px.pie(top10loserCoins, values='market', names='name', title='Loser Cryptocurrencies by Market capitalization')
fig.show()