# CAPSTONE 3. Predicting Major Cryptocurrencies Prices
## Data Wrangling

In this notebook we will perform data wrangling for our project. We will:<br>
<ol>1. Retreive historical data for four major cryptocurrencies:<br>
    <ol><i>1.1. Bitcoin (<b>BTC</b>)<br>
        1.2. Ethereum (<b>ETH</b>)<br>
        1.3. Binance Coin (<b>BNB</b>)</i><br>
    </ol>
    2. Organize it and make sure it's well defined and ready for the next step - Exploratory Data Analysis
</ol>

In [1]:
#importing all the necessary modules and libraries
import pandas as pd
import os
import glob
from functools import reduce
import datetime as dt
import matplotlib.pyplot as plt

First, let's read all the data we downloaded from YahooFinance.

In [2]:
#creating one dataframe for each token
df_BTC = pd.read_csv('../datasets/btc-usd-max.csv', parse_dates=True)
df_ETH = pd.read_csv('../datasets/eth-usd-max.csv', parse_dates=True)
df_BNB = pd.read_csv('../datasets/bnb-usd-max.csv', parse_dates=True)

In [3]:
dfs = [df_BTC, df_ETH, df_BNB]
coins = ['BTC', 'ETH', 'BNB']

In [4]:
for df in dfs:
    df.rename({'snapped_at':'Date', 'price':'Price', 'market_cap':'Market_Cap', 'total_volume':'Volume'}, axis=1, inplace=True)
    df.sort_values(by='Date', ascending=True)

In [5]:
df_BTC.head(3)

Unnamed: 0,Date,Price,Market_Cap,Volume
0,2013-04-28 00:00:00 UTC,135.3,1500518000.0,0.0
1,2013-04-29 00:00:00 UTC,141.96,1575032000.0,0.0
2,2013-04-30 00:00:00 UTC,135.3,1501657000.0,0.0


Now let's add the token column to each dataframe.

In [6]:
for df, coin in zip(dfs, coins):
    df['Coin'] = coin

In [7]:
for df in dfs:
    print(df['Coin'][:1])

0    BTC
Name: Coin, dtype: object
0    ETH
Name: Coin, dtype: object
0    BNB
Name: Coin, dtype: object


In [8]:
df_BTC.columns

Index(['Date', 'Price', 'Market_Cap', 'Volume', 'Coin'], dtype='object')

Let's insert 'Coin' column after the 'Date' column.

In [9]:
for df in dfs:
    for coin in coins:
        col = df.pop("Coin")
        df.insert(1, "Coin", col)

In [10]:
for df in dfs:
    print(df.columns)

Index(['Date', 'Coin', 'Price', 'Market_Cap', 'Volume'], dtype='object')
Index(['Date', 'Coin', 'Price', 'Market_Cap', 'Volume'], dtype='object')
Index(['Date', 'Coin', 'Price', 'Market_Cap', 'Volume'], dtype='object')


Great. All columns are in right spots. Now let's concatenate out dataframes.

In [11]:
df = pd.concat([df_BTC, df_ETH, df_BNB])

In [12]:
df.head()

Unnamed: 0,Date,Coin,Price,Market_Cap,Volume
0,2013-04-28 00:00:00 UTC,BTC,135.3,1500518000.0,0.0
1,2013-04-29 00:00:00 UTC,BTC,141.96,1575032000.0,0.0
2,2013-04-30 00:00:00 UTC,BTC,135.3,1501657000.0,0.0
3,2013-05-01 00:00:00 UTC,BTC,117.0,1298952000.0,0.0
4,2013-05-02 00:00:00 UTC,BTC,103.43,1148668000.0,0.0


In [13]:
#looking how many observations and features we have
df.shape

(6592, 5)

We have 6592 observations and 5 features. Let's check if we have any missing data.

In [14]:
df.isna().sum()

Date          0
Coin          0
Price         0
Market_Cap    2
Volume        0
dtype: int64

We don't have a lot of missing values so we wil just drop them.

In [15]:
df.dropna(axis=0, inplace=True)
df.isnull().any()

Date          False
Coin          False
Price         False
Market_Cap    False
Volume        False
dtype: bool

Great. No more missing values. Let's take a look at our data shape once again.

In [16]:
df.shape

(6590, 5)

We only dropped 2 observatios. Now let's check if we have any duplicates.

In [17]:
df.duplicated().any()

False

No duplicates. Our data is ready for the next strep - Exploratory Data Analysis.

In [18]:
#saving the data
datapath = 'D://Prog/SDST/My Projects/Capstone3/DW'
if not os.path.exists(datapath):
    os.mkdir(datapath)
datapath_DW = os.path.join(datapath, 'Data_for_EDA.csv')
if not os.path.exists(datapath_DW):
    df.to_csv(datapath_DW, index=False)