# CAPSTONE 3. Predicting Next Cryptocurrency Market Cycle Peak
## Data Wrangling

In this notebook we will perform data wrangling for our project. We will:<br>
<ol>1. Retreive historical data for Bitcoin and nine other major cryptocurrencies:<br>
    <ol><i>1.1. Bitcoin (<b>BTC</b>)<br>
        1.2. Ethereum (<b>ETH</b>)<br>
        1.3. XPR (<b>XRP</b>)<br>
        1.4. Cardano (<b>ADA</b>)<br>
        1.5. Litecoin (<b>LTC</b>)<br>
        1.6. Bitcoin Cash (<b>BCH</b>)<br>
        1.7. Binance Coin (<b>BNB</b>)<br>
        1.8. Stellar (<b>XLM</b>)<br>
        1.9. EOS (<b>EOS</b>)<br>
        1.10 Tezos (<b>XTZ</b>)<br></i>
    </ol>
    2. Organize it and make sure it's well defined and ready for the next step - Exploratory Data Analysis
</ol>

In [13]:
#importing all the necessary modules and libraries
import pandas as pd
import os
import glob

In [23]:
#reading the dataset into dataframe and sotring data by date
df = pd.read_csv('../datasets/consolidated_coin_data.csv', parse_dates=True).sort_values(by='Date')

In [24]:
#checking what the df looks like
df.head()

Unnamed: 0,Currency,Date,Open,High,Low,Close,Volume,Market Cap
0,tezos,"Dec 04, 2019",1.29,1.32,1.25,1.25,46048752,824588509
1,tezos,"Dec 03, 2019",1.24,1.32,1.21,1.29,41462224,853213342
2,tezos,"Dec 02, 2019",1.25,1.26,1.2,1.24,27574097,817872179
3,tezos,"Dec 01, 2019",1.33,1.34,1.25,1.25,24127567,828296390
4,tezos,"Nov 30, 2019",1.31,1.37,1.31,1.33,28706667,879181680


We parsed dates and made them an index column for our dataframe.

In [25]:
#overall look at the data
df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28944 entries, 0 to 28943
Data columns (total 8 columns):
 #   Column      Non-Null Count  Dtype 
---  ------      --------------  ----- 
 0   Currency    28944 non-null  object
 1   Date        28944 non-null  object
 2   Open        28944 non-null  object
 3   High        28944 non-null  object
 4   Low         28944 non-null  object
 5   Close       28944 non-null  object
 6   Volume      28944 non-null  object
 7   Market Cap  28944 non-null  object
dtypes: object(8)
memory usage: 1.8+ MB


We can see that even though the dataset mostly consists of the numbers, for some reason they are presented as 'objects'. We will need to convert them into floats later. Our dates are Datetime objects.

In [17]:
#looking how many observations and features we have
df.shape

(28944, 8)

Now let's see if we have any missing data.

In [18]:
df.isnull().values.any()

False

We don't have any missing values in the dataframe, which is great. Now let's find out if we have duplicated observations.

In [19]:
df.duplicated().values.any()

False

No duplicates. The dataset looks very well-organized so far.

In [26]:
datapath = 'D://Tutorials/SDST/My Projects/Capstone3/DW'
if not os.path.exists(datapath):
    os.mkdir(datapath)
datapath_DW = os.path.join(datapath, 'Data_Wrangled.csv')
if not os.path.exists(datapath_DW):
    df.to_csv(datapath_DW, index=False)