# CAPSTONE 3. Predicting Bitcoin Price
## Data Wrangling

In this notebook we will perform data wrangling for our project. We will:<br>
<ol>1. Retreive historical data for Bitcoin<br>
    2. Organize it and make sure it's well defined and ready for the next step - Exploratory Data Analysis
</ol>

In [1]:
#importing all the necessary modules and libraries
import pandas as pd
import os
import glob
from functools import reduce
import datetime as dt
import matplotlib.pyplot as plt

First, let's read all the data we downloaded from YahooFinance.

In [2]:
#creating one dataframe for each token
df_BTC = pd.read_csv('../datasets/btc-usd-max.csv', parse_dates=True)

In [3]:
df_BTC.rename({'snapped_at':'Date', 'price':'Price', 'market_cap':'Market_Cap', 'total_volume':'Volume'}, axis=1, inplace=True)
df_BTC.sort_values(by='Date', ascending=True);

In [4]:
df_BTC.head(3)

Unnamed: 0,Date,Price,Market_Cap,Volume
0,2013-04-28 00:00:00 UTC,135.3,1500518000.0,0.0
1,2013-04-29 00:00:00 UTC,141.96,1575032000.0,0.0
2,2013-04-30 00:00:00 UTC,135.3,1501657000.0,0.0


Now let's add the token column to each dataframe.

In [5]:
df_BTC['Coin'] = 'BTC'

In [6]:
df_BTC.columns

Index(['Date', 'Price', 'Market_Cap', 'Volume', 'Coin'], dtype='object')

Let's insert 'Coin' column after the 'Date' column.

In [7]:
col = df_BTC.pop('Coin')
df_BTC.insert(1, 'Coin', col)

In [8]:
print(df_BTC.columns)

Index(['Date', 'Coin', 'Price', 'Market_Cap', 'Volume'], dtype='object')


Great. All columns are in right spots. Now let's see at the dataframe's shape

In [9]:
#looking how many observations and features we have
df_BTC.shape

(3008, 5)

We have 3008 observations and 5 features. Let's check if we have any missing data.

In [10]:
df_BTC.isna().sum()

Date          0
Coin          0
Price         0
Market_Cap    1
Volume        0
dtype: int64

We only have 1, so we will just drop it.

In [11]:
df_BTC.dropna(axis=0, inplace=True)
df_BTC.isnull().any()

Date          False
Coin          False
Price         False
Market_Cap    False
Volume        False
dtype: bool

Great. No more missing values. Let's find out if we have duplicates.

In [12]:
df_BTC.duplicated().any()

False

No duplicates. Our data is ready for the next strep - Exploratory Data Analysis.

In [13]:
#saving the data
datapath = 'D://Prog/SDST/My Projects/Capstone3/DW'
if not os.path.exists(datapath):
    os.mkdir(datapath)
datapath_DW = os.path.join(datapath, 'Data_for_EDA.csv')
if not os.path.exists(datapath_DW):
    df_BTC.to_csv(datapath_DW, index=False)