<a href="https://www.kaggle.com/code/ainurrohmanbwx/bitcoin-price-analytics?scriptVersionId=142729056" target="_blank"><img align="left" alt="Kaggle" title="Open in Kaggle" src="https://kaggle.com/static/images/open-in-kaggle.svg"></a>

# Introduction

**Deep Dive into Bitcoin: A Five-Year Analysis Using ETL Methodology**

Bitcoin, as a revolutionary innovation in the world of digital finance, has captured global attention since its inception in 2009. In an effort to comprehend the journey and pricing dynamics of Bitcoin over the past five years, spanning from September 8, 2018, to September 8, 2023, we present an in-depth analysis. The chosen analytical methodology is ETL (Extract, Transform, Load) to unearth deeper insights into this cryptocurrency asset. We will explore several intriguing aspects, including examining the price trends of Bitcoin over time, employing candlestick analysis for detailed visual insights, analyzing close prices using a logarithmic scale, dissecting close price changes on a Y (yearly), M (monthly), and Q (quarterly) basis, and detailing the daily close price fluctuations. Through this analysis, we aim to gain a more profound understanding of how Bitcoin has evolved and how it may influence its future trajectory.

# Load data (Extract)

In [None]:
import pandas as pd

bitcoin = pd.read_csv("data/bitcoin.csv")

In [None]:
bitcoin.tail(5)

Features explanation:

- **date**: is a feature that contains the date when the bitcoin was recorded.
- **open**: is a feature for the bitcoin price at the time trading starts (first price) within a certain time period (for example: one trading day) and is the starting point in price formation during that period.
- **high**: is the highest price feature achieved by bitcoin during a specified time (for example, today). This feature reflects the highest point in price increases during the period.
- **low**: is the lowest price feature achieved by bitcoin during a certain period of time. This feature reflects the lowest point in price decline during the period.
- **close**: is a feature of the last price at the time of trading in that time period and is often used as an indicator of daily performance.
- **adj close**: is a feature similar to the "close" feature but has been adjusted to accommodate changes in company structure, such as dividends or stock splits.
- **volume**: is a feature of the number of bitcoins traded during a certain time period. This feature indicates how active the market was during that period. High trading volume can indicate high interest or market volatility.

In [None]:
bitcoin.shape

In [None]:
bitcoin.info()

In [None]:
bitcoin.describe().T

# Data preprocessing (Transform)

#### Is there an incorrect data type?

In [None]:
# convert data type "Date" from object to datetime

bitcoin["Date"] = bitcoin["Date"].astype("datetime64[ns]")

In [None]:
type(bitcoin["Date"])

In [None]:
# check again data type "Date"

bitcoin.dtypes

#### Are there any missing values?

In [None]:
# Check for missing values
missing_values = bitcoin.isnull().sum()

# Display columns with missing values and the count of missing values
missing_values = missing_values[missing_values > 0]

if not missing_values.empty:
    print("Columns with missing values:")
    for column, count in missing_values.items():
        print(f"{column}: {count} missing values")
else:
    print("There are no columns with missing value")

#### Is there any duplicate data?

In [None]:
if bitcoin.duplicated().any():
    print(f"There are as many as {bitcoin.duplicated().sum()} duplicate data.")
else:
    print("There are no duplicate data.")

# Let's analyze the data (Load)

In [None]:
# sort index from new to old

bitcoin.sort_index(ascending=False)

#### How does the bitcoin price trend move from time to time?

In [None]:
# reset index

bitcoin_new = bitcoin.sort_index(ascending=False).reset_index()

In [None]:
# removed the old index feature

bitcoin_new.drop('index', axis=1, inplace=True)

In [None]:
bitcoin_new.columns

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(20, 10))

colors = ['blue', 'green', 'red', 'purple']

for index, col in enumerate(['Open', 'High', 'Low', 'Close']):
    plt.subplot(2, 2, index + 1)
    plt.plot(bitcoin_new["Date"], bitcoin_new[col], color=colors[index])
    plt.xlabel('Date')
    plt.ylabel('Price (USD$)')
    plt.title(col)
    plt.legend([col])
    
plt.suptitle('Bitcoin Price Trends', fontsize=16)
plt.tight_layout()
plt.show()

**Result**

An analysis of Bitcoin price trends over the course of 5 years, based on close, open, high, and low data, reveals significant fluctuations.

#### How does the price of bitcoin move over time using candlesticks?

In [None]:
from plotly.offline import init_notebook_mode

init_notebook_mode(connected=True)

In [None]:
import plotly.graph_objs as go

bitcoin_sampel = bitcoin_new[0:50]

trace = go.Candlestick(x=bitcoin_sampel['Date'],
               high=bitcoin_sampel['High'],
               open=bitcoin_sampel['Open'],
               low=bitcoin_sampel['Low'],
               close=bitcoin_sampel['Close'])

In [None]:
layout = go.Layout(
    title='Bitcoin Price',
    xaxis=dict(
        title='Date',
        showgrid=True,
        gridcolor='lightgray',
        tickformat='%Y-%m-%d',
        showline=True
    ),
    yaxis=dict(
        title='Price',
        showgrid=True,
        gridcolor='lightgray',
        showline=True
    ),
    margin=dict(l=40, r=20, t=40, b=20),
    plot_bgcolor='white',
    hovermode='x', 
    showlegend=False
)

In [None]:
fig = go.Figure(data=[trace], layout=layout)

fig.update_layout(xaxis_rangeslider_visible=False)

fig.update_xaxes(
    rangebreaks=[dict(enabled=True, bounds=["sat", "sun"])]  # Menghilangkan akhir pekan pada sumbu x
)

fig.show()

**Result**

Bitcoin price trend analysis for 5 years using candlestick plots based on close, open, high and low shows significant variations in price changes. There are several factors that might be the cause.

#### How to analyze close price with a log scale?

Log is an abbreviation of the word "logarithm" in mathematics. Logarithm is a mathematical operation used to measure exponentiality or the ratio between two numbers. The logarithmic function measures the exponential required to convert one number to another number on a given basis.

There are lots of logarithmic functions in data analysis, in this analysis, we will apply logarithms to the logarithmic scale on the close price feature. The goal is to make it easier to read.

In [None]:
import matplotlib.pyplot as plt

plt.figure(figsize=(12, 6))
bitcoin_new['Close'].plot(color='blue', linewidth=1.5, label='Closing Price')
plt.title('Bitcoin Closing Price Chart')
plt.xlabel('Data Point')
plt.ylabel('Closing Price')
plt.grid(True, linestyle='--', alpha=0.6)
plt.legend(loc='upper right')
plt.tight_layout()
plt.show()

In [None]:
bitcoin_new

In [None]:
bitcoin_new.set_index('Date', inplace=True)

In [None]:
bitcoin_new

In [None]:
plt.figure(figsize=(20,6))

plt.plot(bitcoin_new.index, bitcoin_new['Close'], color='blue', linestyle='-', marker='o', markersize=5, label='Closing Price')
plt.xlabel('Date', fontsize=12)
plt.ylabel('Closing Price', fontsize=12)
plt.title('Bitcoin Closing Price Over Time', fontsize=14)
plt.grid(True, linestyle='--', alpha=0.6)
plt.gca().xaxis.set_major_formatter(plt.matplotlib.dates.DateFormatter('%Y-%m-%d'))
plt.legend()
plt.show()

In [None]:
import numpy as np

sma_30d = bitcoin_new['Close'].rolling(window=30).mean()
sma_30d_log = np.log1p(sma_30d)

plt.figure(figsize=(14, 6))
plt.suptitle('Bitcoin Price Analysis', fontsize=16)

# Subplot 1
plt.subplot(1, 2, 1)
bitcoin_new['Close'].plot(color='b', alpha=0.7, label='Original')
sma_30d.plot(color='r', label='30-Day SMA')
plt.title('No Scaling', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Price', fontsize=12)
plt.grid(True)
plt.legend()

# Subplot 2
plt.subplot(1, 2, 2)
np.log1p(bitcoin_new['Close']).plot(color='g', alpha=0.7, label='Original')
sma_30d_log.plot(color='m', label='Log 30-Day SMA')
plt.title('Log Scaling', fontsize=14)
plt.xlabel('Date', fontsize=12)
plt.ylabel('Log Price', fontsize=12)
plt.yscale('log')
plt.grid(True)
plt.legend()

plt.tight_layout(rect=[0, 0.03, 1, 0.95])
plt.show()

#### How to see annual, monthly and quartile close price trends along with daily price changes?

In [None]:
bitcoin_new.head()

In [None]:
# Resample the data
yearly_mean = bitcoin_new['Close'].resample('Y').mean()
quarterly_mean = bitcoin_new['Close'].resample('Q').mean()
monthly_mean = bitcoin_new['Close'].resample('M').mean()

# Create the figure and set its size
plt.figure(figsize=(12, 6))

# Plot the yearly data with a label
yearly_mean.plot(label='Yearly Mean', marker='o', linestyle='-', linewidth=2)

# Plot the quarterly data with a label
quarterly_mean.plot(label='Quarterly Mean', marker='s', linestyle='--', linewidth=2)

# Plot the monthly data with a label
monthly_mean.plot(label='Monthly Mean', marker='^', linestyle='-.', linewidth=2)

# Add a title and axis labels
plt.title('Average Bitcoin Closing Prices')
plt.xlabel('Date')
plt.ylabel('Closing Price')

# Add a legend
plt.legend()

# Display the plot
plt.grid(True)
plt.show()

In [None]:
bitcoin_new['Close']

In [None]:
bitcoin_new['Close_pct_change'] = bitcoin_new['Close'].pct_change()*100
bitcoin_new['Close_pct_change']

In [None]:
# Plot the data
plt.figure(figsize=(12, 6))  # Set the figure size
plt.plot(bitcoin_new['Close_pct_change'], color='blue', linestyle='-', marker='o', markersize=4, label='Pct Change')

# Add title and axis labels
plt.title('Enhanced Bitcoin Closing Price Percentage Change')
plt.xlabel('Date')
plt.ylabel('Percentage Change')

# Add a grid
plt.grid(True)

# Add a legend
plt.legend()

# Display the plot
plt.show()

In [None]:
import cufflinks as cf

cf.go_offline()

In [None]:
# Makes the plot more interactive

import plotly.express as px

# Plot the percentage change in Bitcoin's closing price
px.line(bitcoin_new, x=bitcoin_new.index, y='Close_pct_change', title='Bitcoin Closing Price Percentage Change Over Time')

**Result**

The analysis of Bitcoin's closing price trend over the course of 5 years has revealed significant fluctuations. On March 11, 2020, the highest price increase of 59% was recorded, while on February 7, 2021, the lowest decrease of -15% occurred.

The probable causes behind the highest price increase on March 11, 2020, are likely attributed to several factors, including increased interest and investment in cryptocurrencies, advancements in blockchain technology, and potentially the impact of global events such as the COVID-19 pandemic, which spurred investor interest in digital assets as an alternative investment.

On the other hand, the largest price decrease on February 7, 2021, could have been caused by various factors, including a correction in price following a significant prior uptrend, negative news or events in the crypto space, or massive selling actions by investors. It's important to note that the crypto market is known for its high volatility, making significant price fluctuations like these not uncommon.