# Scrapping Crypto Historical Data

Today we are going to scrape Bitcoin historical data from [Coin Market Cap](https://coinmarketcap.com/), specifically from [here](https://coinmarketcap.com/currencies/bitcoin/historical-data/) for year 2013 to uptil recently.

First import all the libraries, we are going to use Beautiful soup from python to scrape the data.

In [1]:
import pandas as pd
import matplotlib.pyplot as plt
import cufflinks as cf
%matplotlib inline
import bs4
from bs4 import BeautifulSoup
import requests

Below we are going to paste the link of page that we are going to use for scraping. If we want to scrape different crypto we can also use different link from Coin Market Cap.

we are going to pull first row from the table to see if we get proper results.

In [2]:
page = requests.get("https://coinmarketcap.com/currencies/bitcoin/historical-data/?start=20130428&end=20191229")

In [3]:
soup = BeautifulSoup(page.content, 'html.parser')

when we write find_all the result we get is in list format, we are going to use indexing to get inside the list.

In [4]:
recent_date = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--sticky cmc-table__cell--left")[1].get_text()

In [5]:
recent_open = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--right")[0].get_text()

In [6]:
recent_high = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--right")[2].get_text()

In [7]:
recent_low = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--right")[3].get_text()

In [8]:
recent_close = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--right")[4].get_text()

In [9]:
recent_volume = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--right")[5].get_text()

In [10]:
recent_market_cap = soup.find_all('td', class_="cmc-table__cell cmc-table__cell--right")[6].get_text()

Below is the command that we are going to use to get the whole table.

In [11]:
row_data = soup.find_all('tr', class_="cmc-table-row")

Now, we have tested our sample results, lets create a loop to store Date, open, high, low, close, volume, Market cap.

In [33]:
Date = []
Open = []
High = []
Low = []
Close = []
Volume = []
Market_Cap = []
tr = soup.find_all('tr', class_="cmc-table-row")
for i in range(len(row_data)):
    #Date of the stock
    date = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--sticky cmc-table__cell--left")[0].get_text()
    Date.append(date)
    
    #Open Price
    open_ = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--right")[0].get_text()
    Open.append(open_)
    
    #High Price
    high = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--right")[1].get_text()
    High.append(high)
    
    #Low Price
    low = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--right")[2].get_text()
    Low.append(low)
    
    #Close Price
    close = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--right")[3].get_text()
    Close.append(close)
    
    #Total Volume durning the day
    volume = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--right")[4].get_text()
    Volume.append(volume)
    
    #Market Cap
    market = tr[i].find_all('td', class_="cmc-table__cell cmc-table__cell--right")[5].get_text()
    Market_Cap.append(market)
    


In [34]:
#Creating a dataframe 
df = pd.DataFrame({'Date':Date,
             'Open':Open,
             'High': High,
             'Low': Low,
             'Close': Close,
             'Volume': Volume,
             'Market Cap': Market_Cap})



In [35]:
#Lets Check the data type of all the columns if any column is empyty
df.info()
columns = ['Date', 'Open', 'High', 'Low', 'Close', 'Volume', 'Market Cap']
for col in columns:
    print(df[col].isnull().sum())

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2437 entries, 0 to 2436
Data columns (total 7 columns):
Date          2437 non-null object
Open          2437 non-null object
High          2437 non-null object
Low           2437 non-null object
Close         2437 non-null object
Volume        2437 non-null object
Market Cap    2437 non-null object
dtypes: object(7)
memory usage: 133.4+ KB
0
0
0
0
0
0
0


In [36]:
#Converting date string type to datetime type
from datetime import datetime
Date = []
for d in df['Date']:
    date = datetime.strptime(d, '%b %d, %Y')
    Date.append(date)
    
df['Date'] = Date    

In [37]:
#Removing all the commas from the dataframe
columns = ['Open', 'High', 'Low', 'Close', 'Volume', 'Market Cap']
for col in columns:
    df[col] = df[col].str.replace(",","").astype('float')
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Market Cap
0,2019-12-29,7317.65,7513.95,7279.87,7422.65,22445260000.0,134570800000.0
1,2019-12-28,7289.03,7399.04,7286.91,7317.99,21365670000.0,132659100000.0
2,2019-12-27,7238.14,7363.53,7189.93,7290.09,22777360000.0,132139500000.0
3,2019-12-26,7274.8,7388.3,7200.39,7238.97,22787010000.0,131200000000.0
4,2019-12-25,7325.76,7357.02,7220.99,7275.16,21559510000.0,131840600000.0


In [55]:
#plt.plot(df['Open','High','Low','Close','Volume','Market Cap'])
import chart_studio.plotly as py
from plotly.offline import plot
cf.go_offline()
df.iplot(y = ['Open','High','Low','Close'],x = ['Date'],kind = 'line')


# Saving Dataset

Lets save df to csv file so it can be used for further analysis

In [57]:
df.to_csv("Crypto_Historical_Data(Bitcoin)")
df.head()

Unnamed: 0,Date,Open,High,Low,Close,Volume,Market Cap
0,2019-12-29,7317.65,7513.95,7279.87,7422.65,22445260000.0,134570800000.0
1,2019-12-28,7289.03,7399.04,7286.91,7317.99,21365670000.0,132659100000.0
2,2019-12-27,7238.14,7363.53,7189.93,7290.09,22777360000.0,132139500000.0
3,2019-12-26,7274.8,7388.3,7200.39,7238.97,22787010000.0,131200000000.0
4,2019-12-25,7325.76,7357.02,7220.99,7275.16,21559510000.0,131840600000.0
