# Cryptocurrency Data Extraction from Binance
- by Chee-Foong
- on May 2020

Adapted from this post by **Peter Nistrup**.  Thank you for sharing.

https://medium.com/swlh/retrieving-full-historical-data-for-every-cryptocurrency-on-binance-bitmex-using-the-python-apis-27b47fd8137f

## Install dependencies

In [1]:
# !pip install python-binance

Collecting python-binance
  Downloading python_binance-0.7.5-py2.py3-none-any.whl (29 kB)
Collecting cryptography
  Downloading cryptography-2.9.2-cp35-abi3-manylinux2010_x86_64.whl (2.7 MB)
[K     |████████████████████████████████| 2.7 MB 984 kB/s eta 0:00:01     |████▌                           | 378 kB 984 kB/s eta 0:00:03     |██████                          | 512 kB 984 kB/s eta 0:00:03
Collecting pyOpenSSL
  Downloading pyOpenSSL-19.1.0-py2.py3-none-any.whl (53 kB)
[K     |████████████████████████████████| 53 kB 4.1 MB/s  eta 0:00:01
Collecting dateparser
  Downloading dateparser-0.7.4-py2.py3-none-any.whl (353 kB)
[K     |████████████████████████████████| 353 kB 5.6 MB/s eta 0:00:01
Collecting autobahn
  Downloading autobahn-20.4.3-py2.py3-none-any.whl (1.1 MB)
[K     |████████████████████████████████| 1.1 MB 7.5 MB/s eta 0:00:01
[?25hCollecting service-identity
  Downloading service_identity-18.1.0-py2.py3-none-any.whl (11 kB)
Collecting Twisted
  Downloading Twisted-20.3.

## Loading the libraries

In [4]:
# IMPORTS
import pandas as pd
import numpy as np

import time
import math
import os.path

from tqdm import tqdm_notebook #(Optional, used for progress-bars)
from datetime import timedelta, datetime
from dateutil import parser


Register a Binance API key-secret pair to access the data and update the key-secret pair in the json file: **../binance/api.json**.

In [5]:
import json

with open('../binance/api.json', 'r') as f:
    api = json.load(f)

Create a data folder to store all raw csv files downloaded from Binance.  These raw files are important to determine what data is already downloaded and hence need to download from Binance again.  Downloading full data from Binance may take a long time.

Set the path name for the data folder in **data_folder**

In [6]:
from binance.client import Client

### API
binance_api_key = api['key']       #Enter your own API-key here
binance_api_secret = api['secret'] #Enter your own API-secret here

### CONSTANTS
binsizes = {"1m": 1, "5m": 5, "1h": 60, "1d": 1440}
binance_client = Client(api_key=binance_api_key, api_secret=binance_api_secret)
data_folder = '../data/'

Functions adapted from **Peter Nistrup**'s write up

In [7]:
### FUNCTIONS
def minutes_of_new_data(symbol, kline_size, data, source):
    if len(data) > 0:  
        old = parser.parse(data["timestamp"].iloc[-1])
    elif source == "binance": 
        old = datetime.strptime('1 Jan 2017', '%d %b %Y')

    if source == "binance": 
        new = pd.to_datetime(binance_client.get_klines(symbol=symbol, interval=kline_size)[-1][0], unit='ms')

    return old, new


def get_all_binance(symbol, kline_size, save = False):
    filename = data_folder + '%s-%s-data.csv' % (symbol, kline_size)
    
    if os.path.isfile(filename): 
        data_df = pd.read_csv(filename)
    else: 
        data_df = pd.DataFrame()
        
    oldest_point, newest_point = minutes_of_new_data(symbol, kline_size, data_df, source = "binance")
    delta_min = (newest_point - oldest_point).total_seconds()/60
    available_data = math.ceil(delta_min/binsizes[kline_size])
    
    if oldest_point == datetime.strptime('1 Jan 2017', '%d %b %Y'): 
        print('Downloading all available %s data for %s. Be patient..!' % 
              (kline_size, symbol))
    else: 
        print('Downloading %d minutes of new data available for %s, i.e. %d instances of %s data.' % 
              (delta_min, symbol, available_data, kline_size))
        
    klines = binance_client.get_historical_klines(symbol, kline_size, 
                                                  oldest_point.strftime("%d %b %Y %H:%M:%S"), 
                                                  newest_point.strftime("%d %b %Y %H:%M:%S"))
    data = pd.DataFrame(klines, columns = ['timestamp', 'open', 'high', 'low', 'close', 
                                           'volume', 'close_time', 'quote_av', 'trades', 
                                           'tb_base_av', 'tb_quote_av', 'ignore' ])
    data['timestamp'] = pd.to_datetime(data['timestamp'], unit='ms')
    
    if len(data_df) > 0:
        temp_df = pd.DataFrame(data)
        data_df = data_df.append(temp_df)
    else: 
        data_df = data
        
    data_df.set_index('timestamp', inplace=True)
    
    if save: 
        data_df.to_csv(filename)
        
    print('All caught up..!')
    return data_df

## Get list of coins

Get a cryptocurrency symbols based in USD

In [8]:
info = binance_client.get_exchange_info()

symbols = info['symbols']
coins = []
others = []

for i, symbol in enumerate(symbols):
    s = symbol['symbol']
    if ('USDT' in s) and (len(s) == 7) :
#         print('{} - {}'.format(i, s))
        coins.append(s)
    elif ('USDT' in s):
        others.append(s)

In [9]:
# others

In [10]:
from tqdm import tnrange, notebook

for symbol in notebook.tqdm(coins):
    try:
        get_all_binance(symbol, '1m', save = True)
    except:
        print('Skipping {}...'.format(symbol))
        pass

HBox(children=(FloatProgress(value=0.0, max=67.0), HTML(value='')))

Downloading 573 minutes of new data available for BTCUSDT, i.e. 573 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for ETHUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for BNBUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 0 minutes of new data available for BCCUSDT, i.e. 0 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for NEOUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for LTCUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for ADAUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for XRPUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data available for EOSUSDT, i.e. 572 instances of 1m data.
All caught up..!
Downloading 572 minutes of new data avail

In my analysis, I am only interested in CLOSE price.  Hence, I perform some data transformation and combined all the close prices into a single dataframe.

In [11]:
from functools import reduce

data_list = []

for symbol in notebook.tqdm(coins):
    data = pd.read_csv(data_folder + symbol + '-1m-data.csv', parse_dates=True, index_col='timestamp')
    data = pd.to_numeric(data.close).resample('1T').last()
    data_list.append(data)
    
prices = reduce(lambda left, right: pd.merge(left, right, 
                                             left_on='timestamp', right_on='timestamp', 
                                             how='outer'), data_list)

prices.columns = [symbols[0:3] for symbols in coins]

HBox(children=(FloatProgress(value=0.0, max=67.0), HTML(value='')))




Saving the prices in a csv file for future analysis

In [12]:
prices.to_csv('../data/prices_backup.csv')

Reloading prices for checking

In [13]:
prices = pd.read_csv('../data/prices_backup.csv', parse_dates=True, index_col='timestamp')
prices.head()

Unnamed: 0_level_0,BTC,ETH,BNB,BCC,NEO,LTC,ADA,XRP,EOS,XLM,...,BTS,LSK,BNT,LTO,MBL,USD.2,WTC,XZC,CHR,GXS
timestamp,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2017-08-17 04:00:00,4261.48,301.13,,,,,,,,,...,,,,,,,,,,
2017-08-17 04:01:00,4261.48,301.13,,,,,,,,,...,,,,,,,,,,
2017-08-17 04:02:00,4280.56,300.0,,,,,,,,,...,,,,,,,,,,
2017-08-17 04:03:00,4261.48,300.0,,,,,,,,,...,,,,,,,,,,
2017-08-17 04:04:00,4261.48,301.13,,,,,,,,,...,,,,,,,,,,


## Closing Remarks

The functions written by Peter Nistrup made downloading prices from Binance a breeze.  