<h1>Get Binance Data!</h1>
<p>To get historical Binance cryptocurrency price data using Python's requests library, you can access Binance's public API. By utilizing the Klines (candlestick) endpoint, you can retrieve historical data such as open, close, high, low prices, and volume for specific trading pairs over different time intervals.</p>

In [1]:
import pandas as pd
import numpy as np
import os
from pathlib import Path
import requests
import datetime as dt
import threading

<h3>The 'crypto_data' class!</h3>
<ol>
    <li>
        <h3>The '__init__' function:</h3>
        Initialize a few important parameters.
        <ul>
            <li>'self.pairs': Store a list of all pairs that have been passed in.</li>
            <li>'self.thread': Number of thread that used for multithreading.</li>
            <li>'self.start_date','self.end_date' : Denote the start and end date of data.</li>
            <li>'self.interval' : The timeframe of candle(default : 1m).</li>
            <li>'self.container' : To store data of n time periods divided by the 'date_processing' function.</li>
        </ul>
    </li>
    <li>
        <h3>The 'split' function:</h3>
        This function splits a list a into n nearly equal parts.
        <ul>
            <li>Get the div and mod when divide the length of a by n.</li>
            <li>The previous n-1 parts will have the same length,the nth parts will have length 'k'+'m'.</li>
        </ul>
    </li>
    <li>
        <h3>The 'date_processing' function:</h3>
        Divide the time for 'self.start_date' to 'self.end_date' to smaller group.
        <ul>
            <li>Get a list of all dates from 'self.start_date' to 'self.end_date'.</li>
            <li>Divide the list into n groups.</li>
            <li>Return the head and tail of each groups.</li>
        </ul>
    </li>
    <li>
        <h3>The 'get_history_bars' function:</h3>
        Retrieves historical Binance trading data for a specific pair within a date range and return it as a DataFrame.
        <ul>
            <li>Get the raw data that returned as json text.</li>
            <li>Convert the json data to DataFrame and check if it is empty ?</li>
            <li>Takes the first 6 columns, gives the new column a name, casts the columns, adds some new columns, and then returns the new dataframe.</li>
        </ul>
    </li>
    <li>
        <h3>The 'get_binance_kline_price' function:</h3>
        Because the Binance exchange only allows a maximum of 1000 rows per request.This function repeatedly fetches historical Binance data in chunks, appending each DataFrame to a list and concatenating the results.
        <ul>
            <li>Get data in the first loop.</li>
            <li>If 'new_df' is None (mean that we reached the 'end_date'),then break the while loop.</li>
            <li>Ortherwise,append to 'df_list' variable,take the date on the last row of 'new_df' then plus 1 sec and start again.</li>
            <li>After break the while loop,connect all the dataframe store on 'df_list' variable and store into self.container.</li>
        </ul>
    </li>
    <li>
        <h3>The 'get_symbol_data' function:</h3>
        Retrieves Binance Kline price data using multithreading, concatenates the results, removes duplicates, and returns sorted data.
        <ul>
            <li>Create n group of datetime base on 'self.thread' using 'date_processing' function.</li>
            <li>For each group of dates, get the historical data of the trading pair using the 'get_binance_kline_price' function</li>
            <li>Concate the 'self.containter' ,remove duplicates,set 'self.container' to empty list (for storing new pair), and sort by timestamp.</li>
        </ul>
    </li>
    <li>
        <h3>The 'filling' function:</h3>
        This function sets the DataFrame index to the 'Date' column, converts it to a datetime format, and backfills missing 1-minute intervals.
    </li>
    <li>
        <h3>The 'get_data' function:</h3>
        This function retrieves and processes data for specified trading pairs, then saves the filled data as CSV files in directories.
        <ul>
            <li>Check if the 'data' folder exist or not.</li>
            <li>For each pair,create folder,get data and store data with csv format.</li>
        </ul>
    </li>
</ol>

In [2]:
class crypto_data():
    def __init__(self,pairs,start_date,end_date,interval='1m',thread=20):
        self.pairs=pairs
        self.thread=thread
        self.start_date=start_date
        self.end_date=end_date
        self.interval=interval
        self.container=[]

    def split(self,a, n):
        k, m = divmod(len(a), n)
        return [a[i*k+min(i, m):(i+1)*k+min(i+1, m)] for i in range(n)]
    
    def date_processing(self):
        date=pd.date_range(self.start_date,self.end_date,freq='D').to_pydatetime()
        date=self.split(date,self.thread)
        for i in date:i[-1]+=dt.timedelta(1)
        date=list(map(lambda x:[x[0],x[-1]],date))
        return date

    def get_history_bars(self,pair, start_date,end_date):

        start_date=int(start_date.timestamp()*1000)
        end_date=int(end_date.timestamp()*1000)

        url = f"https://data-api.binance.vision/api/v3/klines?symbol={pair}&interval={self.interval}&startTime={start_date}&endTime={end_date}&limit=1000"
        df=pd.DataFrame(requests.get(url).json())
        if (len(df.index)==0):
            return None

        df=df.iloc[:, 0:6]
        df.columns = ['Timestamp', 'Open', 'High', 'Low', 'Close', 'Volume']
        df['Date'] = [dt.datetime.strftime(dt.datetime.fromtimestamp(x/1000.0),'%Y/%m/%d %H:%M:%S') for x in df['Timestamp']]
        df['Open']=df['Open'].astype(float)
        df['High']=df['High'].astype(float)
        df['Low']=df['Low'].astype(float)
        df['Close']=df['Close'].astype(float)
        df['Volume']=df['Volume'].astype(float)
        df['Timestamp']=df['Timestamp']/1000
        df['Timeframe']=self.interval
        df=df[['Date','Timestamp','Timeframe','Open','High','Low','Close','Volume']]

        return df
    def get_binance_kline_price(self,pair,start_time,end_time):
        df_list = []
        last_datetime =start_time

        while True:
            new_df = self.get_history_bars(pair,last_datetime,end_time)
            if new_df is None:
                break
            df_list.append(new_df)
            last_datetime = dt.datetime.fromtimestamp(max(new_df['Timestamp'])) + dt.timedelta(0, 1)
            # print(last_datetime)
        df = pd.concat(df_list)
        self.container.append(df)

    def get_symbol_data(self,pair):
        th_list=[]
        date=self.date_processing()

        for i in range(self.thread):
            m=threading.Thread(target=self.get_binance_kline_price,args=(pair,date[i][0],date[i][1]))
            m.start()
            th_list.append(m)
        for th in th_list:th.join()
        data=pd.concat(self.container)
        data.drop_duplicates(inplace=True)
        data.sort_values(['Timestamp'],inplace=True)
        self.container=[]
        return data

    def filling(self,df):
        df=df.set_index('Date')
        df.index=pd.to_datetime(df.index)
        df=df.asfreq('1min',method='bfill')
        # df.reset_index(inplace=True)
        return df
    
    def get_data(self):
        if not os.path.exists(os.path.join(Path().cwd(),f'data')):
            os.mkdir(os.path.join(Path().cwd(),f'data'))
        for pair in self.pairs:
            print(pair)
            raw_data=self.get_symbol_data(pair)
            data=self.filling(raw_data)
            if not os.path.exists(os.path.join(Path().cwd(),f'data/{pair}')):
                os.mkdir(os.path.join(Path().cwd(),f'data/{pair}'))
            data.to_csv(os.path.join(Path().cwd(),f'data/{pair}/{pair}_1m.csv'))

In [3]:
start_date='2017/11/15 00:00:00' #FROM
end_date= dt.datetime.strftime(dt.datetime.now(),'%Y/%m/%d %H:%M:%S') # NOW
pairs=['BTCUSDT','ETHUSDT','BNBUSDT'] # PAIR LIST

In [4]:
crypto=crypto_data(pairs=pairs,start_date=start_date,end_date=end_date)

In [5]:
crypto.get_data()

BTCUSDT
ETHUSDT
BNBUSDT
