# Fetch Bitcoin Data

We need more than 10.000 records

- Get the *hourly* changes for the last 3-5 years ?? 
- Maybe Better for the training to train with data from the last year - Get the *minutly* changes for the last 1-3 years ??

Coinbase recently deprecated the Coinbase Pro API, transitioning to their newer **Coinbase Advanced Trade API**. This updated API still provides similar functionality, including historical market data, but the endpoints and URL structure have changed.

Here's an updated example using the **Coinbase Advanced Trade API** for fetching historical price data. Note that the Advanced Trade API requires authentication (an API key and secret), unlike the public endpoints in the deprecated Pro API.

### Step 1: Generate an API Key
1. Log in to your [Coinbase account](https://www.coinbase.com/).
2. Go to **Settings** -> **API** to generate a new API key with the necessary permissions (view permissions should suffice for historical data).

### Step 2: Fetch Data Using the Advanced Trade API

The new Advanced Trade API endpoint for market data is structured differently. Here's how you can fetch historical data with Python:

```python
import requests
import pandas as pd
from datetime import datetime, timedelta
import time
import hmac
import hashlib
import base64

# Replace these with your API key and secret
API_KEY = "YOUR_API_KEY"
API_SECRET = "YOUR_API_SECRET"

def generate_coinbase_headers(request_path, method="GET", body=""):
    """
    Generates the headers required for authenticated Coinbase API requests.
    """
    timestamp = str(int(time.time()))
    message = timestamp + method + request_path + body
    hmac_key = base64.b64decode(API_SECRET)
    signature = hmac.new(hmac_key, message.encode(), hashlib.sha256).digest()
    signature_b64 = base64.b64encode(signature).decode()
    
    return {
        "CB-ACCESS-KEY": API_KEY,
        "CB-ACCESS-SIGN": signature_b64,
        "CB-ACCESS-TIMESTAMP": timestamp,
        "Content-Type": "application/json"
    }

def fetch_coinbase_data(start=None, end=None, granularity="HOUR"):
    """
    Fetches historical Bitcoin data using Coinbase's Advanced Trade API.
    Note: Coinbase's new Advanced API might not support full hourly historical data.
    """
    base_url = "https://api.coinbase.com/api/v3/brokerage/products/BTC-USD/candles"
    params = {
        "start": start,
        "end": end,
        "granularity": granularity
    }
    headers = generate_coinbase_headers("/api/v3/brokerage/products/BTC-USD/candles")
    response = requests.get(base_url, headers=headers, params=params)
    
    if response.status_code == 200:
        data = response.json()["candles"]
        # The API returns data in the format: [time, low, high, open, close, volume]
        df = pd.DataFrame(data, columns=["time", "low", "high", "open", "close", "volume"])
        df["time"] = pd.to_datetime(df["time"], unit='s')
        return df.sort_values("time")
    else:
        print("Error:", response.status_code, response.text)
        return None

# Example usage:
end_date = datetime.utcnow()
start_date = end_date - timedelta(days=1)  # Last 24 hours
# Convert dates to ISO format
end_date_str = end_date.isoformat() + "Z"
start_date_str = start_date.isoformat() + "Z"

# Fetch data
df = fetch_coinbase_data(start=start_date_str, end=end_date_str)
print(df)
```

### Notes
- **Headers**: Coinbase's Advanced Trade API requires specific headers (`CB-ACCESS-KEY`, `CB-ACCESS-SIGN`, and `CB-ACCESS-TIMESTAMP`) for authentication.
- **Granularity**: The new API uses text-based granularity (e.g., `"HOUR"`) instead of seconds.
- **Limits**: This new API has limits on the amount of data you can request in a single call, so you may need to paginate your requests.

Please ensure that you replace `YOUR_API_KEY` and `YOUR_API_SECRET` with your actual Coinbase API credentials. Let me know if you run into any issues with this setup!

In [1]:
import requests
import pandas as pd
from datetime import datetime, timedelta
import time
import hmac
import hashlib
import base64

# Replace these with your API key and secret
API_KEY = "YOUR_API_KEY"
API_SECRET = "YOUR_API_SECRET"

def generate_coinbase_headers(request_path, method="GET", body=""):
    """
    Generates the headers required for authenticated Coinbase API requests.
    """
    timestamp = str(int(time.time()))
    message = timestamp + method + request_path + body
    hmac_key = base64.b64decode(API_SECRET)
    signature = hmac.new(hmac_key, message.encode(), hashlib.sha256).digest()
    signature_b64 = base64.b64encode(signature).decode()
    
    return {
        "CB-ACCESS-KEY": API_KEY,
        "CB-ACCESS-SIGN": signature_b64,
        "CB-ACCESS-TIMESTAMP": timestamp,
        "Content-Type": "application/json"
    }

def fetch_coinbase_data(start=None, end=None, granularity="HOUR"):
    """
    Fetches historical Bitcoin data using Coinbase's Advanced Trade API.
    Note: Coinbase's new Advanced API might not support full hourly historical data.
    """
    base_url = "https://api.coinbase.com/api/v3/brokerage/products/BTC-USD/candles"
    params = {
        "start": start,
        "end": end,
        "granularity": granularity
    }
    headers = generate_coinbase_headers("/api/v3/brokerage/products/BTC-USD/candles")
    response = requests.get(base_url, headers=headers, params=params)
    
    if response.status_code == 200:
        data = response.json()["candles"]
        # The API returns data in the format: [time, low,


Error: 503 {"message":"Coinbase Pro API is deprecated."}
None


In [3]:
import requests
import pandas as pd
from datetime import datetime, timedelta
import time

def get_historical_data(start_date, end_date, granularity=3600):
    """
    Изтегля исторически данни от Coinbase API
    
    Parameters:
    start_date (str): Начална дата във формат 'YYYY-MM-DD'
    end_date (str): Крайна дата във формат 'YYYY-MM-DD'
    granularity (int): Интервал в секунди (3600 = 1 час)
    
    Returns:
    pandas.DataFrame: DataFrame с историческите данни
    """
    
    # Конвертиране на датите към datetime обекти
    start = datetime.strptime(start_date, '%Y-%m-%d')
    end = datetime.strptime(end_date, '%Y-%m-%d')
    
    # База URL за API заявките
    base_url = "https://api.exchange.coinbase.com"
    
    # Празен списък за съхранение на данните
    all_data = []
    
    # Итерация през периоди от по 300 свещи (API ограничение)
    current_start = start
    while current_start < end:
        # Изчисляване на край на текущия период
        current_end = min(current_start + timedelta(hours=300), end)
        
        # Създаване на URL за заявката
        endpoint = f"/products/BTC-USD/candles"
        params = {
            'start': current_start.isoformat(),
            'end': current_end.isoformat(),
            'granularity': granularity
        }
        
        # Изпращане на заявката
        response = requests.get(f"{base_url}{endpoint}", params=params)
        
        if response.status_code == 200:
            # Добавяне на данните към списъка
            data = response.json()
            all_data.extend(data)
            print(f"Изтеглени данни от {current_start} до {current_end}")
        else:
            print(f"Грешка при заявката: {response.status_code}")
        
        # Изчакване за да не надвишим лимита на API
        time.sleep(0.5)
        
        # Преместване към следващия период
        current_start = current_end

    # Създаване на DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    
    # Конвертиране на timestamp към datetime
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
    
    # Сортиране по време
    df = df.sort_values('timestamp')
    
    return df

# Пример за използване:
start_date = '2024-01-01'
end_date = '2024-02-01'

# Изтегляне на данните
btc_data = get_historical_data(start_date, end_date)

# Запазване в CSV файл
btc_data.to_csv('bitcoin_historical_data.csv', index=False)

Изтеглени данни от 2024-01-01 00:00:00 до 2024-01-13 12:00:00
Изтеглени данни от 2024-01-13 12:00:00 до 2024-01-26 00:00:00
Изтеглени данни от 2024-01-26 00:00:00 до 2024-02-01 00:00:00


In [6]:
import requests
import pandas as pd
from datetime import datetime, timedelta
import time

def get_historical_data_minutes(start_date, end_date, granularity=60):
    """
    Изтегля исторически данни от Coinbase API на минутен интервал
    
    Parameters:
    start_date (str): Начална дата във формат 'YYYY-MM-DD'
    end_date (str): Крайна дата във формат 'YYYY-MM-DD'
    granularity (int): Интервал в секунди (60 = 1 минута)
    
    Returns:
    pandas.DataFrame: DataFrame с историческите данни
    """
    
    # Конвертиране на датите към datetime обекти
    start = datetime.strptime(start_date, '%Y-%m-%d')
    end = datetime.strptime(end_date, '%Y-%m-%d')
    
    # База URL за API заявките
    base_url = "https://api.exchange.coinbase.com"
    
    # Празен списък за съхранение на данните
    all_data = []
    
    # При минутни данни, ще взимаме данни на 300-минутни интервали
    # за да спазим API ограниченията
    current_start = start
    
    while current_start < end:
        # За минутни данни, взимаме по-малки времеви прозорци
        current_end = min(current_start + timedelta(minutes=300), end)
        
        # Създаване на URL за заявката
        endpoint = f"/products/BTC-USD/candles"
        params = {
            'start': current_start.isoformat(),
            'end': current_end.isoformat(),
            'granularity': granularity
        }
        
        # Изпращане на заявката с повторни опити при грешка
        max_retries = 3
        retry_count = 0
        while retry_count < max_retries:
            try:
                response = requests.get(f"{base_url}{endpoint}", params=params)
                if response.status_code == 200:
                    data = response.json()
                    all_data.extend(data)
                    print(f"Изтеглени данни от {current_start} до {current_end}")
                    break
                elif response.status_code == 429:  # Too Many Requests
                    print("Достигнат лимит на API. Изчакване 30 секунди...")
                    time.sleep(30)
                else:
                    print(f"Грешка при заявката: {response.status_code}")
                    time.sleep(5)
            except Exception as e:
                print(f"Грешка: {e}")
                time.sleep(5)
            retry_count += 1
        
        # Изчакване между заявките за да не надвишим лимита
        time.sleep(1)
        
        # Преместване към следващия период
        current_start = current_end

    if not all_data:
        raise Exception("Не са получени данни от API!")

    # Създаване на DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    
    # Конвертиране на timestamp към datetime
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
    
    # Сортиране по време
    df = df.sort_values('timestamp')
    
    # Добавяне на допълнителни времеви колони за по-лесен анализ
    df['date'] = df['timestamp'].dt.date
    df['time'] = df['timestamp'].dt.time
    
    return df

def save_data_with_backup(df, filename='bitcoin_historical_data_sec.csv'):
    """
    Запазва данните с backup файл
    """
    # Запазване на основния файл
    df.to_csv(filename, index=False)
    
    # Създаване на backup файл с timestamp
    backup_filename = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{filename}"
    df.to_csv(backup_filename, index=False)
    
    print(f"Данните са запазени в {filename}")
    print(f"Backup е създаден в {backup_filename}")

# Пример за използване:
if __name__ == "__main__":
    start_date = '2024-01-01'
    end_date = '2024-02-01'  # За минутни данни е препоръчително да теглите по-малки периоди
    
    try:
        # Изтегляне на данните
        btc_data = get_historical_data_minutes(start_date, end_date)
        
        # Показване на информация за данните
        print("\nИнформация за изтеглените данни:")
        print(f"Брой редове: {len(btc_data)}")
        print(f"Времеви период: от {btc_data['timestamp'].min()} до {btc_data['timestamp'].max()}")
        
        # Запазване на данните
        save_data_with_backup(btc_data)
        
    except Exception as e:
        print(f"Възникна грешка: {e}")

Изтеглени данни от 2024-01-01 00:00:00 до 2024-01-01 05:00:00
Изтеглени данни от 2024-01-01 05:00:00 до 2024-01-01 10:00:00
Изтеглени данни от 2024-01-01 10:00:00 до 2024-01-01 15:00:00
Изтеглени данни от 2024-01-01 15:00:00 до 2024-01-01 20:00:00
Изтеглени данни от 2024-01-01 20:00:00 до 2024-01-02 01:00:00
Изтеглени данни от 2024-01-02 01:00:00 до 2024-01-02 06:00:00
Изтеглени данни от 2024-01-02 06:00:00 до 2024-01-02 11:00:00
Изтеглени данни от 2024-01-02 11:00:00 до 2024-01-02 16:00:00
Изтеглени данни от 2024-01-02 16:00:00 до 2024-01-02 21:00:00
Изтеглени данни от 2024-01-02 21:00:00 до 2024-01-03 02:00:00
Изтеглени данни от 2024-01-03 02:00:00 до 2024-01-03 07:00:00
Изтеглени данни от 2024-01-03 07:00:00 до 2024-01-03 12:00:00
Изтеглени данни от 2024-01-03 12:00:00 до 2024-01-03 17:00:00
Изтеглени данни от 2024-01-03 17:00:00 до 2024-01-03 22:00:00
Изтеглени данни от 2024-01-03 22:00:00 до 2024-01-04 03:00:00
Изтеглени данни от 2024-01-04 03:00:00 до 2024-01-04 08:00:00
Изтеглен

## Download historical Bitcoin price data from the Coinbase API, with minute-by-minute granularity

Create a script to download historical Bitcoin price data from the Coinbase API, with minute-by-minute granularity.

```python
import requests
import pandas as pd
from datetime import datetime, timedelta
import time

def get_historical_data_minutes(start_date, end_date, granularity=60):
    """
    Downloads historical data from Coinbase API at minute intervals
    
    Parameters:
    start_date (str): Start date in 'YYYY-MM-DD' format
    end_date (str): End date in 'YYYY-MM-DD' format
    granularity (int): Interval in seconds (60 = 1 minute)
    
    Returns:
    pandas.DataFrame: DataFrame with historical data
    """
    
    # Convert dates to datetime objects
    start = datetime.strptime(start_date, '%Y-%m-%d')
    end = datetime.strptime(end_date, '%Y-%m-%d')
    
    # Base URL for API requests
    base_url = "https://api.exchange.coinbase.com"
    
    # Empty list to store data
    all_data = []
    
    # For minute data, we'll get data in 300-minute intervals
    # to comply with API limitations
    current_start = start
    
    while current_start < end:
        # For minute data, we take smaller time windows
        current_end = min(current_start + timedelta(minutes=300), end)
        
        # Create URL for the request
        endpoint = f"/products/BTC-USD/candles"
        params = {
            'start': current_start.isoformat(),
            'end': current_end.isoformat(),
            'granularity': granularity
        }
        
        # Send request with retries on error
        max_retries = 3
        retry_count = 0
        while retry_count < max_retries:
            try:
                response = requests.get(f"{base_url}{endpoint}", params=params)
                if response.status_code == 200:
                    data = response.json()
                    all_data.extend(data)
                    print(f"Downloaded data from {current_start} to {current_end}")
                    break
                elif response.status_code == 429:  # Too Many Requests
                    print("API rate limit reached. Waiting 30 seconds...")
                    time.sleep(30)
                else:
                    print(f"Request error: {response.status_code}")
                    time.sleep(5)
            except Exception as e:
                print(f"Error: {e}")
                time.sleep(5)
            retry_count += 1
        
        # Wait between requests to avoid hitting rate limits
        time.sleep(1)
        
        # Move to next period
        current_start = current_end

    if not all_data:
        raise Exception("No data received from API!")

    # Create DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    
    # Convert timestamp to datetime
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
    
    # Sort by time
    df = df.sort_values('timestamp')
    
    # Add additional time columns for easier analysis
    df['date'] = df['timestamp'].dt.date
    df['time'] = df['timestamp'].dt.time
    
    return df

def save_data_with_backup(df, filename='bitcoin_historical_data.csv'):
    """
    Saves data with a backup file
    """
    # Save main file
    df.to_csv(filename, index=False)
    
    # Create backup file with timestamp
    backup_filename = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{filename}"
    df.to_csv(backup_filename, index=False)
    
    print(f"Data saved to {filename}")
    print(f"Backup created at {backup_filename}")

# Example usage:
if __name__ == "__main__":
    start_date = '2024-01-01'
    end_date = '2024-01-02'  # For minute data, it's recommended to download smaller periods
    
    try:
        # Download the data
        btc_data = get_historical_data_minutes(start_date, end_date)
        
        # Display information about the data
        print("\nDownloaded data information:")
        print(f"Number of rows: {len(btc_data)}")
        print(f"Time period: from {btc_data['timestamp'].min()} to {btc_data['timestamp'].max()}")
        
        # Save the data
        save_data_with_backup(btc_data)
        
    except Exception as e:
        print(f"An error occurred: {e}")

```

To use the code:

1. First, install the required libraries:
```bash
pip install requests pandas
```

2. Save the code to a file (e.g., `coinbase_data.py`) and run it

Key features of the code:
1. Downloads minute-by-minute Bitcoin price data
2. Automatically handles API rate limits
3. Includes error handling and retries
4. Creates backup files automatically
5. Adds additional time columns for easier analysis

Important notes:
1. It's recommended to download data in smaller time periods (e.g., 1-2 days at a time) due to the large volume of minute data
2. The code automatically pauses between requests to respect API rate limits
3. Backup files are created for each download
4. The code will retry up to 3 times on errors

The resulting CSV file will contain the following columns:
- timestamp: The exact time of the data point
- open: Opening price
- high: Highest price during the minute
- low: Lowest price during the minute
- close: Closing price
- volume: Trading volume
- date: Date part of the timestamp
- time: Time part of the timestamp


In [3]:
import requests
import pandas as pd
from datetime import datetime, timedelta
import time

def get_historical_data_minutes(start_date, end_date, granularity=60):
    """
    Downloads historical data from Coinbase API at minute intervals
    
    Parameters:
    start_date (str): Start date in 'YYYY-MM-DD' format
    end_date (str): End date in 'YYYY-MM-DD' format
    granularity (int): Interval in seconds (60 = 1 minute)
    
    Returns:
    pandas.DataFrame: DataFrame with historical data
    """
    
    # Convert dates to datetime objects
    start = datetime.strptime(start_date, '%Y-%m-%d')
    end = datetime.strptime(end_date, '%Y-%m-%d')
    
    # Base URL for API requests
    base_url = "https://api.exchange.coinbase.com"
    
    # Empty list to store data
    all_data = []
    
    # For minute data, we'll get data in 300-minute intervals
    # to comply with API limitations
    current_start = start
    
    while current_start < end:
        # For minute data, we take smaller time windows
        current_end = min(current_start + timedelta(minutes=300), end)
        
        # Create URL for the request
        endpoint = f"/products/BTC-USD/candles"
        params = {
            'start': current_start.isoformat(),
            'end': current_end.isoformat(),
            'granularity': granularity
        }
        
        # Send request with retries on error
        max_retries = 3
        retry_count = 0
        while retry_count < max_retries:
            try:
                response = requests.get(f"{base_url}{endpoint}", params=params)
                if response.status_code == 200:
                    data = response.json()
                    all_data.extend(data)
                    print(f"Downloaded data from {current_start} to {current_end}")
                    break
                elif response.status_code == 429:  # Too Many Requests
                    print("API rate limit reached. Waiting 30 seconds...")
                    time.sleep(30)
                else:
                    print(f"Request error: {response.status_code}")
                    time.sleep(5)
            except Exception as e:
                print(f"Error: {e}")
                time.sleep(5)
            retry_count += 1
        
        # Wait between requests to avoid hitting rate limits
        time.sleep(1)
        
        # Move to next period
        current_start = current_end

    if not all_data:
        raise Exception("No data received from API!")

    # Create DataFrame
    df = pd.DataFrame(all_data, columns=['timestamp', 'open', 'high', 'low', 'close', 'volume'])
    
    # Convert timestamp to datetime
    df['timestamp'] = pd.to_datetime(df['timestamp'], unit='s')
    
    # Sort by time
    df = df.sort_values('timestamp')
    
    # Add additional time columns for easier analysis
    df['date'] = df['timestamp'].dt.date
    df['time'] = df['timestamp'].dt.time
    
    return df

def save_data_with_backup(df, filename='bitcoin_historical_data.csv'):
    """
    Saves data with a backup file
    """
    # Save main file
    df.to_csv(filename, index=False)
    
    # Create backup file with timestamp
    backup_filename = f"backup_{datetime.now().strftime('%Y%m%d_%H%M%S')}_{filename}"
    df.to_csv(backup_filename, index=False)
    
    print(f"Data saved to {filename}")
    print(f"Backup created at {backup_filename}")

# Example usage:
if __name__ == "__main__":
    # Start date in 'YYYY-MM-DD' format
    start_date = '2023-11-01'
    # End date in 'YYYY-MM-DD' format
    end_date = '2024-11-01'  # For minute data, it's recommended to download smaller periods
    
    try:
        # Download the data
        btc_data = get_historical_data_minutes(start_date, end_date)
        
        # Display information about the data
        print("\nDownloaded data information:")
        print(f"Number of rows: {len(btc_data)}")
        print(f"Time period: from {btc_data['timestamp'].min()} to {btc_data['timestamp'].max()}")

        filename = 'bitcoin_historical_data_1_year.csv'
        # Save the data
        save_data_with_backup(btc_data, filename)
        
    except Exception as e:
        print(f"An error occurred: {e}")

Downloaded data from 2023-11-01 00:00:00 to 2023-11-01 05:00:00
Downloaded data from 2023-11-01 05:00:00 to 2023-11-01 10:00:00
Downloaded data from 2023-11-01 10:00:00 to 2023-11-01 15:00:00
Downloaded data from 2023-11-01 15:00:00 to 2023-11-01 20:00:00
Downloaded data from 2023-11-01 20:00:00 to 2023-11-02 01:00:00
Downloaded data from 2023-11-02 01:00:00 to 2023-11-02 06:00:00
Downloaded data from 2023-11-02 06:00:00 to 2023-11-02 11:00:00
Downloaded data from 2023-11-02 11:00:00 to 2023-11-02 16:00:00
Downloaded data from 2023-11-02 16:00:00 to 2023-11-02 21:00:00
Downloaded data from 2023-11-02 21:00:00 to 2023-11-03 02:00:00
Downloaded data from 2023-11-03 02:00:00 to 2023-11-03 07:00:00
Downloaded data from 2023-11-03 07:00:00 to 2023-11-03 12:00:00
Downloaded data from 2023-11-03 12:00:00 to 2023-11-03 17:00:00
Downloaded data from 2023-11-03 17:00:00 to 2023-11-03 22:00:00
Downloaded data from 2023-11-03 22:00:00 to 2023-11-04 03:00:00
Downloaded data from 2023-11-04 03:00:00