# Real-Time Bitcoin Data Processing using cuDF

This notebook demonstrates end-to-end Bitcoin data processing using NVIDIA RAPIDS cuDF for GPU-accelerated analysis.

## Overview

In this notebook, we will:
1. Fetch historical Bitcoin price data
2. Process the data using GPU-accelerated operations
3. Calculate technical indicators for Bitcoin analysis
4. Visualize price trends and indicators
5. Demonstrate real-time data processing

In [None]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

## Import Dependencies

In [None]:
# Standard library imports.
import os
import sys
import time
from datetime import datetime, timedelta

# Third-party library imports.
import pandas as pd
import numpy as np
import plotly.graph_objects as go
from dotenv import load_dotenv

# Add the parent directory to sys.path.
sys.path.append('..')

# Import utility functions from our project.
from utils.cudf_utils import (
    fetch_bitcoin_price, fetch_historical_data, add_to_dataframe,
    compute_moving_averages, compute_volatility, compute_rate_of_change,
    compute_rsi, plot_bitcoin_data, save_to_csv, load_from_csv
)

# Import cuDF for GPU-accelerated data processing.
import cudf

# Load environment variables (for API keys).
load_dotenv()

## Check GPU Availability

First, let's verify that we have a CUDA-capable GPU available for acceleration.

In [None]:
def check_cuda():
    """Check if CUDA is available and print GPU information"""
    try:
        import cupy as cp
        print("[SUCCESS] CUDA is available! Using GPU acceleration with cuDF.")
        
        # Get GPU information.
        gpu_info = cp.cuda.runtime.getDeviceProperties(0)
        print(f"[INFO] GPU Device: {gpu_info['name'].decode()}")
        print(f"[INFO] CUDA Compute Capability: {gpu_info['major']}.{gpu_info['minor']}")
        print(f"[INFO] Total Memory: {gpu_info['totalGlobalMem'] / (1024**3):.2f} GB")
        
        return True
    except ImportError:
        print("[WARNING] CuPy not found. Make sure CUDA is properly configured for GPU acceleration.")
        return False
    except Exception as e:
        print(f"[WARNING] Error initializing CUDA: {e}")
        return False

# Check CUDA availability.
has_cuda = check_cuda()

## Historical Data Analysis

Let's start by fetching and analyzing historical Bitcoin price data.

In [None]:
# Define the number of days of historical data to analyze.
days = 90
print(f"[FETCHING] Fetching {days} days of historical Bitcoin price data...")

# Start timing.
start_time = time.time()

# Fetch historical data.
historical_data = fetch_historical_data(days=days)

if historical_data is None or len(historical_data) == 0:
    print("[ERROR] Failed to fetch historical data. Please check your internet connection.")
else:
    fetch_time = time.time() - start_time
    print(f"[SUCCESS] Successfully fetched {len(historical_data)} historical data points in {fetch_time:.2f} seconds.")
    print(f"[DATE RANGE] Date range: {historical_data['timestamp'].min()} to {historical_data['timestamp'].max()}")
    print(f"[PRICE RANGE] Price range: ${historical_data['price'].min():.2f} to ${historical_data['price'].max():.2f}")
    
    # Display the first few rows.
    historical_data.head()

## Technical Indicator Calculation

Now, let's calculate various technical indicators using cuDF's GPU-accelerated functions.

In [None]:
# Process historical data.
print("[PROCESSING] Computing technical indicators...")
start_time = time.time()

if 'historical_data' in locals() and historical_data is not None and len(historical_data) > 0:
    # Compute moving averages.
    historical_data = compute_moving_averages(historical_data, windows=[7, 20, 50])
    
    # Compute volatility.
    historical_data = compute_volatility(historical_data, window=20)
    
    # Compute rate of change.
    historical_data = compute_rate_of_change(historical_data, periods=[1, 7, 30])
    
    # Compute RSI.
    historical_data = compute_rsi(historical_data, window=14)
    
    process_time = time.time() - start_time
    print(f"[SUCCESS] Finished computing indicators in {process_time:.2f} seconds.")
    
    # Display the data with indicators.
    historical_data.tail()

## Data Visualization

Let's visualize the Bitcoin price data along with technical indicators.

In [None]:
# Create interactive visualization.
if 'historical_data' in locals() and historical_data is not None and len(historical_data) > 0:
    print("[VISUALIZATION] Generating visualization of historical data...")
    
    # Plot Bitcoin data with technical indicators.
    fig = plot_bitcoin_data(historical_data, 
                           title=f"{days}-Day Bitcoin Price Analysis with cuDF")
    
    # Display the plot.
    fig.show()

## Real-Time Data Processing

Now, let's demonstrate real-time data collection and processing.

In [None]:
# Define parameters for real-time data collection.
interval_seconds = 3  # seconds between data points.
num_points = 10       # number of data points to collect.

print(f"[COLLECTING] Collecting {num_points} real-time Bitcoin price data points with {interval_seconds} second intervals...")
print(f"[TIME] This will take approximately {num_points * interval_seconds} seconds. Please wait...")

# Collect real-time data.
realtime_data = None
if has_cuda:  # Only proceed if CUDA is available.
    from utils.cudf_utils import simulate_realtime
    realtime_data = simulate_realtime(interval_seconds=interval_seconds, num_points=num_points)
    
    if realtime_data is not None and len(realtime_data) > 0:
        print(f"[SUCCESS] Successfully collected {len(realtime_data)} real-time data points.")
        
        # Display the real-time data.
        realtime_data.head(num_points)
    else:
        print("[ERROR] Failed to collect real-time data.")

## Processing Real-Time Data

Let's process the real-time data and calculate technical indicators.

In [None]:
# Process real-time data.
if 'realtime_data' in locals() and realtime_data is not None and len(realtime_data) > 0:
    print("[PROCESSING] Computing technical indicators for real-time data...")
    
    # Adjust window sizes based on the amount of data available.
    window_sizes = [min(3, len(realtime_data)-1), min(5, len(realtime_data)-1)]
    window_sizes = [w for w in window_sizes if w > 0]
    
    if window_sizes:
        # Compute technical indicators with appropriate window sizes.
        realtime_data = compute_moving_averages(realtime_data, windows=window_sizes)
        realtime_data = compute_volatility(realtime_data, window=window_sizes[0])
        realtime_data = compute_rate_of_change(realtime_data, periods=[1])
        
        print("[SUCCESS] Finished computing indicators for real-time data.")
        
        # Display the processed real-time data.
        realtime_data.head(num_points)
    else:
        print("[WARNING] Not enough data points for technical indicators.")

## Combining Historical and Real-Time Data

Now, let's combine historical and real-time data for a comprehensive view.

In [None]:
# Combine historical and real-time data.
if ('historical_data' in locals() and historical_data is not None and len(historical_data) > 0 and
    'realtime_data' in locals() and realtime_data is not None and len(realtime_data) > 0):
    
    # Convert both DataFrames to pandas for easier concatenation.
    hist_pdf = historical_data.to_pandas()
    real_pdf = realtime_data.to_pandas()
    
    # Concatenate the DataFrames.
    combined_pdf = pd.concat([hist_pdf, real_pdf], ignore_index=True)
    
    # Convert back to cuDF.
    combined_data = cudf.DataFrame.from_pandas(combined_pdf)
    
    print(f"[SUCCESS] Combined data has {len(combined_data)} rows")
    
    # Recalculate technical indicators.
    combined_data = compute_moving_averages(combined_data, windows=[7, 20, 50])
    combined_data = compute_volatility(combined_data, window=20)
    combined_data = compute_rate_of_change(combined_data, periods=[1, 7])
    combined_data = compute_rsi(combined_data, window=14)
    
    # Display the combined data.
    combined_data.tail()

## Visualizing Combined Data

Let's create a comprehensive visualization of the combined dataset.

In [None]:
# Visualize combined data.
if 'combined_data' in locals() and combined_data is not None and len(combined_data) > 0:
    print("[VISUALIZATION] Generating visualization of combined data...")
    
    # Create plot with combined data.
    fig_combined = plot_bitcoin_data(combined_data, 
                                    title="Bitcoin Price Analysis - Historical + Real-time Data")
    
    # Display the plot.
    fig_combined.show()

## Saving the Data

Let's save the processed data to a CSV file for future reference.

In [None]:
# Save combined data to CSV.
if 'combined_data' in locals() and combined_data is not None and len(combined_data) > 0:
    filename = "bitcoin_data_combined.csv"
    save_to_csv(combined_data, filename=filename)
    print(f"[SUCCESS] Data saved to {filename}")

## Bitcoin Price Forecasting

Finally, let's implement a simple forecasting model to predict future Bitcoin prices.

In [None]:
# Import forecasting libraries.
from sklearn.linear_model import LinearRegression
from sklearn.preprocessing import StandardScaler

def forecast_bitcoin_prices(historical_data, forecast_days=30):
    """Forecast Bitcoin prices using historical data
    
    Args:
        historical_data (cudf.DataFrame): Historical Bitcoin price data
        forecast_days (int): Number of days to forecast
    
    Returns:
        tuple: (historical_data_pandas, forecast_data_pandas) as pandas DataFrames
    """
    print(f"[FORECASTING] Forecasting Bitcoin prices for the next {forecast_days} days...")
    
    if historical_data is None or len(historical_data) < 30:
        print("[ERROR] Insufficient historical data for forecasting. Need at least 30 data points.")
        return None, None
    
    # Convert to pandas for forecasting.
    df = historical_data.to_pandas()
    
    # Sort by timestamp.
    df = df.sort_values('timestamp')
    
    # Set timestamp as index.
    df.set_index('timestamp', inplace=True)
    
    # Keep only the price column for basic forecasting.
    price_series = df['price']
    
    # Create features for regression (using lag features).
    X = np.column_stack([
        price_series.shift(1).values[30:],
        price_series.shift(7).values[30:],
        price_series.shift(14).values[30:],
        price_series.shift(30).values[30:],
        price_series.rolling(7).mean().shift(1).values[30:],
        price_series.rolling(14).mean().shift(1).values[30:],
        price_series.rolling(30).mean().shift(1).values[30:],
        price_series.rolling(7).std().shift(1).values[30:],
        price_series.pct_change(periods=1).shift(1).values[30:],
        price_series.pct_change(periods=7).shift(1).values[30:],
    ])
    
    # Target variable.
    y = price_series.values[30:]
    
    # Remove NaN rows.
    valid_indices = ~np.isnan(X).any(axis=1) & ~np.isnan(y)
    X_clean = X[valid_indices]
    y_clean = y[valid_indices]
    
    # Standardize features.
    scaler = StandardScaler()
    X_scaled = scaler.fit_transform(X_clean)
    
    # Train a linear regression model.
    model = LinearRegression()
    model.fit(X_scaled, y_clean)
    
    print("[SUCCESS] Model trained. Generating forecast...")
    
    # Prepare data for forecasting.
    forecast_horizon = forecast_days
    forecast_dates = [df.index[-1] + timedelta(days=i+1) for i in range(forecast_horizon)]
    
    # Initialize with known values.
    forecast_values = []
    forecast_df = price_series.copy()
    
    # Step-by-step forecast.
    for i in range(forecast_horizon):
        # Get the latest data point.
        latest_price = forecast_df.iloc[-1] if i == 0 else forecast_values[-1]
        latest_price_lag1 = forecast_df.iloc[-1]
        latest_price_lag7 = forecast_df.iloc[-7] if len(forecast_df) > 7 else forecast_df.iloc[0]
        latest_price_lag14 = forecast_df.iloc[-14] if len(forecast_df) > 14 else forecast_df.iloc[0]
        latest_price_lag30 = forecast_df.iloc[-30] if len(forecast_df) > 30 else forecast_df.iloc[0]
        
        # Calculate rolling stats.
        if i == 0:
            ma7 = forecast_df.rolling(7).mean().iloc[-1]
            ma14 = forecast_df.rolling(14).mean().iloc[-1]
            ma30 = forecast_df.rolling(30).mean().iloc[-1]
            std7 = forecast_df.rolling(7).std().iloc[-1]
            pct1 = forecast_df.pct_change(periods=1).iloc[-1]
            pct7 = forecast_df.pct_change(periods=7).iloc[-1]
        else:
            # Append the latest prediction to the series.
            temp_series = pd.concat([forecast_df, pd.Series([forecast_values[-1]], index=[forecast_dates[i-1]])])
            ma7 = temp_series.rolling(7).mean().iloc[-1]
            ma14 = temp_series.rolling(14).mean().iloc[-1]
            ma30 = temp_series.rolling(30).mean().iloc[-1]
            std7 = temp_series.rolling(7).std().iloc[-1]
            pct1 = (temp_series.iloc[-1] / temp_series.iloc[-2]) - 1 if len(temp_series) > 1 else 0
            pct7 = (temp_series.iloc[-1] / temp_series.iloc[-7]) - 1 if len(temp_series) > 7 else 0
        
        # Create feature vector.
        X_forecast = np.array([[
            latest_price_lag1,
            latest_price_lag7,
            latest_price_lag14,
            latest_price_lag30,
            ma7,
            ma14,
            ma30,
            std7,
            pct1,
            pct7
        ]])
        
        # Scale the features.
        X_forecast_scaled = scaler.transform(X_forecast)
        
        # Make prediction.
        forecast_price = model.predict(X_forecast_scaled)[0]
        forecast_values.append(forecast_price)
    
    # Create DataFrame with forecasted values.
    forecast_result = pd.DataFrame({'price': forecast_values}, index=forecast_dates)
    
    # Add confidence intervals (simple approach using historical volatility).
    volatility = df['price'].pct_change().std() * np.sqrt(forecast_horizon)
    forecast_result['lower_bound'] = forecast_result['price'] * (1 - volatility * 1.96)
    forecast_result['upper_bound'] = forecast_result['price'] * (1 + volatility * 1.96)
    
    print(f"[SUCCESS] 30-day forecast generated with confidence intervals.")
    
    return df, forecast_result

# Run forecasting if we have historical data.
if 'historical_data' in locals() and historical_data is not None and len(historical_data) > 0:
    # Generate forecast.
    historical_df, forecast_df = forecast_bitcoin_prices(historical_data, forecast_days=30)
    
    if historical_df is not None and forecast_df is not None:
        # Display the forecast.
        forecast_df.head()

## Visualizing the Forecast

Let's create a visualization of our price forecast with confidence intervals.

In [None]:
def plot_forecast(historical_df, forecast_df, title="Bitcoin Price Forecast"):
    """
    Plot historical data with forecasted prices
    
    Args:
        historical_df (pd.Series): Historical price data with timestamp index
        forecast_df (pd.DataFrame): Forecast data with timestamp index
        title (str): Plot title
    
    Returns:
        plotly.graph_objects.Figure: Plotly figure
    """
    # Create figure.
    fig = go.Figure()
    
    # Add historical price.
    fig.add_trace(go.Scatter(
        x=historical_df.index,
        y=historical_df,
        mode='lines',
        name='Historical Price',
        line=dict(color='#F2A900', width=2)
    ))
    
    # Add forecast line.
    fig.add_trace(go.Scatter(
        x=forecast_df.index,
        y=forecast_df['price'],
        mode='lines',
        name='Forecasted Price',
        line=dict(color='red')
    ))
    
    # Add confidence interval.
    fig.add_trace(go.Scatter(
        x=forecast_df.index.tolist() + forecast_df.index.tolist()[::-1],
        y=forecast_df['upper_bound'].tolist() + forecast_df['lower_bound'].tolist()[::-1],
        fill='toself',
        fillcolor='rgba(255,0,0,0.2)',
        line=dict(color='rgba(255,255,255,0)'),
        name='95% Confidence Interval'
    ))
    
    # Update layout.
    fig.update_layout(
        title=title,
        xaxis_title='Date',
        yaxis_title='Bitcoin Price (USD)',
        hovermode='x unified',
        legend=dict(orientation='h', yanchor='bottom', y=1.02, xanchor='right', x=1)
    )
    
    return fig

# Import enhanced plot_forecast function
from utils.plot_forecast import plot_forecast

# Plot forecast if available.
if 'historical_df' in locals() and 'forecast_df' in locals() and historical_df is not None and forecast_df is not None:
    
    print("[VISUALIZATION] Generating forecast visualization...")
    
    # Create forecast plot with enhanced function
    fig_forecast = plot_forecast(
        historical_df['price'], # Pass just the price series, not the whole dataframe
        forecast_df, 
        title="Bitcoin 30-Day Price Forecast with 95% Confidence Interval"
    )
    
    # Display the plot.
    fig_forecast.show()

## Conclusion

In this notebook, we've demonstrated a complete workflow for Bitcoin price data analysis using GPU-accelerated cuDF:

1. **Data Acquisition**: Fetched historical and real-time Bitcoin price data
2. **GPU-Accelerated Processing**: Calculated technical indicators with cuDF
3. **Visualization**: Created interactive charts of price trends and indicators
4. **Forecasting**: Implemented a simple model to predict future prices

This demonstrates how cuDF can significantly enhance the performance of data-intensive financial analysis workflows.

For more information, refer to:
- The cuDF API documentation in `cudf.API.ipynb` and `cudf.API.md`
- Performance benchmarks in `performance_comparison.ipynb` and `performance_comparison.md`