# Ingest Bitcoin Prices using River for Real-Time Processing

This notebook demonstrates how to ingest real-time Bitcoin price data using the CoinGecko API and perform online learning using the River library.

**Goals:**
- Stream live Bitcoin price data
- Use River for incremental training
- Maintain a rolling window of prices
- Extract lag features for prediction
- Visualize real-time predictions and model accuracy


In [3]:
%load_ext autoreload
%autoreload 2
%matplotlib inline

The autoreload extension is already loaded. To reload it, use:
  %reload_ext autoreload


## Imports
This cell initializes the required packages and imports the helper functions from bitcoin_forecast_utils.py.

In [1]:
!pip install river
!pip install pytest
!pip install scikit-learn
!pip install matplotlib
!pip install requests
!pip install streamlit
!pip install numpy as np

[0mCollecting streamlit
  Downloading streamlit-1.40.1-py2.py3-none-any.whl.metadata (8.5 kB)
Collecting altair<6,>=4.0 (from streamlit)
  Downloading altair-5.4.1-py3-none-any.whl.metadata (9.4 kB)
Collecting blinker<2,>=1.0.0 (from streamlit)
  Downloading blinker-1.8.2-py3-none-any.whl.metadata (1.6 kB)
Collecting cachetools<6,>=4.0 (from streamlit)
  Downloading cachetools-5.5.2-py3-none-any.whl.metadata (5.4 kB)
Collecting click<9,>=7.0 (from streamlit)
  Downloading click-8.1.8-py3-none-any.whl.metadata (2.3 kB)
Collecting protobuf<6,>=3.20 (from streamlit)
  Downloading protobuf-5.29.4-cp38-abi3-manylinux2014_x86_64.whl.metadata (592 bytes)
Collecting pyarrow>=7.0 (from streamlit)
  Downloading pyarrow-17.0.0-cp38-cp38-manylinux_2_28_x86_64.whl.metadata (3.3 kB)
Collecting rich<14,>=10.14.0 (from streamlit)
  Downloading rich-13.9.4-py3-none-any.whl.metadata (18 kB)
Collecting tenacity<10,>=8.1.0 (from streamlit)
  Downloading tenacity-9.0.0-py3-none-any.whl.metadata (1.2 kB)
C

Collecting watchdog<7,>=2.1.5 (from streamlit)
  Downloading watchdog-4.0.2-py3-none-manylinux2014_x86_64.whl.metadata (38 kB)
Collecting gitdb<5,>=4.0.1 (from gitpython!=3.1.19,<4,>=3.0.7->streamlit)
  Downloading gitdb-4.0.12-py3-none-any.whl.metadata (1.2 kB)
Collecting markdown-it-py>=2.2.0 (from rich<14,>=10.14.0->streamlit)
  Downloading markdown_it_py-3.0.0-py3-none-any.whl.metadata (6.9 kB)
Collecting smmap<6,>=3.0.1 (from gitdb<5,>=4.0.1->gitpython!=3.1.19,<4,>=3.0.7->streamlit)
  Downloading smmap-5.0.2-py3-none-any.whl.metadata (4.3 kB)
Collecting mdurl~=0.1 (from markdown-it-py>=2.2.0->rich<14,>=10.14.0->streamlit)
  Downloading mdurl-0.1.2-py3-none-any.whl.metadata (1.6 kB)
Downloading streamlit-1.40.1-py2.py3-none-any.whl (8.6 MB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.6/8.6 MB[0m [31m21.1 MB/s[0m eta [36m0:00:00[0ma [36m0:00:01[0m
[?25hDownloading altair-5.4.1-py3-none-any.whl (658 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

In [2]:
import logging
# Import libraries in this section.
# Avoid imports like import *, from ... import ..., from ... import *, etc.
import time
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import warnings
import collections

import river
import river.linear_model
import river.tree
import river.metrics
import river.optim
from river import linear_model
from river import metrics
import bitcoin_forecast_utils
from bitcoin_forecast_utils import (
    get_bitcoin_price_with_retry,
    get_coin_ohlc,
    build_rolling_features
)
from collections import deque
from river import preprocessing
from river import tree
import requests
import pickle
import streamlit as st

# Initialize Model, Metric, Rolling Structures
This cell performs essential setup for real-time streaming:

- **Model Initialization**: Uses River's `StandardScaler` and `LinearRegression` in a pipeline to handle online feature normalization and regression.
- **Metric**: Initializes `MAE` (Mean Absolute Error) for evaluating prediction accuracy incrementally.
- **Rolling Window**: Creates a `deque` to maintain the most recent 5 Bitcoin prices for feature engineering.
- **Logging**: Prepares empty lists to store MAE values, predictions, and actual prices for post-streaming visualization and analysis.

In [3]:
# Initialize River model and metric
model = preprocessing.StandardScaler() | linear_model.LinearRegression()
metric = metrics.MAE()

# Rolling window to hold past prices (lagged features)
rolling_prices = deque(maxlen=5)

# Logs for analysis and plotting
mae_log = []
pred_log = []
true_log = []


##  Real-Time Streaming + Online Model Training
This block simulates **real-time model training** using cached OHLC close prices. Here's what each part does:

- **OHLC Fetch**: Retrieves 1 day of close prices from the CoinGecko API via `get_coin_ohlc(days=1)` to mimic a live stream.
- **Streaming Simulation**: Iterates through the close prices one-by-one as if they’re arriving in real time.
- **Rolling Window**: Appends each new price to a `deque`, maintaining only the latest 5 values.
- **Feature Engineering**: Constructs combined features using both price lags and OHLC-derived volatility indicators.
- **Model Prediction**: Makes a prediction using the current model.
- **Model Training**: If a prediction is available, the model is updated with the actual target using `learn_one()`.
- **Metric Update**: The MAE (Mean Absolute Error) is updated incrementally for evaluation.
- **Logging**: Records MAE, prediction, and actual values for later plotting.

This structure supports **streaming learning** with continuous model updates, performance tracking, and feature interaction using River.

In [4]:
# Simulate real-time from cached OHLC close prices
ohlc_df = get_coin_ohlc(days=1)
for step, price in enumerate(ohlc_df["close"].head(30)):
    rolling_prices.append(price)

    if len(rolling_prices) < rolling_prices.maxlen:
        continue

    #features = build_rolling_features(rolling_prices)
    ohlc_features_df = bitcoin_forecast_utils.extract_ohlc_features(ohlc_df)
    features = bitcoin_forecast_utils.build_combined_features(rolling_prices, ohlc_features_df)
    true_price = features["price_lag_0"]
    pred_price = model.predict_one(features)

    if pred_price is not None:
        model.learn_one(features, true_price)
        metric = metric.update(true_price, pred_price)

        mae_log.append(metric.get())
        pred_log.append(pred_price)
        true_log.append(true_price)


INFO:bitcoin_forecast_utils:Loading cached OHLC data



This log message is generated by the `get_coin_ohlc()` function from the `bitcoin_forecast_utils.py` module.

 What It Means:
- The notebook **did not fetch fresh OHLC data** from the CoinGecko API.
- Instead, it **loaded previously saved OHLC data** from a local CSV file (`cached_ohlc.csv`).
- This behavior is controlled by the caching mechanism to improve efficiency and reduce API calls.

 Why This Is Useful:
- **Faster Execution:** Avoids unnecessary API calls.
- **Rate Limit Protection:** Prevents hitting CoinGecko’s usage limits.
- **Reproducibility:** Ensures consistent results during testing or demonstration.

## Print Model Weights
This cell prints the **learned weights** from the linear regression model to understand which features (lags and indicators) are contributing most to predictions.


In [5]:
# Inspect model weights (feature importance)
print("Model Weights:")
for feature, weight in model[-1].weights.items():
    print(f"{feature}: {weight:.4f}")


Model Weights:
price_lag_0: -13885.3678
price_lag_1: -18945.4805
price_lag_2: -10273.9125
price_lag_3: -192.6954
price_lag_4: 11631.5213
range: 0.0000
price_change: 0.0000
price_change_pct: 0.0000
volatility: 0.0000


## Simulated Retry Logic (API Robustness Test)
This cell simulates an API failure scenario to test the robustness of the retry mechanism implemented in `bitcoin_forecast_utils.py`.


In [6]:
#  Simulate API failure to demonstrate retry mechanism
def simulate_api_failure():
    raise requests.exceptions.HTTPError(response=requests.Response())

try:
    simulate_api_failure()
except requests.exceptions.HTTPError:
    print("Retry mechanism would be triggered here (simulated).")


Retry mechanism would be triggered here (simulated).


##  Save the Trained River Model with Pickle
This cell demonstrates how to serialize and persist the trained River model using the `pickle` module.


In [7]:
# Save trained River model to a pickle file
with open("btc_stream_model.pkl", "wb") as f:
    pickle.dump(model, f)

print(" Model saved to btc_stream_model.pkl")


 Model saved to btc_stream_model.pkl


## Basic Streaming with OHLC and Linear Regression
This cell demonstrates a simple pipeline for streaming Bitcoin close prices and updating a linear regression model using lag features.

In [8]:
# 2.12 Basic Streaming with OHLC Close Prices and Linear Regression

ohlc_df = get_coin_ohlc("bitcoin", vs_currency="usd", days=7)
ohlc_prices = ohlc_df["close"].tolist()

rolling_window = deque(maxlen=5)
model = linear_model.LinearRegression()
metric = metrics.MAE()

for price in ohlc_prices[:30]:
    rolling_window.append(price)

    if len(rolling_window) < rolling_window.maxlen:
        continue

    features = build_rolling_features(rolling_window)
    y_pred = model.predict_one(features)
    y_true = features["price_lag_0"]

    if y_pred is not None:
        model.learn_one(features, y_true)
        metric = metric.update(y_true, y_pred)

    rolling_window.append(price)

print(" MAE using OHLC Close Prices:", metric.get())


INFO:bitcoin_forecast_utils:Loading cached OHLC data


 MAE using OHLC Close Prices: 2.457223583842532e+20


## Multi-Model Streaming and Volatility Analysis
This cell demonstrates the use of multiple River models—Linear Regression, Hoeffding Tree Regressor, and a Scaled Linear Regression pipeline—to forecast Bitcoin prices using a streaming approach. It also tracks volatility for interpretability.

In [9]:
# ---- Initialization ----
ohlc_df = get_coin_ohlc(days=1)
close_prices = ohlc_df["close"].head(50)  # You can increase this range
rolling_prices = deque(maxlen=5)

# Models
lr_model = linear_model.LinearRegression()
tree_model = tree.HoeffdingTreeRegressor()
pipeline_model = preprocessing.StandardScaler() | linear_model.LinearRegression()

# Logs
actual_log, lr_log, tree_log, pipe_log, vol_log = [], [], [], [], []

# ---- Streaming Loop ----
for step, price in enumerate(close_prices):
    rolling_prices.append(price)
    if len(rolling_prices) < rolling_prices.maxlen:
        continue

    features = build_rolling_features(rolling_prices)
    actual = price

    # Predictions
    lr_pred = lr_model.predict_one(features) if step > rolling_prices.maxlen else 0
    tree_pred = tree_model.predict_one(features) if step > rolling_prices.maxlen else 0
    pipe_pred = pipeline_model.predict_one(features) if step > rolling_prices.maxlen else 0

    # Logging
    actual_log.append(actual)
    lr_log.append(lr_pred)
    tree_log.append(tree_pred)
    pipe_log.append(pipe_pred)
    vol_log.append(ohlc_df["high"].iloc[step] - ohlc_df["low"].iloc[step])

    # Training
    lr_model.learn_one(features, actual)
    tree_model.learn_one(features, actual)
    pipeline_model.learn_one(features, actual)

# ---- Print Output ----
print(f"{'Step':<6} {'Actual':>10} | {'LR':>10} | {'Tree':>10} | {'Pipe':>10} | {'Volatility':>12}")
print("-" * 65)
for i, (a, l, t, p, v) in enumerate(zip(actual_log, lr_log, tree_log, pipe_log, vol_log)):
    print(f"[{i:<3}] {a:10.2f} | {l:10.2f} | {t:10.2f} | {p:10.2f} | {v:12.2f}")

INFO:bitcoin_forecast_utils:Loading cached OHLC data


Step       Actual |         LR |       Tree |       Pipe |   Volatility
-----------------------------------------------------------------
[0  ]  103478.00 |       0.00 |       0.00 |       0.00 |       101.00
[1  ]  103410.00 |       0.00 |       0.00 |       0.00 |        60.00
[2  ]  103541.00 | -535095027057300275200.00 |  103444.00 |   21615.47 |       125.00
[3  ]  103548.00 | 261075342675663712.00 |  103476.33 |   18389.76 |        19.00
[4  ]  103612.00 | -535509075241683648512.00 |  103494.25 |   15872.55 |        69.00
[5  ]  103541.00 | 287960181817382176.00 |  103517.80 |   18008.36 |        99.00
[6  ]  103501.00 | -535733503818821140480.00 |  103521.67 |   32329.19 |        44.00
[7  ]  103502.00 | 382292482805110784.00 |  103518.71 |   36275.19 |        10.00
[8  ]  103594.00 | -535700148667351302144.00 |  103516.62 |   40645.41 |        61.00
[9  ]  103708.00 | 430093463293556224.00 |  103525.22 |   42786.63 |       174.00
[10 ]  103810.00 | -536177800949068070912.00 |  

 Prints a neatly formatted row-by-row comparison showing:
  - Time step
  - Actual price
  - Predicted prices from LR, Tree, and Pipeline
  - Rolling volatility

## Visualization: Predicted vs Actual Comparison Across Models
This cell provides a comprehensive visual comparison of the performance of multiple models over time and visualizes the volatility of Bitcoin prices.

 Top Plot — Model Predictions vs Actual Price
- Compares predicted Bitcoin prices from:
  - Linear Regression
  - Hoeffding Tree Regressor
  - Pipeline (StandardScaler + Linear Regression)
- Uses distinct colors and markers for clarity.
- Plots them against the actual price across time steps.

**Insight:** The closer the predicted lines are to the actual line, the better the model’s performance.

Bottom Plot — Rolling Volatility (Standard Deviation)
- Displays the rolling volatility of Bitcoin prices (i.e., std deviation of the high-low spread).
- Helps identify periods of high market uncertainty or price fluctuation.

**Insight:** Volatility spikes often indicate sudden price movements which models may struggle to predict accurately.

# Streamlit Integration with Trained River Model
This cell integrates a pre-trained River model into a live Streamlit app for real-time forecasting and dynamic visualization.

In [10]:
# Load trained model
with open("btc_stream_model.pkl", "rb") as f:
    model = pickle.load(f)

# Streamlit UI
st.title(" Real-Time Bitcoin Price Forecasting (River)")

# Fetch price
try:
    current_price = get_bitcoin_price_with_retry()
    st.metric(" Current BTC Price (USD)", f"${current_price:,.2f}")
except Exception as e:
    st.error(f"Failed to fetch price: {e}")
    st.stop()

# Simulate rolling window
if "rolling_prices" not in st.session_state:
    st.session_state.rolling_prices = deque(maxlen=5)

st.session_state.rolling_prices.append(current_price)

# Only forecast if we have enough history
if len(st.session_state.rolling_prices) == 5:
    features = build_rolling_features(st.session_state.rolling_prices)
    prediction = model.predict_one(features)

    # Display prediction
    st.subheader("Predicted Next Price")
    st.success(f"${prediction:,.2f}")

    # Train the model (simulate online learning)
    model.learn_one(features, current_price)

    # Save updated model
    with open("btc_stream_model.pkl", "wb") as f:
        pickle.dump(model, f)

    # Show weights
    st.subheader("Model Weights")
    st.json(model.weights)

    # Plot true vs predicted
    if "price_log" not in st.session_state:
        st.session_state.price_log = []

    st.session_state.price_log.append((current_price, prediction))

    df = pd.DataFrame(st.session_state.price_log, columns=["Actual", "Predicted"])
    st.line_chart(df)

2025-05-15 18:39:51.440 
  command:

    streamlit run /usr/local/lib/python3.8/dist-packages/ipykernel_launcher.py [ARGUMENTS]
2025-05-15 18:39:51.582 Session state does not function when running a script without `streamlit run`


In [11]:
code = '''
import streamlit as st
import time
import pickle
from collections import deque
from bitcoin_forecast_utils import get_bitcoin_price_with_retry, build_rolling_features
from river import linear_model, metrics, preprocessing

# App Config
st.set_page_config(page_title="Bitcoin Forecasting with River", page_icon=":chart_with_upwards_trend:")
st.title("📈 Real-Time Bitcoin Price Forecasting using River")

# Load or initialize model with normalization
if "model" not in st.session_state:
    scaler = preprocessing.StandardScaler()
    regressor = linear_model.LinearRegression()
    st.session_state.model = scaler | regressor
    st.session_state.metric = metrics.MAE()
    st.session_state.rolling_prices = deque(maxlen=5)
    st.session_state.price_log = []

model = st.session_state.model
metric = st.session_state.metric
rolling_prices = st.session_state.rolling_prices
price_log = st.session_state.price_log

# Fetch live price
if st.button("🔄 Refresh BTC Price"):
    get_bitcoin_price_with_retry.cache_clear()
    st.rerun()

try:
    current_price = get_bitcoin_price_with_retry()
    st.metric("📌 Current BTC Price (USD)", f"${current_price:,.2f}")
except Exception as e:
    st.error(f"Failed to fetch price: {e}")
    st.stop()

# Update rolling window
rolling_prices.append(current_price)

# Predict only if enough data is available
if len(rolling_prices) == rolling_prices.maxlen:
    features = build_rolling_features(rolling_prices)
    pred_price = model.predict_one(features)

    # Display prediction
    # TEMP FIX: Add predicted delta to current price
    prediction = model.predict_one(features)
    corrected_prediction = current_price + prediction
    st.subheader("🧠 Predicted Next Price")
    st.success(f"${corrected_prediction:,.2f}")


    # Train model
    model.learn_one(features, current_price)
    metric = metric.update(current_price, pred_price)
    st.session_state.metric = metric

    # Log data
    price_log.append((current_price, pred_price))

# Optional: Show model weights
if st.checkbox("🔍 Show Model Weights"):
    try:
        weights = dict(model[-1].weights)
        st.json(weights)
        st.line_chart(list(weights.values()))
    except Exception as e:
        st.error("Could not display weights: " + str(e))

# Optional: Display log chart
if price_log:
    import pandas as pd
    df = pd.DataFrame(price_log, columns=["Actual", "Predicted"])
    st.line_chart(df)

'''

# Save to a file
with open("streamlit_app.py", "w") as f:
    f.write(code)

print(" Streamlit app saved to streamlit_app.py")


 Streamlit app saved to streamlit_app.py


In [None]:
!streamlit run streamlit_app.py


Collecting usage statistics. To deactivate, set browser.gatherUsageStats to false.
[0m
[0m
[34m[1m  You can now view your Streamlit app in your browser.[0m
[0m
[34m  Local URL: [0m[1mhttp://localhost:8501[0m
[34m  Network URL: [0m[1mhttp://172.17.0.2:8501[0m
[34m  External URL: [0m[1mhttp://76.100.194.142:8501[0m
[0m
