# üß† Deep Model Tuning for Bitcoin Birth DATE

–ü—Ä–æ–≤–µ—Ä–∫–∞ –≥–∏–ø–æ—Ç–µ–∑—ã: "–£–≤–µ–ª–∏—á–µ–Ω–∏–µ —Å–ª–æ–∂–Ω–æ—Å—Ç–∏ –º–æ–¥–µ–ª–∏ –ø–æ–º–æ–∂–µ—Ç –ø–µ—Ä–µ–≤–∞—Ä–∏—Ç—å —Ç—Ä–∞–Ω–∑–∏—Ç—ã –∫ –Ω–∞—Ç–∞–ª—å–Ω–æ–π –∫–∞—Ä—Ç–µ".

–î–∞—Ç–∞: **2009-10-10** (Economic Birth / First Rate)
–ü—Ä–∏–∑–Ω–∞–∫–∏: –¢—Ä–∞–Ω–∑–∏—Ç—ã –∫ –Ω–∞—Ç–∞–ª—É + –ê—Å–ø–µ–∫—Ç—ã —Ç—Ä–∞–Ω–∑–∏—Ç–æ–≤ + –§–∞–∑—ã (–ë–ï–ó –¥–æ–º–æ–≤)

In [1]:
import sys
from pathlib import Path
import pandas as pd
import numpy as np
from itertools import product
from tqdm import tqdm
from datetime import datetime, date, timezone
from sklearn.metrics import classification_report, matthews_corrcoef

PROJECT_ROOT = Path("/home/rut/ostrofun")
sys.path.insert(0, str(PROJECT_ROOT))

from RESEARCH.config import cfg
from RESEARCH.data_loader import load_market_data
from RESEARCH.labeling import create_balanced_labels
from RESEARCH.astro_engine import (
    init_ephemeris,
    calculate_bodies_for_dates_multi,
    calculate_aspects_for_dates,
    calculate_transits_for_dates,
    calculate_phases_for_dates,
    get_natal_bodies,
)
from RESEARCH.features import build_full_features, merge_features_with_labels
from RESEARCH.model_training import split_dataset, prepare_xy, train_xgb_model, tune_threshold, predict_with_threshold, check_cuda_available

In [26]:
# Config
TARGET_DATE = date(2009, 10, 10)
print(f"üß† Tuning for Birth Date: {TARGET_DATE}")

ASTRO_CONFIG = {
    "coord_mode": "both",
    "orb_mult": 0.1,
    "gauss_window": 200,
    "gauss_std": 70.0,
    "exclude_bodies": None,
}

# Deep Grid Search Space
PARAM_GRID = {
    "n_estimators": [ 500, 900,1300],
    "max_depth": [ 6, 8, 10],  # –ü—Ä–æ–±—É–µ–º –≥–ª—É–±–æ–∫–∏–µ –¥–µ—Ä–µ–≤—å—è
    "learning_rate": [0.05, 0.03],
    "colsample_bytree": [0.6, 0.8], 
    "subsample": [0.8],
}

üß† Tuning for Birth Date: 2009-10-10


In [27]:
# 1. Prepare Data
print("Loading data...")
df_market = load_market_data()
df_market = df_market[df_market["date"] >= "2017-11-01"].reset_index(drop=True)
df_labels = create_balanced_labels(df_market, ASTRO_CONFIG["gauss_window"], ASTRO_CONFIG["gauss_std"])
settings = init_ephemeris()
_, device = check_cuda_available()

print("Calculating astro...")
df_bodies, geo_by_date, helio_by_date = calculate_bodies_for_dates_multi(
    df_market["date"], settings, coord_mode="both"
)
bodies_by_date = geo_by_date
df_phases = calculate_phases_for_dates(bodies_by_date)

# 2. Build Natal Features
print(f"Building natal features for {TARGET_DATE}...")
natal_dt_str = f"{TARGET_DATE.isoformat()}T12:00:00"
natal_bodies = get_natal_bodies(natal_dt_str, settings)

df_transits = calculate_transits_for_dates(
    bodies_by_date, natal_bodies, settings, 
    orb_mult=ASTRO_CONFIG["orb_mult"]
)

# –ê—Å–ø–µ–∫—Ç—ã –º–µ–∂–¥—É —Ç—Ä–∞–Ω–∑–∏—Ç–∞–º–∏ (Baseline features)
df_aspects = calculate_aspects_for_dates(
    bodies_by_date, settings, 
    orb_mult=ASTRO_CONFIG["orb_mult"]
)

# 3. Full Dataset
print("Merging dataset...")
df_features = build_full_features(
    df_bodies, df_aspects, df_transits=df_transits, df_phases=df_phases, 
    include_pair_aspects=True,    # –í–∫–ª—é—á–∞–µ–º baseline –∞—Å–ø–µ–∫—Ç—ã
    include_transit_aspects=True  # –í–∫–ª—é—á–∞–µ–º –Ω–∞—Ç–∞–ª—å–Ω—ã–µ —Ç—Ä–∞–Ω–∑–∏—Ç—ã
)
df_dataset = merge_features_with_labels(df_features, df_labels)

print(f"Dataset Shape: {df_dataset.shape}")
print(f"Columns: {len(df_dataset.columns)}")

  df = pd.read_sql_query(query, conn, params=params)


Loading data...
Loaded 5677 rows from DB for subject=btc
Date range: 2010-07-18 -> 2026-01-31
Labels created: 2814 samples
  UP: 1368 (48.6%)
  DOWN: 1446 (51.4%)
  Date range: 2017-11-01 -> 2025-07-15
Calculating astro...
üìç –†–∞—Å—á—ë—Ç –ì–ï–û–¶–ï–ù–¢–†–ò–ß–ï–°–ö–ò–• –∫–æ–æ—Ä–¥–∏–Ω–∞—Ç (–ó–µ–º–ª—è –≤ —Ü–µ–Ω—Ç—Ä–µ)...


Calculating bodies: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3014/3014 [00:00<00:00, 17278.06it/s]


‚òÄÔ∏è –†–∞—Å—á—ë—Ç –ì–ï–õ–ò–û–¶–ï–ù–¢–†–ò–ß–ï–°–ö–ò–• –∫–æ–æ—Ä–¥–∏–Ω–∞—Ç (–°–æ–ª–Ω—Ü–µ –≤ —Ü–µ–Ω—Ç—Ä–µ)...


Calculating bodies: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3014/3014 [00:00<00:00, 31440.67it/s]


‚úÖ –û–±—ä–µ–¥–∏–Ω–µ–Ω–æ: 78364 –∑–∞–ø–∏—Å–µ–π –∏–∑ 2 —Å–∏—Å—Ç–µ–º –∫–æ–æ—Ä–¥–∏–Ω–∞—Ç


Calculating phases & elongations: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3014/3014 [00:00<00:00, 215591.39it/s]


‚úÖ –†–∞—Å—Å—á–∏—Ç–∞–Ω–æ 3014 –¥–Ω–µ–π: —Ñ–∞–∑–∞ –õ—É–Ω—ã + —ç–ª–æ–Ω–≥–∞—Ü–∏–∏ –ø–ª–∞–Ω–µ—Ç
Building natal features for 2009-10-10...


Calculating transits (orb=0.1): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3014/3014 [00:00<00:00, 30595.35it/s]
Calculating aspects (orb=0.1): 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 3014/3014 [00:00<00:00, 59968.65it/s]

Merging dataset...





Merged dataset: 3014 samples (ALL days, forward-filled)
Features: 2040
Dataset Shape: (3014, 2042)
Columns: 2042


In [28]:
# 4. Grid Search
print("üöÄ Starting Deep Grid Search...")

train_df, val_df, test_df = split_dataset(df_dataset)
feature_cols = [c for c in df_dataset.columns if c not in ["date", "target"]]
X_train, y_train = prepare_xy(train_df, feature_cols)
X_val, y_val = prepare_xy(val_df, feature_cols)
X_test, y_test = prepare_xy(test_df, feature_cols)

results = []
keys = PARAM_GRID.keys()
combinations = list(product(*PARAM_GRID.values()))

for vals in tqdm(combinations, desc="Grid Search"):
    params = dict(zip(keys, vals))
    
    # Train
    model = train_xgb_model(
        X_train, y_train, X_val, y_val, feature_cols, 
        n_classes=2, device=device, early_stopping_rounds=50, verbose=False,
        **params
    )
    
    # Evaluate
    best_t, _ = tune_threshold(model, X_val, y_val, metric="recall_min")
    y_test_pred = predict_with_threshold(model, X_test, threshold=best_t)
    
    report = classification_report(y_test, y_test_pred, output_dict=True, zero_division=0)
    r_min = min(report["0"]["recall"], report["1"]["recall"])
    mcc = matthews_corrcoef(y_test, y_test_pred)
    
    res_row = params.copy()
    res_row["R_MIN"] = r_min
    res_row["MCC"] = mcc
    results.append(res_row)

üöÄ Starting Deep Grid Search...
Split: Train=2109, Val=452, Test=453


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:   3%|‚ñé         | 1/36 [00:00<00:34,  1.03it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4109, gap=0.1823


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:   6%|‚ñå         | 2/36 [00:01<00:29,  1.16it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4746, gap=0.1291


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:   8%|‚ñä         | 3/36 [00:02<00:27,  1.19it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4109, gap=0.1823


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  11%|‚ñà         | 4/36 [00:03<00:30,  1.06it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.5382, gap=0.0437


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  14%|‚ñà‚ñç        | 5/36 [00:04<00:30,  1.01it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  17%|‚ñà‚ñã        | 6/36 [00:05<00:28,  1.04it/s]

üéØ Best threshold=0.47, RECALL_MIN=0.4218, gap=0.2222


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  19%|‚ñà‚ñâ        | 7/36 [00:06<00:26,  1.08it/s]

üéØ Best threshold=0.49, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  22%|‚ñà‚ñà‚ñè       | 8/36 [00:07<00:26,  1.06it/s]

üéØ Best threshold=0.49, RECALL_MIN=0.5706, gap=0.0876


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  25%|‚ñà‚ñà‚ñå       | 9/36 [00:08<00:25,  1.05it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  28%|‚ñà‚ñà‚ñä       | 10/36 [00:09<00:25,  1.03it/s]

üéØ Best threshold=0.47, RECALL_MIN=0.4218, gap=0.2222


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  31%|‚ñà‚ñà‚ñà       | 11/36 [00:10<00:24,  1.03it/s]

üéØ Best threshold=0.49, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  33%|‚ñà‚ñà‚ñà‚ñé      | 12/36 [00:11<00:23,  1.02it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.4400, gap=0.2041


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  36%|‚ñà‚ñà‚ñà‚ñå      | 13/36 [00:12<00:21,  1.09it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4109, gap=0.1823


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  39%|‚ñà‚ñà‚ñà‚ñâ      | 14/36 [00:13<00:19,  1.11it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4746, gap=0.1291


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  42%|‚ñà‚ñà‚ñà‚ñà‚ñè     | 15/36 [00:15<00:27,  1.32s/it]

üéØ Best threshold=0.50, RECALL_MIN=0.4109, gap=0.1823


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  44%|‚ñà‚ñà‚ñà‚ñà‚ñç     | 16/36 [00:16<00:24,  1.24s/it]

üéØ Best threshold=0.48, RECALL_MIN=0.5382, gap=0.0437


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  47%|‚ñà‚ñà‚ñà‚ñà‚ñã     | 17/36 [00:17<00:21,  1.14s/it]

üéØ Best threshold=0.48, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  50%|‚ñà‚ñà‚ñà‚ñà‚ñà     | 18/36 [00:18<00:19,  1.07s/it]

üéØ Best threshold=0.47, RECALL_MIN=0.4218, gap=0.2222


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  53%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé    | 19/36 [00:19<00:17,  1.03s/it]

üéØ Best threshold=0.49, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  56%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå    | 20/36 [00:20<00:16,  1.02s/it]

üéØ Best threshold=0.49, RECALL_MIN=0.5706, gap=0.0876


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  58%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä    | 21/36 [00:21<00:15,  1.01s/it]

üéØ Best threshold=0.48, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  61%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà    | 22/36 [00:22<00:14,  1.03s/it]

üéØ Best threshold=0.47, RECALL_MIN=0.4218, gap=0.2222


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  64%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç   | 23/36 [00:23<00:13,  1.02s/it]

üéØ Best threshold=0.49, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  67%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã   | 24/36 [00:24<00:12,  1.04s/it]

üéØ Best threshold=0.48, RECALL_MIN=0.4400, gap=0.2041


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  69%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ   | 25/36 [00:25<00:10,  1.01it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4109, gap=0.1823


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  72%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè  | 26/36 [00:25<00:09,  1.08it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4746, gap=0.1291


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  75%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå  | 27/36 [00:26<00:07,  1.13it/s]

üéØ Best threshold=0.50, RECALL_MIN=0.4109, gap=0.1823


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  78%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñä  | 28/36 [00:27<00:07,  1.03it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.5382, gap=0.0437


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  81%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà  | 29/36 [00:28<00:06,  1.05it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  83%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñé | 30/36 [00:29<00:05,  1.07it/s]

üéØ Best threshold=0.47, RECALL_MIN=0.4218, gap=0.2222


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  86%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñå | 31/36 [00:30<00:04,  1.09it/s]

üéØ Best threshold=0.49, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  89%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñâ | 32/36 [00:31<00:03,  1.08it/s]

üéØ Best threshold=0.49, RECALL_MIN=0.5706, gap=0.0876


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  92%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñè| 33/36 [00:32<00:02,  1.05it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  94%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñç| 34/36 [00:33<00:02,  1.02s/it]

üéØ Best threshold=0.47, RECALL_MIN=0.4218, gap=0.2222


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search:  97%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñã| 35/36 [00:34<00:01,  1.06s/it]

üéØ Best threshold=0.49, RECALL_MIN=0.5455, gap=0.0817


Parameters: { "verbose" } are not used.

  self.starting_round = model.num_boosted_rounds()
Grid Search: 100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 36/36 [00:35<00:00,  1.00it/s]

üéØ Best threshold=0.48, RECALL_MIN=0.4400, gap=0.2041





In [29]:
# 5. Analysis
df_res = pd.DataFrame(results).sort_values("R_MIN", ascending=False)
print("\nüèÜ TOP 10 MODELS:")
print(df_res.head(10))

best = df_res.iloc[0]
print(f"\nü•á WINNER PARAMS:")
print(best.to_dict())

baseline_rmin = 0.587
if best["R_MIN"] > baseline_rmin:
    print(f"\nüöÄ SUCCESS! Deep model beat baseline! ({best['R_MIN']:.3f} > {baseline_rmin})")
else:
    print(f"\nüíÄ FAILURE. Still can't beat baseline. ({best['R_MIN']:.3f} <= {baseline_rmin})")
    print("Hypothesis: Natal features are just noise.")


üèÜ TOP 10 MODELS:
    n_estimators  max_depth  learning_rate  colsample_bytree  subsample  \
2            500          6           0.03               0.6        0.8   
14           900          6           0.03               0.6        0.8   
26          1300          6           0.03               0.6        0.8   
0            500          6           0.05               0.6        0.8   
24          1300          6           0.05               0.6        0.8   
12           900          6           0.05               0.6        0.8   
34          1300         10           0.03               0.6        0.8   
22           900         10           0.03               0.6        0.8   
10           500         10           0.03               0.6        0.8   
30          1300          8           0.03               0.6        0.8   

       R_MIN       MCC  
2   0.602941  0.315097  
14  0.602941  0.315097  
26  0.602941  0.315097  
0   0.597059  0.309950  
24  0.597059  0.309950  
12 