# IDEFAR Demo — News (GCS) → Embeddings (open model) → XGBoost → Daily Prediction

Esta notebook implementa una **demo estándar** (sin fine-tuning de LLM):

- Textos diarios (tipo *Bloomberg wrap*) almacenados como `.txt` en **Google Cloud Storage (GCS)**.
- Generación de **embeddings** con un modelo **gratuito/open** (Sentence-Transformers).
- Entrenamiento de un **XGBoost Regressor** para predecir un `stress_target` sintético (demo).
- Predicción sobre otra carpeta de textos (producción/demo).

> Requisitos: credenciales de GCP configuradas (ADC o Service Account).  
> No entrenamos un LLM; lo usamos como extractor de features.

---

## Estructura en GCS

- `gs://<BUCKET>/idefar_demo/train/news/`  → 10 noticias (train)
- `gs://<BUCKET>/idefar_demo/predict/news/` → 2 noticias (predict)
- `gs://<BUCKET>/idefar_demo/artifacts/` → modelo + metadata (opcional)


In [1]:

import os
from pathlib import Path
import json
import re
import hashlib
import numpy as np
import pandas as pd

from sklearn.metrics import mean_absolute_error, mean_squared_error

import xgboost as xgb

print("Ready.")


Ready.


## 1) Autenticación y configuración de GCS

Esta notebook usa `google-cloud-storage`.

Opciones típicas:

### A) ADC (Application Default Credentials)
En tu terminal:
- `gcloud auth application-default login`

### B) Service Account
- exportar `GOOGLE_APPLICATION_CREDENTIALS=/path/to/key.json`


In [2]:
from google.cloud import storage

# --- CONFIGURACIÓN (EDITAR)
GCP_PROJECT = os.getenv("GCP_PROJECT", "")  # opcional
GCS_BUCKET = os.getenv("GCS_BUCKET", "REEMPLAZAR_BUCKET")  # <- EDITAR
BASE_PREFIX = os.getenv("GCS_BASE_PREFIX", "idefar_demo")

TRAIN_PREFIX = f"{BASE_PREFIX}/train/news/"
PRED_PREFIX  = f"{BASE_PREFIX}/predict/news/"
ART_PREFIX   = f"{BASE_PREFIX}/artifacts/"

client = storage.Client(project=GCP_PROJECT or None)
bucket = client.bucket(GCS_BUCKET)

print("Bucket:", GCS_BUCKET)
print("Train prefix:", TRAIN_PREFIX)
print("Predict prefix:", PRED_PREFIX)




Bucket: REEMPLAZAR_BUCKET
Train prefix: idefar_demo/train/news/
Predict prefix: idefar_demo/predict/news/


## 2) Crear y subir noticias inventadas a GCS (10 train + 2 predict)

Generamos textos de extensión similar a un wrap (aprox 700–1200 palabras) con foco Argentina + mundo.


In [None]:
# --- Utilidades para subir/listar/leer .txt en GCS
def gcs_upload_text(bucket, blob_name: str, text: str, content_type="text/plain; charset=utf-8"):
    blob = bucket.blob(blob_name)
    blob.upload_from_string(text, content_type=content_type)
    return blob_name

def gcs_list_txt(bucket, prefix: str):
    return [b.name for b in bucket.list_blobs(prefix=prefix) if b.name.endswith(".txt")]

def gcs_read_text(bucket, blob_name: str) -> str:
    blob = bucket.blob(blob_name)
    return blob.download_as_text(encoding="utf-8")

print("Utils ready.")


In [None]:
# --- Generar y subir 10 train + 2 predict
# Si ya existen, podés cambiar BASE_PREFIX o borrar los objetos desde la consola.

train_meta = [
  {
    "date": "2024-01-02",
    "filename": "wrap_train_2024-01-02.txt"
  },
  {
    "date": "2024-01-03",
    "filename": "wrap_train_2024-01-03.txt"
  },
  {
    "date": "2024-01-04",
    "filename": "wrap_train_2024-01-04.txt"
  },
  {
    "date": "2024-01-05",
    "filename": "wrap_train_2024-01-05.txt"
  },
  {
    "date": "2024-01-08",
    "filename": "wrap_train_2024-01-08.txt"
  },
  {
    "date": "2024-01-09",
    "filename": "wrap_train_2024-01-09.txt"
  },
  {
    "date": "2024-01-10",
    "filename": "wrap_train_2024-01-10.txt"
  },
  {
    "date": "2024-01-11",
    "filename": "wrap_train_2024-01-11.txt"
  },
  {
    "date": "2024-01-12",
    "filename": "wrap_train_2024-01-12.txt"
  },
  {
    "date": "2024-01-15",
    "filename": "wrap_train_2024-01-15.txt"
  }
]
pred_meta  = [
  {
    "date": "2024-01-16",
    "filename": "wrap_predict_2024-01-16.txt"
  },
  {
    "date": "2024-01-17",
    "filename": "wrap_predict_2024-01-17.txt"
  }
]

train_texts = ["Argentina & Global Markets Wrap — 2024-01-02\n\nKey themes:\n- Argentina: reservas y flujo comercial\n- Global: Fed pricing\n- Rates/credit: front-end repricing\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.", "Argentina & Global Markets Wrap — 2024-01-03\n\nKey themes:\n- Argentina: tasas en pesos y liquidez\n- Global: US yields\n- Rates/credit: widening spreads\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.", "Argentina & Global Markets Wrap — 2024-01-04\n\nKey themes:\n- Argentina: acciones y Merval\n- Global: Fed pricing\n- Rates/credit: liquidity pockets\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.", "Argentina & Global Markets Wrap — 2024-01-05\n\nKey themes:\n- Argentina: FX y brecha cambiaria\n- Global: Fed pricing\n- Rates/credit: steeper curve\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.", "Argentina & Global Markets Wrap — 2024-01-08\n\nKey themes:\n- Argentina: acciones y Merval\n- Global: oil volatility\n- Rates/credit: widening spreads\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.", "Argentina & Global Markets Wrap — 2024-01-09\n\nKey themes:\n- Argentina: tasas en pesos y liquidez\n- Global: oil volatility\n- Rates/credit: liquidity pockets\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.", "Argentina & Global Markets Wrap — 2024-01-10\n\nKey themes:\n- Argentina: IMF y programa\n- Global: oil volatility\n- Rates/credit: front-end repricing\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.", "Argentina & Global Markets Wrap — 2024-01-11\n\nKey themes:\n- Argentina: curva soberana y spreads\n- Global: China data\n- Rates/credit: front-end repricing\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Late-day trading turned more defensive as spreads widened and liquidity thinned, with notable risk-off behavior.", "Argentina & Global Markets Wrap — 2024-01-12\n\nKey themes:\n- Argentina: curva soberana y spreads\n- Global: US yields\n- Rates/credit: front-end repricing\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.", "Argentina & Global Markets Wrap — 2024-01-15\n\nKey themes:\n- Argentina: reservas y flujo comercial\n- Global: China data\n- Rates/credit: widening spreads\n\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed.\nMarkets navigated a mix of local and global cross-currents. In Argentina, traders focused on policy signals, the FX complex and funding conditions, while global investors weighed shifts in rates and risk appetite. Price action reflected cautious positioning, with volatility concentrated around macro headlines, flow-driven moves and changes in expectations about the near-term policy path. In parallel, commodities and emerging-market sentiment provided an external backdrop, amplifying intraday swings in sovereign risk and local assets. Trading was relatively orderly, with two-way flows and selective demand for carry where liquidity allowed."]
pred_texts  = ["Argentina & Global Markets Wrap — 2024-01-16\n\nKey themes:\n- Argentina: commodities\n- Global: EM flows\n- Rates/credit: duration selloff\n\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Flows were mixed, but the tone leaned cautious as participants waited for clearer policy anchors.", "Argentina & Global Markets Wrap — 2024-01-17\n\nKey themes:\n- Argentina: FX y brecha cambiaria\n- Global: oil move\n- Rates/credit: credit risk repricing\n\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration.\nThe session featured a tighter link between headlines and positioning. Argentina assets reacted to the tone of policy communication and the perceived balance between FX supply and demand, while global markets remained sensitive to rates and growth surprises. Sovereign pricing moved alongside shifts in risk appetite, and local curves repriced as investors reassessed inflation dynamics and near-term funding needs. Overall, the narrative emphasized uncertainty and optionality, with a premium placed on liquidity and downside protection. Toward the close, traders flagged signs of stress in funding conditions and a more cautious stance on duration."]

# Subir train
for meta, txt in zip(train_meta, train_texts):
    blob_name = TRAIN_PREFIX + meta["filename"]
    gcs_upload_text(bucket, blob_name, txt)

# Subir predict
for meta, txt in zip(pred_meta, pred_texts):
    blob_name = PRED_PREFIX + meta["filename"]
    gcs_upload_text(bucket, blob_name, txt)

print("Uploaded:")
print(" train:", len(train_meta), "files to", TRAIN_PREFIX)
print(" predict:", len(pred_meta), "files to", PRED_PREFIX)

print("GCS train files:", len(gcs_list_txt(bucket, TRAIN_PREFIX)))
print("GCS predict files:", len(gcs_list_txt(bucket, PRED_PREFIX)))


## 3) Construir dataset de entrenamiento desde GCS

- Lee los `.txt` de `train/news/`
- Limpia texto
- Genera `stress_target` sintético (demo) — **reemplazable** por tu índice real


In [None]:
def clean_text(s: str) -> str:
    s = re.sub(r"\s+", " ", s).strip()
    return s

train_files = sorted(gcs_list_txt(bucket, TRAIN_PREFIX))
rows = []
for fn in train_files:
    txt = gcs_read_text(bucket, fn)
    m = re.search(r"(\d{4}-\d{2}-\d{2})", fn)
    d = pd.to_datetime(m.group(1)) if m else pd.NaT
    rows.append({"date": d, "gcs_path": fn, "news_text": txt, "news_text_clean": clean_text(txt)})

df_train = pd.DataFrame(rows).sort_values("date").reset_index(drop=True)

# Target sintético (demo): 3 días más estresados, resto normal
np.random.seed(42)
base = np.random.normal(0, 1, len(df_train)).cumsum() / 10
stress = (base - base.mean()) / (base.std() + 1e-12)

if len(df_train) >= 10:
    stress[3] += 1.5
    stress[4] += 1.2
    stress[7] += 1.8

df_train["stress_target"] = stress.astype(float)

df_train[["date","gcs_path","stress_target"]].head(10)


## 4) Embeddings (modelo gratuito/open)

Usamos **Sentence-Transformers** como extractor de embeddings (sin API paga).

Modelo recomendado para demo rápida:
- `sentence-transformers/all-MiniLM-L6-v2` (384 dimensiones)


In [None]:
from sentence_transformers import SentenceTransformer

EMB_MODEL = os.getenv("EMB_MODEL", "sentence-transformers/all-MiniLM-L6-v2")
print("Embedding model:", EMB_MODEL)

st_model = SentenceTransformer(EMB_MODEL)

E_train = st_model.encode(
    df_train["news_text_clean"].tolist(),
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
)

print("E_train shape:", E_train.shape)


## 5) Entrenamiento XGBoost (simple) con validación temporal

- Input: embeddings
- Output: `stress_target`
- Split: holdout temporal (últimos 2 días como validación) para demo


In [None]:
n = len(df_train)
n_val = 2
train_idx = np.arange(0, n - n_val)
val_idx   = np.arange(n - n_val, n)

X_tr, y_tr = E_train[train_idx], df_train.loc[train_idx, "stress_target"].values
X_va, y_va = E_train[val_idx],   df_train.loc[val_idx, "stress_target"].values

model = xgb.XGBRegressor(
    n_estimators=300,
    max_depth=3,
    learning_rate=0.05,
    subsample=0.9,
    colsample_bytree=0.9,
    reg_lambda=1.0,
    random_state=42,
    objective="reg:squarederror",
)

model.fit(X_tr, y_tr)

pred_va = model.predict(X_va)
mae  = mean_absolute_error(y_va, pred_va)
rmse = mean_squared_error(y_va, pred_va) ** 0.5

print("Validation MAE:", mae)
print("Validation RMSE:", rmse)

pd.DataFrame({
    "date": df_train.loc[val_idx, "date"].dt.date.astype(str).values,
    "y_true": y_va,
    "y_pred": pred_va
})


## 6) Guardar artefactos (local y opcionalmente subir a GCS)

Se guardan:
- Modelo XGBoost (`xgb_model.json`)
- `metadata.json` con configuración + métricas + modelo de embedding


In [None]:
from pathlib import Path

ART_LOCAL = Path("artifacts")
ART_LOCAL.mkdir(exist_ok=True, parents=True)

model_path = ART_LOCAL / "xgb_model.json"
meta_path  = ART_LOCAL / "metadata.json"

model.save_model(model_path.as_posix())

metadata = {
    "run_ts": pd.Timestamp.utcnow().isoformat(),
    "gcs_bucket": GCS_BUCKET,
    "train_prefix": TRAIN_PREFIX,
    "pred_prefix": PRED_PREFIX,
    "embedding_model": EMB_MODEL,
    "embedding_dim": int(E_train.shape[1]),
    "train_rows": int(len(df_train)),
    "val_rows": int(n_val),
    "metrics": {"MAE": float(mae), "RMSE": float(rmse)},
}

meta_path.write_text(json.dumps(metadata, indent=2, ensure_ascii=False), encoding="utf-8")

print("Saved locally:", model_path, meta_path)

UPLOAD_ARTIFACTS_TO_GCS = os.getenv("UPLOAD_ARTIFACTS_TO_GCS", "1") == "1"
if UPLOAD_ARTIFACTS_TO_GCS and GCS_BUCKET != "REEMPLAZAR_BUCKET":
    # metadata (string)
    gcs_upload_text(bucket, ART_PREFIX + "metadata.json", meta_path.read_text(encoding="utf-8"), content_type="application/json; charset=utf-8")
    # model (file)
    bucket.blob(ART_PREFIX + "xgb_model.json").upload_from_filename(model_path.as_posix(), content_type="application/json")
    print("Uploaded artifacts to:", f"gs://{GCS_BUCKET}/{ART_PREFIX}")
else:
    print("Skipping upload artifacts. Set GCS_BUCKET and UPLOAD_ARTIFACTS_TO_GCS=1 to enable.")


## 7) Predicción (lee otra carpeta en GCS)

- Lee los `.txt` en `predict/news/`
- Genera embeddings con el mismo modelo
- Carga el XGBoost entrenado
- Devuelve un score por noticia


In [None]:
pred_files = sorted(gcs_list_txt(bucket, PRED_PREFIX))
rows = []
for fn in pred_files:
    txt = gcs_read_text(bucket, fn)
    m = re.search(r"(\d{4}-\d{2}-\d{2})", fn)
    d = pd.to_datetime(m.group(1)) if m else pd.NaT
    rows.append({"date": d, "gcs_path": fn, "news_text": txt, "news_text_clean": clean_text(txt)})

df_pred = pd.DataFrame(rows).sort_values("date").reset_index(drop=True)

E_pred = st_model.encode(
    df_pred["news_text_clean"].tolist(),
    show_progress_bar=True,
    convert_to_numpy=True,
    normalize_embeddings=True
)

model2 = xgb.XGBRegressor()
model2.load_model(model_path.as_posix())

pred_scores = model2.predict(E_pred)

out = df_pred[["date","gcs_path"]].copy()
out["stress_score_pred"] = pred_scores

out


## 8) Notas para producción (rápido)

- Reemplazar `stress_target` sintético por tu índice cuantitativo real (aunque tenga rezago).
- Fijar el embedding model y versionarlo (no cambiar modelo sin reentrenar).
- Programar:
  - Batch mensual: re-entrenar cuando entra nuevo target rezagado.
  - Batch diario: embed del texto y `predict()`.
