trajectory forecasting with adaptive updating 

**Main Features**
- Brent Crude, WTI Crude, Dutch TTF Gas, Henry Hub Gas
- Equinor (EQNR.OL): Open, Close, High, Low, Volume, Market Cap
- OSEBX Index: Open, Close, High, Low, Volume
- VIX (volatility index)
- Dollar Index (DXY)

**Relevant Stocks**
- **Norway**: Aker BP (AKRBP), DNO (DNO), Vår Energi (VAR), Petroleum Geo-Services (PGS), BW Offshore (BWO), Frontline (FRO)
- **US/Global**: Exxon (XOM), Chevron (CVX), Shell (SHEL), BP (BP), TotalEnergies (TTE), ConocoPhillips (COP), Occidental (OXY)

**Stock Exchanges**
- S&P 500, NASDAQ, Dow Jones
- FTSE 100, DAX, CAC 40
- Nikkei 225, Hang Seng

**Commodity Prices**
- Gold (XAU), Silver (XAG)
- **Currencies**: USD/NOK, EUR/NOK, GBP/NOK, SEK/NOK, USD/EUR
- Coal (API2), Uranium (UX)
- Carbon Credits (EU ETS)

**Economic Indicators**
- **Interest Rates**: Norway (Norges Bank), US Fed Funds, ECB, BoE, BoJ, PBoC
- **Inflation**: Norway CPI, US CPI, EU HICP
- **Unemployment**: Norway, US, EU rates
- **Analyst Targets**: Equinor consensus price targets, EPS estimates


#### Fetch Dependencies

In [1]:
import yfinance as yf
import pandas as pd
import numpy as np
from datetime import datetime, timedelta
import warnings
warnings.filterwarnings('ignore')

#### Collect Data

In [16]:
# CELL 1: COLLECT DATA

def collect_data(start_date="2021-01-01", end_date=None):
    """Collect stock data and return as DataFrame"""
    end_date = end_date or datetime.now().strftime('%Y-%m-%d')
    
    # Updated tickers based on research
    tickers = {
        # Main stock
        'EQNR.OL': 'equinor',
       
        # Energy commodities
        'BZ=F': 'brent_crude',
        'CL=F': 'wti_crude',
        'TTF=F': 'ttf_gas',
        'NG=F': 'henry_hub',
       
        # Norwegian energy stocks
        'AKRBP.OL': 'aker_bp',
        'DNO.OL': 'dno',
        'VAR.OL': 'var_energi',
        'PGS.OL': 'pgs',
        'BWO.OL': 'bw_offshore',
        'FRO.OL': 'frontline',
       
        # Global energy stocks
        'XOM': 'exxon',
        'CVX': 'chevron',
        'SHEL': 'shell',
        'BP': 'bp',
        'TTE': 'totalenergies',
        'COP': 'conocophillips',
        'OXY': 'occidental',
       
        # Indices
        'OSEBX.OL': 'osebx',
        '^GSPC': 'sp500',
        '^IXIC': 'nasdaq',
        '^DJI': 'dow_jones',
        '^FTSE': 'ftse100',
        '^GDAXI': 'dax',
        '^FCHI': 'cac40',
        '^N225': 'nikkei',
        '^HSI': 'hang_seng',
       
        # Volatility and Dollar
        '^VIX': 'vix',
        'DX-Y.NYB': 'dollar_index',
       
        # Commodities
        'GC=F': 'gold',
        'SI=F': 'silver',
       
        # Currencies
        'NOK=X': 'usd_nok',
        'EURNOK=X': 'eur_nok',
        'GBPNOK=X': 'gbp_nok',
        'SEKNOK=X': 'sek_nok',
        'EURUSD=X': 'eur_usd'
    }
    
    all_data = {}
    
    # Download each ticker separately to avoid alignment issues
    for ticker, name in tickers.items():
        try:
            print(f"Downloading {name}...")
            # Download individually
            data = yf.download(ticker, start=start_date, end=end_date, progress=False)
            
            if len(data) > 0:
                # Only keep OHLC and Volume columns
                cols_to_keep = ['Open', 'High', 'Low', 'Close', 'Volume']
                data = data[[c for c in cols_to_keep if c in data.columns]]
                # Handle column renaming - columns might be strings or tuples
                new_cols = []
                for col in data.columns:
                    if isinstance(col, tuple):
                        col_name = col[0] if len(col) > 0 else str(col)
                    else:
                        col_name = str(col)
                    new_cols.append(f"{name}_{col_name.lower()}")
                data.columns = new_cols
                all_data[name] = data
                print(f"  ✓ {name}: {len(data)} rows")
            else:
                print(f"  ✗ No data received for {ticker}")
                
        except Exception as e:
            print(f"  ✗ Error fetching {ticker}: {e}")
    
    if not all_data:
        print("No data collected")
        return pd.DataFrame()
    
    # Combine using outer join to keep all dates
    df = pd.concat(all_data.values(), axis=1, join='outer')
    print(f"Combined data: {len(df)} rows, {len(df.columns)} columns")
    
    # Check initial NaN percentage
    nan_pct = df.isnull().sum().sum() / df.size * 100
    print(f"Initial NaN percentage: {nan_pct:.2f}%")
    
    # Keep only dates where Equinor traded (removes weekends/holidays)
    if 'equinor_close' in df.columns:
        before_filter = len(df)
        df = df[df['equinor_close'].notna()]
        print(f"Filtered to Equinor trading days: {before_filter} → {len(df)} rows")
    
    # Forward fill then backward fill to handle gaps
    df = df.ffill().bfill()
    
    # For any remaining NaNs at the beginning, drop those rows
    # This happens when some tickers start trading later than others
    first_valid_idx = df.first_valid_index()
    last_valid_idx = df.last_valid_index()
    if first_valid_idx and last_valid_idx:
        df = df.loc[first_valid_idx:last_valid_idx]
    
    # Final check for NaN percentage
    nan_count = df.isnull().sum().sum()
    if nan_count > 0:
        nan_pct_final = nan_count / df.size * 100
        print(f"Warning: {nan_count} NaN values remain ({nan_pct_final:.2f}%)")
        # Show which columns have NaNs
        nan_cols = df.columns[df.isnull().any()].tolist()
        if nan_cols:
            print(f"  Columns with NaNs: {nan_cols}")
    
    print(f"\nFinal data: {len(df)} rows, {len(df.columns)} columns")
    if len(df) > 0:
        print(f"Date range: {df.index[0].date()} to {df.index[-1].date()}")
        nan_pct_final = df.isnull().sum().sum() / df.size * 100
        print(f"Final NaN percentage: {nan_pct_final:.2f}%")
    else:
        print("WARNING: No data remaining after processing")
    
    return df

In [12]:
# Run collection
data = collect_data(start_date="2015-01-01")

Downloading equinor...
  ✓ equinor: 2683 rows
Downloading brent_crude...
  ✓ brent_crude: 2686 rows
Downloading wti_crude...
  ✓ wti_crude: 2685 rows
Downloading ttf_gas...
  ✓ ttf_gas: 1980 rows
Downloading henry_hub...
  ✓ henry_hub: 2686 rows
Downloading aker_bp...
  ✓ aker_bp: 2683 rows
Downloading dno...
  ✓ dno: 2683 rows
Downloading var_energi...
  ✓ var_energi: 893 rows
Downloading pgs...
  ✓ pgs: 2400 rows
Downloading bw_offshore...
  ✓ bw_offshore: 2683 rows
Downloading frontline...
  ✓ frontline: 2683 rows
Downloading exxon...
  ✓ exxon: 2685 rows
Downloading chevron...
  ✓ chevron: 2685 rows
Downloading shell...
  ✓ shell: 2685 rows
Downloading bp...
  ✓ bp: 2685 rows
Downloading totalenergies...
  ✓ totalenergies: 2685 rows
Downloading conocophillips...
  ✓ conocophillips: 2685 rows
Downloading occidental...
  ✓ occidental: 2685 rows
Downloading osebx...
  ✓ osebx: 2668 rows
Downloading sp500...
  ✓ sp500: 2685 rows
Downloading nasdaq...
  ✓ nasdaq: 2685 rows
Downloading d

#### Print Data

In [13]:
# CELL 2: PRINT HEAD OF DATA
print(f"Shape: {data.shape}")
print(f"Date range: {data.index.min().date()} to {data.index.max().date()}")
print(f"Columns: {len(data.columns)}")
print(f"Remaining NaNs: {data.isnull().sum().sum()}")
print(f"NaN percentage: {data.isnull().sum().sum() / data.size * 100:.2f}%")
data.head()

Shape: (2683, 180)
Date range: 2015-01-02 to 2025-09-05
Columns: 180
Remaining NaNs: 0
NaN percentage: 0.00%


Unnamed: 0_level_0,equinor_open,equinor_high,equinor_low,equinor_close,equinor_volume,brent_crude_open,brent_crude_high,brent_crude_low,brent_crude_close,brent_crude_volume,...,sek_nok_open,sek_nok_high,sek_nok_low,sek_nok_close,sek_nok_volume,eur_usd_open,eur_usd_high,eur_usd_low,eur_usd_close,eur_usd_volume
Date,Unnamed: 1_level_1,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1,Unnamed: 5_level_1,Unnamed: 6_level_1,Unnamed: 7_level_1,Unnamed: 8_level_1,Unnamed: 9_level_1,Unnamed: 10_level_1,Unnamed: 11_level_1,Unnamed: 12_level_1,Unnamed: 13_level_1,Unnamed: 14_level_1,Unnamed: 15_level_1,Unnamed: 16_level_1,Unnamed: 17_level_1,Unnamed: 18_level_1,Unnamed: 19_level_1,Unnamed: 20_level_1,Unnamed: 21_level_1
2015-01-02,86.868472,86.999395,84.380903,85.166458,3926948.0,57.630001,58.220001,55.52,56.419998,16707.0,...,0.9494,0.96025,0.94248,0.94682,0.0,1.208868,1.208956,1.20108,1.208941,0.0
2015-01-05,85.428317,85.952018,82.155202,82.482513,6601413.0,56.290001,56.290001,52.669998,53.110001,30065.0,...,0.95566,0.96413,0.95356,0.9552,0.0,1.1955,1.19759,1.188909,1.194643,0.0
2015-01-06,81.696961,83.660831,80.845949,83.333519,6226931.0,53.23,53.52,50.529999,51.099998,35494.0,...,0.96209,0.9778,0.959,0.96274,0.0,1.19383,1.197,1.188693,1.193902,0.0
2015-01-07,81.762422,85.100999,81.304183,84.119064,8661067.0,51.060001,51.84,49.68,51.150002,37082.0,...,0.97618,0.98658,0.9671,0.97493,0.0,1.187479,1.19,1.180401,1.187536,0.0
2015-01-08,85.297376,85.886533,83.72628,84.839142,7062883.0,51.0,51.889999,49.82,50.959999,29469.0,...,0.96514,0.96777,0.94491,0.96572,0.0,1.183894,1.184806,1.175601,1.1836,0.0


#### Save Matrix to CSV

In [None]:
# CELL 3: SAVE TO CSV
filepath = "data/equinor_data.csv"
data.to_csv(filepath)
print(f"Saved {len(data)} rows to {filepath}")

Saved 2683 rows to data/equinor_data.csv
