## 2026 EY AI & Data Challenge - Landsat Data Extraction Notebook

This notebook demonstrates Landsat data extraction and the creation of an output file to be used by the benchmark notebook. The baseline data is [Landsat Collection 2 Level 2](https://planetarycomputer.microsoft.com/dataset/landsat-c2-l2) data from the MS Planetary Computer catalog.

**Caution**... This notebook requires significant execution time as there are 9,319 data points (unique locations and times) used for data extraction from the Landsat archive. The code takes about 7 hours to run to completion on a typical laptop computer with a typical internet connection. Lower execution times are likely possible with optimization of the data extraction process and the use of cloud computing services.


### Load In Dependencies
The following code installs the required Python libraries (found in the requirements.txt file) in the Snowflake environment to allow successful execution of the remaining notebook code. After running this code for the first time, it is required to ‚Äúrestart‚Äù the kernal so the Python libraries are available in the environment. This is done by selecting the ‚ÄúConnected‚Äù menu above the notebook (next to ‚ÄúRun all‚Äù) and selecting the ‚Äúrestart kernal‚Äù link. Subsequent runs of the notebook do not require this ‚Äúrestart‚Äù process. 

In [None]:
!pip install uv
!uv pip install  -r requirements.txt 

In [1]:
import snowflake
from snowflake.snowpark.context import get_active_session
session = get_active_session()

import warnings
warnings.filterwarnings("ignore")

# Data manipulation and analysis
import numpy as np
import pandas as pd

# Planetary Computer tools for STAC API access and authentication
import pystac_client
import planetary_computer as pc
from odc.stac import stac_load
from pystac.extensions.eo import EOExtension as eo

from datetime import date
from tqdm import tqdm
import os
import time
import re

tqdm.pandas()  

### Extracting Landsat Data Using API Calls

The API-based method allows us to efficiently access **Landsat** data for specific coordinates and time periods, ensuring scalability and reproducibility of the process.

Through the API, we can query individual bands or compute indices like **NDMI** on the fly. This approach reduces storage requirements and simplifies data preprocessing, making it ideal for large-scale environmental and water quality analysis.

The **compute_Landsat_values** function extracts Landsat surface reflectance values for specific sampling locations using a 100 m focal buffer around each point. For each location:

- A bounding box (bbox) is created around the latitude and longitude coordinates.
- The Microsoft Planetary Computer API is queried for Landsat-8 Level-2 surface reflectance imagery within the date range.
- The nearest low-cloud (<10% cloud cover) scene is selected, and the specified bands (**green**, **nir08**, **swir16**, **swir22**) are loaded.
- Median values of the pixels within the bounding box are computed to reduce the effect of noise or outliers.

**Why the buffer value is 0.00089831**

We want a ~100 m buffer around each point.  
At the equator, 1 degree ‚âà 110 km.

Therefore, the degree equivalent of 100 m is:

*buffer_deg ‚âà 100 m / 110,000 m per degree ‚âà 0.00089831*

This value ensures that the buffer approximately matches the pixel resolution of Landsat imagery, capturing a ~100 m area around each sampling location.


### Extracting features for the training dataset

In [3]:
Water_Quality_df=pd.read_csv('water_quality_training_dataset.csv')
display(Water_Quality_df.head())

In [4]:
Water_Quality_df.shape

### Note

The Landsat data extraction process for all 9,319 locations typically requires more than 7 hours when executed in a single run. During long executions, you may occasionally encounter API limits, timeout errors, or request failures. To avoid these interruptions, we recommend running the extraction in smaller batches.

In this notebook, we provide a sample code snippet demonstrating how to extract data for the first 200 locations. Participants are encouraged to follow the same batching approach to extract data for all 9,319 locations safely and efficiently.

We have already executed the full extraction for all 9,319 locations and saved the output to **landsat_features_training.csv**, which will be used in the benchmark notebook.  
Similarly, participants can extract Landsat features in batches, combine the batch outputs, and save the final merged dataset as **landsat_features_training.csv** to ensure the benchmark notebook runs smoothly.


### APIÊíàÂèñLandsatÁöÑrawÊ™îÔºàÈ≠îÊîπÂä†Á¥ÖÂ§ñÁ∑öÂÖâË≠úË≥áË®äÔºâ

In [None]:
catalog = pystac_client.Client.open(
    "https://planetarycomputer.microsoft.com/api/stac/v1",
    modifier=pc.sign_inplace,
)

def pick_first(keys, asset_keys):
    for k in keys:
        if k in asset_keys:
            return k
    return None

def compute_Landsat_raw(row):
    cols = ["blue", "green", "red", "nir", "swir16", "swir22", "st_b10"]

    try:
        lat, lon = row["Latitude"], row["Longitude"]
        sample_date = pd.to_datetime(row["Sample Date"], dayfirst=True, errors="coerce")
        if pd.isna(sample_date):
            return pd.Series({k: None for k in cols})

        sample_date_utc = (
            sample_date.tz_localize("UTC")
            if sample_date.tzinfo is None
            else sample_date.tz_convert("UTC")
        )

        bbox_size = 0.00089831 * 2
        bbox = [lon - bbox_size/2, lat - bbox_size/2, lon + bbox_size/2, lat + bbox_size/2]

        search = catalog.search(
            collections=["landsat-c2-l2"],
            bbox=bbox,
            datetime="2011-01-01/2015-12-31",
            query={"eo:cloud_cover": {"lt": 10}},
        )
        items = search.item_collection()
        if not items:
            return pd.Series({k: None for k in cols})

        selected_item = min(
            items,
            key=lambda x: abs(pd.to_datetime(x.properties["datetime"]).tz_convert("UTC") - sample_date_utc),
        )

        asset_keys = selected_item.assets.keys()

        nir_key = pick_first(["nir08", "nir"], asset_keys)
        thermal_key = pick_first(["lwir11", "lwir"], asset_keys)

        available_bands = [k for k in ["blue", "green", "red", "swir16", "swir22", nir_key, thermal_key] if k]

        data = stac_load([selected_item], bands=available_bands, bbox=bbox, chunks={}).isel(time=0)

        def get_val(band_name):
            if band_name is None or band_name not in data:
                return None
            return data[band_name].values.flatten().tolist()

        return pd.Series({
            "blue": get_val("blue"),
            "green": get_val("green"),
            "red": get_val("red"),
            "nir": get_val(nir_key),
            "swir16": get_val("swir16"),
            "swir22": get_val("swir22"),
            "st_b10": get_val(thermal_key),  # 2012‰ª•ÂâçÂ§öÂçäÊúÉÊòØ ST_B6 Â∞çÊáâÁöÑ thermal
        })

    except Exception:
        return pd.Series({k: None for k in cols})


def run_landsat_raw_in_batches(df, out_dir, batch_size=100, start_batch=0, pause_every=5, pause_seconds=2):
    os.makedirs(out_dir, exist_ok=True)
    points = df[['Latitude', 'Longitude', 'Sample Date']].drop_duplicates().reset_index(drop=True)
    n = len(points)
    n_batches = (n + batch_size - 1) // batch_size
    
    print(f"Total points: {n}, Batches: {n_batches}")

    for b in range(start_batch, n_batches):
        out_path = f"{out_dir}/landsat_raw_batch_{b:05d}.parquet"
        s, e = b * batch_size, min((b + 1) * batch_size, n)
        
        print(f"üöÄ Processing batch {b}/{n_batches-1} (rows {s}..{e-1})")
        
        batch_points = points.iloc[s:e].reset_index(drop=True)
        batch_raw = batch_points.progress_apply(compute_Landsat_raw, axis=1)
        
        pd.concat([batch_points, batch_raw], axis=1).to_parquet(out_path, index=False)
        
        if pause_every and (b - start_batch + 1) % pause_every == 0:
            time.sleep(pause_seconds)

    print("üéâ Done.")

def merge_landsat_batches(out_dir, merged_path):
    files = sorted([os.path.join(out_dir, f) for f in os.listdir(out_dir) if re.match(r"^landsat_raw_batch_\d{5}\.parquet$", f)])
    df_all = pd.concat([pd.read_parquet(f) for f in files], ignore_index=True)
    df_all.to_parquet(merged_path, index=False)
    print(f"‚úÖ Merged {df_all.shape}")
    return df_all

In [None]:
run_landsat_raw_in_batches(
    df=Water_Quality_df,
    out_dir="landsat_raw_batches",
    batch_size=100,
    start_batch=0 
    )
    
merge_landsat_batches(
    out_dir="landsat_raw_batches",
    merged_path="landsat_raw_all.parquet"
)

### ÂÑ≤Â≠òlandsatÁöÑrawÊ™îÔºåËΩâÂ≠ògithub

In [None]:
landsat_raw_all = merge_landsat_batches(
    out_dir="landsat_raw_batches",
    merged_path="/tmp/landsat_raw_all.parquet"
)

In [None]:
session.sql("""
    PUT file:///tmp/landsat_raw_all.parquet
    'snow://workspace/USER$.PUBLIC."EY-AI-and-Data-Challenge"/versions/live/'
    AUTO_COMPRESS=FALSE
    OVERWRITE=TRUE
""").collect()

print("File saved! Refresh the browser to see the files in the sidebar")

### Ê∏ÖÁêÜÊéâÊâÄÊúâÁî®APIÊíàÂèñlandsatÁöÑrawÊ™îÁöÑÊö´Â≠ò

In [None]:
import os

def clear_landsat_batch_dir(out_dir="landsat_raw_batches"):
    if not os.path.exists(out_dir):
        print(f"üìÇ Directory does not exist: {out_dir}")
        return

    files = os.listdir(out_dir)

    if not files:
        print(f"üìÇ Directory already empty: {out_dir}")
        return

    for f in files:
        path = os.path.join(out_dir, f)
        if os.path.isfile(path):
            os.remove(path)

    print(f"üßπ Cleared all files in: {out_dir}")
    
clear_landsat_batch_dir("landsat_raw_batches")

### Áî®rawÊ™îÂÅöÂêÑÁ®ÆË®àÁÆóÔºå‰ª•medianÁÇ∫‰æã

In [None]:
# --------------------------
# 1) Âü∫Á§éÔºöÊääÂÑ≤Â≠òÊ†ºÁµ±‰∏ÄËΩâÊàê 1D float array
# --------------------------
def parse_pixel_array(val):
    """
    ÊääÂÑ≤Â≠òÊ†ºÂÖßÂÆπÁµ±‰∏ÄËΩâÊàê 1D numpy array(float)„ÄÇ
    ÊîØÊè¥: None / scalar / list / np.ndarray / string like "[1,2,3]"
    """
    if val is None:
        return np.array([], dtype=float)

    # scalar (Âê´ np.nan)
    if np.isscalar(val):
        if pd.isna(val):
            return np.array([], dtype=float)
        return np.array([float(val)], dtype=float)

    # string like "[1,2,3]" or "1 2 3"
    if isinstance(val, str):
        s = val.strip().strip("[]'\"").replace(",", " ")
        s = " ".join(s.split())  # Â£ìÁ∏ÆÂ§öÈáçÁ©∫ÁôΩ
        if not s:
            return np.array([], dtype=float)
        return np.fromstring(s, sep=" ", dtype=float)

    # list/ndarray
    return np.asarray(val, dtype=float).ravel()


# --------------------------
# 2) ‰∏≠‰ΩçÊï∏ÔºàÊéíÈô§ 0Ôºâ
# --------------------------
def clean_pixel_median(val, zero_as_nan=True):
    """
    ËôïÁêÜÂñÆ‰∏ÄÂÑ≤Â≠òÊ†ºÁöÑÂÉèÁ¥†Ë≥áÊñôÔºåË®àÁÆó‰∏≠‰ΩçÊï∏‰∏¶ÂèØÈÅ∏ÊìáÊéíÈô§ 0„ÄÇ
    """
    try:
        arr = parse_pixel_array(val)
        if arr.size == 0:
            return np.nan

        med = float(np.nanmedian(arr))
        if zero_as_nan and med == 0:
            return np.nan
        return med
    except Exception:
        return np.nan


# --------------------------
# 3) ÂÖâË≠úÊåáÊï∏ÔºàÁî® med_ Ê≥¢ÊÆµÔºâ
# --------------------------
def calculate_spectral_indices(df, prefix="med_", eps=1e-10):
    """
    Ê†πÊìö‰∏≠‰ΩçÊï∏Ê≥¢ÊÆµË®àÁÆóÈÅôÊ∏¨ÊåáÊï∏Ôºàin-place ÂØ´Âõû dfÔºâ„ÄÇ
    È†êÊúüÊ¨Ñ‰ΩçÔºömed_blue, med_green, med_red, med_nir, med_swir16
    """
    # Ëã•Áº∫Ê¨Ñ‰ΩçÔºåÁõ¥Êé•Áî¢Áîü NaN Ê¨Ñ‰ΩçËÄå‰∏çÂô¥ÈåØ
    def col(name):
        return df[name] if name in df.columns else np.nan

    b   = col(f"{prefix}blue")
    g   = col(f"{prefix}green")
    r   = col(f"{prefix}red")
    n   = col(f"{prefix}nir")
    s16 = col(f"{prefix}swir16")

    df[f"{prefix}NDMI"]  = (n - s16) / (n + s16 + eps)
    df[f"{prefix}MNDWI"] = (g - s16) / (g + s16 + eps)
    df[f"{prefix}NDVI"]  = (n - r)   / (n + r   + eps)

    df[f"{prefix}NDTI"]  = (r - g)   / (r + g   + eps)
    df[f"{prefix}NDBI"]  = (s16 - n) / (s16 + n + eps)

    df[f"{prefix}UrbanScore"] = df[f"{prefix}NDBI"] - df[f"{prefix}NDVI"]

    # EVIÔºàÂ∏∏Ë¶ãÂÖ¨ÂºèÔºâ
    df[f"{prefix}EVI"] = 2.5 * (n - r) / (n + 6*r - 7.5*b + 1 + eps)

    return df


# --------------------------
# 4) ÂÆâÂÖ®Áõ∏Èóú‰øÇÊï∏
# --------------------------
def safe_corr(a, b):
    """
    Ë®àÁÆó corr(a,b)ÔºåËá™ÂãïËôïÁêÜÈï∑Â∫¶‰∏çÂêå„ÄÅnan„ÄÅÂ∏∏Êï∏ÂêëÈáèÁ≠âÊÉÖÊ≥Å„ÄÇ
    """
    a = np.asarray(a, dtype=float).ravel()
    b = np.asarray(b, dtype=float).ravel()

    m = min(a.size, b.size)
    if m < 2:
        return np.nan

    a = a[:m]
    b = b[:m]

    mask = np.isfinite(a) & np.isfinite(b)
    a = a[mask]
    b = b[mask]
    if a.size < 2:
        return np.nan
    if np.std(a) == 0 or np.std(b) == 0:
        return np.nan

    return float(np.corrcoef(a, b)[0, 1])


# --------------------------
# 5) ÂÉèÁ¥†Â±§Á¥öÁâπÂæµ
# --------------------------
def compute_pixel_level_features(
    row,
    eps=1e-10,
    ndvi_thr=0.3,
    ndbi_thr=0.0,
    mndwi_thr=0.0,
    ndti_thr=0.2,
    hot_q=0.75
):
    """
    Âæû raw_df ÁöÑ pixel arrays Ë®àÁÆó ratio / corr / LST stats„ÄÇ
    """

    # --- ÂèñÂá∫ arrays ---
    b   = parse_pixel_array(row.get("blue"))
    g   = parse_pixel_array(row.get("green"))
    r   = parse_pixel_array(row.get("red"))
    n   = parse_pixel_array(row.get("nir"))
    s16 = parse_pixel_array(row.get("swir16"))
    lst = parse_pixel_array(row.get("st_b10"))  # LST/ÁÜ±Á¥ÖÂ§ñ band pixel array

    # --- Â∞çÈΩäÔºöÁî®ÊúÄÁü≠Èï∑Â∫¶Êà™Êñ∑Ôºà‰øÆÊ≠£‰Ω†ÁöÑ align_min bugÔºâ---
    def align_min(*arrs):
        sizes = [a.size for a in arrs]
        m = min(sizes) if sizes else 0
        if m <= 0:
            return [np.array([], dtype=float) for _ in arrs]
        return [a[:m] for a in arrs]

    # median_LSTÔºàÊää 0 Áï∂ÁÑ°ÊïàÔºâ
    if lst.size:
        median_LST = float(np.nanmedian(lst))
        if median_LST == 0:
            median_LST = np.nan
    else:
        median_LST = np.nan

    # --- pixel-level indicesÔºàÂÖàÂ∞çÈΩäÂÜçÁÆóÔºåÈÅøÂÖç broadcast ÂïèÈ°åÔºâ---
    n_a, r_a = align_min(n, r)
    ndvi = (n_a - r_a) / (n_a + r_a + eps)

    g_a, s16_a = align_min(g, s16)
    mndwi = (g_a - s16_a) / (g_a + s16_a + eps)

    s16_b, n_b = align_min(s16, n)
    ndbi = (s16_b - n_b) / (s16_b + n_b + eps)

    r2_a, g2_a = align_min(r, g)
    ndti = (r2_a - g2_a) / (r2_a + g2_a + eps)

    # --- ratiosÔºàÂ∏ÉÊûóÂπ≥Âùá = ÊØî‰æãÔºâ---
    water_ratio = float(np.nanmean(mndwi > mndwi_thr)) if mndwi.size else np.nan
    veg_ratio   = float(np.nanmean(ndvi  > ndvi_thr))  if ndvi.size else np.nan
    bare_ratio  = float(np.nanmean(ndti  > ndti_thr))  if ndti.size else np.nan

    # urban_ratioÔºöÈúÄË¶Å ndvi Ëàá ndbi ÂêåÈï∑Â∫¶
    ndbi_a, ndvi_a = align_min(ndbi, ndvi)
    urban_ratio = (
        float(np.nanmean((ndbi_a > ndbi_thr) & (ndvi_a < ndvi_thr)))
        if ndbi_a.size else np.nan
    )

    # --- hot_ratioÔºöË©≤ row ÁöÑ LST ÂàÜ‰ΩçÊï∏‰ª•‰∏äÊØî‰æã ---
    if lst.size:
        q = np.nanquantile(lst, hot_q)
        hot_ratio = float(np.nanmean(lst > q)) if np.isfinite(q) else np.nan
    else:
        hot_ratio = np.nan

    # --- correlations with LST ---
    corr_ndvi_lst  = safe_corr(ndvi,  lst)
    corr_ndbi_lst  = safe_corr(ndbi,  lst)
    corr_mndwi_lst = safe_corr(mndwi, lst)

    return pd.Series({
        "median_LST": median_LST,
        "urban_ratio": urban_ratio,
        "water_ratio": water_ratio,
        "veg_ratio": veg_ratio,
        "bare_ratio": bare_ratio,
        "hot_ratio": hot_ratio,
        "corr_NDVI_LST": corr_ndvi_lst,
        "corr_NDBI_LST": corr_ndbi_lst,
        "corr_MNDWI_LST": corr_mndwi_lst,
    })


# --------------------------
# 6) ‰∏ª pipeline
# --------------------------
def process_landsat_pipeline(raw_df, band_cols=None, prefix="med_"):
    """
    ‰∏ªËôïÁêÜÊµÅÁ®ãÔºö
    (1) Ë®àÁÆóÂêÑÊ≥¢ÊÆµ‰∏≠‰ΩçÊï∏ -> med_*
    (2) Áî® med_* Ë®àÁÆóÂÖâË≠úÊåáÊï∏
    (3) Áî® pixel arrays Ë®àÁÆóÂÉèÁ¥†Â±§Á¥öÁâπÂæµ
    """
    if band_cols is None:
        band_cols = ["blue", "green", "red", "nir", "swir16", "swir22", "st_b10"]

    # Âü∫Êú¨Ê¨Ñ‰Ωç
    base_cols = [c for c in ["Latitude", "Longitude", "Sample Date"] if c in raw_df.columns]
    features_df = raw_df[base_cols].copy()

    print(f"--- 1) Ë®àÁÆóÂêÑÊ≥¢ÊÆµ‰∏≠‰ΩçÊï∏ ({len(band_cols)} bands) ---")
    for col in band_cols:
        if col in raw_df.columns:
            features_df[f"{prefix}{col}"] = raw_df[col].apply(clean_pixel_median)
            print(f"   Done: {prefix}{col}")
        else:
            # Ê¨Ñ‰Ωç‰∏çÂ≠òÂú®‰πüË£ú‰∏äÔºåÈÅøÂÖçÂæåÈù¢ÁÆóÊåáÊï∏ KeyError
            features_df[f"{prefix}{col}"] = np.nan
            print(f"   Missing col -> fill NaN: {prefix}{col}")

    print(f"--- 2) Ë®àÁÆóÂÖâË≠úÊåáÊï∏ÔºàÁî® {prefix} Ê≥¢ÊÆµÔºâ ---")
    features_df = calculate_spectral_indices(features_df, prefix=prefix)

    print(f"--- 3) Ë®àÁÆóÂÉèÁ¥†Â±§Á¥öÁâπÂæµÔºàratios / corr / hot_ratioÔºâ ---")
    pixel_feat_df = raw_df.apply(compute_pixel_level_features, axis=1)
    features_df = pd.concat([features_df, pixel_feat_df], axis=1)

    print("--- ËôïÁêÜÂÆåÊàê ---")
    return features_df

In [None]:
# ==========================================
# ÂØ¶ÈöõÂü∑Ë°å (Main Execution)
# ==========================================

# 1. ËºâÂÖ•Ë≥áÊñô
path = "landsat_raw_all.parquet"
landsat_raw = pd.read_parquet(path)


# 2. Âü∑Ë°å Pipeline
landsat_train_features = process_landsat_pipeline(landsat_raw)

# 3. ÂÑ≤Â≠òÁµêÊûú
output_path = "/tmp/landsat_features_training_jin3.csv"
landsat_train_features.to_csv(output_path, index=False)
print(f"ÁµêÊûúÂ∑≤ÂÑ≤Â≠òËá≥: {output_path}")

# 4) ‰∏äÂÇ≥Ëá≥ Snowflake Stage
session.sql(f"""
    PUT file://{output_path}
    'snow://workspace/USER$.PUBLIC."EY-AI-and-Data-Challenge-2"/versions/live/'
    AUTO_COMPRESS=FALSE
    OVERWRITE=TRUE
""").collect()

print("File saved and uploaded! Refresh the browser to see the files in the sidebar.")

# È†êË¶ΩÁµêÊûú
landsat_train_features.head()


**NDMI and MNDWI Indices**

In this notebook, we compute two commonly used water-related indices from the extracted Landsat bands:

- **NDMI (Normalized Difference Moisture Index):**  
  Measures vegetation water content and surface moisture.  
  Computed as *(NIR - SWIR16) / (NIR + SWIR16)*.

- **MNDWI (Modified Normalized Difference Water Index):**  
  Highlights open water features by enhancing water reflectance and suppressing built-up areas.  
  Computed as *(Green - SWIR16) / (Green + SWIR16)*.

An **epsilon value** (*eps = 1e-10*) is added to the denominators to avoid division by zero.  
These indices are widely used in hydrological and water quality analyses for detecting water presence and vegetation moisture levels.


**Note:** If you're using your own workspace, remember to replace "EY-AI-and-Data-Challenge" with your workspace name in the file path.

### Extracting features for the validation dataset

In [11]:
Validation_df=pd.read_csv('submission_template.csv')
display(Validation_df.head())

In [12]:
Validation_df.shape


In [None]:
# -----------------------------
# Âü∑Ë°å
# -----------------------------
run_landsat_raw_in_batches(
        df=Validation_df,
        out_dir="landsat_validation_raw_batches",
        batch_size=100,
        start_batch=0 
    )
    
merge_landsat_batches(
        out_dir="landsat_validation_raw_batches",
        merged_path="landsat_validation_raw_all.parquet"
    )

In [None]:
merge_landsat_batches(
        out_dir="landsat_validation_raw_batches",
        merged_path="/tmp/landsat_validation_raw_all.parquet"
    )
session.sql("""
    PUT file:///tmp/landsat_validation_raw_all.parquet
    'snow://workspace/USER$.PUBLIC."EY-AI-and-Data-Challenge-2"/versions/live/'
    AUTO_COMPRESS=FALSE
    OVERWRITE=TRUE
""").collect()

print("File saved! Refresh the browser to see the files in the sidebar")

In [None]:
# 1. ËºâÂÖ•Ë≥áÊñô
landsat_validation_raw = pd.read_parquet("landsat_validation_raw_all.parquet")

# 2. Âü∑Ë°å Pipeline
landsat_validation_features = process_landsat_pipeline(landsat_validation_raw)

# 3. ÂÑ≤Â≠òÁµêÊûú
output_path = "/tmp/landsat_features_validation_200m.csv"
landsat_validation_features.to_csv(output_path, index=False)
print(f"ÁµêÊûúÂ∑≤ÂÑ≤Â≠òËá≥: {output_path}")

# 4) ‰∏äÂÇ≥Ëá≥ Snowflake Stage
session.sql(f"""
    PUT file://{output_path}
    'snow://workspace/USER$.PUBLIC."EY-AI-and-Data-Challenge-2"/versions/live/'
    AUTO_COMPRESS=FALSE
    OVERWRITE=TRUE
""").collect()

print("File saved and uploaded! Refresh the browser to see the files in the sidebar.")

# È†êË¶ΩÁµêÊûú
landsat_validation_features.head()


**Note:** If you're using your own workspace, remember to replace "EY-AI-and-Data-Challenge" with your workspace name in the file path.