# Machine Learning and Statistics with GEE

This notebook demonstrates how to bridge the gap between Google Earth Engine and the Python scientific stack (`scikit-learn`, `numpy`, `xarray`) for advanced analytics.

In [None]:
import ee
import xarray as xr
import xee
import numpy as np
from sklearn.ensemble import RandomForestRegressor
import matplotlib.pyplot as plt

# Initialize Earth Engine
ee.Initialize()

## Data Preparation

We load multi-temporal data and convert it to an Xarray dataset for processing.

In [None]:
l8 = ee.ImageCollection("LANDSAT/LC08/C02/T1_L2") \
    .filterDate('2021-01-01', '2021-12-31') \
    .filterBounds(ee.Geometry.Point([77.59, 12.97])) \
    .select(['SR_B4', 'SR_B5']) # Red and NIR

ds = xr.open_dataset(
    l8,
    engine='ee',
    scale=100,
    geometry=ee.Geometry.Point([77.59, 12.97]).buffer(5000).bounds()
)

## Statistical Analysis

Using Xarray, we can easily calculate indices and statistics like NDVI.

In [None]:
ndvi = (ds.SR_B5 - ds.SR_B4) / (ds.SR_B5 + ds.SR_B4)
ndvi_mean = ndvi.mean(dim='time')

plt.figure(figsize=(10, 6))
ndvi_mean.plot(cmap='RdYlGn')
plt.title("Mean NDVI - 2021")
plt.show()

## Machine Learning (Scikit-Learn)

To use Scikit-learn, we reshape the multidimensional array into a tabular format.

In [None]:
# Flatten the spatial dimensions into samples
data_matrix = ds.to_array().values.reshape(2, -1).T
data_matrix = data_matrix[~np.isnan(data_matrix).any(axis=1)] # Remove NaNs

print(f"Data shape for ML: {data_matrix.shape}")

# Simple cluster analysis or regression example
# (In a real scenario, you would have target values Y)
model = RandomForestRegressor(n_estimators=10)
# model.fit(X_train, y_train)