## **Setup (install, import library, authorization setup)**

In [None]:
#!pip -q install earthengine-api geemap geopandas shapely fiona pandas numpy

In [None]:
import ee, geemap, pandas as pd, numpy as np
import geopandas as gpd, zipfile, os
from operator import index
import matplotlib.pyplot as plt

**Connect Google Colab to your Google Drive and initialize Earth Engine.**

**Why**: Many workflows need file access from Drive and satellite data access from Google Earth Engine (EE).

**How**:

*    Mount Drive (drive.mount) → lets Colab read/write your Drive files.
*    Authenticate Earth Engine (ee.Authenticate()) → grants access to your EE account.
*    Initialize Earth Engine with your project name (ee.Initialize(project="...")).

🔑 Make sure to replace "robust-doodad-443716-a5" with your own Google Cloud project linked to Earth Engine.


In [None]:

from google.colab import drive
drive.mount('/content/drive')
print ("connected to google drive")
ee.Authenticate()       # follow the link in Colab
ee.Initialize(project = "robust-doodad-443716-a5") # replace project name
print("Earth Engine ready.")

## **NLDAS: Download and Export Dataset**



*   Collection:INLDAS-2: North American Land Data Assimilation System Forcing Fields;
https://developers.google.com/earth-engine/datasets/catalog/NASA_NLDAS_FORA0125_H002

*   Bands we’ll use: temperature (°C), pressure (Pa), total_precipitation (mm),shortwave_radiation (W/m^2)
*   Compute average hourly NLDAS data within ROI
*   Export results as CSV to google drive








 **FIRST PARAMETER SETUP**
*   Define Timeframe:  Set start_date and end_date for filtering datasets.
*   Define ROI: a point (lon, lat) and create a circular buffer (radius_m) around it. This buffer is your ROI.




In [None]:
start_date = '2023-04-01'   # <-- edit
end_date   = '2023-10-31'   # <-- edit

lon, lat = -100.7629, 41.08679  # <-- edit
radius_m = 20  # <-- edit

pt = ee.Geometry.Point([lon, lat])
roi = pt.buffer(radius_m)  # circular buffer

**LOAD NLDAS DATA**

We now load the **NLDAS-2 Forcing Fields** dataset (`NASA/NLDAS/FORA0125_H002`) and prepare it for analysis.  

- Filters the ImageCollection by the study period (`start_date` → `end_date`).  
- Renames bands to short, consistent labels.  
- Adds a `date` property (human-readable timestamp) to each image.  
- Maps this transformation across the entire ImageCollection.  
- Performs a **quick check**: count images and inspect the first image’s band names.  




In [None]:
# load IC and format the dataset
ic = (ee.ImageCollection('NASA/NLDAS/FORA0125_H002')
    .filterDate(start_date, end_date)
    .select(['temperature','pressure','total_precipitation','shortwave_radiation']))

# Rename the bands and stamp image with date property
def to_phys(img):
  ta = img.select('temperature').rename('ta')
  pa = img.select('pressure').rename('pa')
  ppt = img.select('total_precipitation').rename('ppt')
  swin = img.select('shortwave_radiation').rename('swin')
  out = ee.Image.cat([ta,pa,ppt,swin]) \
                .set('date', ee.Date(img.get('system:time_start')).format('YYYY-MM-dd HH:mm:ss'))
  return out

hourly_ic = ic.map(to_phys) #get image collection

# quick print to check dataset
n = hourly_ic.size().getInfo()
print(f'NLDAS hourly images: {n}')
if n == 0:
  raise ValueError('No NLDAS images found. Check dates/dataset ID.')

first = ee.Image(hourly_ic.first())
print('First day bands:', first.bandNames().getInfo())



**Reduce ImageCollection to ROI Averages**

We want to compute the **mean values of each band** within the Region of Interest (ROI) for every image in our hourly collection.  

- Uses `reduceRegion` to calculate the spatial average (mean) of each band inside the ROI.  
- Attaches a `datetime_utc` string and a `timestamp_ms` value so we can build a time series later.  
- Converts each image into an `ee.Feature` (a row of data).  
- Collects all features into a `FeatureCollection` (`fc`).  


This step transforms the image stack into a **tabular dataset** that’s ready to export or convert into a pandas DataFrame.  


In [None]:
# calculate average within roi
def reduce_one(img):
    # mean of each band over the ROI
    stats = img.reduceRegion(
        reducer=ee.Reducer.mean(),
        geometry=roi,
        scale=1000,
        maxPixels=1e13,
        tileScale=4
    )
    # Attach time as properties
    time_start = img.date()  # ee.Date
    return ee.Feature(None, stats)\
        .set('datetime_utc', time_start.format('YYYY-MM-dd HH:mm'))\
        .set('timestamp_ms', time_start.millis())

fc = ee.FeatureCollection(hourly_ic.map(reduce_one))
print("Feature count:", fc.size().getInfo())

# quick print to check dataset
print(fc.limit(3).getInfo())

**Export Hourly Weather Dataset to Google Drive**


- Uses `ee.batch.Export.table.toDrive` to save the FeatureCollection.  
- File format: CSV.  
- Destination: Google Drive folder (`GEE_irrigation`).  
- You can **monitor progress** in the Earth Engine Task tab.

In [None]:


task_NLDAS = ee.batch.Export.table.toDrive(
    collection = fc,
    description = 'NLDAS',
    fileFormat = 'csv',
    folder= 'GEE_irrigation'
)
task_NLDAS.start()
print("NLDAS export started → check Earth Engine 'Tasks' tab or your Drive/EE folder.")


## **DATA ORGANIZATION and VISUALIZATION**
-  Extract solar noon hours (e.g., 11:00–13:00 local time) from the hourly NLDAS data, since these hours typically capture peak energy balance conditions.  
-  Merge NLDAS with Tcns observations to build a combined dataset ready for model calibration/validation.  
-  Visualize the data to check.   




In [None]:
df = pd.read_csv('/content/drive/MyDrive/GEE_irrigation/2023_NLDAS.csv')
df['datetime_utc'] = pd.to_datetime(df['datetime_utc'], format='%Y-%m-%d %H:%M')
df = df.set_index('datetime_utc')
df.index = pd.to_datetime(df.index)   # ensure it's datetime type
df['date'] = df.index.date            # just the date part (yyyy-mm-dd)
df['hour'] = df.index.hour            # hour of day (0–23)
df.head()

#extract solar noon weather data
def filter_solar_hour(data):
    data['hour'] = data.index.hour
    data['date'] = data.index.date
    df_filtered = data[(data['hour'] >= 11) & (data['hour'] <= 13)]
    return df_filtered

df_weather = filter_solar_hour(df)
print(df_weather.head())
df_weather.to_csv('/content/drive/MyDrive/GEE_irrigation/NLDAS_solar_weather_2023.csv')



**Conduct a quick visual check of NLDAS weather data**

In [None]:
# check the air temperature patter
plt.figure(figsize=(10, 5)) # Define the plot size
plt.plot(df.index, df["ta"], label="air temperature", color="green") # (X = df.index,Y = df["ta"])
plt.xlabel("Date")
plt.ylabel("Air Temperature")
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.show()





**Merge with Tcns**
*   Load Tcns file and convert the date and hour format aligning with NLDAS
*   Merge NLDAS and Tcns by date and hour
*   Save the merged data



In [None]:
# merge with Tcns
df_tcns = pd.read_csv('/content/drive/MyDrive/GEE_irrigation/2023_Tcns.csv',index_col=0)
df_tcns.head()

# Convert to datetime first
df_weather['date'] = pd.to_datetime(df_weather['date']).dt.strftime("%Y-%m-%d")
df_tcns['date'] = pd.to_datetime(df_tcns['date']).dt.strftime("%Y-%m-%d")

# Ensure hour is integer
df_weather['hour'] = df_weather['hour'].astype(int)
df_tcns['hour'] = df_tcns['hour'].astype(int)

# === Merge on date + hour ===
df_merged = pd.merge(
    df_weather[['date','hour','pa','ppt','swin','ta']],  # keep weather cols
    df_tcns[['date','hour','Tcns']],                     # keep Tcns only
    on=['date','hour'],
    how='inner'   # use 'left' if you want all weather rows, even without Tcns
)

# === Save to CSV ===
df_merged.to_csv("/content/drive/MyDrive/GEE_irrigation/NLDAS_solar_weather_2023_Tcns.csv", index=False)

print("✅ Merge complete:", df_merged.shape, "rows")
print(df_merged.head())