<a href="https://colab.research.google.com/github/sinajahangir/Cload-Data-Retrieval/blob/main/NLDASRetrieval_v1.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

First version: August 2025
Sina Jahangir

Downloads and processes North American Land Data Assimilation System (NLDAS-2) data using Google Earth Engine (GEE) API.

The data are in 1/8th-degree grid spacing; the temporal resolution is hourly.

Extracts time-series data for points (x,y)

Saves results to a Pandas DataFrame for analysis or plotting

Reference:
NLDAS is a collaboration project among several groups: NOAA/NCEP''s Environmental Modeling Center (EMC), NASA''s Goddard Space Flight Center (GSFC), Princeton University, the University of Washington, the NOAA/NWS Office of Hydrological Development (OHD), and the NOAA/NCEP Climate Prediction Center (CPC). NLDAS is a core project with support from NOAA''s Climate Prediction Program for the Americas (CPPA).

Key Features:

✅ Custom data pre-process (feature selection)

✅ Modular fraemwrok for efficiency and reproducibility

✅ GEE utilization through Google Cloud

MIT License
Copyright (c) [2025] [Sina Jahangir]

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

# Install dependencies

In [None]:
# Install libraries
!pip install earthengine-api # library used to access ee
# Library import
import ee
import pandas as pd
import os
import geopandas as gpd
import json



# Access GEE

In [None]:
# Initialization
## Google Authentication
ee.Authenticate()  # Authenticate with your Google account
## Google Earth's API
# Initialize the library
#change based on your defined project on Google cloud
ee.Initialize(project='earthengine-433017')

print(ee.String('Hello from the Earth Engine servers!').getInfo())
## Access Google drive
from google.colab import drive
drive.mount('/content/drive')

Hello from the Earth Engine servers!
Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).


# Set settings

In [None]:
#Change directory to where Shapefiles are saved
# This is for data retrieval based on shapefile
os.chdir('/content/drive/MyDrive/Bio Runoff data') #change this

In [None]:
# Check if the directory exists before creating it to save the results
'''
Unfortunately, Google Earth Engine (GEE) Export.table.toDrive does (see below) not support subfolder paths
(like 'parent/child') in the folder parameter.
The folder must be a top-level folder in your Google Drive (My Drive),
and it will not create nested folders.
'''
savefolder_name='Bio Runoff data'
if not os.path.exists('/content/drive/MyDrive/%s'%(savefolder_name)):
    os.mkdir('/content/drive/MyDrive/%s'%(savefolder_name))
else:
    print("Directory already exists.")

Directory already exists.


# Data retrieval functions

In [None]:
from datetime import datetime, timedelta
def get_dataset_for_point(lat, lon, start_date='1980-01-01', end_date='2023-12-31',save_path='Example',chunk='year'):
    """Retrieve and filter the DayMet dataset for a specific time range and location."""
    variables=[
            "temperature",          # Air temperature at 2m (K)
            "specific_humidity",    # Specific humidity (kg/kg)
            "wind_u",               # Zonal wind (m/s)
            "wind_v",               # Meridional wind (m/s)
            "pressure",             # Surface pressure (Pa)
            "shortwave_radiation",  # Downward shortwave radiation (W/m^2)
            "longwave_radiation",   # Downward longwave radiation (W/m^2)
            "total_precipitation",         # Precipitation (kg/m^2)
            'potential_evaporation',       # Potential evaporation (kg/m^2/s)
        ]
    # Define the point of interest
    point = ee.Geometry.Point(lon, lat)

    # Helper: chunk time range
    start = datetime.strptime(start_date, "%Y-%m-%d")
    end = datetime.strptime(end_date, "%Y-%m-%d")

    if chunk == "year":
        step = timedelta(days=365)
    elif chunk == "month":
        step = timedelta(days=30)
    else:
        raise ValueError("chunk must be 'year' or 'month'")

    dfs = []
    current = start
    while current < end:
        next_date = min(end, current + step)

        dataset = (
            ee.ImageCollection("NASA/NLDAS/FORA0125_H002")
            .filterDate(current.strftime("%Y-%m-%d"), next_date.strftime("%Y-%m-%d"))
            .select(variables)
        )

        try:
            raw = dataset.getRegion(point, scale=12500).getInfo()
            df = pd.DataFrame(raw[1:], columns=raw[0])
            df["time"] = pd.to_datetime(df["time"], unit="ms")
            df = df.sort_values("time").reset_index(drop=True)
            drop_cols = [c for c in ["longitude", "latitude", "id"] if c in df.columns]
            df = df.drop(columns=drop_cols)
            dfs.append(df)
        except Exception as e:
            print(f"⚠️ Skipped {current} to {next_date}: {e}")

        current = next_date

    # Combine all chunks
    if dfs:
        full_df = pd.concat(dfs).drop_duplicates(subset="time").sort_values("time").reset_index(drop=True)
        full_df.to_csv(f'{save_path}_NLDAS.csv', index=False)
        return full_df
    else:
        raise RuntimeError("No data retrieved.")

# Extract and save the data as CSV file(s)

In [None]:
df=pd.read_csv('SitesInfo.csv')
for ii in range(5,6):
  site=df.loc[ii,'Site']
  print(f'Download started for {site}')
  get_dataset_for_point(df.loc[ii,'lat'], df.loc[ii,'long'], start_date='1980-01-01', end_date='2023-12-31',save_path=df.loc[ii,'Site'])
  print(f'Download completed for {site}')

Download started for B
Download completed for B
