# Accessing Salient's forecasts through Cloudflare

Salient has made its native forecasts available for North America for the `v9` modeling and its GemAI v2 model released in October 2025.

This notebook is to show how to gain direct access to the forecasts through their Zarr stores so you can analyze, backtest, and plug the data into your own processes without needing to make API requests for every forecast init date.

Access to these forecasts are provided by Salient using a customized URL and access keys. This notebook assumes that these are stored in your run environment as:

- `SALIENT_DIRECT_URL`: A url like `https://{url_string}.r2.cloudflarestorage.com`
- `SALIENT_DIRECT_ID`: The access key ID
- `SALIENT_DIRECT_SECRET` The access key secret

In [None]:
import os
import sys

try:
    import salientsdk as sk
except ModuleNotFoundError as e:
    if os.path.exists("../salientsdk"):
        sys.path.append(os.path.abspath(".."))
        import salientsdk as sk
    else:
        raise ModuleNotFoundError("Install salient SDK with: pip install salientsdk")

sk.set_file_destination("direct-access")
sk.login("SALIENT_USERNAME", "SALIENT_PASSWORD")

<requests.sessions.Session at 0x71d257516410>

# Dataset access

If you're already comfortable with `xarray`, you may just want access to the `dask`-backed datasets for an entire region or the `global` region. Each request requires the creation of a `location` object. You can choose any of `region` in `["africa", "asia", "europe", "global", "north-america", "russia", "south-america", "south-pacific"]`.

In [None]:
loc_region = sk.Location(region="global")

### Accessing GEM forecasts

The GEM datasets are provided in their native format from `2020-01-01` to the current forecast release and have support for the following values:

- `variable`: `["cc", "cdd", "dewpoint", "hdd", "heat_index", "hgt500", "mslp", "rh", "precip", "temp", "tmax", "tmin", "tsi", "wind_chill", "wgst", "wspd", "wspd100"]`
- `field`: `["vals_ens", "vals"]` where `_ens` returns the 200 ensemble member values while `vals` return the data in the same quantile space as `v9`.
- `model`: `["gem"]`
- `timescale`: `["daily"]`

In [None]:
ds = sk.ForecastZarr(
    location=loc_region,
    variable=["tmin", "hdd", "wind_chill"],
    field="vals_ens",
    model="gem",
    timescale="daily",
    start="2025-01-01",
    end="today",
    # If you already have your credentials in your environment, you can omit the following
    # as these credentials variables will be loaded automatically
    # key_id=os.environ.get("SALIENT_KEY_ID", "your_key_id"),
    # key_secret=os.environ.get("SALIENT_KEY_SECRET", "your_key_secret"),
    # direct_url=os.environ.get("SALIENT_DIRECT_URL", "your_direct_url"),
).open_dataset()
ds

You can see there is `num_leads` coordinate in the dataset. This is because that the number of daily leads differs between the reforecast and operational dataset.

From 2020-01-01 to 2025-09-30, the forecast has 46 daily leads unless the forecast was made on a Monday, in which case it has 126 leads. From 2025-10-01 onward, each forecast date has 126 lead days.

If you are performing an analyis on timescales longer than 46 days during the reforecast period, this allows you then to select out only those forecast dates which have 126 lead days.

In [None]:
ds.where(ds.num_leads == 126, drop=True)

### Accessing v9 forecasts

The following values are supported:
- `variable`: `["temp", "precip", "wspd", "tsi", "cdd", "hdd"]`
- `field`: `["anom", "vals"]`
- `model`: `["blend", "noaa_gefs", "ecmwf_ens", "ecmwf_seas5", "truth"]`
- `timescale`: `["sub-seasonal", "seasonal", "long-range"]`

Currently, only the forecasts for the `north-america` region are made available for direct access.

In [None]:
loc_region = sk.Location(region="north-america")
ds_v9 = sk.ForecastZarr(
    location=loc_region,
    variable="temp",
    field=["anom", "vals"],
    model="blend",
    timescale="sub-seasonal",
    # If you already have your credentials in your environment, you can omit the following
    # as these credentials variables will be loaded automatically
    # key_id=os.environ.get("SALIENT_KEY_ID", "your_key_id"),
    # key_secret=os.environ.get("SALIENT_KEY_SECRET", "your_key_secret"),
    # direct_url=os.environ.get("SALIENT_DIRECT_URL", "your_direct_url"),
).open_dataset()
ds_v9

# Accessing location-specific values

You can use `location` files in order to get forecasts at specifc locations, and similar to compute `anom` it generates a large `dask` graph and can take a while to compute.

In [None]:
loc_file = sk.upload_location_file(
    lats=[37.7749, 33.9416, 32.7336],
    lons=[-122.4194, -118.4085, -117.1897],
    names=["SFO", "LAX", "SAN"],
    geoname="CA_Airports",
)
loc = sk.Location(location_file=loc_file)

In [None]:
ds = sk.ForecastZarr(
    location=loc,
    variable="tmax",
    field="vals_ens",
    model="gem",
    timescale="daily",
    start="2025-10-01",
    # end="2025-01-31",
    # If you already have your credentials in your environment, you can omit the following
    # as these credentials variables will be loaded automatically
    # key_id=os.environ.get("SALIENT_KEY_ID", "your_key_id"),
    # key_secret=os.environ.get("SALIENT_KEY_SECRET", "your_key_secret"),
    # direct_url=os.environ.get("SALIENT_DIRECT_URL", "your_direct_url"),
).open_dataset()
ds

So if you are looking to compute for a large number of locations it can be helpful to reduce the dataset size down before grabbing data for specific locations.
An easy way to illustrate this is with seasonal variables, e.g., an interest in `tmax/cdd/heat_index` only in summer or warm months.
If this is the case, it's beneficial to grab data for the whole region, subselect the desired forecast dates, and then get the data at the desired locations.

In [None]:
import pandas as pd

loc_region = sk.Location(region="global")
fz = sk.ForecastZarr(
    location=loc_region,
    variable=["tmax", "cdd", "heat_index"],
    field="vals_ens",
    model="gem",
    timescale="daily",
    start="2024-04-01",
    end="today",
    # If you already have your credentials in your environment, you can omit the following
    # as these credentials variables will be loaded automatically
    # key_id=os.environ.get("SALIENT_KEY_ID", "your_key_id"),
    # key_secret=os.environ.get("SALIENT_KEY_SECRET", "your_key_secret"),
    # direct_url=os.environ.get("SALIENT_DIRECT_URL", "your_direct_url"),
)
ds = fz.open_dataset()

forecast_dates = [
    fd for fd in ds.forecast_date.values if pd.Timestamp(fd).month in [5, 6, 7, 8, 9]
]
ds_summer = ds.sel(forecast_date=forecast_dates)
ds_summer

In [None]:
loc_df = pd.read_csv(loc_file)
lat, lon = fz.make_coords_dataarrays(
    lon=loc_df["lon"], lat=loc_df["lat"], names=loc_df["name"]
)  # This creates xarray DataArrays for lat and lon
# By default this does linear interpolation, if you just want to get the grid value you can do `method="nearest"`, which will be much faster.
ds_summer_locs = fz.interp_to_coords(ds_summer, lat=lat, lon=lon)
ds_summer_locs