# Using PelicanFS via FSSpec to Access Data on the OSDF

---

## Overview
Now that you've learned about the OSDF, you may be wondering how you can easily access that data from within a notebook using python.

You can do this using `pelicanfs`, which is an `FSSPec` implementation of the Pelican client

### This notebook will contain:

1. A brief explanation of FSSPec and PelicanFS
1. A real-world example using FSSPec, Pelican, XArray, and Zarr
1. Other common access patterns
1. FAQs

## Prerequisites

To better understand this notebook, please familiarize yourself with the following concepts:

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to OSDF](http://projectpythia.org/osdf-cookbook/notebooks/osdf-intro/) | Necessary | |
| [Understanding of XArray](https://foundations.projectpythia.org/core/xarray/) | Helpful | To better understand the example workflow |
| [Overview of FSSpec](https://filesystem-spec.readthedocs.io/en/latest/) | Helpful | To better understand the FSSpec library |

- **Time to learn**: 20-30 minutes

---

## Imports

In [None]:
import xarray as xr
import numpy as np
import matplotlib.pyplot as plt
import cartopy.crs as ccrs
import cartopy.feature as cfeature
import metpy.calc as mpcalc
from metpy.units import units
import fsspec

## What are PelicanFS and FSSPec?

First, let's understand PelicanFS and how it integrates with FSSpec

### FSSPec

FileSystem Spec (fsspec) is a python library which endeavors to provide a unified interface to many different storage backends. This includes, but is not limited to, posix, https, and s3. It's used by various data processing libraries such as `xarray`, `pandas`, and `intake`, just to name a few.

To learn more about FSSPec, visit its [information page](https://filesystem-spec.readthedocs.io/en/latest/)

#### Protocols

FSSpec differentiates between different interfaces by using a protocol. When you give a path to FSSPec, it determine the which interface to use by the 'protocol' that's included in the path.  For example, a path which starts with `https:` indicates to FSSpec to use its `http` interface. This allows users to just user the top level FSSPec library without needing to know the specifics of the underlying interface.

### PelicanFS

PelicanFS is an implementation of FSSpec which introduces two new protocols to FSSpec: `pelican` and `osdf`. The `pelican` protocol specifies for FSSpec to the use the PelicanFS implementation access data via a Pelican Federation. To use it, you must specify the federation root. A pelican fsspec path would look like:

```pelican://<federation-root>/<namespace-path>```

The `osdf` protocol is a specific instance of the `pelican` protocol that knows how to access the OSDF. A path using the `osdf` protocol **should not** provide the federation root. An osdf fsspec path would look like:

```osdf:///<namespace-path>```


:::{note}
Notice the three '/' after "osdf:", this is required for a properly formed osdf path
:::

If you'd like to understand more about how pelican works, check out the documentation [here](https://docs.pelicanplatform.org/about-pelican/core-concepts)

### Putting it all together

What does this mean in practice?

In an environment where pelicanfs is installed, then if you want to access data from the OSDF using fsspec or any library which uses fsspec, all that you need to do is give it the proper path with the osdf protocol to fsspec, and fsspec and pelicanfs will do all the work to resolve it behind the scenes.



---

## A PelicanFS Example using Real Data

The following is an example that shows how pelicanfs works on real world data using FSSPec and XArray to access Zarr data from AWS.

This portion of the notebook is based off of the [Project Pythia HRRR AWS Cookbook](https://projectpythia.org/HRRR-AWS-cookbook/)

### Setting the Proper Path
The data for this tutorial is part of AWS Open Data, hosted in the `us-west-2` region. The OSDF provides access to that region using the `aws-opendata/us-west-1` namespace.

Let's first create a path which uses the `osdf` protocol.

In [None]:
# Set the date, hour, variable, and level for the HRRR data
date = '20211016'
hour = '21'
var = 'TMP'
level = '2m_above_ground'

# Construct file paths for the Zarr datasets using the osdf protocol
namespace_file1 = f'osdf:///aws-opendata/us-west-1/hrrrzarr/sfc/{date}/{date}_{hour}z_anl.zarr/{level}/{var}/{level}/'
namespace_file2 = f'osdf:///aws-opendata/us-west-1/hrrrzarr/sfc/{date}/{date}_{hour}z_anl.zarr/{level}/{var}/'

### Using FSSpec to access the data

Now we can access the data using XArray like normal. The two files will accessed using fsspec's `get_mapper` function, which knows to use PelicanFS because we created the path using the `osdf` protocol.

In [None]:
# Get mappers for the Zarr datasets

file1 = fsspec.get_mapper(namespace_file1)
file2 = fsspec.get_mapper(namespace_file2)

# Open the datasets
ds = xr.open_mfdataset([file1, file2], engine='zarr', decode_timedelta=True)

# Display the dataset
ds

### Continue the workflow
As you can see, xarray streamed the data correctly into the datasets. To prove the workflow works, the next cell continues the computation and generates two plots. This tutorial will not go in depth as to what this code is accomplishing.

If you'd like to know more about the following workflow, please refer to the [Project Pythia HRRR AWS Cookbook](https://projectpythia.org/HRRR-AWS-cookbook/)

In [None]:
# Define coordinates for projection
lon1 = -97.5
lat1 = 38.5
slat = 38.5

# Define the Lambert Conformal projection
projData = ccrs.LambertConformal(
    central_longitude=lon1,
    central_latitude=lat1,
    standard_parallels=[slat, slat],
    globe=ccrs.Globe(
        semimajor_axis=6371229,
        semiminor_axis=6371229
    )
)

# Display dataset coordinates
ds.coords

# Extract temperature data
airTemp = ds.TMP

# Display the temperature data
airTemp

# Convert temperature units to Celsius
airTemp = airTemp.metpy.convert_units('degC')

# Display the converted temperature data
airTemp

# Extract projection coordinates
x = airTemp.projection_x_coordinate
y = airTemp.projection_y_coordinate

# Plot temperature data
airTemp.plot(figsize=(11, 8.5))

# Compute minimum and maximum temperatures
minTemp = airTemp.min().compute()
maxTemp = airTemp.max().compute()

# Display minimum and maximum temperature values
minTemp.values, maxTemp.values

# Define contour levels
fint = np.arange(np.floor(minTemp.values), np.ceil(maxTemp.values) + 2, 2)

# Define plot bounds and resolution
latN = 50.4
latS = 24.25
lonW = -123.8
lonE = -71.2
res = '50m'

# Create a figure and axis with projection
fig = plt.figure(figsize=(18, 12))
ax = plt.subplot(1, 1, 1, projection=projData)
ax.set_extent([lonW, lonE, latS, latN], crs=ccrs.PlateCarree())
ax.add_feature(cfeature.COASTLINE.with_scale(res))
ax.add_feature(cfeature.STATES.with_scale(res))

# Add the title
tl1 = 'HRRR 2m temperature ($^\\circ$C)'
tl2 = f'Analysis valid at: {hour}00 UTC {date}'
plt.title(f'{tl1}\n{tl2}', fontsize=16)

# Contour fill
CF = ax.contourf(x, y, airTemp, levels=fint, cmap=plt.get_cmap('coolwarm'))

# Make a colorbar for the ContourSet returned by the contourf call
cbar = fig.colorbar(CF, shrink=0.5)
cbar.set_label(r'2m Temperature ($^\circ$C)', size='large')

# Show the plot
plt.show()