<img src="images/osdf_logo.png" width=250 alt="OSDF Logo"></img>

# OSDF Foundations

---

## Overview
In this notebook, we give an introduction to the [Open Science Data Federation](https://osg-htc.org/services/osdf.html), the [Pelican Platform](https://pelicanplatform.org), the [PelicanFS](https://github.com/PelicanPlatform/pelicanfs), which is a file system interface (fsspec) for the Pelican Platform and illustrate how to access data from an OSDF origin.

1. This is a numbered list of the specific topics
1. These should map approximately to your main sections of content
1. Or each second-level, `##`, header in your notebook
1. Keep the size and scope of your notebook in check
1. And be sure to let the reader know up front the important concepts they'll be leaving with

## Prerequisites
This section was inspired by [this template](https://github.com/alan-turing-institute/the-turing-way/blob/master/book/templates/chapter-template/chapter-landing-page.md) of the wonderful [The Turing Way](https://the-turing-way.netlify.app) Jupyter Book.

Following your overview, tell your reader what concepts, packages, or other background information they'll **need** before learning your material. Tie this explicitly with links to other pages here in Foundations or to relevant external resources. Remove this body text, then populate the Markdown table, denoted in this cell with `|` vertical brackets, below, and fill out the information following. In this table, lay out prerequisite concepts by explicitly linking to other Foundations material or external resources, or describe generally helpful concepts.

Label the importance of each concept explicitly as **helpful/necessary**.

| Concepts | Importance | Notes |
| --- | --- | --- |
| [Intro to Cartopy](https://foundations.projectpythia.org/core/cartopy/cartopy.html) | Necessary | |
| [Understanding of NetCDF](https://foundations.projectpythia.org/core/data-formats/netcdf-cf.html) | Helpful | Familiarity with metadata structure |
| Project management | Helpful | |

- **Time to learn**: estimate in minutes. For a rough idea, use 5 mins per subsection, 10 if longer; add these up for a total. Safer to round up and overestimate.
- **System requirements**:
    - Populate with any system, version, or non-Python software requirements if necessary
    - Otherwise use the concepts table above and the Imports section below to describe required packages as necessary
    - If no extra requirements, remove the **System requirements** point altogether

---

## Imports

In [None]:
import sys
import intake
import numpy as np
import pandas as pd
import xarray as xr
import seaborn as sns
import re
import xesmf as xe
import matplotlib.pyplot as plt
import fsspec.implementations.http as fshttp
from pelicanfs.core import PelicanFileSystem, PelicanMap, OSDFFileSystem 

import cf_units as cf
import dask 
from dask_jobqueue import PBSCluster
from dask.distributed import Client
from dask.distributed import performance_report

## Introduction

[Open Science Data Federation](https://osg-htc.org/services/osdf.html)
- The Open Science Data Federation (OSDF) is an Open Science Grid service designed to support the sharing of files staged in autonomous “origins”, for efficient access to those files from anywhere in the world via a global namespace and network of caches. The OSDF allows data to be downloaded via HTTPS

[Pelican Platform](https://pelicanplatform.org)
- Pelican provides an open-source software platform for federating dataset repositories together and delivering the objects to computing capacity such as the OSPool or your favourite HPC resource. Image courtesy: Pelican Platform

<img src="images/pelican_osdf.png" width=600 alt="OSDF info"></img>

[PelicanFS](https://github.com/PelicanPlatform/pelicanfs)
- PelicanFS which is a file system interface (fsspec) for the Pelican Platform and illustrate how to access data from an OSDF origin.

## Use OSDF and PelicanFS protocol to access data

### Set up osdf url to use with PelicanFS - Move this to next notebook ?????
- We should one of the two pelicanFS FSSpec protocols ('osdf' or 'pelican') instead of the https protocol.
- We will use the 'osdf' protocol and modify the existing CESM2-LENS catalog
- So, the urls will look like: osdf_discovery_url/namespace_prefix/aws-region/bucket_name/path to file or object
- In this case, the urls will look like osdf:///aws-opendata/us-west-2/ncar-cesm2-lens + path to individual zarr stores

#### Specifying a cache

pelfs = PelicanFileSystem("pelican://osg-htc.org", preferred_caches=["https://cache.example.com"])

#### Accessing data directly from an origin

pelfs = PelicanFileSystem("pelican://osg-htc.org", direct_reads=True)

## Summary
In this notebook, we gave a brief introduction to the Open Science Data Federation, the Pelican Platform and learnt how to access data using PelicanFS. In particular, we discussed how to use PelicanFS to

- Access data without specifying a cache
- Access data from a specific cache
- Access data directly from the data origin

all via the OSDF protocol.

### What's next?
In the next notebooks, we will learn how to access data from various climate datasets into our notebook using PelicanFS.

## Resources and references
- [Open Science Data Federation](https://osg-htc.org/services/osdf.html)
- [Pelican Platform](https://pelicanplatform.org)
- [PelicanFS](https://github.com/PelicanPlatform/pelicanfs) 