In [2]:
import pandas as pd

from pangeo_forge_recipes.patterns import pattern_from_file_sequence
from pangeo_forge_recipes.recipes import XarrayZarrRecipe

In [3]:
input_url_pattern = (
    'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/'
    'v2.1/access/avhrr/{yyyymm}/oisst-avhrr-v02r01.{yyyymmdd}.nc'
    )

dates = pd.date_range('1982-01-01', '1982-02-01', freq='D')
input_urls = [
    input_url_pattern.format(
        yyyymm=day.strftime('%Y%m'), yyyymmdd=day.strftime('%Y%m%d')
    )
    for day in dates
]

pattern = pattern_from_file_sequence(input_urls, 'time', nitems_per_file=1)
recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=10)

# Introduction Tutorial (Part 2 - Running a Recipe Locally)

Welcome back to the Pangeo Forge introduction tutorial!

This tutorial is split into three parts:
1. Defining a recipe
1. Running a recipe locally
2. Setting up a recipe to run in the cloud

Throughout this tutorial we are going to convert NOAA OISST stored in netCDF to Zarr. OISST is a global, gridded ocean sea surface temperature dataset at 1/4 degree resolution. By the end of this tutorial sequence you will have converted some OISST data to zarr, be able to access a sample on your computer, and see how to propose the recipe for cloud deployment!

Here we tackle **Part 2 - Running a recipe locally**. We will assume that you already have `pangeo-forge-recipes` installed.


## Part 2 Outline
In part 2 of this tutorial we wil be running the recipe we defined in part 1 to create some cloud optimized data on our own computer.

The step to doing this are:
1. Define storage targets
1. Set up logging
1. Run & Check output

pruning?

## Define storage targets

"Targets" are locations on a file system where `pangeo-forge-recipes` is going to write and read data. Put another way, a target is a folder somewhere on the machine where the recipe is being run. 

To run a recipe, there are 3 types of targets that need to be set:

1. **Input cache** - the location of the source files. In the case of this tutorial, the netCDF files from NOAA will be downloaded here
1. **Target** - the location of the converted, cloud-optimized data. In the case of this tutorial, the zarr files will be saved here.
2. **Metadata cache** - this final target is for metadata files that are created in the conversion process.

### Creating a filesystem and target objects

Targets are their own type of object in Pangeo Forge - the classes are `FSSpecTarget` for the target target [awkward wording] and the `CacheFSSpecTarget` for the **input cache** and **metadata cache** target.


In [17]:
# Work with Charles -- what is the bare minimum that someone needs to do?
FSSpecTarget(fs_local)  # this defaulted to the current directory

FSSpecTarget(fs=<fsspec.implementations.local.LocalFileSystem object at 0x105f8f760>, root_path='')

In [15]:
import tempfile
from fsspec.implementations.local import LocalFileSystem
from pangeo_forge_recipes.storage import FSSpecTarget, CacheFSSpecTarget

fs_local = LocalFileSystem()

cache_dir = tempfile.TemporaryDirectory()
meta_cache_dir = tempfile.TemporaryDirectory()
meta_cache_target = CacheFSSpecTarget(fs_local, cache_dir.name)  # cache_dir.name could be any path
cache_target = CacheFSSpecTarget(fs_local, meta_cache_dir.name)  # cache_dir.name could be any path

target_dir = tempfile.TemporaryDirectory()
target = FSSpecTarget(fs_local, target_dir.name)

recipe.input_cache = cache_target
recipe.metadata_cache = meta_cache_target
recipe.target = target

# recipe.input_cache = '/Users/rwegener/Documents/repos/scratch'
recipe.metadata_cache = meta_cache_target
recipe.target = FSSpecTarget(fs_local)

## Set up logging

Is this our desired flow? What are we hoping that someone will get out of logging?

In [6]:
# Should we make this a built-in function?
def setup_logging():
    import logging
    import sys
    formatter = logging.Formatter('%(name)s - %(levelname)s - %(message)s')
    logger = logging.getLogger("pangeo_forge_recipes")
    logger.setLevel(logging.DEBUG)
    sh = logging.StreamHandler(stream=sys.stdout)
    sh.setFormatter(formatter)
    logger.addHandler(sh)

setup_logging()

## Run and check output

The time is here, to run the recipe! There are [multiple ways](https://pangeo-forge.readthedocs.io/en/latest/recipe_user_guide/execution.html) to run the recipe. Here we are going to use the `.to_function()` method to convert our recipe object into a Python function. Then we can run the function.

In [16]:
flow = recipe.to_function()

1. Consolidating metadata in this existing store with zarr.consolidate_metadata().
2. Explicitly setting consolidated=False, to avoid trying to read consolidate metadata, or
3. Explicitly setting consolidated=True, to raise an error in this case instead of falling back to try reading non-consolidated metadata.
  return xr.open_zarr(target.get_mapper())


In [None]:
flow()  # I still get the consolidated metadata warning when I run this

### Check output

Now that the process has run we can use `xarray` to inspect the output data.

In [18]:
import xarray as xr

In [19]:
ds_target = xr.open_zarr(recipe.target.get_mapper(), consolidated=True)
ds_target

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 126.56 MiB 39.55 MiB Shape (32, 1, 720, 1440) (10, 1, 720, 1440) Count 5 Tasks 4 Chunks Type float32 numpy.ndarray",32  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 126.56 MiB 39.55 MiB Shape (32, 1, 720, 1440) (10, 1, 720, 1440) Count 5 Tasks 4 Chunks Type float32 numpy.ndarray",32  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 126.56 MiB 39.55 MiB Shape (32, 1, 720, 1440) (10, 1, 720, 1440) Count 5 Tasks 4 Chunks Type float32 numpy.ndarray",32  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray
"Array Chunk Bytes 126.56 MiB 39.55 MiB Shape (32, 1, 720, 1440) (10, 1, 720, 1440) Count 5 Tasks 4 Chunks Type float32 numpy.ndarray",32  1  1440  720  1,

Unnamed: 0,Array,Chunk
Bytes,126.56 MiB,39.55 MiB
Shape,"(32, 1, 720, 1440)","(10, 1, 720, 1440)"
Count,5 Tasks,4 Chunks
Type,float32,numpy.ndarray


We have converted the netCDF OISST data to zarr and opened it up in xarray! We have a working local recipe.

## End of Part 2

In this part of the tutorial we took the recipe Python class defined in Part 1 and ran it on our local machine. We defined our targets, set up logging, and ran the recipe with the `.to_function()` method.

In the next part of the tutorial we will look how we take our local recipe and set it up for cloud deployment.