# Setting up a recipe to run in the cloud (Intro Tutorial Part 3)

Welcome back to the Pangeo Forge introduction tutorial!

This tutorial is split into three parts:

1. Defining a `FilePattern`
2. Defining a recipe and running it locally
3. Setting up a recipe to run in the cloud

Throughout this tutorial we are going to convert NOAA OISST stored in netCDF to Zarr. OISST is a global, gridded ocean sea surface temperature dataset at daily 1/4 degree resolution. By the end of this tutorial sequence you will have converted some OISST data to zarr, be able to access a sample on your computer, and see how to propose the recipe for cloud deployment!

Here we tackle **Part 3 - Setting up a recipe to run in the cloud**. We will assume that you already have `pangeo-forge-recipes` installed.

## Steps to Running in the Cloud

We are at an exciting point - transitioning to Pangeo Forge Cloud. In order to take our local recipe and set it up for the cloud we will need to:

1. Fork the `staged-recipes` repo
2. Add the recipe files: a `.py` file and a `meta.yml` file
4. Make a PR to the `staged-recipes` repo


## Fork the `staged-recipes` repo

[`staged-recipes`](https://github.com/pangeo-forge/staged-recipes) is a repository that exists as a staging ground for recipes. It is where recipes get reviewed before they are run. Once the recipe is run the code will be transitioned to its own repository for that recipe, called a **feedstock**. 

You can fork a repo through the web browser or the Github CLI. Checkout the [Github docs](https://docs.github.com/en/get-started/quickstart/fork-a-repo) for steps how to do this.

## Add the recipe files

Within `staged-recipes`, recipes files should go in a new folder in the `recipes` subdirectory. In the `recipes` folder, add a new folder for your dataset. In the example below we call the folder `oissst`. The final file structure we are creating is this:

```
staged-recipes/recipes/
                └──oisst/
                   ├──recipe.py
                   └──meta.yml
```
where the name of the folder `oisst` would vary based on the name of the dataset.

Question: does the name of the file hold any significance?

### Copy the recipe code into a single `.py` file

Within the `oisst` folder create a file called `recipe.py` and copy the recipe creation code from the first two parts of this tutorial. We don't have to copy any of the code we used for local testing - the cloud automation will take care of testing and scaling the processing on the cloud infrastructure. The `recipe.py` file should look like:

In [1]:
import pandas as pd

from pangeo_forge_recipes.patterns import ConcatDim, FilePattern
from pangeo_forge_recipes.recipes import XarrayZarrRecipe

dates = pd.date_range('1981-09-01', '2022-02-01', freq='D')

def make_url(time):
    yyyymm = time.strftime('%Y%m')
    yyyymmdd = time.strftime('%Y%m%d')
    return (
        'https://www.ncei.noaa.gov/data/sea-surface-temperature-optimum-interpolation/'
        f'v2.1/access/avhrr/{yyyymm}/oisst-avhrr-v02r01.{yyyymmdd}.nc'
    )

time_concat_dim = ConcatDim("time", dates, nitems_per_file=1)
pattern = FilePattern(make_url, time_concat_dim)

recipe = XarrayZarrRecipe(pattern, inputs_per_chunk=2)

Another step, complete!

## Create a `meta.yml` file

The `meta.yml` contains two important things:
1. metadata about the recipe 
2. the **bakery**, designating the cloud infrastructure where the recipe will be run and stored.

Here we will walk through each field of the `meta.yml`. A template of `meta.yml` is also available [here](https://github.com/pangeo-forge/sandbox/blob/main/recipe/meta.yaml). 

Question/Note -- I don't see a place in our docs where someone could read more about `meta.yml`, perhaps an idea for the future. Address: fields that are required/optional, for "roles" what are the options, perhaps more detail about how important it is that we are pinning the PF version, what happens with the metadata (catalog?)


### `title` and `description`

These fields describe the dataset. They are not highly restricted.

```{code-block} yaml
:lineno-start: 1
title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
```

```{admonition} Full File Preview
:class: dropdown
```{code-block} yaml
:emphasize-lines: 1, 2

title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
```

### `pangeo_forge_version`

This is the version of the `pangeo_forge_recipes` library that you used to create the recipe. It's important to track in case someone wants to run your recipe in the future. Conda users can find this information with `conda list`.

```{code-block} yaml
:lineno-start: 3
pangeo_forge_version: "0.6.2"
```

```{admonition} Full File Preview
:class: dropdown
```{code-block} yaml
:lineno-start: 1
:emphasize-lines: 3

title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
pangeo_forge_version: "0.6.2"
```

### `recipes` section

Question: How are these chosen?

```{code-block} yaml
:lineno-start: 4
recipes:
  - id: identifier-for-your-recipe
    object: "recipe:recipe"
```

```{admonition} Full File Preview
:class: dropdown
```{code-block} yaml
:lineno-start: 1
:emphasize-lines: 4-6

title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
pangeo_forge_version: "0.6.2"
recipes:
  - id: ?identifier-for-your-recipe?
    object: "recipe:recipe"
```

### `provenance` section

Provenance explains the origin of the dataset. Here we are listing information about the data provider.

```{code-block} yaml
:lineno-start: 7
provenance:
  providers:
    - name: "NOAA NCEI"
      description: "National Oceanographic & Atmospheric Administration National Centers for Environmental Information"
      roles:
        - producer
        - licensor
      url: https://www.ncdc.noaa.gov/oisst
  license: "CC-BY-4.0"
```

`name` and `description` refer to the data provider. `roles` are ...?  (link to the options?). Docs: where you can have multiple inputs. `license` is set by NOAA. This information is typically available ____.

```{admonition} Full File Preview
:class: dropdown
```{code-block} yaml
:lineno-start: 1
:emphasize-lines: 7-15

title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
pangeo_forge_version: "0.6.2"
recipes:
  - id: ?identifier-for-your-recipe?
    object: "recipe:recipe"
provenance:
  providers:
    - name: "NOAA NCEI"
      description: "National Oceanographic & Atmospheric Administration National Centers for Environmental Information"
      roles:
        - producer
        - licensor
      url: https://www.ncdc.noaa.gov/oisst
  license: "CC-BY-4.0"
```

### `maintainers` section

This is information about you, the recipe creator! Multiple maintainers can be listed and the required fields are `name`, ___.

```{code-block} yaml
:lineno-start: 17
maintainers:
  - name: "Dorothy Vaughan"
    orcid: "9999-9999-9999-9998"
    github: dvaughan0987
```

```{admonition} Full File Preview
:class: dropdown
```{code-block} yaml
:lineno-start: 1
:emphasize-lines: 16-19

title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
pangeo_forge_version: "0.6.2"
recipes:
  - id: ?identifier-for-your-recipe?
    object: "recipe:recipe"
provenance:
  providers:
    - name: "NOAA NCEI"
      description: "National Oceanographic & Atmospheric Administration National Centers for Environmental Information"
      roles:
        - producer
        - licensor
      url: https://www.ncdc.noaa.gov/oisst
  license: "CC-BY-4.0"
maintainers:
  - name: "Dorothy Vaughan"
    orcid: "9999-9999-9999-9998"
    github: dvaughan0987
```

### `bakery` section

**Bakeries** are where the work gets done on Pangeo Forge Cloud. A single bakery is a set of cloud infrastructure hosted by a particular institution or group.

Choosing the `bakery` is how you choose which cloud infrastructure on which the recipe will be run and hosted. Quetion: How do you know which bakeries you are allowed to use? Is the [bakery database](https://github.com/pangeo-forge/bakery-database/blob/main/bakeries.yaml) something we are recommending?

```{code-block} yaml
:lineno-start: 17
bakery:
  id: "columbia-staging"
```

```{admonition} Full File Preview
:class: dropdown
```{code-block} yaml
:lineno-start: 1
:emphasize-lines: 20, 21

title: "NOAA Optimum Interpolated SST"
description: "1/4 degree daily gap filled sea surface temperature (SST)"
pangeo_forge_version: "0.6.2"
recipes:
  - id: ?identifier-for-your-recipe?
    object: "recipe:recipe"
provenance:
  providers:
    - name: "NOAA NCEI"
      description: "National Oceanographic & Atmospheric Administration National Centers for Environmental Information"
      roles:
        - producer
        - licensor
      url: https://www.ncdc.noaa.gov/oisst
  license: "CC-BY-4.0"
maintainers:
  - name: "Dorothy Vaughan"
    orcid: "9999-9999-9999-9998"
    github: dvaughan0987
bakery:
  id: "columbia-staging"
```

And that is the `meta.yml`! Between the `meta.yml` and `recipe.py` we have now put together all the files we need for cloud processing.

## Make a PR to the `staged-recipes` repo

It's time to submit the changes as a Pull Request. If you have opened an issue for your dataset you can reference it in the PR. Otherwise, provide a notes about the datasets and hit submit!

## After the PR

With the PR in, all the steps to stage the recipe are complete! At this point a Pangeo Forge Bot will perform some automated steps, such as checking syntax and required fields. [This recipe](https://github.com/pangeo-forge/staged-recipes/pull/66#issuecomment-1048578240) is an example of the Bot in action. The bot, and possibly a Pangeo Forge Maintainer will guide you through any steps to be taken on your recipe before merge.

Merging the PR will kick off a series of automated steps to begin the processing. These include:

- creating a feedstock repository
- setting up the necessary bakery infrastructure
- deploying the recipe

The relevant information about the recipe run will be communicated directly in the PR. If you are interested in learning more about how your recipe is processed, check out the {doc}`cloud_automoation_user_guide/index`.

## End of the Introduction Tutorial

Congratulations, you've completed the introduction tutorial!

From here, we hope you are excited to try writing your own recipe. As you write, you can find additional documentation helpful, such as the {doc}`recipe_user_guide/index` or the more advanced {doc}`tutorials/index`. You can also open issues in [`pangeo_forge_recipes`](https://github.com/pangeo-forge/pangeo-forge-recipes).

Happy ARCO building!