# Sharing code along with the data using CliMetLab


**Objective**:

The objective of this notebook is to illustrate how to create a climetlab dataset plugin for a dataset form these two files: `forecast_error.csv` and `soil_temperature.csv` (These files are located next to this notebook).

There are three main steps:

- Step 1: Create the plugin boilerplate structure using climetlab-plugin-tools.

- Step 2: Add your code to the plugin.

- Step 3: Push to github and publish to pypi (This is not covered here. See the general purpose tutorial about github and pipy)

## How to run this exercise

This exercise is in the form of a [Jupyter notebook](https://jupyter.org/). It can be "run" in a number of free cloud based environments (see two options below). These require no installation. When you click on one of the links below ([`Open in Colab`](https://colab.research.google.com/github/ecmwf-projects/mooc-machine-learning-weather-climate/blob/main/tier_2/data_handling/04-dataset-plugin.ipynb) or [`Launch in Deepnote`](https://deepnote.com/launch?url=https://github.com/ecmwf-projects/mooc-machine-learning-weather-climate/blob/main/tier_2/data_handling/04-dataset-plugin.ipynb)) you will be prompted to create a free account, after which you will see the same page you see here. You can run each block of code by selecting shift+control repeatedly, or by selecting the "play" icon. 

Advanced users may wish to run this exercise on their own computers by first installing [Python](https://www.python.org/downloads/), [Jupyter](https://jupyter.org/install) and [CliMetLab](https://climetlab.readthedocs.io/en/latest/installing.html).

<style>
td, th {
   border: 1px solid white;
   border-collapse: collapse;
}
</style>
<table align="left">
  <tr>
    <th>Run the tutorial via free cloud platforms: </th>
    <th><a href="https://colab.research.google.com/github/ecmwf-projects/mooc-machine-learning-weather-climate/blob/main/tier_2/data_handling/04-dataset-plugin.ipynb">
        <img src = "https://colab.research.google.com/assets/colab-badge.svg" alt = "Colab"></th>
    <th><a href="https://deepnote.com/launch?url=https://github.com/ecmwf-projects/mooc-machine-learning-weather-climate/blob/main/tier_2/data_handling/04-dataset-plugin.ipynb">
        <img src = "https://deepnote.com/buttons/launch-in-deepnote-small.svg" alt = "Kaggle"></th>
  </tr>
</table>

## Let's begin the exercise...

In [1]:
pip install climetlab --quiet

[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/177.5 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m177.5/177.5 kB[0m [31m5.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m8.7/8.7 MB[0m [31m89.5 MB/s[0m eta [36m0:00:00[0m
[?25h  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m2.3/2.3 MB[0m [31m96.6 MB/s[0m eta [36m0:00:00[0m
[?25h  Preparing metadata (setup.py) ... [?25l[?25hdone
[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.2/76.2 MB[0m [31m11.9 MB/s[0m eta [36m0:00

In [11]:
# on Collab you may need to download the file located next to this notebook:
!wget 'https://raw.githubusercontent.com/ecmwf-projects/mooc-machine-learning-weather-climate/main/tier_2/data_handling/soil_temperature.csv'
!wget 'https://raw.githubusercontent.com/ecmwf-projects/mooc-machine-learning-weather-climate/main/tier_2/data_handling/forecast_error.csv'

--2023-04-28 12:35:37--  https://raw.githubusercontent.com/ecmwf-projects/mooc-machine-learning-weather-climate/main/tier_2/data_handling/soil_temperature.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.109.133, 185.199.108.133, 185.199.111.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.109.133|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 16933 (17K) [text/plain]
Saving to: ‘soil_temperature.csv’


2023-04-28 12:35:37 (27.1 MB/s) - ‘soil_temperature.csv’ saved [16933/16933]

--2023-04-28 12:35:38--  https://raw.githubusercontent.com/ecmwf-projects/mooc-machine-learning-weather-climate/main/tier_2/data_handling/forecast_error.csv
Resolving raw.githubusercontent.com (raw.githubusercontent.com)... 185.199.108.133, 185.199.109.133, 185.199.110.133, ...
Connecting to raw.githubusercontent.com (raw.githubusercontent.com)|185.199.108.133|:443... connected.
HTTP request sent, awaiting response... 

In [12]:
!ls *.csv

forecast_error.csv  soil_temperature.csv


#### Step 1: Create the plugin boilerplate structure using climetlab-plugin-tools.

In [13]:
!climetlab help


Documented commands (type help <topic>):
availability  completion    help             plugins         versions
benchmark     decache       index_directory  quit          
cache         export_cache  index_url        settings      
check         grib_info     index_urls       settings_reset

Undocumented commands:
create  df  dump_index  libraries  plugin_create_dataset  plugin_create_source



Climetlab plugin tools are packaged separately, we need to install them as a plugin in order to have the shell command `climetlab create_plugin_dataset` available.

In [14]:
!pip install climetlab-plugin-tools --quiet

In [15]:
!climetlab help


Documented commands (type help <topic>):
availability  completion    help             plugins         versions
benchmark     decache       index_directory  quit          
cache         export_cache  index_url        settings      
check         grib_info     index_urls       settings_reset

Undocumented commands:
create  df  dump_index  libraries  plugin_create_dataset  plugin_create_source



Run from a shell terminal:

$ climetlab

(climetlab) plugin_create_dataset

Answer questions...

A new folder has now been created with all the code required to have a proper pip python package and have code included in a climetlab dataset plugin.

#### Step 2: Add your code to the plugin.

The previous created boilerplate code, let us now write some actual code to link it to the data.

Here is the file you want to edit.

In [16]:
!ls climetlab-*/climetlab_*/*.py

ls: cannot access 'climetlab-*/climetlab_*/*.py': No such file or directory


In [17]:
from climetlab.decorators import normalize

@normalize("parameter", ["tp", "t2m"])
def __init__(self, year, parameter):
    request = dict(parameter=parameter, url=URL, year=year)
    self.source = cml.load_source("url-pattern", PATTERN, request)

Let us edit this part and change it to:

In [18]:
def __init__(self, parameter):
    self.source = cml.load_source("file", parameter + '.csv')
    # For a real plugin use "url" or "url-pattern" sources:
    # self.source = cml.load_source("url", URL_PREFIX + parameter + '.csv')
    # self.source = cml.load_source("url-pattern", PATTERN, {"parameter": parameter} )

The `pip` plugin package need now to be installed. 

As an example a plugin is available next to this notebook in `./climetlab-my-plugin-solution`. Adapt the name to use the plugin name you defined.

In [19]:
# !pip install -e ./climetlab-my-plugin        # installing your own plugin
# !pip install -e ./climetlab-your-plugin-name # installing your own plugin

!pip install -e ./climetlab-my-plugin-solution # installing the solution plugin

[31mERROR: ./climetlab-my-plugin-solution is not a valid editable requirement. It should either be a path to a local project or a VCS URL (beginning with bzr+http, bzr+https, bzr+ssh, bzr+sftp, bzr+ftp, bzr+lp, bzr+file, git+http, git+https, git+ssh, git+git, git+file, hg+file, hg+http, hg+https, hg+ssh, hg+static-http, svn+ssh, svn+http, svn+https, svn+svn, svn+file).[0m[31m
[0m


Note: using -e with pip
Warning for Jupyter users: you may need to restart your python notebook.


Let us test this. From a notebook or from ipython or from a python script:

In [20]:
import climetlab as cml 
ds = cml.load_dataset('my-plugin', parameter = 'soil_temperature')
ds.to_pandas()


my-plugin.yaml: 0.00B [00:00, ?B/s]



NameError: ignored

In [21]:
ds = cml.load_dataset('my-plugin', parameter = 'forecast_error')
ds.to_pandas()

my-plugin.yaml: 0.00B [00:00, ?B/s]



NameError: ignored

# Improving data usability
Data can be accessed as a panda dataframe. Can we do better to help the end-user handling the data?

What about helping them fixing a typo?

In [22]:
import climetlab as cml
cml.load_dataset('my-plugin', parameter = 'soiltemperature')
# For Github actions: skip

my-plugin.yaml: 0.00B [00:00, ?B/s]



NameError: ignored

Let us replace this error message 

	`FileNotFoundError: [Errno 2] No such file or directory: 'soiltemperature.csv'`

by a more helpful error message such as:

	`Invalid value 'soiltemperature', possible values are ['soil_temperature', 'forecast_error'] (EnumSingleOrListType)`

In [23]:
# Add the climetlab decorator `@normalize`
from climetlab.decorators import normalize

@normalize("parameter", ['soil_temperature', 'forecast_error'])
def __init__(self, parameter):
    self.source = cml.load_source("file", parameter + '.csv')

# And retry previous cell (You may need to restart kernel after doing pip install)

# If you installed the plugin package with -e, you do not need to reinstall it.
# If you did not use -e, you need to reinstall the plugin package to update it.

In [24]:
import climetlab as cml
cml.load_dataset('my-plugin', parameter = 'soiltemperature')
# For Github actions: skip

my-plugin.yaml: 0.00B [00:00, ?B/s]



NameError: ignored

This also takes care of using lower and uppercase letters:

In [25]:
import climetlab as cml
ds = cml.load_dataset('my-plugin', parameter = 'SOIL_TEMPERATURE') # ok
ds = cml.load_dataset('my-plugin', parameter = 'Soil_Temperature') # ok

my-plugin.yaml: 0.00B [00:00, ?B/s]



NameError: ignored

## Dates time parameters
Date and time are so ubiquitous in the climate and meteorology domains that we have developed specific tools to handle these input arguments.

Similar to `@normalize("parameter", ['soil_temperature', 'forecast_error'])`

Adding `@normalize("argument", "date(%Y-%m-%d)")` transforms the input as a string with the requested format.

Relevant CliMetLab documentation: https://climetlab.readthedocs.io/en/latest/contributing/normalize.html

# CliMetLab dataset plugin blueprint features

Here is a minimal example: https://github.com/ecmwf/climetlab-demo-dataset

Here is a real-life example: https://github.com/mchantry/climetlab-mltc-surface-observation-postprocessing

- Python pip package structure:
	- setup.py + MANIFEST
	- version file
- README
	- Links to notebook in colab/binder/etc. 
- Examples in notebooks:
	- Used in README links
	- Tested in github actions.
- Test in tests/*
	- Using pytest.
	- Used in github actions.
- Github actions: yaml files in .github/workflows/*.yml
	- Check code quality
	- Run tests (fron tests/*.py) on various platform and python versions
- Automated release of the pip package from github (need and account on pypi.org)
	- Make sure the tests pass.
	- Update the */version file
	- Trigger a release : https://docs.github.com/en/repositories/releasing-projects-on-github/managing-releases-in-a-repository
- Legal stuff: LICENCE/AUTHOR/CONTRIBUTORS


Compare to https://github.com/ecmwf/climetlab-demo-source/blob/master/climetlab_demo_source/__init__.py

Compare to https://github.com/ecmwf-lab/climetlab-google-drive-source/blob/main/climetlab_google_drive_source/__init__.py