## Mapping data from ICOADS deck 704 to the Common Data Model (CDM)

Here we extract supplemental metadata from [ICOADSv3.0](https://icoads.noaa.gov/r3.html) stored in the [IMMA version 1](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf) format. 
We will then map this data (including the supplemental data) to the Common Data Model (CDM) format defined in the [CDM Documentation](https://github.com/glamod/common_data_model/blob/master/cdm_latest.pdf)..

The supplementary data are mapped to the CDM using the [tables](https://github.com/glamod/cdm_reader_mapper/tree/main/cdm_reader_mapper/cdm_mapper/tables/icoads/r300/d704) and [codes](https://github.com/glamod/cdm_reader_mapper/tree/main/cdm_reader_mapper/cdm_mapper/codes/icoads/r300/d704) specific to deck 704. The generic ICOADS [tables](https://github.com/glamod/cdm_reader_mapper/tree/main/cdm_reader_mapper/cdm_mapper/tables/icoads) are used to map the common ICOADS data components.

We are analysing deck: `704`, the [US Marine Meteorological Journals Collection](https://icoads.noaa.gov/usmmj.html)

In [None]:
from __future__ import annotations

import pandas as pd

from cdm_reader_mapper import read_mdf, test_data

We first read the supplemental data information from the `c99` imma format for a subset of the data (e.g. 1878/10). For this we need to use the `"icoads_r300_d704"` schema. The convention for schema names is: `"format_version_deck"`

* format/data model: "icoads"
* version/release: "r300" (release 3.0.0)
* deck: "d704"

In this notebook we load the icoads r3.0.0 deck 704 test file to use as an example.

In [None]:
schema = "icoads_r300_d704"

data_file_path = test_data.test_icoads_r300_d704[
    "source"
]  # Load the example file from the cdm_reader_mapper test data
data_bundle = read_mdf(data_file_path, imodel=schema)
data_raw = data_bundle.data

The data from the c99 column for this deck is separated in the following sub sections:
- c99_sentinel
- c99_journal
- c99_voyage
- c99_daily
- c99_data4
- c99_data5

In [None]:
data_raw.c99_sentinel.head()

In [None]:
pd.options.display.max_columns = None
data_raw.c99_journal.head()

In [None]:
data_raw.c99_voyage.head()

In [None]:
data_raw.c99_daily.head()

In [None]:
data_raw.c99_data4.head()

In [None]:
data_raw.c99_data5.head()

Now that we have separated the c99 data into the different sections, we see that this deck is composed of two types of data, which are the same:
    
    - c99_data4
    - c99_data5
    
Both sections have the same name in variables. To map the correct section into the CDM it is necessary to impose a filter on the sections composed only of NaN data. 
The problem is that we dont know which years in the time series will have a section c99_data4 and which will have a c99_data5

> Note that this solution of excluding one section, will only work for decks from which sections are exclusive: Among the sections listed in the block, only one of them appears in every report.


We can now use the `"icoads_r300_d704"` model to map the raw data to the Common Data Model [glamod/common_data_model](https://www.github.com/glamod/common_data_model). The method function `map_model` contains all the functions for the model to convert variables to the correct units and/or specification following the [CDM Documentation](https://github.com/glamod/common_data_model/blob/master/cdm_latest.pdf).

To run the data model we need three things:

- raw data (the data we just read above)
- attributes of the raw data (sections and column names)
- the name of the model

In [None]:
cdm_tables = data_bundle.map_model()

Now, have we succeeded in writing some of the data to the CDM format?

We were looking to write the following data 

### Header section

 - Platform type and sub type
 - primary station id: original ship names
 - Longitude and Latitudes: converted from Degrees Minutes and Hemisphere to Decimal degrees
 - Location accuracy
 
 
### Observations tables

- `Observations-at`: latitude, longitude and location precision
- `Observations-dpt`: latitude, longitude and location precision
- `Observations-slp`: latitude, longitude and location precision
     - z_coordinate_type: Barometer height in feet converted to m.
     - original units: written in the CDM code format

- `Observations-sst`: latitude, longitude and location precision
- `Observations-wbt`: latitude, longitude and location precision
- `Observations-wd`: latitude, longitude and location precision
- `Observations-ws`: latitude, longitude and location precision


In [None]:
data = cdm_tables["header"]
data.head()

We now show an example of Lat and Lon

In [None]:
data.latitude.head(), data.longitude.head()

In [None]:
data_raw.c99_daily[
    [
        "lat_deg_on",
        "lat_min_on",
        "lat_hemis_on",
        "lon_deg_of",
        "lon_min_of",
        "lon_hemis_of",
    ]
].head()

This has been successfully converted to Decimal degrees with the right (-) for each hemisphere. 


Now for the SLP we have other information:

In [None]:
data_raw.c99_journal[["baro_type", "baro_height", "baro_units"]].head()

Baro type original code table

```
{
	"1":"aneroid",
	"2":"mercurial"
}
```
Baro units original code table. It has been left like this:

```
{
	"1":"inches",
	"2":"millimeters",
	"3":"millibars",
	"4":"unable to determine",
	"5":"Paris inches"
}
```

Our CDM table will be
```
{
  "1":1001,
  "2":1002,
  "3":1003,
  "4":9999,
  "5":1005
}
```

9999 will be the `"fill_value": 9999` that indicates to the CDM-mapper that these are NaN values.


In [None]:
data_obs = cdm_tables["observations-slp"]
data_obs.head()