## How to read meteorogical data with `read_mdf` function

In [None]:
from __future__ import annotations

import pandas as pd

from cdm_reader_mapper import properties, read_mdf, test_data

The `cdm_reader_mapper.read_mdf` function and is a tool designed to read data files compliant with a user specified [data
model](https://cds.climate.copernicus.eu/toolbox/doc/how-to/15_how_to_understand_the_common_data_model/15_how_to_understand_the_common_data_model.html).

It was developed with the initial idea to read the [IMMA](https://icoads.noaa.gov/e-doc/imma/R3.0-imma1.pdf) data format, but it was further enhanced to account for other meteorological data formats.

Lets see an example for a typical file from [ICOADSv3.0.](https://icoads.noaa.gov/r3.html). We pick an specific monthly output for a Source/Deck. In this case data from the Marine Meterological Journals data set SID/DCK: **125-704 for Oct 1878.**

The `.imma` file looks like this:

In [None]:
data_path = test_data.test_icoads_r300_d704["source"]

data_ori = pd.read_table(data_path)
data_ori.head()

Very messy to just read into python!

This is why we need the `mdf_reader` tool, to helps us put those imma files in a [pandas.DataFrame](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html) format. For that we need need a **schema**.

A **schema** file gathers a collection of descriptors that enable the `mdf_reader` tool to access the content
of a `data model/ schema` and extract the sections of the raw data file that contains meaningful information. These **schema files** are the `bones` of the data model, basically `.json` files outlining the structure of the incoming raw data.

The `mdf_reader` takes this information and translate the characteristics of the data to a python pandas dataframe.

The tool has several **schema** templates build in.

In [None]:
properties.supported_data_models

**Schemas** can be designed to be deck specific like the example below

In [None]:
schema = "icoads_r300_d704"

data = read_mdf(data_path, imodel=schema)

A new **schema** can be build for a particular deck and source as shown in this notebook. The `imma1_d704` schema was build upon the `imma1` schema/data model but extra sections have been added to the `.json` files to include supplemental data from ICOADS documentation. This is a snapshot of the data inside the `imma1_d704.json`.

```
"c99_journal": {
            "header": {"sentinal": "1", "field_layout":"fixed_width","length": 117},
            "elements": {
              "sentinal":{
                  "description": "Journal header record identifier",
                  "field_length": 1,
                  "column_type": "str"
              },
              "reel_no":{
                  "description": "Microfilm reel number. See if we want the zero padding or not...",
                  "field_length": 3,
                  "column_type": "str",
                  "LMR6": true
              }
            ...
```

Now metadata information can be extracted as a component of the padas dataframe.

In [None]:
data.data.c99_journal

To learn how to construct a schema or data model for a particular deck/source, visit this other [tutorial notebook](https://github.com/glamod/cdm_reader_mapper/blob/main/docs/example_notebooks/CLIWOC_datamodel.ipynb)