## Generating a data model for CLIWOC

### Creating a data model

In [None]:
from __future__ import annotations

import json
import os
import shutil

import pandas as pd

from cdm_reader_mapper import mdf_reader, test_data

try:
    from importlib.resources import files as get_files
except ImportError:
    from importlib_resources import files as get_files

from tempfile import TemporaryDirectory

In [None]:
schema = "imma1"
data_file_path = test_data.test_063_714["source"]
data_raw = mdf_reader.read(data_file_path, data_model=schema)

In [None]:
data_raw.data["c99"].head()

In [None]:
data_raw.data["c99"].iloc[3]

### Custom Schema

To use a custom schema we need to use the `data_model_path` argument in `mdf_reader.read`. The structure of the directory is:

```
name_of_model/
    name_of_model.json
    code_tables/
        ...
```

The `code_tables` sub-directory contains the code tables that map the key columns in the data to their values.

In this demonstration we will create a new model in a temporary directory. It will be a copy of the `imma1_d703` schema and code tables.

In [None]:
tmp_dir = TemporaryDirectory()
model_name = "imma1_d730"
my_model_path = os.path.join(tmp_dir.name, model_name)
os.mkdir(my_model_path)

# Load schema and save to json file
schema = mdf_reader.schemas.read_schema(model_name)
json_object = json.dumps(schema, indent=2)

with open(os.path.join(my_model_path, model_name + ".json"), "w") as outfile:
    outfile.write(json_object)

# Get code tables and copy to the directory
code_tables_path = get_files(
    ".".join([mdf_reader.properties._base, "code_tables", "imma1"])
)
shutil.copytree(code_tables_path, os.path.join(my_model_path, "code_tables"))

In [None]:
data_file_path = test_data.test_133_730["source"]
data = mdf_reader.read(data_file_path, data_model_path=my_model_path)

In [None]:
data.data[["c99_sentinal"]].head()

In [None]:
data.data[["c99_logbook"]].c99_logbook.describe(include="all")

In [None]:
pd.options.display.max_columns = None
data.data[["c99_voyage"]].c99_voyage.describe(include="all")

In [None]:
data.data[["c99_voyage"]].c99_voyage.ZeroMeridian.head()

e.g. the ship types on this deck will be given in a tons of different languages. There is no code table for this variable in the CLIWOC website.

In [None]:
data.data[["c99_voyage"]].c99_voyage.Ship_type.dropna().head()

In [None]:
data.data[["c99_data"]].c99_data.describe(include="all")

What about the different scales for the wind force, given different languages?

In [None]:
data.data[["c99_data"]].c99_data.wind_force.head()