# Data reading example 1 - minimal test dataset #
To run this example the file *test_csv_data_sec_cat.csv* must be placed in the same folder as this notebook. You can find the notebook and the csv file in the folder docs in the PRIMAP2 repository.

In [None]:
# imports
import primap2 as pm2

## Dataset Specifications ##
Here we define which columns of the csv file contain the metadata. The dict `coords_cols` contains the mapping of csv columns to PRIMAP2 metadata.
Default values are set using `coords_defaults`. The terminologies (e.g. IPCC2006 for categories or the ISO3 country codes for area) are set in the coords_terminologies dict. `meta_mapping` defines conversion of metadata values, e.g. category codes. You can either specify a dict for a metadata column which directly defines the mapping or a string which governs which function to use to create the mapping. Possible functions are hard-coded currently for security reasons. `filter_keep` and `filter_remove` filter the input data. Each entry in `filter_keep` specifies a subset of the input data which is kept while the subsets defined by `filter_remove` are removed from the input data.

For details we refer to the documentation of `read_wide_csv_file_if` which is located in the `io` module of PRIMAP2.

Here the metadata for `unit`, `entity`, `area`, `category`, and the secondary category `class`. As secondary categories have free names they need a prefix in the interchange format to identify them which is `sec_cats__`. Metadata for `source`, `citation`, the secondary category `type`, and `scenario` is not available int he csv file and thus added using default values defined in `coords_defaults`. Terminologies are given for `area`, `category`, `scenario`, and the secondary categories. Providing these terminologies is mandatory to obtain a valid PRIMAP2 dataset.

Metadata mapping is necessary for `category` and `entity`. Both use the PRIMAP1 specifications in the csv file. For `category` this means that e.g. `IPC1A2` would be converted to `1.A.2` for `entity` the conversion affects the way GWP information is stored in the entity name: e.g. `KYOTOGHGAR4` is mapped to `KYOTOGHG (AR4GWP100)`. 

For examples on the use of filters we refer to the second example which reads the PRIMAP-hist data.

In [None]:
file = "test_csv_data_sec_cat.csv"
folder = "."
coords_cols = {
    "unit": "unit",
    "entity": "gas",
    "area": "country",
    "category": "category",
    "sec_cats__Class": "class",
}
coords_defaults = {
    "source": "TESTcsv2021",
    "citation": "Test",
    "sec_cats__Type": "fugitive",
    "scenario": "HISTORY",
}
coords_terminologies = {
    "area": "ISO3",
    "category": "IPCC2006",
    "sec_cats__Type": "type",
    "sec_cats__Class": "class",
    "scenario": "general",
}
meta_mapping = {"category": "PRIMAP1", "entity": "PRIMAP1"}
filter_keep = {}
filter_remove = {}
data_if = pm2.pm2io.read_wide_csv_file_if(
    file,
    folder,
    coords_cols=coords_cols,
    coords_defaults=coords_defaults,
    coords_terminologies=coords_terminologies,
    meta_mapping=meta_mapping,
    filter_keep=filter_keep,
    filter_remove=filter_remove,
)
data_if.head()

## Transformation to PRIMAP2 xarray format ##
The transformation to PRIMAP2 xarray format is done using the function `from_interchange_format` which takes an interchange format DataFrame, the `attrs` dict and an optional parameter specifying the data format as inputs. The resulting xr Dataset is already quantified, thus the variables are pint arrays which include a unit.

In [None]:
data_pm2 = pm2.pm2io.from_interchange_format(data_if, data_if.attrs)
data_pm2