## Tag column values.

This notebook demonstrates how take an event file or a map and create a
two-column spreadsheet that makes it easy to assign HED tags to the unique
values in the columns. The input can either be event files or a map.
This notebook uses a map as the example.

**Table 1:** Excerpt of a map designed to map the `type` column into
a combination of columns `event_type`, `task_role`, and `letter`.

| type	  | event_type  | task_role        | letter |
| ------- | ----------- | ---------------- | ------ |
| A	      | show_letter | target           | A      |
| gD      | show_letter | non-target       | D      |
| Y       | show_letter | target           | E      |
| rK      | show_letter | probe            | K      |
| nonWM   | show_cross  | fixate           | +      |
| correct | sound_beep  | correct_feedback | n/a    |
| 1       | right_click | in_group         | n/a    |

In order to annotate the data using HED tags, you must associate the
meaning of each term in the relevant columns of the event file with a HED string.

To make this easier, we flatten the information into a two-column spreadsheet
that can be easily edited. Table 2 shows a *flattened* version of Table 1.

**Table 2:** A flattened version of Table 1. The categorical columns are
`event_type` and `task_role`. The `letter` column is a value column.

| column            | HED                     |
| ----------------- | ----------------------- |
| \_\*\_event_type_*_ | n/a                     |
| right_click       | Label/right_click       |
| show_dash         | Label/show_dash         |
| show_cross        | Label/show_cross        |
| sound_beep        | Label/sound_beep        |
| \_\*\_task_role_*_  | n/a                     |
| correct_feedback  | Label/correct_feedback  |
| fixate            | Label/fixate            |
| in_group          | Label/in_group          |
| non_target        | Label/non_target        |
| probe             | Label/probe             |
| target            | Label/target            |
| \_\*\_letter_*_      | Label/letter, Label/#  |

The column names `event_type` and `task_role` are categorical columns, so
they appear as separator rows with HED columns values `n/a`.

Each unique value in the categorical columns appears in a separate row
following its column name. Dummy HED tags using *Label* are filled in
as placeholders.

The `letter` value column values are represented by a single row. The HED
tag for this column must have a `#` placeholder. HED tools substitute
the particular value in the table for `#` when the annotation is assembled.

After you have updated the spreadsheet of Table 2 with the appropriate HED
tags, you will need to convert it to a JSON sidecar to be used with BIDS
datasets.  Table 3 shows the *unflattened* conversion of Table 2 into the
JSON form.

**Table 3:** The unflattened JSON sidecar.

```json
{
  "event_type": {
    "HED": {
      "right_click": "Label/right_click",
      "show_cross": "Label/show_cross",
      "show_letter": "Label/show_letter",
      "sound_beep": "Label/sound_beep",
    }
  },
  "task_role": {
    "HED": {
      "correct_feedback": "Label/correct_feedback",
      "fixate": "Label/fixate",
      "in_group": "Label/in_group",
      "non_target": "Label/non_target",
      "probe": "Label/probe",
      "target": "Label/target"
    }
  },
  "letter": {
    "HED": "Label/letter, Label/#"
  }
}

```

#### Read the completed template

In [1]:
import os
from hed.tools import ColumnDict
data_path = "../data/sternberg"
remap_file = os.path.join(data_path, "sternberg_map.tsv")
col_dict = ColumnDict(value_cols=['letter'], skip_cols=['type'], name='SternbergFlat')
col_dict.update(remap_file)

#### Output the template for viewing

In [2]:
col_dict.print()

Summary for column dictionary SternbergFlat:
  Categorical columns (2):
    event_type (9 distinct values):
      fixation: 1
      left_click: 1
      response: 1
      right_click: 1
      show_cross: 1
      show_dash: 1
      show_letter: 78
      sound_beep: 1
      sound_buzz: 2
    task_role (11 distinct values):
      correct_feedback: 1
      fixate: 1
      in_group: 1
      incorrect_feedback: 2
      non_target: 26
      out_group: 1
      probe: 26
      ready: 1
      target: 26
      unknown: 1
      work_memory: 1
  Value columns (1):
    letter: 87


#### Create a flattened version of the template to facilitate tagging

In [3]:
df = col_dict.get_flattened()
print(df.to_string())

                column                       HED
0     _*_event_type_*_                       n/a
1             fixation            Label/fixation
2           left_click          Label/left_click
3             response            Label/response
4          right_click         Label/right_click
5           show_cross          Label/show_cross
6            show_dash           Label/show_dash
7          show_letter         Label/show_letter
8           sound_beep          Label/sound_beep
9           sound_buzz          Label/sound_buzz
10     _*_task_role_*_                       n/a
11    correct_feedback    Label/correct_feedback
12              fixate              Label/fixate
13            in_group            Label/in_group
14  incorrect_feedback  Label/incorrect_feedback
15          non_target          Label/non_target
16           out_group           Label/out_group
17               probe               Label/probe
18               ready               Label/ready
19              targ

#### Output the flattened dictionary

The flattened file should be edited to replace the dummy *Label* HED tags with
the desired annotations.

In [4]:
file_path = os.path.join(data_path, 'sternberg_flattened.tsv')
df.to_csv(file_path, sep='\t', index=False)

#### Create a JSON sidecar from the flattened dictionary

Once the flattened spreadsheet is completed, you can convert
directly to a JSON sidecar by unflattening.


In [5]:
from json import dumps
from hed.tools import SidecarMap, get_new_dataframe

s_map = SidecarMap()
file_path = os.path.join(data_path, 'sternberg_flattened.tsv')
df = get_new_dataframe(file_path)
sidecar = s_map.unflatten_hed(df)
sidecar_string = dumps(sidecar, indent=2)
print(sidecar_string)

{
  "event_type": {
    "HED": {
      "fixation": "Label/fixation",
      "left_click": "Label/left_click",
      "response": "Label/response",
      "right_click": "Label/right_click",
      "show_cross": "Label/show_cross",
      "show_dash": "Label/show_dash",
      "show_letter": "Label/show_letter",
      "sound_beep": "Label/sound_beep",
      "sound_buzz": "Label/sound_buzz"
    }
  },
  "task_role": {
    "HED": {
      "correct_feedback": "Label/correct_feedback",
      "fixate": "Label/fixate",
      "in_group": "Label/in_group",
      "incorrect_feedback": "Label/incorrect_feedback",
      "non_target": "Label/non_target",
      "out_group": "Label/out_group",
      "probe": "Label/probe",
      "ready": "Label/ready",
      "target": "Label/target",
      "unknown": "Label/unknown",
      "work_memory": "Label/work_memory"
    }
  },
  "letter": {
    "HED": "Label/letter, Label/#"
  }
}


#### Save the JSON sidecar in the file

The general strategy is to produce a single JSON events sidecar that
will be placed at the top level the dataset to annotate the entire dataset.

In [6]:
json_path = os.path.join(data_path, 'sternberg_events.json')
json_file = open(json_path, "w")
json_file.write(sidecar_string)
json_file.close()

#### Enhance the JSON sidecar

You may wish to added other information to the JSON sidecar such as
the `levels` fields for categorical columns.  The `SidecarMap` has
facilities for flattening and unflattening JSON files of arbitrary
depth. You can use these facilities to easily edit and reconvert.