# Tutorial: Data Management User Guide

In this tutorial, we use Traffic Sign Recognition(TSR) dataset as an example to demo how to import the metadata of the dataset into ProSafeAI Evaluation Tool which can be further used for [Data distribution](2.2.data_distribution.ipynb) & [Data Verification](2.3.data_verification.ipynb) in Data V and [basic metrics](3.1.basic_metrics.ipynb) & [Robustness test](3.2.robustness_test.ipynb) in Model V.

## 1.Prerequisites

* You need a Python environment to run this code.

* You need to align with ProSafeAI CN team(contacts are listed as below) to create the schema of your metadata of your dataset.
  - Xia, Xi (C|TN-12) <xi.xia@cariad-technology.cn> 
  - Yi, Wukun (C|TN-12) <Wukun.Yi@cariad-technology.cn> 

* You need to prepare a Python or other language script to convert the original metadata into the format as the exported from ProSafeAI Evaluation Tool after aligning with ProSafeAI CN team and creating the schema after the alignment.

## 2.Prepare metadata

This script converts the original metadata of the dataset into the format as the template exported from ProSafeAI Evaluation Tool after aligning with ProSafeAI CN team and creating the schema after the alignment.

### 2.1.ETL script for Traffic Sign Recognition(TSR)
For example, in the user case of Traffic Sign Recognition(TSR), we will use the following script to convert te original metadata into required format:

In [1]:
"""
original metadata: ./data/TSR_data/label_train.txt
the expected format of the metadata looks like the following json:
[
    {
            "image_name": "",
            "image_format": "",
            "class": "",
            "Snowfall_intensity": "",
            "Fog_intensity": "",
            "Rain_quantity": "",
            "dataset": "",
            "augmentation": "",
            "Illuminance": ""       
    }
]
"""
import os
import json

def scan_data(data_path, format_data_path):

    print("input data:", data_path)

    with open(data_path, "r", encoding="utf-8") as f:
        data = [line.strip().split("\t") for line in f.readlines()]

    results = []

    for pic_name, value in data:
        sub_dict = json.loads(value)

        info = {
            "image_name": pic_name,
            "image_format": pic_name.split(".")[1],
            "class": sub_dict.get("class"),
            "Snowfall_intensity": sub_dict.get("Snowfall_intensity"),
            "Fog_intensity": sub_dict.get("Fog_intensity"),
            "Rain_quantity": sub_dict.get("Rain_quantity"),
            "dataset": sub_dict.get("dataset"),
            "augmentation": sub_dict.get("augmentation"),
            "Illuminance": sub_dict.get("Illuminance"),
        }

        results.append(info)

    with open(os.path.join(format_data_path, "TSR_format_metadata.json"), "w", encoding="utf-8") as fw:
        json.dump(results, fw)

    print('format_data: ', os.path.join(format_data_path, "TSR_format_metadata.json"))

scan_data("./data/TSR_data/label_train.txt", "./data/TSR_data")

input data: ./data/label_train.txt
format_data:  ./data/TSR_format_metadata.json


### 2.2.ETL script for Pedestrian and Vehicle Detection(PVD)

For example, in the user case of Pedestrian and Vehicle Detection(PVD), we will use the following script to convert the original metadata into required format:

In [3]:

"""
original data: ./data/bdd100k
the expected format of the metadata looks like the following json:
[
    {
        "image_path": "",
        "image_name": "",
        "image_size": {
            "height": 720,
            "width": 1280,
         },
        "weather": "",
        "scene": "",
        "timeofday": "",
        "objects": [
            {
                "object_class": 0,
                "tag": {
                    "bbox": [
                        450.643725,
                        261.277479,
                        491.393423,
                        328.39463,
                    ],
                },
                "object_code": 0,
            },
        ]
    }
]
"""

import os
import json
import cv2


def scan_data(data_path):

    results = []

    category2id = {
        "car": 0,
        "bus": 1,
        "person": 2,
        "bike": 3,
        "truck": 4,
        "motor": 5,
        "train": 6,
        "rider": 7,
    }

    for root, dirs, files in os.walk(data_path + "labels"):
        for file_name in files:
            tmp = dict()

            file_path = os.path.join(root, file_name)

            with open(file_path, encoding="utf-8") as f:
                data = json.load(f)

            img_name = file_name.replace("json", "jpg")

            img = cv2.imread(os.path.join(data_path + "images", img_name))

            height, width = img.shape[0:2]

            tmp["image_path"] = os.path.join("/data/bdd100k/images", img_name)
            tmp["image_name"] = img_name
            tmp["image_size"] = {"height": height, "width": width}

            tmp["weather"] = data["attributes"]["weather"]
            tmp["scene"] = data["attributes"]["scene"]
            tmp["timeofday"] = data["attributes"]["timeofday"]

            for sub in data["frames"][0]["objects"]:
                if sub["category"] in category2id:
                    tmp.setdefault("objects", []).append(
                        {
                            "object_class": category2id[sub["category"]],
                            "tag": {
                                "bbox": [
                                    sub["box2d"]["x1"],
                                    sub["box2d"]["y1"],
                                    sub["box2d"]["x2"],
                                    sub["box2d"]["y2"],
                                ],
                                "occluded": sub["attributes"]["occluded"],
                                "truncated": sub["attributes"]["truncated"],
                                "trafficLightColor": sub["attributes"][
                                    "trafficLightColor"
                                ],
                            },
                            "object_code": category2id[sub["category"]],
                        }
                    )

            results.append(tmp)

    with open("./data/bdd100k/bdd100k_mini.json", "w", encoding="utf-8") as fw:
        json.dump(results, fw, ensure_ascii=False, indent=4)


scan_data(data_path="./data/bdd100k/")

When you run the script above, you will generate the formatted metadata JSON file `/data/bdd100k/bdd100k_mini.json` in your local disk.

## 3.Import data
After converting the original metadata as the template, we can use the generated JSON file(TSR_format_metadata.json for TSR) to import ProsafeAI Evaluation Tool.

Open [http://10.38.49.30:8080/](http://10.38.49.30:8080/), then input your username, password and verification code.
After the successful login, click on <span style="color:red;font-weight:bold">Data Management</span> in the left menu or Quick navigation.


![table list](./media/table_list.png)

We obtain a table list containing some attributes such as project name, user case, table name, description, field summary, task type, the latest version, etc. This list only displays the data that your account has permission to access. When your mouse hovers over the field summary, the column names of this table are displayed. These columns are aligned with ProSafeAI CN team, and the schema is created after alignment. If you can't find your dataset's schema, please contact ProSafeAI CN Team as soon as possible.

We select the row for TSR and click the `view details` button to enter the details page.

![table detail](./media/table_detail.png)

This page shows metadata in detail. We can click `View Other Version`, then choose a version and click `commit`, the page will display the selected data version.

We click `import New Data` to import new data.

![iport data](./media/import_data.png)

First, we need click `preview and download json template`, then we can download a template which is generated based on the schema pre-aligned with ProSafeAI CN team. For example, here is the JSON template for a specific dataset:

In [None]:
[
    {
        "image_name": "string",
        "image_format": "string",
        "augmentation": "string",
        "dataset": "string",
        "class": "string",
        "Fog_intensity": "string",
        "Snowfall_intensity": "string",
        "Illuminance": "string",
        "Rain_quantity": "string"
    }
]


We need to check if the prepared metadata(for example, `TSR_format_metadata.json` which is generated in the first step in this tutorial) meets the format as the exported template from ProSafeAI Evaluation Tool. If not, we need to modify the Python or other language script and re-generate the metadata until its format is correct. If it meets the format requirement, we can check the box `I have previewd and downloaded the template`, at the same time, the `Browse` button is available.


Second, we can click `Browse` button and upload the prepared metadata JSON file(e.g. `TSR_format_metadata.json`). Then, we input a table version description in `version comments` to distinguish the differences between different versions. Finally, we click the `commit` button to commit this data version.

We can see the latest uploaded data on the page and filter the data through the filtering box above.

Until now, you should have successfully imported your metadata of your dataset into ProSafeAI Evaluation Tool, and this metadata inforamtion is important for the following tasks such us [Data distribution](2.2.data_distribution.ipynb), [Data Verification](2.3.data_verification.ipynb) in `Data V` & [Basic Metrics](3.1.basic_metrics.ipynb) & [Robustness Test](3.2.robustness_test.ipynb) in `Model V`.

Now, you can jump to [Data distribution](2.2.data_distribution.ipynb) to see how your data distribution looks like for your dataset,enjoy^_^.