<div align="left">
    <img width="640" src="https://highlighter-public.s3.ap-southeast-2.amazonaws.com/web/assets/Highlighter_Logo_Primary_Horizontal_RGB.png" alt="Highlighter logo">
</div>

# Create Assessments

An `assessment` is a collections of *attributes* associated with a `data_file`. In the follwing example we're dealing with a very small object detection dataset in the common `Coco` format. So each `data_file` is an image in the dataset and the collection of bounding boxes associated with each image is the `assessment`.

In the real world `assessment`s can be created in several different ways. This notebook demos a few common usecases:

  - You have an existing dataaset in Coco format, you wish to upload the image and the annotations.
  - You have an existing dataset in some custom format the is not currently supported by the `highlighter-sdk`
  - You have some local process like a deep learning model you run locally to produce outputs you wish to upload as `assessment`s in Highligher

---

First we need to install the `highlighter-sdk`

In [None]:
!pip install --quiet highlighter-sdk

# Imports and Download Sample Dataset

In [None]:
import json
from pathlib import Path
from uuid import uuid4
import urllib
import tarfile
import tempfile
import os


import highlighter as hl
from highlighter.datasets import ImageRecord, AttributeRecord, Dataset

from IPython.display import display_html
from itertools import chain,cycle

def show_dataset(ds):
    """Helper to display Datasets nicely in the Notebook
    """
    html_str=''
    for df,title in zip([ds.annotations_df, ds.data_files_df], chain(["Annotations", "Data Files"],cycle(['</br>'])) ):
        html_str+='<th style="text-align:center"><td style="vertical-align:top">'
        html_str+=f'<h2 style="text-align: center;">{title}</h2>'
        html_str+=df.head(5).to_html().replace('table','table style="display:inline"')
        html_str+=f'<br> shape: {df.shape}</td></th>'
    display_html(html_str,raw=True)


SAMPLE_DATASET_URL = "https://highlighter-public.s3.ap-southeast-2.amazonaws.com/simple-shapes-coco/simple_shapes_dataset.tar"

TEMP_DIR = Path(tempfile.mkdtemp())


In [None]:

def get_sample_data(temp_dir=TEMP_DIR):
    dataset_path = TEMP_DIR / "sample_dataset"
    coco_json = dataset_path / "data.json"
    data_files_dir = dataset_path / "images"

    if coco_json.exists():
        print(f"Existing data found at: {dataset_path}")
        return coco_json, data_files_dir

    try:
        # Download the tar file
        filename = SAMPLE_DATASET_URL.split('/')[-1]
        filepath = Path(temp_dir) / filename
        urllib.request.urlretrieve(SAMPLE_DATASET_URL, filepath)

        # Extract the tar file
        with tarfile.open(filepath, 'r') as tar:
            tar.extractall(temp_dir)

        (Path(temp_dir) / filepath.stem).rename(dataset_path)
        print(f"File downloaded and extracted to: {dataset_path}")

    except Exception as e:
        print("Error:", e)

    return coco_json, data_files_dir


# Create a Dataset From A Supported Format

Some common dataset formats can be read from out-of-the-box, and we plan to add more as time goes on.



In [None]:
COCO_JSON, DATA_FILES_DIR = get_sample_data()

In [None]:
ds = Dataset.read_coco(COCO_JSON)

In [None]:
show_dataset(ds)

# Initalize A Highlighter Client

In [None]:
api_token = "ADD API TOKEN HERE"  # See https://highlighter-docs.netlify.app/docs/how-to-guides/highlighter-credentials/ for more info
endpoint_url = "https://<YOUR ACCOUNT SUB DOMAIN>.highlighter.ai/graphql"


client = hl.HLClient.from_credential(api_token=api_token, endpoint_url=endpoint_url)
print(client)

# Upload The Images To A Data Source

**First create a Data Source in the Highlighter Web UI, note the id and come back**

You can find the ID in the URL

```
https://compuglobalhypermeganet.highlighter.ai/data_sources/#####
                                                            ^^^^^
                                                              |
                                Data Source ID -----------------

```


In [None]:
data_source_id = ToDo

_ = ds.upload_data_files(client, data_source_id, data_file_dir=DATA_FILES_DIR)

In [None]:
show_dataset(ds)

# Create Object Classes

Here we map the class names in the source dataset to Highlighter ObjectClass uuids. We will create
them in Highlighter if one of the same name does not already exist

In [None]:

# Get the unique object class names
adf = ds.annotations_df
object_class_names = adf[adf.attribute_id == str(hl.OBJECT_CLASS_ATTRIBUTE_UUID)].value.unique()

# This function checks if object classes exist by of the same name
# and is case incentive before creating them. Then returns a dict mapping
# the original name to the Highlighter ObjectClass.uuid
object_class_name_to_highlighter_uuid = hl.object_classes.create_object_classes(client, object_class_names)
print(object_class_name_to_highlighter_uuid)

# Create Workflow
The Workflow is where we store the annotations for a set of `data_files`. *In our case these data_files are images*

In [None]:
# If you already have a workflow_id set it here, if not leave as None
workflow_id = ToDo

if workflow_id is None:

    # Create an Workflow
    # Note: Workflow names must be unique
    workflow_name = "My Toy workflow 000"

    workflow = hl.create_workflow(client, name=workflow_name,
                             object_class_uuids=[str(i) for i in object_class_name_to_highlighter_uuid.values()])
    workflow_id = workflow.id
    print(workflow)


In [None]:
from highlighter.datasets.formats.highlighter.writer import HighlighterAssessmentsWriter

# Define the Dataset Writer
writer = HighlighterAssessmentsWriter(client,
                                      workflow_id,
                                      object_class_uuid_lookup=object_class_name_to_highlighter_uuid
                                      )

writer.write(ds)

In [None]:
ds.annotations_df

**Your data should now be visible in the Workflow you defined**

Below are some extra credit tutorials

---

---

# Create Dataset From A Custom Format

Many times you will be uploading data from a non standard format. The dataset we're working with is in
the popular Coco format which **is** supported by Highlighter. However, for the purpose of the exercisewe'll
do this manually.

The below code block loops through each image and creates a list of `ImagRecord`s then loops through each annotation and creates a list of `AttributeRecord`s. The `ImageRecord`s are pretty straight forward, but let us focus on the `AttributeRecord`s

In its simplest form each `AttributeRecord` requres:
  - `data_file_id`: This indicates the image the attribute belongs to
  - `value`: This is the value of the attribute, and
  - `entity_id`: This uniquely identifies an individual object or "thing" in an image or even across time or data sources. For example, in the block below we delibrately use the same `entity_id` for both the `PixelLocationAttributeValue` and `ObjectClassAttributeValue`. This tells Highlighter both attributes refer to the same "thing"

In [None]:
COCO_JSON, DATA_FILES_DIR = get_sample_data()

api_token = os.environ["HL_WEB_GRAPHQL_API_TOKEN"]
endpoint_url = os.environ["HL_WEB_GRAPHQL_ENDPOINT"]
client = hl.HLClient.from_credential(api_token=api_token, endpoint_url=endpoint_url)
print(client)

workflow_id = ToDo

In [None]:
from highlighter import read_object_classes, LabeledUUID
from highlighter.datasets.base_models import (
    ObjectClassAttributeValue,
    PixelLocationAttributeValue,
    AttributeRecord,
    ImageRecord
)

with open(COCO_JSON, 'r') as f:
    data = json.load(f)

# Get a lookup to map class names to object class uuids
object_class_uuid_lookup = {o.name: o.uuid for o in read_object_classes(client, process_id=workflow_id)}
cat_id_to_name = {c["id"]: c["name"] for c in data["categories"]}

# We use the ImageRecord BaseModel to validate the fields
# before adding them to the Dataset.
data_file_records = [ImageRecord(data_file_id=i["id"],
                             width=i["width"],
                             height=i["height"],
                             filename=i["file_name"],
                            ) for i in data["images"]]

attribute_records = []
for a in data["annotations"]:
    entity_id = str(uuid4())

    # Create an AttributeRecord with an ObjectClassAttributeValue by:
    #   - looking up the object_class_uuid from a dict
    #   - creating an LabeledUUID for the object class value. You can use LabeledUUID
    #     or UUID interchangably. LabeledUUID is simply used to make things readable
    #   - Append the AttributeRecord to attribute_records
    object_class_name = cat_id_to_name[a["category_id"]]
    object_class_uuid = object_class_uuid_lookup[object_class_name]
    object_class_value = LabeledUUID(object_class_uuid, label=object_class_name)
    object_class_attribute_value = ObjectClassAttributeValue(value=object_class_value)

    attribute_records.append(
        AttributeRecord.from_attribute_value(
            a["image_id"],
            object_class_attribute_value,
            entity_id=entity_id,
        )
    )

    # Create an AttributeRecord with an PixelLocationAttributeValue by:
    #   - using the PixelLocationAttributeValue helper function to from_left_top_width_height_coords
    #   - Append the AttributeRecord to attribute_records
    pixel_location_attribute_value = PixelLocationAttributeValue.from_left_top_width_height_coords(a["bbox"])

    # Create an PixelLocation AttributeValue
    attribute_records.append(
        AttributeRecord.from_attribute_value(
            a["image_id"],
            pixel_location_attribute_value,
            entity_id=entity_id,
        )
    )

ds = Dataset(attribute_records=attribute_records, data_file_records=data_file_records)

# Upload files as needed and update data_file_ids
_ = ds.upload_data_files(client, data_source_id, data_file_dir=DATA_FILES_DIR)

show_dataset(ds)

In [None]:
from highlighter.datasets.formats.highlighter.writer import HighlighterAssessmentsWriter

# Define the Dataset Writer
writer = HighlighterAssessmentsWriter(client,
                                      workflow_id)
writer.write(ds)

---
---

# Create Submissions By Performing Inference On Images In Highlighter

Finally. If you have images alread stored in Highligher and you want to do predictions on those images and upload the results to Highligher you can follow a similar process, but without the needing to create `ImageRecords` becuause the images are already in Highlighter.

We assume we're looping over a directory of image files with their filename matching their Highlighter `data_file_id`. To set this up we're going to create a directory containing symlinks `<data_file_id>.jpg` that refer to the original image paths.

In [None]:
symlink_dir: Path = DATA_FILES_DIR.parent / "hl_id_symlinks"
symlink_dir.mkdir(exist_ok=True)

for data_file_id, filename in ds.data_files_df.loc[:, ["data_file_id", "filename"]].values:
    original_file_path = Path(filename)
    link_path = symlink_dir / f"{data_file_id}{original_file_path.suffix}"
    assert original_file_path.absolute().exists()
    link_path.hardlink_to(original_file_path.absolute())

!ls {symlink_dir}

In [None]:
COCO_JSON, DATA_FILES_DIR = get_sample_data()

api_token = os.environ["HL_WEB_GRAPHQL_API_TOKEN"]
endpoint_url = os.environ["HL_WEB_GRAPHQL_ENDPOINT"]
client = hl.HLClient.from_credential(api_token=api_token, endpoint_url=endpoint_url)
print(client)

workflow_id = ToDo

In [None]:
import numpy as np
from typing import List
from uuid import uuid4
from highlighter import read_object_classes, HLClient
from highlighter.datasets.base_models import (
    ObjectClassAttributeValue,
    PixelLocationAttributeValue,
    AttributeRecord,
    ImageRecord
)
from highlighter import io


class MyAwesomeShapePredictor():

    def __init__(self, object_class_uuids):
        self.object_class_uuids = object_class_uuids


    def convert_output_to_attribute_records(self, raw_model_output: tuple, image_id: int) -> List[AttributeRecord]:

        bbox, class_id, conf = raw_model_output

        object_class_uuid = self.object_class_uuids[class_id]
        object_class_attribute_value = ObjectClassAttributeValue(value=object_class_uuid,
                                                                 confidence=conf)

        bbox_attribtue_value = PixelLocationAttributeValue.from_left_top_width_height_coords(bbox,
                                                                                             confidence=conf)

        entity_id = uuid4()
        attribute_records: List[AttributeRecord] = [

            AttributeRecord.from_attribute_value(
            image_id,
            object_class_attribute_value,
            entity_id=entity_id,
            ),

            AttributeRecord.from_attribute_value(
            image_id,
            bbox_attribtue_value,
            entity_id=entity_id,
            ),

        ]

        return attribute_records



    def get_mock_predictions(self, image):

        random_x = np.random.randint(0, image.shape[1]/2)
        random_y = np.random.randint(0, image.shape[0]/2)
        random_w = np.random.randint(0, image.shape[1]/2)
        random_h = np.random.randint(0, image.shape[0]/2)
        random_bbox = (random_x, random_y, random_w, random_h)

        random_class_id = np.random.randint(low=0, high=len(self.object_class_uuids))
        random_conf = np.random.uniform()
        return (random_bbox, random_class_id, random_conf)

    def predict(self, image_path: Path):
        image_id = image_path.stem
        image_np = io.read_image(image_path)
        raw_model_output = self.get_mock_predictions(image_np)
        attribute_records = self.convert_output_to_attribute_records(raw_model_output, image_id)
        return attribute_records

client = HLClient.from_env()
object_class_uuids = [o.uuid for o in read_object_classes(client, process_id=workflow_id)]
predictor = MyAwesomeShapePredictor(object_class_uuids)

In [None]:

attribute_records: List[AttributeRecord] = []
for image_path in symlink_dir.glob("*.jpg"):

    records = predictor.predict(image_path)
    attribute_records.extend(records)

In [None]:
ds = Dataset(attribute_records=attribute_records)
show_dataset(ds)

In [None]:
from highlighter.datasets.formats.highlighter.writer import HighlighterAssessmentsWriter

# Define the Dataset Writer
writer = HighlighterAssessmentsWriter(client,
                                      workflow_id,
                                        )

writer.write(ds)