# Scenario 7: Extracting food entities out of food recall announcements
This notebook is complementary material for the walkthrough scenario **"Extracting food entities out of food recall announcements"** used the STELAR KLMS
It is not intended to be run as a standalone notebook. It **requires access to a deployment of STELAR KLMS** and an **account** on the respective instance. 

Some of the instances used during the evaluation period of the STELAR Project are:

Internal Pilot Instance: https://klms.stelar.gr

Public Sandbox Instance: https://sandbox.stelar.gr


*If you don't have an account on the STELAR KLMS, you can create one on the respective instance. 
Kindly note that the internal pilot instance is only accessible to STELAR project members, while the public sandbox instance is open to everyone by registration.*

---
# Overview

This notebook is intended to run the Generic NER in order to showcase the tool in a task where a user needs to identify foods mentioned in lengthy food recall announcements.

### Prerequisites

- Fill in your accounts credentials in the block below. 
- Select datasets according to the walkthrough directions.
- Ensure you have a modern python version installed (3.9 or later).
- Install the STELAR Python SDK and any other required libraries (`pip install stelar_client --upgrade`).

### Instatiate a STELAR Client object
**Modify credentials and base URL as needed.**

In [None]:
from stelar.client import Client, Dataset, TaskSpec, Process
from datetime import datetime

# Base URL
# Sandbox: https://sandbox.stelar.gr
# Internal Pilots: https://klms.stelar.gr

BASE_URL = "https://sandbox.stelar.gr"
USERNAME = "your_username"  # Replace with your username
PASSWORD = "your_password"  # Replace with your password

c = Client(base_url=BASE_URL, username=USERNAME, password=PASSWORD)
print(f"Connected to STELAR KLMS @ {c._base_url} as {c._username}")

### Select the generic-ner-test-sample dataset

In [None]:
ner_test_dataset = c.datasets["generic-ner-test-sample"]
print(f"Selected Dataset: {ner_test_dataset.id} | {ner_test_dataset.title}")
print(f"Browse the dataset at: {c._base_url}/console/v1/catalog/{ner_test_dataset.id}")

### Create/Select a Workflow Process to run the NER task

In [None]:
ORGANIZATION = "stelar-klms"

try:
    proc = c.processes.create(**{
        "title": "Evaluation Workflow for " + c._username,
        "name": "evaluation-workflow-" + c._username,
        "organization": c.organizations[ORGANIZATION]
    })
    print(f"Created new process for evaluation: {proc.id} | {proc.title}")
except Exception as e:
    proc = c.processes["evaluation-workflow-" + c._username]
    print(f"Using existing process for evaluation: {proc.id} | {proc.title}")

### Create a dataset to store the results of the NER task

In [None]:
ORGANIZATION = "stelar-klms"

try:
    res_dset = c.datasets.create(**{
        "title": "Annotated NER Test Dataset for " + c._username,
        "name": "annotated-ner-test-dataset-" + c._username,
        "organization": c.organizations[ORGANIZATION],
        "notes": "Annotated NER Test Dataset for " + c._username,
    })
    print(f"Created new dataset for annotated NER test dataset: {res_dset.id} | {res_dset.title}")
except Exception as e:
    res_dset = c.datasets["annotated-ner-test-dataset-" + c._username]
    print(f"Using existing dataset for annotated NER test dataset: {res_dset.id} | {res_dset.title}")

### Prepare the Generic NER task

Instantiate a simple ontology to link the identified foods on!

In [None]:
food_entity_ontology = ["meat", "fish", "dairy", "vegetable", "fruit", "grain", "herb", "spice", "other"]

In [None]:
# Start building the TaskSpec for Generic NER
t = TaskSpec(tool="generic-ner",image="0.0.6", name="Generic NER for "+c._username)

# Define the local dataset aliases
t.d(alias='d0', dset=res_dset)

# Define the inputs
t.i(context_files=str(ner_test_dataset.resources[0].id))

# Set the parameters for Generic NER
t.p(
    translation_option=False,
    translation_method="deep-translator",
    class_of_interest="food",
    summary_option=True,
    summary_model="groq:llama-3.1-8b-instant",
    summary_base_url='null',
    summary_model_instance='null',
    main_entity_selection_option=False,
    main_entity_selection_type="single",
    main_entity_model="groq:llama-3.1-8b-instant",
    main_entity_base_url='null',
    main_entity_model_instance='null',
    ner_method="llm",
    ner_model="groq:llama-3.1-8b-instant",
    ner_base_url='null',
    ner_custom_prompt='null',
    ner_model_instance='null',
    entity_linking_method="chromadb_aug",
    entity_linking_model="groq:llama-3.1-8b-instant",
    entity_linking_base_url='null',
    entity_linking_model_instance='null',
    ontology=food_entity_ontology,
    entity_linking_k=3,
    entity_linking_augm=False,
    entity_linking_model_augm='null',
    evaluation_option=True,
    gold_standard_index="golden_annotations",
    texts_index="text",
    output_columns={
        "annotations": "annotations", 
        "linked_annotations": "linked_annotations"
    }
)

# Set the outputs
timestamp= datetime.now().strftime("%Y%m%d%H%M%S")
t.o(annotations_file = {"url": "s3://klms-bucket/evaluation/experiments/proc-" + str(proc.id) + f"/annotation_{timestamp}_{c._username}.csv",
                       "resource":{"name": "Annotations for NER Test Dataset",
                                   "relation": "annotations","format": "CSV"},
                        "dataset": "d0"})


The default options for LLM provider in this notebook use GROQ API Platform.
To use models from GROQ you can benefit from the free-tier by signing up in the platform and generating a new API key. 
Be aware that the free-tier has limitations in use, which should not affect the current showcase execution but might affect bigger ones. 

Issue your token here: https://console.groq.com/keys

In [None]:
# Set the API key for GROQ
test_json = t.spec()
test_json['secrets'] = {'api_key': 'gsk_api_key'}

In [None]:
test_json['process_id'] = str(proc.id)
 
resp = c.POST("v2/task", **test_json).json()['result']
print(f"Task {resp['id']} is running. Check the status at: {c._base_url}/console/v1/task/{str(proc.id)}/{str(resp['id'])}")