# Presentazione esperimento

**Panoramica**
Secondo step sull'easy set: arricchisce la pipeline dati e aggiorna l'engine per sfruttare metadati aggiuntivi pur rispondendo solo alle domande Easy.

**Pipeline dati**
- Ingestione identica al baseline ma con classificazione (`classify_menu`) per assegnare metadati di contesto.
- Estrazione in due step: prima normalizzazione e tagging tramite classificazione/aggregazione, poi `extract_info_from_menus` costruisce record strutturati.
- Mapping unificato `create_mappings` che centralizza ingredienti, tecniche e riferimenti testuali.

**Evoluzione dell'engine**
- Passaggio a `engine_medium`, ereditando i tool Easy e aggiungendo pianeti (`get_planet_dish_ids`), ristoranti (`get_restaurant_dish_ids`) e licenze (`get_chef_licence_dish_ids`).
- Introduzione di `union_dish_ids` per gestire query additive oltre a intersect/subtract.
- Valutazione con `evaluate_easy_questions` per confrontare direttamente il nuovo toolset sulle stesse domande.

**Ruolo nel percorso**
Serve per quantificare quanto le informazioni extra e il toolset medium migliorino le risposte Easy prima di passare alle domande Medium.

**Performance (Jaccard)**: 100%



# Setup

In [None]:
import os
from pathlib import Path
import sys


	
cwd = Path.cwd().resolve()
project_dir = cwd.parent.parent

if str(project_dir) not in sys.path:
	sys.path.insert(0, str(project_dir))
	

dataset_file_path = project_dir / "Dataset"
artifacts_file_path = cwd / "artifacts"

if not os.path.exists(artifacts_file_path):
        os.makedirs(artifacts_file_path)

# Preprocessing

## Parsing e aggregazione

In [None]:
from src.preprocessing.menu_ingestion import group_and_concatenate_documents, parse_documents_in_directory
from src.utils import write_json


menus_path = dataset_file_path / "Knowledge_base" / "menu"
documents_pages = parse_documents_in_directory(document_path=menus_path)
documents = group_and_concatenate_documents(documents=documents_pages)
write_json(documents, artifacts_file_path / "parsed_menus.json")

for doc_name, doc_text in list(documents.items())[:5]:
    print(f"Document: {doc_name}\nContent Preview: {doc_text[:100]}...\n")

## Classificazione

In [None]:
from src.preprocessing.menu_classification import classify_menu

classifications = classify_menu(text_extracted=documents, model_name="gpt-4.1")
write_json(classifications, artifacts_file_path / "menu_classifications.json")

for doc_name, classification in list(classifications.items())[:5]:
    print(f"Document: {doc_name}\nClassification: {classification}\n")

## Estrazione strutturata

In [None]:
from src.preprocessing.menu_extraction import extract_info_from_menus

extracted_info = extract_info_from_menus(documents=documents, classifications=classifications, model_name="gpt-4.1")
write_json(extracted_info, artifacts_file_path / "extracted_menu_info.json")

for info in extracted_info[:5]:
    restaurant_name = info.get("restaurant_name", "Unknown")
    print(f"Document: {restaurant_name}\nExtracted Info: {info}\n")

## Creazione di mapping

In [None]:
from src.preprocessing.menu_mapping import create_mappings_technique_ingredient
from src.utils import read_json

dish_mapping = read_json(dataset_file_path / "ground_truth" / "dish_mapping.json")
ingredient_to_dishes, technique_to_dishes = create_mappings_technique_ingredient(extracted_info=extracted_info, dish_mapping=dish_mapping)

write_json(technique_to_dishes, artifacts_file_path / 'technique_to_dishes.json')
write_json(ingredient_to_dishes, artifacts_file_path / 'ingredient_to_dishes.json')

# Engine

## Agente conversazionale / interrogazioni

In [None]:
from src.ai.agents.engine_easy import get_agent, query_dish_ids

agent = get_agent(model_name="grok-4-1-fast-reasoning")
response = query_dish_ids(question="Quali sono i piatti che includono le Chocobo Wings come ingrediente?", agent=agent)
print(response)

## Valutazione

In [None]:
from src.evaluation.questions_evaluation import evaluate_questions


question_path = dataset_file_path / "domande.csv"
ground_truth_path = dataset_file_path / "ground_truth" / "ground_truth_mapped.csv"
eval_df = evaluate_questions(agent=agent, question_path=question_path, ground_truth_path=ground_truth_path, level="easy")
eval_df.to_csv(artifacts_file_path / "easy_questions_evaluation_results.csv", index=False)

In [None]:
score =  round(eval_df['score'].mean()*100, 2)
print(f"Accuracy: {score}%")