# Building and Handling Textual MOCs

The notebook is associated to the submitted paper **Encapsulating Textual Contents into a MOC data structure for Advanced Applications**.
The notebook outlines the basic functionalities of a new approach that integrates textual descriptions directly into the JSON representation of MOC, enabling simultaneous semantic and spatial operations. After demonstrating some basic applications and its potential use for educational gamification, we will later showcase its applicative capabilities in generative AI (GenAI).

The tutorials are organized in the following folders
1. [tuto1_TextualMOC](https://github.com/ggreco77/TextualMOC/tree/main/tuto1_TextualMOC) basic application to build a Textual MOC
2. [AladinGame](https://github.com/ggreco77/TextualMOC/tree/main/AladinGame) using Text MOC for EDU game in Aladin Lite
3. [tuto2_TextualMOC](https://github.com/ggreco77/TextualMOC/tree/main/tuto2_SemanticMOC) Creating Semantic MOC for application in Generative AI systems

#### Version 0.0.7 - September 2025

This notebook is divided into the following sections.

1. [**Textual MOC Powered by GenAI**](#Textual-MOC-Powered-by-GenAI)

   - [Semantic MOCs Generation and Management](#Semantic-MOCs-Generation-and-Management)  
   - [RAG with textual MOC and Vision Models](#RAG-with-textual-MOC-and-Vision-Models)

 Basic Methods for Handling Textual MOCs

 
 Here are some basic applications of the **Textual MOC**, which enhances ordinary MOCs by encapsulating textual content. The `TextualMOC` class is designed to interact with a Multi-Order Coverage (MOC) object, enabling serialization, modification, and extension of MOC data with additional textual descriptions and image. The `__init__` method initializes the TextualMOC class with an optional MOC object. If a MOC object is provided, it is serialized into JSON format. Additionally, an `ipyaladin` widget is initialized for later use in visualizing the MOC.

For using methods that transform textual content into semantic embeddings, we recommend installing and running Ollama - https://ollama.com/.

**The complete list of methods is provided below**.

In [None]:
# Importing the necessary libraries

import json
import os
from datetime import datetime
import requests
#from copy import deepcopy

import numpy as np

from IPython.display import display
import ipywidgets as widgets

from ipyaladin import Aladin

from mocpy import MOC
import healpy as hp

import matplotlib.pyplot as plt

import astropy.units as u

from bs4 import BeautifulSoup

from langchain.embeddings import OllamaEmbeddings

from sklearn.metrics.pairwise import cosine_similarity
from sklearn.decomposition import PCA
from sklearn.manifold import TSNE

**While we wait for an official library for textual MOCs,  we import some of the main methods required for the tutorial to work.**

In [None]:
#While we wait for an official library for textual MOCs, 
# we import some of the main methods required for the tutorial to work.


import requests
from pathlib import Path

url = "https://raw.githubusercontent.com/ggreco77/TextualMOC/refs/heads/main/textualmoc/textual_moc.py"
dest = Path("textual_moc.py")  # Change the path/name if you want

if dest.exists():
    print(f"{dest} already exists; skipping download.")
else:
    with requests.get(url, stream=True, timeout=30) as r:
        r.raise_for_status()
        with open(dest, "wb") as f:
            for chunk in r.iter_content(chunk_size=8192):
                if chunk:
                    f.write(chunk)
    print("Saved to:", dest.resolve())

# importing TextualMOC class testing
from textual_moc import TextualMOC

In [None]:
# List of Methods in TextualMOC Class¶

import pandas as pd
from IPython.display import display, Markdown

methods = [
    {
        "Method": "add_text_media_image",
        "Description": "Adds text, media and image to `TextualMOC` by reading from a file or a URL."
    },
    {
        "Method": "annotate_cell",
        "Description": "Assigns a textual annotation to a specific MOC cell within the JSON data structure."
        },
    {
        "Method": "embedding_from_custom_text",
        "Description": "Generates an embedding of the text using a specified service and model."
    },
    {
        "Method": "load_textual_moc",
        "Description": "Loads an instance of `TextualMOC` from a JSON file."
    },
    {
        "Method": "plot_moc_area",
        "Description": "Visualizes the MOC area using matplotlib."
    },
    {
        "Method": "show_image_value",
        "Description": "Prints the image URL stored in the MOC data as a clickable link."
    },
    {
        "Method": "show_media_value",
        "Description": "Prints the multimedia URL stored in the MOC data."
    },
    {
        "Method": "show_metadata_value",
        "Description": "Prints metadata information such as author, date, and last text update."
    },
    {
        "Method": "show_text_value",
        "Description": "Prints the custom text stored in the MOC data."
    },
    {
        "Method": "render",
        "Description": "Loads the MOC from a JSON file and displays text, media, MOC area, metadata, image and embedding if present."
    },
    {
        "Method": "render_ipyaladin",
        "Description": "Displays the MOC in an Aladin viewer with defined colors, transparency, and HiPS."
    },
    {
        "Method": "save",
        "Description": "Saves the current state of `TextualMOC` in JSON format."
    },
    {
        "Method": "update_metadata",
        "Description": "Updates metadata such as author and date in the MOC's JSON data."
    },
    {
        "Method": "update_text_inline",
        "Description": "Appends new text to the custom text stored in the MOC's JSON data."
    },
#    {
#        "Method": "union",
#        "Description": "Merges the current MOC instance with another instance of `TextualMOC`."
#    }
]

# Create the DataFrame
df_methods = pd.DataFrame(methods)

# Sort the DataFrame alphabetically by the method name
df_methods = df_methods.sort_values(by="Method").reset_index(drop=True)

# Adjust the index to start at 1 instead of 0
df_methods.index = df_methods.index + 1
df_methods.index.name = 'No.'

# Prevent pandas from truncating the descriptions
pd.set_option('display.max_colwidth', None)

# Define the title
title = "# List of Methods in `TextualMOC`"

# Display the title and the table
display(Markdown(title))
display(df_methods)

# Textual MOC Powered by GenAI

## Semantic MOCs Generation and Management

In the previous sections, we introduced Textual MOCs and developed some examples of how they can be applied. Now we proceed to semantic MOCs, where the textual content is transformed into embeddings. This involves converting the associated text of each MOC into numerical vectors using machine learning models, enabling advanced analysis and comparison.

The `embedding_from_custom_text` method of  `TextualMOC` class is used to generate numerical representations (embeddings) of the custom text stored in a textual MOC, allowing it to capture semantic meaning in a multidimensional space for easier analysis and comparison. The function first checks whether the MOC contains the `custom_text` key; if present, it extracts the text for processing; otherwise, it prints an informative message. Next, an embedding model is instantiated using [OllamaEmbeddings](https://python.langchain.com/docs/integrations/text_embedding/ollama/) from [LangChain](https://www.langchain.com/), with the default model set to `nomic-embed-text`, which can be modified by the user via a parameter. Once the model is selected, it generates a numerical vector representation of the text that is then stored in the MOC under the `embedding` key.  Additionally, the name of the model used is saved in the `model` key to ensure traceability. 

For using that method, we recommend installing and running Ollama - https://ollama.com/.


#### From Textual MOC to Semantic MOC

In [None]:
# Textual MOC path
file_path = "textual_moc_example.json"

# Instantiate TextualMOC
textual_moc = TextualMOC()

# Loading text content form the Textual MOC
textual_moc.load_textual_moc(file_path)  

# Generating an embedding using a specified service and model; here  "nomic-embed-text" is used
textual_moc.embedding_from_custom_text(embeddings_model="nomic-embed-text-v2-moe")

# Check if the embedding was added correctly
print("Embedding present?", "embedding" in textual_moc.moc_data)
print("Model used:", textual_moc.moc_data.get("embedding_model", "No model"))
print("Embedding size:", len(textual_moc.moc_data.get("embedding", [])))

# Save the Semantic MOC in a file
textual_moc.save("moc_data_with_embedding.json")

#### Managing a Semantic MOC

In [None]:
# Semantic MOC path - textual MOC with semantic embedding
file_path = "moc_data_with_embedding.json"

# Create a TextualMOC instance
textual_moc = TextualMOC()

# Load the local MOC 
textual_moc.load_textual_moc(file_path)

# Check if the embedding is present
if "embedding" in textual_moc.moc_data:
    embedding = textual_moc.moc_data["embedding"]
    embedding_model = textual_moc.moc_data.get("model", "Model not specified")

    print("Embedding is present in the MOC.")
    print(f"Embedding model used: {embedding_model}")
    print(f"Embedding size: {len(embedding)}")

    # If you want to print a portion of the embedding (e.g., the first 5 values):
    print("First 5 dimensions of the embedding:", embedding[:5])
else:
    print("No embedding found in the MOC.")

# Print other metadata for verification
textual_moc.show_metadata_value()  
textual_moc.show_text_value()

# Vector Databases, RAG and Visual Models

### Data generation for GenAI applications

Here, we generate a set of spatial MOCs corresponding to the coordinates of astronomical objects of interest. As an illustrative example, we consider well-known spiral, elliptical, and irregular galaxies, some of which exhibit clear evidence of tidal interactions. The process for creating spatial MOCs is performed using SIMBAD, which provides the coordinates of each object; subsequently, circular MOCs are generated around these coordinates, adopting a predefined radius.

In addition, we generate a set of textual MOCs that describe each astronomical object, incorporating an "image" key. The corresponding images are retrieved from the hips2fits server. Specifically, in this case, the HiPS2FITS service utilizes an HTTP-based web API, where parameters are directly passed within the URL as query string parameters.

These textual/semantic MOCs, each containing a reference to a HiPS image, are stored in a directory for subsequent processing by generative AI models, both textual and visual, as previously described.


In [None]:
import os
import json
import matplotlib.pyplot as plt
from astroquery.simbad import Simbad
from astropy.coordinates import SkyCoord, Longitude, Latitude, Angle
import astropy.units as u
from mocpy import MOC

# Object galaxy list
galaxies = ["Arp273", "M59", "NGC4676", "M101", "M60", "NGC4993", 
            "M104", "M82", "NGC4038", "M51", "M87"]

# Building Space MOC at the galaxy position
for galaxy in galaxies:
    result = Simbad.query_object(galaxy)

    if result is not None:
        # Extracting RA/DEC
        ra = result["RA"][0]  
        dec = result["DEC"][0]  
        coords = SkyCoord(ra, dec, unit=(u.hourangle, u.deg))

        print(f"Coordinates "+ galaxy)
        print(f"RA  (Right Ascension) : {coords.ra.deg}°")
        print(f"DEC (Declination)      : {coords.dec.deg}°")

        # --- Step 2: Creating circle MOC with radius = 0.1° ---
        radius = Angle(0.1, u.deg)  # fixed radius
        moc = MOC.from_cone(
            lon=Longitude(coords.ra),
            lat=Latitude(coords.dec),
            radius=radius,
            max_depth=14  # MOC resolution
        )

        # --- Step 3: Creating moc_gals dir ---
        save_dir = "moc_gals"
        os.makedirs(save_dir, exist_ok=True)

        # --- Step 4: Saving MOC in json format ---
        moc_json = moc.serialize(format='json')
        save_path = os.path.join(save_dir, galaxy+".json")

        with open(save_path, "w") as json_file:
            json.dump(moc_json, json_file, indent=4)

        print(f"Space MOC saved in  {save_path}")

    else:
        print("No object in Simbad.")

In [None]:
##### Getting image from hips2fits - https://alasky.cds.unistra.fr/hips-image-services/hips2fits
Arp273_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.1&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=Arp%20273&format=jpg"
NGC4676_ima= "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.1&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=NGC%204676&format=jpg"
NGC4038_ima ="https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FHST%2FEPO&width=1000&height=1000&fov=0.1&projection=SIN&coordsys=icrs&rotation_angle=0.0&ra=180.4790760656&dec=-18.884864677&format=jpg"
M51_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.3&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M51&format=jpg"
M101_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.35&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M101&format=jpg"
M104_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.2&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M104&format=jpg"
M87_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.15&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M87&format=jpg"
M59_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.1&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M59&format=jpg"
M60_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.15&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M60&format=jpg"
NGC4993_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.04&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=NGC%204993&format=jpg"
M82_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.2&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=M82&format=jpg"
#NGC4449_ima = "https://alasky.cds.unistra.fr/hips-image-services/hips2fits?hips=CDS%2FP%2FDSS2%2Fcolor&width=1200&height=900&fov=0.15&projection=SIN&coordsys=icrs&rotation_angle=0.0&object=NGC%204449&format=jpg"

# Short description text
Arp273_text = "The galaxies' twisted and distorted appearance is due to mutual gravitational tides as the pair engage in \
close encounters. Cataloged as Arp 273 (also as UGC 1810), these galaxies do look peculiar, but interacting galaxies are now \
understood to be common in the universe. Closer to home, the large spiral Andromeda Galaxy is known to be some 2 million light-years away \
and inexorably approaching the Milky Way. In fact the far away peculiar galaxies of Arp 273 may offer an analog of the far future \
encounter of Andromeda and Milky Way. Repeated galaxy encounters on a cosmic timescale ultimately result in a merger into a \
single galaxy of stars. From our perspective, the bright cores of the Arp 273 galaxies are separated by only a little over 100,000 light-years. "

NGC4676_text= "This colliding pair of spiral galaxies is known as 'The Mice' because of the long tails of stars and gas emanating \
from each galaxy. Otherwise known as NGC 4676, they will eventually merge into a single giant galaxy. In the galaxy at left, \
the bright blue patch can be identified as a cascade of clusters and associations of young, hot blue stars, whose \
formation has been triggered by the tidal forces of the gravitational interaction. Streams of material can also be seen \
flowing between the two galaxies in this Hubble Space Telescope image. \
The clumps of young stars in the long, straight tidal tail (upper right) are separated by fainter regions \
of material. These dim regions suggest that the clumps of stars have formed from the gravitational \
collapse of the gas and dust that once occupied those areas. Some of the clumps have luminous masses \
comparable to dwarf galaxies that orbit in the halo of our own Milky Way."

NGC4038_text = "This new NASA Hubble Space Telescope image of the Antennae galaxies is the sharpest yet \
of this merging pair of galaxies. During the course of the collision, billions of stars will be formed. The brightest \
and most compact of these star birth regions are called super star clusters. The two spiral galaxies started to interact \
a few hundred million years ago, making the Antennae galaxies one of the nearest and youngest examples of a pair of colliding galaxies. \
Nearly half of the faint objects in the Antennae image are young clusters containing tens of thousands of stars. \
The orange blobs to the left and right of image center are the two cores of the original galaxies and consist mainly \
of old stars criss-crossed by filaments of dust, which appears brown in the image. The two galaxies are dotted with brilliant \
blue star-forming regions surrounded by glowing hydrogen gas, appearing in the image in pink."

M51_text = "M 51, an interacting spiral galaxy, which is also known as the Whirlpool Galaxy. It is located about 25 Million light \
years away from Earth, but can still easily observed with a small telescope by amateur astronomers. M 51 is also a popular object among \
professional astronomers as it shows an ongoing enhanced star formation rate, which is probably caused by the interaction with its \
companion galaxy. The galaxy was also the location of two supernovae within the last couple of years: \
The first one appeared in 2005, the second one in 2011."

M101_text = "Messier 101 is a classic, face-on, pinwheel spiral galaxy. The giant spiral disk of stars, dust and gas is 170,000 light-years across — nearly \
twice the diameter of our galaxy, the Milky Way. M101 is estimated to contain at least one trillion stars. The galaxy’s spiral arms are \
sprinkled with large regions of star-forming nebulas. These nebulas are areas of intense star formation within giant molecular \
hydrogen clouds. Brilliant, young clusters of hot, blue, newborn stars trace out the spiral arms. Pierre Méchain, one of \
Charles Messier’s colleagues, discovered the Pinwheel galaxy in 1781. Located 25 million light-years away from Earth in \
the constellation Ursa Major, M101 has an apparent magnitude of 7.9. It can be spotted through a small telescope and is \
most easily observed during June."

M104_text = "One of most famous spiral galaxies is Messier 104, widely known as the 'Sombrero' (the Mexican hat) because of \
its particular shape. It is located towards the constellation Virgo, at a distance of about 30 million light-years \
and is the 104th object in the famous catalogue of deep-sky objects by French astronomer Charles Messier (1730 - 1817).\
This luminous and massive galaxy has a total mass of about 800 billion suns, and is notable for its dominant nuclear bulge,\
composed mainly of mature stars, and its nearly edge-on disc composed of stars, gas, and dust. The complexity of this dust \
is apparent directly in front of the bright nucleus, but is also evident in the dark absorbing lanes throughout the disc. \
A large number of small, diffuse objects can be seen as a swarm in the halo of Messier 104. Most of these are globular clusters,\
similar to those found in our own Milky Way, but Messier 104 has a much larger number of them."

M87_text = "The elliptical galaxy M87 is the home of several trillion stars, a supermassive black hole and a family of roughly\
15,000 globular star clusters. For comparison, our Milky Way galaxy contains only a few hundred billion stars and about 150 globular \
clusters. The monstrous M87 is the dominant member of the neighboring Virgo cluster of galaxies, which contains some 2,000 galaxies. \
Discovered in 1781 by Charles Messier, this galaxy is located 54 million light-years away from Earth in the constellation Virgo. \
It has an apparent magnitude of 9.6 and can be observed using a small telescope most easily in May."

M59_text = "M59 is one of the largest elliptical galaxies in the Virgo galaxy cluster. However, it is still considerably less massive, \
and at a magnitude of 9.8, less luminous than other elliptical galaxies in the cluster. A supermassive black hole around 270 million times \
as massive as the Sun resides at the center of M59. The galaxy also has an inner disk of stars and around 2,200 globular clusters, \
an exceptionally high number of such clusters. The central region of the galaxy, the inner 200 light-years, rotates in the opposite \
direction than the rest of the galaxy and is the smallest region in a galaxy known to exhibit this behavior. \
Approximately 60 million light-years from Earth, M59 can be found near M58 and M60 in the constellation Virgo. It is best seen in May. \
Small telescopes might reveal an ellipsoidal shape with a bright center, but even larger scopes do not reveal much detail. \
German astronomer Johann Gottfried Koehler discovered M59 and the nearby galaxy M60 in the spring of 1779 when observing the comet \
of that year (Comet Bode)."

M60_text = "The Virgo cluster is a collection of more than 1,300 galaxies, including the elliptical galaxy M60. Unlike spiral galaxies, \
elliptical galaxies lack an organized structure and are nearly featureless, resembling the core of a spiral galaxy. \
The Virgo cluster’s third brightest member, M60 has a diameter of 120,000 light-years and is as massive as one trillion suns. \
At its center lies a huge black hole, 4.5 billion times as massive as the sun — one of the most massive black holes ever found. \
NGC 4647 is about two-thirds the size of M60 — or roughly the size of the Milky Way galaxy — and is much less massive. \
The two galaxies form a pair known as Arp 116. Astronomers have long tried to determine whether these two galaxies are actually interacting. \
Although from Earth they appear to overlap,there is no evidence of new star formation, which would be one of the clearest signs that \
the two galaxies are indeed interacting. However, recent studies of very detailed Hubble images suggest the onset of some tidal \
interaction between the two."

NGC4993_text = "The elliptical galaxy NGC 4993, about 130 million light-years from Earth, viewed with the VIMOS instrument on the European Southern Observatory's Very Large Telescope \
in Chile.After the almost simultaneous detection of gravitational waves by the LIGO/Virgo collaboration, GW170817, and of a gamma-ray burst \
by ESA's INTEGRAL and NASA's Fermi satellites, GRB170817, a large number of ground and space telescopes started searching for the source in the sky. \
About half a day later, scientists at various optical observatories spotted something new near the core of galaxy NGC 4993: \
this was the visible light counterpart to the gravitational waves and the gamma-ray burst, confirming that they originated \
from the collision of two neutron stars. The result of such a cosmic clash is a kilonova: the neutron-rich material released \
in the merger is impacting its surroundings, forging a wealth of heavy elements in the process. The kilonova can be seen just above \
and slightly to the left of the centre of the galaxy, AT2017gfo."

M82_text = "Located 12 million light-years away, M82 appears high in the northern spring sky in the direction of the constellation \
Ursa Major, the Great Bear. It is also called the 'Cigar Galaxy' because of the elongated elliptical shape produced by \
the tilt of its starry disk relative to our line of sight. As shown in this mosaic image, M82 is a magnificent starburst galaxy. \
Throughout its central region young stars are being born ten times faster than they are inside in our Milky Way Galaxy.\
These numerous hot new stars not only emit radiation but also charged particles that form the so-called stellar wind. \
Stellar winds streaming from these stars combine to form a galactic 'superwind'."

#NGC4449_text = ""

# Multimedia URL from text is provided - part of the text has been adapted.
Arp273_media = "https://apod.nasa.gov/apod/ap250109.html"
NGC4676_media= "https://science.nasa.gov/image-detail/idl-tiff-file-40/"
NGC4038_media ="https://hubblesite.org/contents/media/images/2006/46/1995-Image"
M51_media = "https://esahubble.org/images/opo0521b/"
M101_media = "https://science.nasa.gov/mission/hubble/science/explore-the-night-sky/hubble-messier-catalog/messier-101/"
M104_media = "https://www.eso.org/public/images/sombrero/"
M87_media = "https://science.nasa.gov/mission/hubble/science/explore-the-night-sky/hubble-messier-catalog/messier-87/#:~:text=The%20elliptical%20galaxy%20M87%20is,and%20about%20150%20globular%20clusters."
M59_media = "https://science.nasa.gov/mission/hubble/science/explore-the-night-sky/hubble-messier-catalog/messier-59/"
M60_media = "https://science.nasa.gov/mission/hubble/science/explore-the-night-sky/hubble-messier-catalog/messier-60/"
NGC4993_media = "https://sci.esa.int/web/integral/-/59671-new-source-in-galaxy-ngc-4993"
M82_media = "https://www.esa.int/Science_Exploration/Space_Science/Hubble_s_view_of_Cigar_Galaxy_on_sixteenth_mission_anniversary"
#NGC4449_media = ""

# Text description list
text_files = [Arp273_text, M59_text, NGC4676_text, M101_text, M60_text, NGC4993_text, 
              M104_text,M82_text, NGC4038_text, M51_text, M87_text]

# Multimedia link list
multimedia_urls = [Arp273_media, M59_media, NGC4676_media, M101_media, M60_media, NGC4993_media, 
                   M104_media,M82_media, NGC4038_media, M51_media, M87_media]

# Images fron hips2fits list
images = [Arp273_ima, M59_ima, NGC4676_ima, M101_ima, M60_ima, NGC4993_ima, 
          M104_ima,M82_ima,	NGC4038_ima, M51_ima, M87_ima]

# Space MOCs list 
text_moc_gals = ["Arp273.json", "M59.json", "NGC4676.json", "M101.json", "M60.json", "NGC4993.json", 
                 "M104.json","M82.json", "NGC4038.json", "M51.json", "M87.json"]

In [None]:
import os
from pathlib import Path

# Textual MOCs generation

mocs_directory = Path("moc_gals")  # Directory in which the Textual MOCs are loaded

for text_moc_gal, text_file, multimedia_url, image in zip(text_moc_gals, 
                                                          text_files, multimedia_urls, images):
    file_path = os.path.join(mocs_directory, text_moc_gal)  # path
    
    # 🔹 1. Loading MOCs
    textual_moc.load_textual_moc(file_path)
    
    # 🔹 2. Adding text, media, image URL
    textual_moc.add_text_media_image(text_file, multimedia_url, 
                                     image)
    
     # 🔹 3. Saving
    textual_moc.save(file_path)

## RAG with textual MOC and Vision Models

This section summarizes the main components of the RAG + MOC pipeline implementation:

1. **Reading and Preparing MOCs**  
   - `read_mocs_from_directory(directory: str) → List[str]`  
     Scans a folder and returns all `.json` files containing Textual MOC data.  
   - `process_mocs(moc_files: List[str], model: SentenceTransformer) → (metadata, embeddings, filenames)`  
     Loads each JSON, extracts the `custom_text` field, and computes a normalized embedding via a Sentence-BERT model.  

2. **Building the Vector Index**  
   - `build_faiss_index(embeddings: np.ndarray) → faiss.IndexFlatIP`  
     Creates a FAISS inner-product index over the normalized embeddings for efficient cosine-similarity search.  

3. **Querying the Index**  
   - `query_mocs(query_text: str, index, model, metadata, filenames, top_k=5) → List[(moc, score, filename)]`  
     Encodes the user’s query, searches the FAISS index for the top K nearest embeddings, and returns matching MOC metadata.  

4. **Generating a RAG Response**  
   - `generate_response_with_mistral(query: str, contexts: List[(int, str)]) → str`  
     Builds a system+user prompt containing the top-k text fragments and invokes a Mistral LLM (via LangChain/Ollama) to produce a grounded answer.  

5. **Spatial Visualization**  
   - `visualize_moc_in_aladin(moc, similarity: float, filename: str, survey: str, fov: int)`  
     Launches an Aladin Lite widget, overlays the selected MOC footprint on an all-sky image, and displays metadata.  

6. **Vision-LLM Analysis**  
   - `analyze_image_with_vision_llm(image_url: str, vision_llm, prompt: str) → str`  
     Fetches and base64-encodes the FITS image, then calls a vision-enabled LLM to extract morphological insights.  

7. **Main Pipeline (`main()`)**  
   Orchestrates the end-to-end flow:  
   - Loads MOCs and builds the index  
   - Runs the RAG retrieval and text generation  
   - Visualizes the best match in Aladin  
   - Invokes the vision LLM for additional image analysis  






In [None]:
"""
RAG + MOC visualizer — dual-mode with spatial debug
===================================================

- Positional mode: test (RA, DEC) ∈ MOC using Angle-based contains_lonlat(), with clear debug prints.
- Semantic mode: FAISS retrieval on custom_text + strict USED: line.
- ipyaladin overlay of USED MOCs.
- Vision LLM on each USED MOC image (download → base64 data-URI).

ENV toggles (optional):
  TEXT_MODEL, TEXT_TEMPERATURE
  VISION_MODEL, VISION_TEMPERATURE
  EMBEDDING_MODEL
  ALADIN_SURVEY, ALADIN_FOV
  SIMILARITY_THRESHOLD
  MOC_DIR
  DEBUG=1                     # extra verbose prints
  ADD_SYNTHETIC_MOC=1         # append a synthetic cone MOC around the chosen RA/DEC (positional tests)

*** Your TextualMOC class must be defined/imported in this session. ***
"""

from __future__ import annotations

import os
import re
from typing import List, Tuple, Optional

import numpy as np
from sentence_transformers import SentenceTransformer
import faiss  # type: ignore
from langchain_ollama import ChatOllama
from langchain_core.messages import HumanMessage, SystemMessage

# -----------------------------
# ⚙️ Config / Constants
# -----------------------------
TEXT_MODEL_NAME: str = os.environ.get("TEXT_MODEL", "mistral")
TEXT_TEMPERATURE: float = float(os.environ.get("TEXT_TEMPERATURE", "0"))
VISION_MODEL_NAME: str = os.environ.get("VISION_MODEL", "gemma3:4b")
VISION_TEMPERATURE: float = float(os.environ.get("VISION_TEMPERATURE", "0"))

EMBEDDING_MODEL_NAME: str = os.environ.get("EMBEDDING_MODEL", "paraphrase-MiniLM-L3-v2")

DEFAULT_SURVEY: str = os.environ.get("ALADIN_SURVEY", "CDS/P/DSS2/color")
DEFAULT_FOV: int = int(os.environ.get("ALADIN_FOV", "180"))

SIMILARITY_THRESHOLD: float = float(os.environ.get("SIMILARITY_THRESHOLD", "0.4"))

DEBUG: bool = os.environ.get("DEBUG", "0") == "1"
ADD_SYNTHETIC_MOC: bool = os.environ.get("ADD_SYNTHETIC_MOC", "0") == "1"

# Default visual task/context
VISION_TASK_DEFAULT: str = os.environ.get(
    "VISION_TASK",
    "Classify the galaxy as one of: spiral | elliptical | irregular | I don't know. "
    "Also add a brief note about any visible interactions with other galaxies "
    "(e.g., tidal tails, bridges, distortions, close companions)."
)
VISION_CONTEXT_DEFAULT: str = os.environ.get(
    "VISION_CONTEXT",
    "If the image is ambiguous, too small, or low quality, answer exactly: I don't know."
)

#IMAGE_ANALYZER_PROMPT: str = (
#    "Task: {task}\n"
#    "Context: {context}\n"
#    "Instructions:\n"
#    "- If you are uncertain, answer exactly: I don't know.\n"
#    "- Choose <class> from: spiral | elliptical | irregular | I don't know.\n"
#    "- Always include a brief morphology note (<= 7 words) based only on visible cues.\n"
#    "- If interactions are visible, briefly describe them (<= 1 short sentence). "
#    "If not assessable, write: Interactions: I don't know.\n"
#    "Output format: <class>. Morphology: <brief note>. Interactions: <brief note>."
#)

IMAGE_ANALYZER_PROMPT: str = (
    "Task: {task}\n"
    "Context: {context}\n"
    "Instructions:\n"
    "- Predict <class> from: spiral | elliptical | irregular | I don't know.\n"
    "- Provide a confidence score in percent (0–100).\n"
    "- Give a brief morphology note (<= 20 words), based only on visible cues.\n"
    "- Describe tidal/interactions if visible; otherwise state what prevents assessment.\n"
    "- Add one short caption (<= 40 words) summarizing the scene.\n"
    "Output format:\n"
    "<class> (confidence=<0-100>%).\n"
    "Morphology: <note>.\n"
    "Interactions: <note>.\n"
    "Caption: <one sentence>."
)

TEXTUAL_SYSTEM_PROMPT = """
You are an astrophysics assistant.
Use ONLY the provided excerpts to answer, clearly and concisely.
Cite evidence with inline tags like [Doc n].
If the information is insufficient, answer exactly: I don't know.
""".strip()

OUTPUT_RULES_SUFFIX = """
Output rules:
- Use only the necessary documents; if they are not enough, answer exactly: I don't know.
- Add inline citations such as [Doc 1], [Doc 2] next to claims.
- If you cannot cite at least one document, DO NOT provide any explanation.
  In that case, write ONLY the final line: USED:
- Always END with EXACTLY one line in this format:
  USED: 1,2   (numbers separated by commas; if none, leave it blank after the colon)

Answer:
""".strip()


# Types
ResultTuple = Tuple[object, float, str]  # (moc_obj, similarity, filename)


def print_runtime_config(directory: str) -> None:
    print("=== CONFIG RUNTIME ===")
    print(f"Embedding model       : {EMBEDDING_MODEL_NAME}")
    print(f"Text model            : {TEXT_MODEL_NAME} (T={TEXT_TEMPERATURE})")
    print(f"Vision model          : {VISION_MODEL_NAME} (T={VISION_TEMPERATURE})")
    print(f"Aladin                : survey={DEFAULT_SURVEY}, fov={DEFAULT_FOV}")
    print(f"MOC directory         : {directory}")
    print(f"Similarity threshold  : {SIMILARITY_THRESHOLD}")
    print(f"DEBUG                 : {DEBUG}")
    print(f"ADD_SYNTHETIC_MOC     : {ADD_SYNTHETIC_MOC}")


# -----------------------------
# Helpers (no changes to TextualMOC)
# -----------------------------
def read_mocs_from_directory(directory: str) -> List[str]:
    try:
        return [os.path.join(directory, f) for f in os.listdir(directory) if f.endswith(".json")]
    except FileNotFoundError:
        print(f"Directory not found: {directory}")
        return []


def _ensure_textual_moc_available() -> None:
    if "TextualMOC" not in globals() or not isinstance(TextualMOC, type):
        raise ImportError(
            "TextualMOC is not available. Define/import it in this session before running.\n"
            "It must expose: .load_textual_moc(path), .moc_data (dict), and .moc (mocpy.MOC)."
        )


def load_all_mocs(moc_files: List[str]) -> Tuple[List[object], List[str]]:
    _ensure_textual_moc_available()
    TM = TextualMOC

    metadata: List[object] = []
    filenames: List[str] = []

    for file_path in moc_files:
        m = TM()
        try:
            m.load_textual_moc(file_path)
            if getattr(m, "moc", None) is None:
                print(f"[WARN] {os.path.basename(file_path)} has no spatial MOC → skipped.")
                continue
            metadata.append(m)
            filenames.append(os.path.basename(file_path))
        except Exception as e:
            print(f"[ERROR] loading {file_path}: {e}")

    if not metadata:
        raise ValueError("No usable MOCs found (all missing or invalid).")
    return metadata, filenames


def get_doc_title(moc: object, filename: str) -> str:
    moc_data = getattr(moc, "moc_data", {}) or {}
    title = (
        moc_data.get("title")
        or moc_data.get("name")
        or moc_data.get("target")
        or os.path.splitext(filename)[0]
    )
    return str(title)


def build_contexts_with_meta(results: List[ResultTuple], max_chars: int = 1500) -> List[Tuple[int, str]]:
    contexts: List[Tuple[int, str]] = []
    for i, (moc, sim, fn) in enumerate(results, start=1):
        moc_data = getattr(moc, "moc_data", {}) or {}
        txt = (moc_data.get("text", "") or "")[:max_chars]
        title = get_doc_title(moc, fn)
        contexts.append((i, f"(file={fn}, title={title}, sim={sim:.3f})\n{txt}"))
    return contexts


def parse_used_doc_indices(answer_text: str) -> List[int]:
    m = re.search(r"USED:\s*([0-9,\s]*)\s*$", answer_text)
    if not m:
        return []
    payload = m.group(1).strip()
    if not payload:
        return []
    nums: List[int] = []
    for tok in payload.split(","):
        tok = tok.strip()
        if tok.isdigit():
            nums.append(int(tok) - 1)
    return nums


# -----------------------------
# Semantic retrieval
# -----------------------------
def process_mocs_for_embeddings(
    moc_list: List[object],
    model: SentenceTransformer,
) -> Tuple[np.ndarray, List[object]]:
    texts, kept = [], []
    for m in moc_list:
        txt = (getattr(m, "moc_data", {}) or {}).get("text", "")
        if txt:
            texts.append(txt)
            kept.append(m)
    if not texts:
        raise ValueError("No valid 'custom_text' found in any MOC for semantic retrieval.")
    emb = model.encode(texts, convert_to_numpy=True).astype("float32")
    faiss.normalize_L2(emb)
    return emb, kept


def build_faiss_index(embeddings: np.ndarray) -> faiss.IndexFlatIP:
    faiss.normalize_L2(embeddings)
    idx = faiss.IndexFlatIP(embeddings.shape[1])
    idx.add(embeddings)
    return idx


def query_mocs_semantic(
    query_text: str,
    index: faiss.IndexFlatIP,
    model: SentenceTransformer,
    mocs_with_text: List[object],
    filenames_with_text: List[str],
    top_k: int,
    similarity_threshold: float = SIMILARITY_THRESHOLD,
) -> List[ResultTuple]:
    if not query_text:
        raise ValueError("Error: empty query.")
    if index.ntotal == 0:
        raise ValueError("Error: FAISS index is empty.")
    q = model.encode([query_text], convert_to_numpy=True).astype("float32")
    faiss.normalize_L2(q)
    k = min(top_k, len(mocs_with_text))
    sims, idxs = index.search(q, k)
    raw = [(mocs_with_text[i], float(s), filenames_with_text[i]) for s, i in zip(sims[0], idxs[0])]
    if similarity_threshold <= 0:
        return raw
    return [(m, s, fn) for (m, s, fn) in raw if s >= similarity_threshold]


# -----------------------------
# Spatial containment (Angle-based) with detailed debug
# -----------------------------
def check_contains_debug(moc_obj, ra_deg: float, dec_deg: float) -> bool:
    """
    Use mocpy.MOC.contains_lonlat with astropy Angle arrays (as in mocpy docs).
    Prints type/shape/value of the returned mask when DEBUG=True.
    """
    from astropy.coordinates import Angle

    m = getattr(moc_obj, "moc", None)
    if m is None:
        if DEBUG:
            print("  [debug] object has no .moc → skip")
        return False

    # Normalize RA to [0, 360) for robustness
    lon = Angle([ra_deg % 360.0], unit="deg")
    lat = Angle([dec_deg], unit="deg")

    try:
        mask = m.contains_lonlat(lon=lon, lat=lat)  # numpy array([True/False])
        arr = np.asarray(mask)
        if DEBUG:
            print(f"  [debug] contains_lonlat type={type(mask).__name__} shape={arr.shape} value={arr}")
        return bool(arr.ravel()[0])
    except Exception as e:
        if DEBUG:
            print("  [debug] contains_lonlat raised:", e)
        return False


def query_mocs_positional(
    ra_deg: float,
    dec_deg: float,
    moc_list: List[object],
    filenames: List[str],
    top_k: Optional[int] = None,
) -> List[ResultTuple]:
    results: List[ResultTuple] = []
    for moc, fn in zip(moc_list, filenames):
        title = get_doc_title(moc, fn)
        print(f"- Testing {fn} | title='{title}'")
        inside = check_contains_debug(moc, ra_deg, dec_deg)
        print(f"  ⇒ contains? {inside}")
        if inside:
            results.append((moc, 1.0, fn))
    if top_k and len(results) > top_k:
        results = results[:top_k]
    return results


# -----------------------------
# Optional synthetic MOC (for sanity checks)
# -----------------------------
def build_synthetic_cone_moc(ra_deg: float, dec_deg: float, radius_arcmin: float = 15.0, max_norder: int = 10):
    """
    Build a synthetic cone MOC centered on (ra, dec) with given radius, then wrap into a TextualMOC.
    Requires mocpy + astropy.
    """
    from mocpy import MOC
    import astropy.units as u
    from astropy.coordinates import SkyCoord

    center = SkyCoord(ra_deg * u.deg, dec_deg * u.deg, frame="icrs")
    moc = MOC.from_cone(center, radius_arcmin * u.arcmin, max_norder=max_norder)
    tm = TextualMOC(moc)
    try:
        md = tm.moc_data if isinstance(tm.moc_data, dict) else {}
        md.update({
            "title": f"SYNTHETIC_CONE_{radius_arcmin:.1f}arcmin",
            "text": f"Synthetic cone MOC centered at RA={ra_deg}, DEC={dec_deg}, R={radius_arcmin} arcmin.",
            "image": ""
        })
        tm.moc_data = md
    except Exception:
        pass
    return tm, f"SYNTHETIC_CONE_{radius_arcmin:.1f}arcmin.json"


# -----------------------------
# LLMs
# -----------------------------
def generate_response_with_mistral(query: str, contexts: List[Tuple[int, str]]) -> str:
    docs_block = "\n\n".join([f"[Doc {i}]\n{text}" for i, text in contexts])
    system_msg = SystemMessage(content=TEXTUAL_SYSTEM_PROMPT)
    human_text = f"Question:\n{query}\n\nProvided documents:\n{docs_block}\n\n{OUTPUT_RULES_SUFFIX}"
    human_msg = HumanMessage(content=human_text)
    llm = ChatOllama(model=TEXT_MODEL_NAME, temperature=TEXT_TEMPERATURE)
    result = llm.invoke([system_msg, human_msg])
    return str(result.content).strip()


def analyze_image_with_vision_llm(image_url: str, vision_llm: ChatOllama, task: str, context: str = "") -> str:
    import requests, base64
    if not image_url:
        return "No image URL provided."
    try:
        resp = requests.get(image_url, timeout=20)
        resp.raise_for_status()
        data = resp.content
        ctype = resp.headers.get("Content-Type", "").lower()
        if not ctype.startswith("image/"):
            ctype = "image/jpeg"
        encoded = base64.b64encode(data).decode("utf-8")
        data_uri = f"data:{ctype};base64,{encoded}"
    except Exception as e:
        return f"Failed to fetch image: {e}"

    prompt_text = IMAGE_ANALYZER_PROMPT.format(task=task, context=context or "N/A")
    system_msg = SystemMessage(
        content="You are an astrophysics assistant analyzing images. "
                "Be concise and cautious: if unsure, answer 'I don't know'."
    )
    human_msg = HumanMessage(
        content=[{"type": "text", "text": prompt_text},
                 {"type": "image_url", "image_url": {"url": data_uri}}]
    )
    try:
        res = vision_llm.invoke([system_msg, human_msg])
        return str(res.content).strip()
    except Exception as e:
        return f"Image analysis error: {e}"


# -----------------------------
# Visualization
# -----------------------------
def visualize_many_in_one_aladin(selected: List[ResultTuple], survey: str = DEFAULT_SURVEY, fov: int = DEFAULT_FOV):
    try:
        from ipyaladin import Aladin
        from ipywidgets import VBox, HTML
        from IPython.display import display
    except Exception as exc:
        print("ipyaladin/ipywidgets unavailable (skipping visualization):", exc)
        return None

    aladin = Aladin(target="Sgr A*", fov=fov, survey=survey)
    display(aladin)

    rows = []
    for (moc, sim, fn) in selected:
        title = get_doc_title(moc, fn)
        moc_data = getattr(moc, "moc_data", {}) or {}
        image_url = moc_data.get("image", "") or ""
        try:
            aladin.add_moc(getattr(moc, "moc"), name=f"{title} | {fn} | Sim: {sim:.4f}")
        except Exception as exc:
            print(f"Could not add MOC for {fn}: {exc}")
        if image_url:
            rows.append(HTML(
                value=f"File: <b>{fn}</b> | Title: <b>{title}</b> | Similarity: <b>{sim:.4f}</b> | "
                      f"Image: <a href='{image_url}' target='_blank'>{image_url}</a>"
            ))
        else:
            rows.append(HTML(
                value=f"File: <b>{fn}</b> | Title: <b>{title}</b> | Similarity: <b>{sim:.4f}</b> | Image: N/A"
            ))
    if rows:
        display(VBox(rows))
    return aladin


def analyze_and_visualize_used(results: List[ResultTuple], used_indices: List[int],
                               vision_llm: ChatOllama, task: str, context: str = "",
                               survey: str = DEFAULT_SURVEY, fov: int = DEFAULT_FOV):
    if not used_indices:
        print("No relevant documents: the selection is empty.")
        return
    selected = []
    for idx in used_indices:
        if 0 <= idx < len(results):
            selected.append(results[idx])
    if not selected:
        print("No relevant documents: the selection is empty.")
        return

    visualize_many_in_one_aladin(selected, survey=survey, fov=fov)

    BOLD_CYAN = "\033[1;36m"
    BOLD_MAGENTA = "\033[1;35m"
    RESET = "\033[0m"
    for (moc, sim, fn) in selected:
        title = get_doc_title(moc, fn)
        moc_data = getattr(moc, "moc_data", {}) or {}
        image_url = moc_data.get("image", "") or ""
        print(f"{BOLD_CYAN}=== VISION MODEL INPUT ==={RESET}")
        print(
            f"File: {fn}\nTitle: {title}\nSimilarity: {sim:.4f}\nTask: {task}\n"
            f"Context: {context or 'N/A'}\nImage URL: {image_url or 'N/A'}"
        )
        obs = analyze_image_with_vision_llm(image_url, vision_llm, task, context) if image_url else "No image URL available."
        print(f"{BOLD_MAGENTA}=== VISION MODEL OBSERVATIONS ==={RESET}")
        print(obs)


# -----------------------------
# CLI: choose mode
# -----------------------------
def ask_user_mode_and_inputs() -> Tuple[str, Optional[Tuple[float, float]], Optional[str]]:
    print("\nChoose mode:")
    print("  [pos] Positional — enter RA/DEC (deg) to select MOCs that contain the position")
    print("  [sem] Semantic   — enter a natural-language query")
    mode = input("Mode [pos/sem] (default=pos): ").strip().lower() or "pos"
    if mode == "sem":
        q = input("Enter your textual query:\n> ").strip() or "Summarize known properties of the field."
        return "sem", None, q

    print("\nEnter coordinates in decimal degrees.")
    ra_s = input("RA (deg) [default=187.70593041666663]: ").strip()
    dec_s = input("DEC (deg) [default=12.39]: ").strip()
    try:
        ra = float(ra_s) if ra_s else 187.70593041666663
        dec = float(dec_s) if dec_s else 12.39
    except ValueError:
        print("Invalid numbers; using defaults.")
        ra = 187.70593041666663
        dec = 12.39
    return "pos", (ra, dec), None


# -----------------------------
# Main
# -----------------------------
def main():
    directory = os.environ.get("MOC_DIR", "moc_gals")
    top_k = 5  # positional: we can keep many; semantic: FAISS k

    print_runtime_config(directory)

    mode, ra_dec, query_text = ask_user_mode_and_inputs()

    files = read_mocs_from_directory(directory)
    if not files:
        print("No MOC files found.")
        return
    print(f"Found {len(files)} MOC files in: {directory}")

    metadata, filenames = load_all_mocs(files)

    # Optionally add a synthetic MOC centered at the chosen coords (for sanity test)
    if ADD_SYNTHETIC_MOC and mode == "pos" and ra_dec is not None:
        try:
            syn_moc, syn_name = build_synthetic_cone_moc(ra_dec[0], ra_dec[1], radius_arcmin=20.0, max_norder=10)
            metadata.append(syn_moc)
            filenames.append(syn_name)
            print(f"[Synthetic] appended {syn_name} (20 arcmin cone) for debug.")
        except Exception as e:
            print(f"[Synthetic] could not build synthetic MOC: {e}")

    vision_llm = ChatOllama(model=VISION_MODEL_NAME, temperature=VISION_TEMPERATURE)

    if mode == "pos":
        ra_deg, dec_deg = ra_dec if ra_dec else (187.70593041666663, 12.39)
        print(f"\n[Positional] Using RA={ra_deg} deg, DEC={dec_deg} deg")

        print("\n[Positional] Running containment checks per MOC (Angle-based)...")
        results = query_mocs_positional(ra_deg, dec_deg, metadata, filenames, top_k=top_k)

        if not results:
            print("\n[DIAGNOSTIC] No MOCs reported containment. Dumping raw checks again for ALL MOCs:")
            for moc, fn in zip(metadata, filenames):
                title = get_doc_title(moc, fn)
                print(f"* {fn} | title='{title}'")
                _ = check_contains_debug(moc, ra_deg, dec_deg)  # prints internal return types/values
            print("\nConclusion: none returned True. Please verify that each JSON really contains a valid IVOA MOC "
                  "(TextualMOC.load_textual_moc must populate .moc) and that coordinates are ICRS in degrees.")
            return

        print(f"\nPositional retrieval: k={len(results)}; (sim=1.000 for all)")

        contexts = build_contexts_with_meta(results, max_chars=1500)
        text_query = (f"Summarize what these documents state about RA={ra_deg:.6f} deg, "
                      f"DEC={dec_deg:.6f} deg. Include inline citations like [Doc n].")

        rag_text = generate_response_with_mistral(text_query, contexts)
        used_zero_based = parse_used_doc_indices(rag_text)
        if not used_zero_based:
            print("LLM did not cite any source (empty USED) — falling back to all selected docs.")
            used_zero_based = list(range(len(results)))

        print("\n=== RAG-BASED TEXTUAL RESPONSE (Positional) ===")
        print(rag_text)
        print("USED indices    : " + ", ".join(str(i + 1) for i in used_zero_based))

        analyze_and_visualize_used(
            results=results,
            used_indices=used_zero_based,
            vision_llm=vision_llm,
            task=VISION_TASK_DEFAULT,
            context=VISION_CONTEXT_DEFAULT,
            survey=DEFAULT_SURVEY,
            fov=DEFAULT_FOV,
        )
        return

    # SEMANTIC mode
    assert query_text is not None
    print(f"\n[Semantic] Query: {query_text}")

    model = SentenceTransformer(EMBEDDING_MODEL_NAME)
    embeddings, mocs_with_text = process_mocs_for_embeddings(metadata, model)

    filenames_with_text = []
    for m in mocs_with_text:
        idx = metadata.index(m)
        filenames_with_text.append(filenames[idx])

    index = build_faiss_index(embeddings)

    results = query_mocs_semantic(
        query_text,
        index,
        model,
        mocs_with_text,
        filenames_with_text,
        top_k=top_k,
        similarity_threshold=SIMILARITY_THRESHOLD,
    )

    if not results:
        print(f"No retrievals passed the similarity threshold (>= {SIMILARITY_THRESHOLD}).")
        return

    print("Top-k (filtered) : k={}; sims=[{}]".format(len(results), ", ".join(f"{s:.3f}" for _, s, _ in results)))

    contexts = build_contexts_with_meta(results, max_chars=1500)
    rag_text = generate_response_with_mistral(query_text, contexts)
    used_zero_based = parse_used_doc_indices(rag_text)
    if not used_zero_based:
        print("No relevant documents: the LLM did not cite any source (empty USED).")
        return

    print("\n=== RAG-BASED TEXTUAL RESPONSE ===")
    print(rag_text)
    print("USED indices    : " + ", ".join(str(i + 1) for i in used_zero_based))

    analyze_and_visualize_used(
        results=results,
        used_indices=used_zero_based,
        vision_llm=vision_llm,
        task=VISION_TASK_DEFAULT,
        context=VISION_CONTEXT_DEFAULT,
        survey=DEFAULT_SURVEY,
        fov=DEFAULT_FOV,
    )


if __name__ == "__main__":
    main()
