# © Artur Czarnecki. All rights reserved.
# Intergrax framework – proprietary and confidential.
# Use, modification, or distribution without written permission is prohibited.

# 02 – Attachments & Ingestion Demo (Drop-In Knowledge Mode)

This notebook demonstrates how the **Drop-In Knowledge Mode** runtime:

1. Works with **sessions** and basic conversational memory.
2. Accepts **attachments** via `AttachmentRef`.
3. Uses the `AttachmentIngestionService` to:
   - resolve attachment URIs to files,
   - load and split documents with Intergrax RAG components,
   - embed them,
   - store them in the vector store.

At this stage:

- We **do not** yet perform RAG retrieval when answering.
- The goal is to verify that:
  - attachments are correctly stored in the session,
  - ingestion runs without errors,
  - chunks are stored in the vector store,
  - ingestion details are visible in `debug_trace`.

In [None]:
import sys, os
sys.path.insert(0, os.path.abspath(os.path.join(os.getcwd(), "..", "..")))

from datetime import datetime
from pprint import pprint
from pathlib import Path

from langchain_ollama import ChatOllama

from intergrax.llm_adapters import LangChainOllamaAdapter
from intergrax.llm.messages import ChatMessage, AttachmentRef

from intergrax.rag.embedding_manager import EmbeddingManager
from intergrax.rag.vectorstore_manager import VectorstoreManager, VSConfig

from intergrax.runtime.drop_in_knowledge_mode.config import RuntimeConfig
from intergrax.runtime.drop_in_knowledge_mode.engine.runtime import DropInKnowledgeRuntime
from intergrax.runtime.drop_in_knowledge_mode.responses.response_schema import RuntimeRequest
from intergrax.runtime.drop_in_knowledge_mode.session.session_manager import SessionManager




## 1. Initialize the Drop-In Knowledge Mode runtime

In this section we:

1. Create an **in-memory session store** – suitable for notebooks and tests.
2. Instantiate an **LLM adapter** using Ollama + LangChain.
3. Configure:
   - `IntergraxEmbeddingManager` (Ollama embeddings),
   - `IntergraxVectorstoreManager` (Chroma as a vector store).
4. Create a `RuntimeConfig` and `DropInKnowledgeRuntime` instance.

We intentionally keep RAG turned off for now – we only care about ingestion,
not retrieval or context building.


In [None]:
# 1) In-memory session store (no external DB, good for experiments)
from intergrax.globals.settings import GlobalSettings
from intergrax.runtime.drop_in_knowledge_mode.session.in_memory_session_storage import InMemorySessionStorage


session_manager = SessionManager(
    storage=InMemorySessionStorage()
)

# 2) LLM adapter based on Ollama + LangChain
llm_adapter = LangChainOllamaAdapter(
    chat=ChatOllama(
        model=GlobalSettings.default_ollama_model
    )
)

# 3) Embedding manager using Ollama (same setup as in other Intergrax RAG examples)
embedding_manager = EmbeddingManager(
    provider="ollama",
    model_name=GlobalSettings.default_ollama_embed_model,
    assume_ollama_dim=1536,
)

# 4) Vector store manager (Chroma collection dedicated to this demo)
vectorstore_cfg = VSConfig(
    provider="chroma",
    collection_name="intergrax_docs_drop_in_demo",
)

vectorstore_manager = VectorstoreManager(
    config=vectorstore_cfg,
    verbose=True,
)

# 5) High-level runtime configuration
config = RuntimeConfig(
    llm_adapter=llm_adapter,
    embedding_manager=embedding_manager,
    vectorstore_manager=vectorstore_manager,
    llm_label="llm-adapter-ollama",
    embedding_label="ollama-gte-qwen2-1.5b",
    vectorstore_label="chroma-intergrax-docs",
    enable_rag=False,         # RAG will be plugged in a later notebook
    enable_websearch=False,
    enable_tools=False,
    enable_long_term_memory=False,
    enable_user_profile_memory=True,
)

# 6) Drop-In Knowledge Mode runtime
runtime = DropInKnowledgeRuntime(
    config=config,
    session_manager=session_manager,
    # If your engine __init__ has an optional ingestion_service parameter,
    # you can omit it here and let the engine build a default one internally.
)

print("Runtime initialized at", datetime.utcnow().isoformat())


[intergraxVectorstoreManager] Initialized provider=chroma, collection=intergrax_docs_drop_in_demo
[intergraxVectorstoreManager] Existing count: 0
Runtime initialized at 2025-11-24T12:09:53.632708


  print("Runtime initialized at", datetime.utcnow().isoformat())


## 2. Prepare an AttachmentRef for a local project document

Now we want to simulate what happens when a user **attaches a file** in a chat UI.

In this example, we will:

- point to a local file (for example `PROJECT_STRUCTURE.md` at the repo root),
- wrap it in an `AttachmentRef`,
- let the runtime handle ingestion of this attachment.

Important notes:

- We use `Path(...).resolve().as_uri()` to generate a proper `file://` URI that
  the `FileSystemAttachmentResolver` can understand.
- In a real application, URIs could also be:
  - `s3://...`,
  - `db://attachments/<id>`,
  - or any other scheme supported by a custom `AttachmentResolver`.


In [3]:
# Adjust this path so that it points to a real file in your Intergrax repository.
# For this demo, we assume PROJECT_STRUCTURE.md is located at the repo root.
project_root_file = Path("../../PROJECT_STRUCTURE.md").resolve()

if not project_root_file.exists():
    raise FileNotFoundError(f"Expected demo file does not exist: {project_root_file}")

attachment_uri = project_root_file.as_uri()

attachment = AttachmentRef(
    id="project-structure-md",
    type="markdown",
    uri=attachment_uri,
    metadata={
        "description": "Intergrax project structure document",
        "scope": "intergrax-docs-demo",
    },
)

print("AttachmentRef prepared:")
print("  id       :", attachment.id)
print("  type     :", attachment.type)
print("  uri      :", attachment.uri)
print("  metadata :", attachment.metadata)


AttachmentRef prepared:
  id       : project-structure-md
  type     : markdown
  uri      : file:///D:/Projekty/intergrax/PROJECT_STRUCTURE.md
  metadata : {'description': 'Intergrax project structure document', 'scope': 'intergrax-docs-demo'}


## 3. First runtime call with an attachment

We now:

1. Create a `RuntimeRequest` with:
   - `user_id`,
   - `session_id`,
   - a user message,
   - a list containing our `AttachmentRef`.
2. Call `runtime.ask(request_1)`.
3. Inspect:
   - the model's answer,
   - the `RouteInfo`,
   - the `debug_trace["ingestion"]` field.

`debug_trace["ingestion"]` should contain:

- an entry per ingested attachment,
- number of generated chunks,
- how many vectors were stored,
- metadata such as source path and session identifiers.


In [None]:
user_id = "demo-user-attachments"
session_id = "demo-session-attachments-001"

request_1 = RuntimeRequest(
    user_id=user_id,
    session_id=session_id,
    message=(
        "I am attaching a document that describes the Intergrax project "
        "structure. Please ingest it and confirm when you're ready."
    ),
    attachments=[attachment],
    metadata={"source": "02_attachments_ingestion_demo"},
)

answer_1 = await runtime.run(request_1)

print("=== ANSWER 1 ===")
print(answer_1.answer)

print("\n=== ROUTE INFO ===")
pprint(answer_1.route)

print("\n=== DEBUG TRACE: INGESTION ===")
pprint(answer_1.debug_trace.get("ingestion"))


[intergraxVectorstoreManager] Upserting 80 items (dim=1536) to provider=chroma...
[intergraxVectorstoreManager] Upsert complete. New count: 80
=== ANSWER 1 ===
However, I don't see any document attached to our conversation. As a text-based AI assistant, I don't have the ability to receive or access external files.

If you'd like to share the information about the Intergrax project structure with me, you can copy and paste the contents of the document into this chat window, and I'll be happy to help you review it!

=== ROUTE INFO ===
RouteInfo(used_rag=False,
          used_websearch=False,
          used_tools=False,
          used_long_term_memory=False,
          used_user_profile=True,
          strategy='llm_only_with_ingestion',
          extra={})

=== DEBUG TRACE: INGESTION ===
[{'attachment_id': 'project-structure-md',
  'attachment_type': 'markdown',
  'metadata': {'session_id': 'demo-session-attachments-001',
               'source_path': 'D:\\Projekty\\intergrax\\PROJECT_STR

## 4. Inspect the session state (messages and attachments)

Next, we want to verify that the **session store** correctly persisted:

- the user message,
- the assistant response,
- and the attachment references.

We will:

1. Load the session by `session_id`.
2. Inspect:
   - basic session metadata,
   - all `SessionMessage` objects,
   - their `attachments` and `metadata` fields.


In [None]:
session = await session_manager.get_session(session_id)

print("=== SESSION AFTER FIRST REQUEST ===")
print("Session ID :", session.id)
print("User ID    :", session.user_id)
print("Messages   :", len(session.messages))
print("Created at :", session.created_at)
print("Updated at :", session.updated_at)

print("\n=== MESSAGES ===")
for idx, msg in enumerate(session.messages, start=1):
    print(f"[{idx}] role={msg.role!r}, created_at={msg.created_at.isoformat()}")
    print(f"     content={msg.content!r}")
    if msg.attachments:
        print(f"     attachments ({len(msg.attachments)}):")
        for a in msg.attachments:
            print(f"       - id={a.id!r}, type={a.type!r}, uri={a.uri!r}")
    if msg.metadata:
        print(f"     metadata keys: {list(msg.metadata.keys())}")


=== SESSION AFTER FIRST REQUEST ===
Session ID : demo-session-attachments-001
User ID    : demo-user-attachments
Messages   : 2
Created at : 2025-11-24 12:09:53.648304+00:00
Updated at : 2025-11-24 12:10:00.342054+00:00

=== MESSAGES ===
[1] role='user', created_at=2025-11-24T12:09:56.320037+00:00
     content="I am attaching a document that describes the Intergrax project structure. Please ingest it and confirm when you're ready."
     attachments (1):
       - id='project-structure-md', type='markdown', uri='file:///D:/Projekty/intergrax/PROJECT_STRUCTURE.md'
     metadata keys: ['runtime_request_metadata']
[2] role='assistant', created_at=2025-11-24T12:10:00.342054+00:00
     content="However, I don't see any document attached to our conversation. As a text-based AI assistant, I don't have the ability to receive or access external files.\n\nIf you'd like to share the information about the Intergrax project structure with me, you can copy and paste the contents of the document into t

## 5. Inspect the LLM-facing chat history

The runtime internally converts `ChatSession.messages` into a list of
`ChatMessage` objects that are passed to the LLM adapter.

Here we directly call the internal helper `_build_chat_history(session)` to:

- verify how the conversation history looks before the next model call,
- confirm that messages are preserved in order,
- see how roles and content are exposed to the LLM.

Note:
- Attachments are not included directly in `ChatMessage` at this stage.
  They are used by the ingestion pipeline and RAG, not embedded in raw text.


In [6]:
history = runtime._build_chat_history(session)

print("=== BUILT CHAT HISTORY ===")
print("History length:", len(history))
for idx, msg in enumerate(history, start=1):
    assert isinstance(msg, ChatMessage)
    print(f"[{idx}] role={msg.role!r}, created_at={msg.created_at}")
    print(f"     content={msg.content!r}")


=== BUILT CHAT HISTORY ===
History length: 2
[1] role='user', created_at=2025-11-24T12:09:56.320037+00:00
     content="I am attaching a document that describes the Intergrax project structure. Please ingest it and confirm when you're ready."
[2] role='assistant', created_at=2025-11-24T12:10:00.342054+00:00
     content="However, I don't see any document attached to our conversation. As a text-based AI assistant, I don't have the ability to receive or access external files.\n\nIf you'd like to share the information about the Intergrax project structure with me, you can copy and paste the contents of the document into this chat window, and I'll be happy to help you review it!"


## 6. Second runtime call in the same session (no new attachments)

To mirror a typical ChatGPT-like workflow:

1. First message: user uploads a document and asks the system to ingest it.
2. Second message: user refers to that document and asks for a summary.

Here we:

- reuse the same `user_id` and `session_id`,
- send a second message without additional attachments,
- inspect the answer, route info, and debug trace.

At this stage, the answer will not yet use RAG retrieval. However:

- the ingestion has already populated the vector store,
- the session has accumulated multiple messages,
- the runtime is ready for the next step: **context builder + RAG**.


In [None]:
request_2 = RuntimeRequest(
    user_id=user_id,
    session_id=session_id,
    message=(
        "Great. In the previous message I attached the project structure file. "
        "Please summarize the main components of the Intergrax framework as "
        "described in that document."
    ),
    attachments=[],  # no new attachments; we rely on the existing session context
    metadata={"source": "02_attachments_ingestion_demo_second_call"},
)

answer_2 = await runtime.run(request_2)

print("=== ANSWER 2 ===")
print(answer_2.answer)

print("\n=== ROUTE INFO 2 ===")
pprint(answer_2.route)

print("\n=== DEBUG TRACE 2: INGESTION ===")
pprint(answer_2.debug_trace.get("ingestion"))


=== ANSWER 2 ===
As I mentioned earlier, there is no attachment or document from you that I can access. Our conversation started with your request to confirm when I'm ready after attaching a document, but since I couldn't receive any attachment, I didn't have anything to review.

If you'd like to share the project structure information by copying and pasting it into this chat window, I'll be happy to help summarize the main components of the Intergrax framework for you.

=== ROUTE INFO 2 ===
RouteInfo(used_rag=False,
          used_websearch=False,
          used_tools=False,
          used_long_term_memory=False,
          used_user_profile=True,
          strategy='llm_only',
          extra={})

=== DEBUG TRACE 2: INGESTION ===
None
