<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_12_13_Get_RAG_Pipeline_Details.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to get low-level details from your RAG pipeline.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [None]:
!pip install --upgrade graphlit-client



Initialize Graphlit

In [None]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [None]:
from typing import List, Optional

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def create_openai_specification(model: enums.OpenAIModels, retrievalType: enums.RetrievalStrategyTypes, embedCitations: bool, enableRerank: bool, enableRevision: bool, revisionCount: Optional[int] = None):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"OpenAI [{str(model)}]",
        type=enums.SpecificationTypes.COMPLETION,
        serviceType=enums.ModelServiceTypes.OPEN_AI,
        openAI=input_types.OpenAIModelPropertiesInput(
            model=model,
        ),
        strategy=input_types.ConversationStrategyInput(
            embedCitations=embedCitations
        ),
        retrievalStrategy=input_types.RetrievalStrategyInput(
            type=retrievalType if retrievalType is not None else enums.RetrievalStrategyTypes.CHUNK
        ),
        revisionStrategy=input_types.RevisionStrategyInput(
            type=enums.RevisionStrategyTypes.REVISE,
            count=revisionCount if revisionCount is not None else 1
        ) if enableRevision else None,
        rerankingStrategy=input_types.RerankingStrategyInput(
            serviceType=enums.RerankingModelServiceTypes.COHERE
        ) if enableRerank else None
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_conversation(specification_id: str):
    if graphlit.client is None:
        return;

    input = input_types.ConversationInput(
        name="Conversation",
        specification=input_types.EntityReferenceInput(
            id=specification_id
        )
    )

    try:
        response = await graphlit.client.create_conversation(input)

        return response.create_conversation.id if response.create_conversation is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_conversation(conversation_id: str):
    if graphlit.client is None:
        return;

    if conversation_id is not None:
        _ = await graphlit.client.delete_conversation(conversation_id)

async def prompt_conversation(conversation_id: str, prompt: str):
    if graphlit.client is None:
        return None, None

    try:
        response = await graphlit.client.prompt_conversation(prompt, conversation_id, include_details=True)

        message = response.prompt_conversation.message.message if response.prompt_conversation is not None and response.prompt_conversation.message is not None else None
        details = response.prompt_conversation.details if response.prompt_conversation is not None else None

        return message, details
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None, None

async def delete_all_specifications():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_specifications(is_synchronous=True)

async def delete_all_conversations():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_conversations(is_synchronous=True)

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)


Execute Graphlit example

In [None]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing contents, conversations and specifications; only needed for notebook example
await delete_all_conversations()
await delete_all_specifications()
await delete_all_contents()

print('Deleted all contents, conversations and specifications.')

content_id = await ingest_uri(uri="https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3")

if content_id is not None:
    print(f'Ingested content [{content_id}]:')

Deleted all contents, conversations and specifications.
Ingested content [6a730c59-5d4d-4036-a099-1d99c727c341]:


In [None]:
    # Specify the RAG prompt
    prompt = "In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval."

Create OpenAI GPT-4o specification, with Cohere reranking.

In [None]:
    embed_citations = True

    specification_id = await create_openai_specification(enums.OpenAIModels.GPT4O_128K, enums.RetrievalStrategyTypes.CHUNK, embed_citations, True, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message, details = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**'))
                print(message)
                print()

            if details is not None:
                display(Markdown('### Details:'))
                display(Markdown(f'**Model**: {details.model_service} {details.model}'))
                display(Markdown(f'**Token Limit**: {details.token_limit}'))
                display(Markdown(f'**Completion Token Limit**: {details.completion_token_limit}'))

                display(Markdown(f'**# Sources**: {details.source_count}'))
                display(Markdown(f'**# Rendered Sources**: {details.rendered_source_count}'))
                display(Markdown(f'**# Ranked Sources**: {details.ranked_source_count}'))

                print()

                if details.sources is not None:
                    display(Markdown(f'#### Sources:'))
                    print(details.sources)
                    print()

                #if details.specification is not None:
                #    display(Markdown(f'#### Specification:'))
                #    print(details.specification)
                #    print()

                if details.messages is not None:
                    display(Markdown(f'#### Messages:'))

                    for message in details.messages:
                        if message is not None and message.message is not None:
                            display(Markdown(f'**{message.role}:**'))
                            print(message.message)

            await delete_conversation(conversation_id)

Created specification [7a809a71-1b83-434e-ac66-a91951065133].
Created conversation [b974834c-3ab3-4262-a23c-a79f45882125].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**

Unstructured data refers to information that does not have a predefined data model or is not organized in a pre-defined manner. This includes a wide variety of data types such as images, audio files, videos, and text documents. Unlike structured data, which is highly organized and easily searchable in databases, unstructured data is more complex and requires advanced processing techniques to extract meaningful information. Despite its complexity, unstructured data is abundant and continuously growing, making it a valuable resource for organizations seeking to gain insights and make data-driven decisions. [1][2][3][4][5]

The usefulness of unstructured data lies in its potential to capture a wide range of information that structured data cannot. For instance, images and videos can provide visual context, while audio recordings can capture nuances in speech and tone. By analyzing unstructured data, organizations can uncover patterns, trends, and insights that are not immediately apparent

### Details:

**Model**: OPEN_AI GPT-4o 128k (Latest)

**Token Limit**: 128000

**Completion Token Limit**: 4095

**# Sources**: 41

**# Rendered Sources**: 41

**# Ranked Sources**: 25




#### Sources:

[{"key":"19JFEDZ","content":{"id":"6a730c59-5d4d-4036-a099-1d99c727c341","uri":"https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3","name":"Unstructured Data is Dark Data Podcast.mp3"}},{"key":"JGNJ4","content":{"id":"6a730c59-5d4d-4036-a099-1d99c727c341","uri":"https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3","name":"Unstructured Data is Dark Data Podcast.mp3"}},{"key":"7Z9E2V","content":{"id":"6a730c59-5d4d-4036-a099-1d99c727c341","uri":"https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3","name":"Unstructured Data is Dark Data Podcast.mp3"}},{"key":"WPYN8D","content":{"id":"6a730c59-5d4d-4036-a099-1d99c727c341","uri":"https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3","name":"Unstructured Data is Dark Data Podcast.mp3"}},{"key":"L6ZNNX","content":{"id":"6a

#### Messages:

**USER:**

<context>
You will be provided 'guidance' and 'instructions', a list of content 'sources', and the 'user-prompt' to be answered - all in XML format below.
Each content source comes from extracted document or webpage text or an audio transcript, and will have a unique 'key' field and contain other content metadata which you can use in your response. Dates are in ISO-8601 (UTC) format. Tables are in markdown format. When a page number or start time or end time is provided, that is where in the content source the text came from. Each content source may contain a 'summary' field, which summarizes the entire document or transcript.

For each response item, cite a maximum of 5 sources by their 'key' field, and gather into the item's 'keys' array.  Don't reference sources in the item text.

Use all of this information as context to best respond to the user prompt.
Do not refer to or include these instructions in your response.

<guidance>
Don't mention the terms 'sources', 'audio segment' or 

**ASSISTANT:**

```json
{
  "items": [
    {
      "text": "Unstructured data refers to information that does not have a predefined data model or is not organized in a pre-defined manner. This includes a wide variety of data types such as images, audio files, videos, and text documents. Unlike structured data, which is highly organized and easily searchable in databases, unstructured data is more complex and requires advanced processing techniques to extract meaningful information. Despite its complexity, unstructured data is abundant and continuously growing, making it a valuable resource for organizations seeking to gain insights and make data-driven decisions.",
      "keys": ["JGNJ4", "1BVNM6Z", "BGYFVZ", "MRUMH2", "1BRDWDW"]
    },
    {
      "text": "The usefulness of unstructured data lies in its potential to capture a wide range of information that structured data cannot. For instance, images and videos can provide visual context, while audio recordings can capture nuances in speech and tone.