<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_12_01_OpenAI_LLM_Streaming.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to implement streaming LLM completions using Graphlit for RAG retrieval.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.

Assign this property as Colab secret: OPENAI_API_KEY.


---

Install Graphlit Python client SDK

In [13]:
!pip install --upgrade graphlit-client



Install OpenAI Python SDK

In [14]:
!pip install --upgrade openai



Initialize Graphlit

In [15]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Initialize OpenAI

In [16]:
from openai import OpenAI

os.environ['OPENAI_API_KEY'] = userdata.get('OPENAI_API_KEY')

client = OpenAI(api_key=os.environ.get("OPENAI_API_KEY"))


Define Graphlit helper functions

In [17]:
from typing import List, Optional

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def create_openai_specification(model: enums.OpenAIModels):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"OpenAI [{str(model)}]",
        type=enums.SpecificationTypes.COMPLETION,
        serviceType=enums.ModelServiceTypes.OPEN_AI,
        openAI=input_types.OpenAIModelPropertiesInput(
            model=model,
        )
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_conversation(specification_id: str):
    if graphlit.client is None:
        return;

    input = input_types.ConversationInput(
        name="Conversation",
        specification=input_types.EntityReferenceInput(
            id=specification_id
        )
    )

    try:
        response = await graphlit.client.create_conversation(input)

        return response.create_conversation.id if response.create_conversation is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_conversation(conversation_id: str):
    if graphlit.client is None:
        return;

    if conversation_id is not None:
        _ = await graphlit.client.delete_conversation(conversation_id)

async def format_conversation(conversation_id: str, prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.format_conversation(prompt, conversation_id)

        return response.format_conversation.message.message if response.format_conversation is not None and response.format_conversation.message is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def complete_conversation(conversation_id: str, completion: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.complete_conversation(completion, conversation_id)

        return response.complete_conversation.message.message if response.complete_conversation is not None and response.complete_conversation.message is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def get_conversation(conversation_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.get_conversation(conversation_id)

        return response.conversation
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_specifications():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_specifications(is_synchronous=True)

async def delete_all_conversations():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_conversations(is_synchronous=True)

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)


In [18]:
from typing import Callable

def stream_completion(prompt: str, model: str, callback: Callable[[str], None]) -> str:
    response = client.chat.completions.create(
        model=model,
        messages=[{"role": "user", "content": prompt}],
        temperature=0.1,
        top_p=0.2,
        stream=True
    )

    completion = ""

    for chunk in response:
        delta = chunk.choices[0].delta.content

        if delta is not None:
            callback(delta)
            completion += delta

    return completion


Execute Graphlit example

In [19]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing contents, conversations and specifications; only needed for notebook example
await delete_all_conversations()
await delete_all_specifications()
await delete_all_contents()

print('Deleted all contents, conversations and specifications.')

content_id = await ingest_uri(uri="https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3")

if content_id is not None:
    print(f'Ingested content [{content_id}]:')

Deleted all contents, conversations and specifications.
Ingested content [993af4d7-dbb4-4d49-a3ff-546a871e33f2]:


In [20]:
    # Specify the RAG prompt
    prompt = "In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval."

Create OpenAI GPT-4o specification.

In [21]:
    model = "gpt-4o"

    specification_id = await create_openai_specification(enums.OpenAIModels.GPT4O_128K)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            # NOTE: returns LLM-ready formatted prompt from RAG pipeline
            message = await format_conversation(conversation_id, prompt)

            if message is not None:
                # NOTE: uncomment to see formatted LLM prompt

                #print(f'Formatted LLM prompt:')
                #print(message)
                print()

                print(f'Streaming completion:')

                completion = stream_completion(message, model, lambda delta: print(delta, end=''))

                if completion is not None:
                    # NOTE: stores completion back into conversation
                    await complete_conversation(conversation_id, completion)

            conversation = await get_conversation(conversation_id)

            if conversation is not None:
                display(Markdown(f'### Conversation [{conversation.id}]:'))

                if conversation.messages is not None:
                    for message in conversation.messages:
                        if message is not None:
                            display(Markdown(f'**{message.role}:**\n{message.message}'))

                print()

            await delete_conversation(conversation_id)

Created specification [be72b1a4-e30c-45d7-aa95-1db7275a8e38].
Created conversation [b680c022-2835-463c-a52b-9587d20db043].

Streaming completion:
Unstructured data refers to information that does not have a predefined data model or is not organized in a pre-defined manner. This type of data is typically text-heavy and can include a variety of formats such as images, videos, audio files, and documents. Unlike structured data, which is highly organized and easily searchable in databases, unstructured data is more complex and requires advanced processing techniques to extract meaningful insights. Kirk Marple, the founder of Unstruct Data, describes unstructured data as encompassing everything from imagery and audio to 3D geometry point clouds, documents, and emails. This broad set of data types highlights the diverse nature of unstructured data and the challenges associated with managing and analyzing it.

The usefulness of unstructured data lies in its potential to capture rich, detailed

### Conversation [b680c022-2835-463c-a52b-9587d20db043]:

**USER:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**ASSISTANT:**
Unstructured data refers to information that does not have a predefined data model or is not organized in a pre-defined manner. This type of data is typically text-heavy and can include a variety of formats such as images, videos, audio files, and documents. Unlike structured data, which is highly organized and easily searchable in databases, unstructured data is more complex and requires advanced processing techniques to extract meaningful insights. Kirk Marple, the founder of Unstruct Data, describes unstructured data as encompassing everything from imagery and audio to 3D geometry point clouds, documents, and emails. This broad set of data types highlights the diverse nature of unstructured data and the challenges associated with managing and analyzing it.

The usefulness of unstructured data lies in its potential to capture rich, detailed information that structured data might miss. For instance, images and videos can provide visual context that is invaluable for certain applications, such as geospatial analysis or property inspections. By processing unstructured data, organizations can extract metadata and insights that help in understanding real-world assets and conditions. This is particularly important in fields like geospatial analysis, where data from drones, robots, and mobile devices can be used to create detailed maps and models of physical environments. The ability to analyze and interpret unstructured data allows businesses to gain a deeper understanding of their operations and make more informed decisions.

Unstructured data is also crucial for knowledge capture and retrieval because it can be used to build comprehensive knowledge graphs. These graphs represent relationships between different data points, allowing for more dynamic and flexible data analysis. By creating connections between various pieces of unstructured data, organizations can uncover patterns and trends that might not be immediately apparent. This capability is essential for industries that rely on large volumes of data, such as oil and gas, where semantic search and contextualization of data can lead to significant operational efficiencies and insights.

Moreover, the integration of unstructured data into knowledge systems enhances the ability to perform sentiment analysis and contextual understanding. For example, analyzing the sentiment of documents or audio transcripts can provide insights into customer opinions or employee feedback, which can be critical for improving products and services. The contextualization of unstructured data, such as linking images to specific locations or events, further enriches the knowledge base and supports more accurate and timely decision-making.

In conclusion, unstructured data plays a vital role in knowledge capture and retrieval by providing a rich source of information that complements structured data. Its ability to capture complex, real-world scenarios and relationships makes it an invaluable asset for organizations looking to enhance their data-driven strategies. By leveraging advanced technologies like machine learning and knowledge graphs, businesses can unlock the full potential of unstructured data, leading to improved insights and competitive advantages.


