<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_09_13_Compare_RAG_strategies.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to compare RAG strategies when prompting a conversation about an ingested podcast.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [42]:
!pip install --upgrade graphlit-client



Initialize Graphlit

In [43]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [44]:
from typing import List, Optional

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def create_anthropic_specification(model: enums.AnthropicModels, retrievalType: enums.RetrievalStrategyTypes, enableRerank: bool, enableRevision: bool, revisionCount: Optional[int] = None,
                                         enablePromptStrategy: Optional[bool] = None, promptType: Optional[enums.PromptStrategyTypes] = None):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"Anthropic [{str(model)}]",
        type=enums.SpecificationTypes.COMPLETION,
        serviceType=enums.ModelServiceTypes.ANTHROPIC,
        anthropic=input_types.AnthropicModelPropertiesInput(
            model=model,
        ),
        promptStrategy=input_types.PromptStrategyInput(
            type=promptType if promptType is not None else enums.PromptStrategyTypes.OPTIMIZE_SEARCH
        ) if enablePromptStrategy else None,
        retrievalStrategy=input_types.RetrievalStrategyInput(
            type=retrievalType if retrievalType is not None else enums.RetrievalStrategyTypes.CHUNK
        ),
        revisionStrategy=input_types.RevisionStrategyInput(
            type=enums.RevisionStrategyTypes.REVISE,
            count=revisionCount if revisionCount is not None else 1
        ) if enableRevision else None,
        rerankingStrategy=input_types.RerankingStrategyInput(
            serviceType=enums.RerankingModelServiceTypes.COHERE
        ) if enableRerank else None
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_openai_specification(model: enums.OpenAIModels, retrievalType: enums.RetrievalStrategyTypes, enableRerank: bool, enableRevision: bool, revisionCount: Optional[int] = None):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"OpenAI [{str(model)}]",
        type=enums.SpecificationTypes.COMPLETION,
        serviceType=enums.ModelServiceTypes.OPEN_AI,
        openAI=input_types.OpenAIModelPropertiesInput(
            model=model,
        ),
        retrievalStrategy=input_types.RetrievalStrategyInput(
            type=retrievalType if retrievalType is not None else enums.RetrievalStrategyTypes.CHUNK
        ),
        revisionStrategy=input_types.RevisionStrategyInput(
            type=enums.RevisionStrategyTypes.REVISE,
            count=revisionCount if revisionCount is not None else 1
        ) if enableRevision else None,
        rerankingStrategy=input_types.RerankingStrategyInput(
            serviceType=enums.RerankingModelServiceTypes.COHERE
        ) if enableRerank else None
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_deepseek_specification(model: enums.DeepseekModels, retrievalType: enums.RetrievalStrategyTypes, enableRerank: bool, enableRevision: bool, revisionCount: Optional[int] = None):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"OpenAI [{str(model)}]",
        type=enums.SpecificationTypes.COMPLETION,
        serviceType=enums.ModelServiceTypes.DEEPSEEK,
        deepseek=input_types.DeepseekModelPropertiesInput(
            model=model,
        ),
        retrievalStrategy=input_types.RetrievalStrategyInput(
            type=retrievalType if retrievalType is not None else enums.RetrievalStrategyTypes.CHUNK
        ),
        revisionStrategy=input_types.RevisionStrategyInput(
            type=enums.RevisionStrategyTypes.REVISE,
            count=revisionCount if revisionCount is not None else 1
        ) if enableRevision else None,
        rerankingStrategy=input_types.RerankingStrategyInput(
            serviceType=enums.RerankingModelServiceTypes.COHERE
        ) if enableRerank else None
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_conversation(specification_id: str):
    if graphlit.client is None:
        return;

    input = input_types.ConversationInput(
        name="Conversation",
        specification=input_types.EntityReferenceInput(
            id=specification_id
        )
    )

    try:
        response = await graphlit.client.create_conversation(input)

        return response.create_conversation.id if response.create_conversation is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_conversation(conversation_id: str):
    if graphlit.client is None:
        return;

    if conversation_id is not None:
        _ = await graphlit.client.delete_conversation(conversation_id)

async def prompt_conversation(conversation_id: str, prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.prompt_conversation(prompt, conversation_id)

        return response.prompt_conversation.message.message if response.prompt_conversation is not None and response.prompt_conversation.message is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_specifications():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_specifications(is_synchronous=True)

async def delete_all_conversations():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_conversations(is_synchronous=True)

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)


Execute Graphlit example

In [45]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing contents, conversations and specifications; only needed for notebook example
await delete_all_conversations()
await delete_all_specifications()
await delete_all_contents()

print('Deleted all contents, conversations and specifications.')

content_id = await ingest_uri(uri="https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3")

if content_id is not None:
    print(f'Ingested content [{content_id}]:')

Deleted all contents, conversations and specifications.
Ingested content [1af47cc7-4f4a-4d52-ade3-bc4e5de61ab7]:


In [46]:
    # Specify the RAG prompt
    prompt = "In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval."

Create Anthropic Sonnet 3.5 specification, using chunk-level retrieval.

In [47]:
    specification_id = await create_anthropic_specification(enums.AnthropicModels.CLAUDE_3_5_SONNET, enums.RetrievalStrategyTypes.CHUNK, False, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [e7996f18-3467-428f-a854-31586b810459].
Created conversation [202f3900-a149-4679-8271-f99372123a16].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to information that doesn't fit neatly into traditional database structures. It encompasses a wide range of file-based content such as documents, images, audio, and video. While these files have their own internal structure, the term 'unstructured' is used to differentiate them from more rigidly organized data types. Unstructured data often contains rich, contextual information that can be extremely valuable when properly analyzed and linked to other data sources.

The process of extracting insights from unstructured data involves multiple layers of analysis. First-order metadata, such as file headers and EXIF data, provides basic information about the file itself. Second-order metadata is derived through techniques like object detection in images or term extraction from documents. Third-order metadata involves making inferences and connections between different pieces of information, often utilizing machine learning and knowledge graph technologies to contextualize the data and link it to real-world entities.

The true power of unstructured data lies in its ability to provide deep, contextual insights when properly analyzed and connected. By using advanced techniques like knowledge graphs, natural language processing, and computer vision, organizations can uncover hidden relationships, trends, and patterns within their data. This allows for more sophisticated search and discovery capabilities, moving beyond simple keyword searches to semantic understanding and relationship-based queries. The result is a rich, interconnected web of information that can drive better decision-making, improve operational efficiency, and unlock new value from previously underutilized data assets.




Create Anthropic Sonnet 3.5 specification, using section-level retrieval.

In [48]:
    specification_id = await create_anthropic_specification(enums.AnthropicModels.CLAUDE_3_5_SONNET, enums.RetrievalStrategyTypes.SECTION, False, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [e400e3ae-7a69-4be2-914a-f780d599aaff].
Created conversation [7222738c-ce72-4061-a703-7fd4bb15adda].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to a broad set of file-based information that doesn't fit neatly into traditional structured databases. This includes images, audio, video, documents, emails, 3D geometry, point clouds, and more. While these files do have some inherent structure, they are considered 'unstructured' because their content is not easily queryable or analyzable without additional processing. The value of unstructured data lies in extracting insights and creating searchable metadata from this rich, but often underutilized, information.

To make unstructured data useful, it must be processed to extract metadata and insights. This can be done in stages, starting with 'first-order' metadata inherent in file headers, then moving to 'second-order' metadata derived from content analysis (e.g., object detection in images or transcription of audio), and finally to 'third-order' metadata that involves complex inferences and connections to other data sources. By applying techniques like computer vision, natural language processing, and machine learning, organizations can transform raw unstructured data into valuable, searchable knowledge.

The power of unstructured data lies in its ability to provide context and connections that may not be apparent from structured data alone. By creating knowledge graphs that link various pieces of information across different file types and sources, organizations can uncover trends, relationships, and insights that were previously hidden. This approach allows for more dynamic and flexible data models compared to traditional databases, enabling users to pivot on any entity or relationship and discover new connections. Furthermore, by analyzing historical unstructured data alongside current information, companies can identify long-term trends and make more informed decisions.




Create Anthropic Sonnet 3.5 specification, using content-level retrieval.

In [49]:
    specification_id = await create_anthropic_specification(enums.AnthropicModels.CLAUDE_3_5_SONNET, enums.RetrievalStrategyTypes.CONTENT, False, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [32748877-fe7a-4f2c-a9da-e5995f2f7f6c].
Created conversation [647489cd-0543-4e8c-bc9a-42688503680d].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to a broad set of file-based information that doesn't fit neatly into traditional structured databases. This includes images, audio, video, documents, emails, 3D geometry, point clouds, and more. While these files do have some inherent structure, the term 'unstructured' is used to differentiate them from highly organized, tabular data. The value in unstructured data lies in extracting insights and context from the content itself, rather than relying solely on predefined fields or schemas.

To make unstructured data useful, it needs to be processed and enriched with metadata. This can be done in stages, starting with 'first-order' metadata inherent to the file (like EXIF data in images), then applying 'second-order' metadata through techniques like object detection or text analysis, and finally deriving 'third-order' metadata by making inferences and connections to other data sources. This enrichment process allows for more effective searching, categorization, and analysis of the data.

Knowledge graphs are a powerful tool for capturing and retrieving insights from unstructured data. By representing entities and their relationships as nodes and edges in a graph, it becomes possible to create dynamic, flexible data models that can evolve over time. This approach allows for complex queries and the discovery of non-obvious connections between different pieces of information. Additionally, knowledge graphs can be continually enriched as new data is processed, creating a growing web of interconnected knowledge.

The true power of unstructured data analysis lies in its ability to uncover hidden patterns and insights across large volumes of information. By applying techniques like computer vision, natural language processing, and machine learning, it's possible to extract valuable context from seemingly disparate sources. This can lead to improved decision-making, trend analysis, and the ability to surface relevant information that might otherwise remain hidden in the 'dark data' of an organization's archives.




Create Anthropic Sonnet 3.5 specification, optimizing prompt for semantic search.

In [50]:
    specification_id = await create_anthropic_specification(enums.AnthropicModels.CLAUDE_3_5_SONNET, enums.RetrievalStrategyTypes.SECTION,
                                                            False, False, None, True, enums.PromptStrategyTypes.OPTIMIZE_SEARCH)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [3e59ef0b-b792-4f9a-b30f-c938fe5a3a06].
Created conversation [31949888-41cc-4065-a0c3-3b9a91b9957a].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to a broad set of file-based information that doesn't fit neatly into traditional database structures. This includes images, audio, video, 3D geometry, point clouds, documents, and emails. While these files have internal structures, they're considered 'unstructured' because their content isn't easily queryable or analyzable without additional processing. The value in unstructured data lies in extracting insights and context from this rich, but often underutilized, information.

To make unstructured data useful, it's processed through multiple layers of analysis. First-order metadata comes from file headers and embedded information. Second-order metadata is derived from content analysis, like object detection in images or term extraction from documents. Third-order metadata involves making inferences and connections between pieces of data, often using machine learning techniques. This layered approach allows for increasingly sophisticated understanding and contextualization of the information.

The power of unstructured data for knowledge capture and retrieval comes from its ability to form rich, interconnected knowledge graphs. By analyzing content, extracting entities (like people, places, and things), and establishing relationships between them, organizations can create powerful systems for data discovery and insight generation. This approach allows for dynamic, flexible querying of information across diverse data types and sources, enabling users to uncover non-obvious connections and trends that might be missed in more traditional, siloed data approaches.




Create Anthropic Sonnet 3.5 specification, rewriting prompt for clarity.

In [51]:
    specification_id = await create_anthropic_specification(enums.AnthropicModels.CLAUDE_3_5_SONNET, enums.RetrievalStrategyTypes.SECTION,
                                                            False, False, None, True, enums.PromptStrategyTypes.REWRITE)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [a6931a32-596f-4804-86a0-e0ec9c273205].
Created conversation [b40c5418-e1b8-412d-9136-25194bb0470b].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data encompasses a wide range of file-based information, including images, audio, 3D geometry, point clouds, documents, and emails. While these files have defined structures and formats, they are considered 'unstructured' because their content is not easily parsed or analyzed without additional processing. This data often contains valuable information that is not immediately apparent or searchable, making it challenging to extract insights and knowledge. Unstructured data can be categorized into first-order metadata (basic file information), second-order metadata (content-derived information), and third-order metadata (inferred connections and contextualization).

The significance of unstructured data lies in its potential for knowledge capture and retrieval. By applying advanced processing techniques such as computer vision, natural language processing, and machine learning, organizations can extract meaningful insights from their unstructured data. This process involves creating knowledge graphs that connect various pieces of information, allowing for more comprehensive analysis and discovery. For example, in the geospatial industry, unstructured data from drone imagery, maintenance reports, and sensor readings can be combined to create a rich contextual understanding of physical assets and their conditions over time.

Effective utilization of unstructured data can bring substantial benefits to various industries. In facility management, analyzing images and documents related to equipment maintenance can help predict failures and optimize operations. For urban planning, processing citizen-submitted photos of infrastructure issues can improve resource allocation. In the oil and gas sector, combining unstructured data from various sources can enhance exploration and production decisions. The key to unlocking the value of unstructured data lies in advanced indexing, contextualizing, and searching capabilities that go beyond traditional methods, enabling organizations to discover patterns, trends, and insights that were previously hidden in their vast data repositories.




Create OpenAI GPT-4o specification, with Cohere reranking.

In [52]:
    specification_id = await create_openai_specification(enums.OpenAIModels.GPT4O_128K, enums.RetrievalStrategyTypes.CHUNK, True, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [e97bfd20-87c9-4ec4-a204-50cc5402c79c].
Created conversation [34614c82-750e-44c4-96e2-ae961f248bd2].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to information that doesn't have a predefined data model or isn't organized in a pre-defined manner. This includes a wide variety of data types such as images, audio files, videos, documents, emails, and 3D models. Unlike structured data, which is highly organized and easily searchable in databases, unstructured data is more complex and requires advanced methods to parse and analyze. Despite its complexity, unstructured data is incredibly valuable because it encompasses a vast amount of information that can provide deep insights when properly analyzed.

One of the primary uses of unstructured data is in the field of knowledge capture and retrieval. By extracting insights from unstructured data, organizations can create a comprehensive knowledge base that includes various forms of information about real-world assets. For example, images and videos captured by drones, robots, or mobile phones can be analyzed to provide detailed information about physical environments. This data can then be used for tasks such as property inspections, maintenance reports, and even historical trend analysis.

The process of making unstructured data useful often involves the extraction of metadata, which provides context and makes the data searchable. Metadata can include information such as the date and time a file was created, the location where an image was taken, or the author of a document. This first-order metadata is crucial for organizing and retrieving unstructured data. Advanced techniques like sentiment analysis can also be applied to documents to understand the underlying emotions or opinions, adding another layer of context to the data.

Unstructured data is particularly useful for creating knowledge graphs, which are networks of interconnected data points that represent real-world entities like people, places, and things. These graphs can be used to contextualize data, making it easier to understand relationships and draw meaningful conclusions. For instance, a knowledge graph could link various documents, images, and videos related to a specific project, providing a holistic view that aids in decision-making and strategic planning.

Despite its potential, managing unstructured data comes with challenges. Organizations often struggle with data management, especially when dealing with large volumes of data stored in disparate systems. However, by leveraging cloud-native solutions and advanced data management platforms, companies can overcome these challenges. These platforms not only help in storing and organizing unstructured data but also provide tools for advanced analytics, making it easier to extract valuable insights and improve operational efficiency.




Create OpenAI GPT-4o specification, *without* Cohere reranking.

In [53]:
    specification_id = await create_openai_specification(enums.OpenAIModels.GPT4O_128K, enums.RetrievalStrategyTypes.CHUNK, False, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [7304bd37-e9f6-4c7b-9cbd-834f08d2c36f].
Created conversation [a9bc2ebb-e11e-4c51-a4bd-7221b8ed837c].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to information that doesn't have a predefined data model or isn't organized in a predefined manner. This includes a wide variety of data types such as text documents, images, videos, audio files, and even 3D models. Unlike structured data, which is highly organized and easily searchable in databases, unstructured data is more complex and requires advanced methods to parse and analyze. Despite its complexity, unstructured data is incredibly valuable because it represents the vast majority of data generated today.

The usefulness of unstructured data lies in its ability to capture a wide range of information that structured data cannot. For example, images and videos can provide visual context that text alone cannot convey. Audio recordings can capture nuances in speech and tone, while 3D models can represent physical spaces and objects in a way that 2D data cannot. This richness of information makes unstructured data invaluable for applications that require a deep understanding of context, such as geospatial analysis, sentiment analysis, and real-time monitoring of physical assets.

One of the key challenges with unstructured data is making it searchable and retrievable. This is where metadata comes into play. Metadata is data about data, and it helps in organizing and categorizing unstructured data. For instance, EXIF metadata in images can provide information about the camera settings and location where the photo was taken. By extracting and analyzing metadata, it's possible to create a more structured representation of unstructured data, making it easier to search and retrieve relevant information.

Advanced techniques such as machine learning and knowledge graphs are often employed to extract insights from unstructured data. Machine learning algorithms can be used to perform tasks like object detection in images, speech recognition in audio files, and entity recognition in text documents. Knowledge graphs, on the other hand, help in creating relationships between different pieces of data, enabling a more holistic understanding of the information. These techniques allow for the creation of a 'network effect' where the value of data increases as more connections are made.

In practical applications, unstructured data can be used for a variety of purposes. For example, in the field of geospatial analysis, unstructured data from drones, satellites, and mobile devices can be used to create detailed maps and models of physical spaces. In the business world, unstructured data can be used for sentiment analysis to gauge customer opinions and feedback. By making unstructured data searchable and retrievable, organizations can unlock valuable insights that were previously hidden, leading to better decision-making and more efficient operations.




Create Deepseek Chat specification, with *no* revisions.

In [54]:
    specification_id = await create_deepseek_specification(enums.DeepseekModels.CHAT, enums.RetrievalStrategyTypes.SECTION, False, False)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [ca47e463-de6c-4fac-a565-4c6738d34244].
Created conversation [9632b580-68ef-4335-a262-c5b4b6e4c971].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data refers to a broad category of data that includes various formats such as images, audio, 3D models, point clouds, documents, and emails. Unlike structured data, which is organized in a predefined manner, unstructured data lacks a specific schema or format, making it challenging to process and analyze directly. However, this diversity of formats also means that unstructured data can capture a wealth of information from different sources, including real-world assets like drones, robots, and mobile phones.

The usefulness of unstructured data lies in its ability to provide context and insights that structured data often misses. For instance, images and videos can capture visual information that is crucial for understanding spatial relationships and physical conditions. Audio recordings can capture conversations and sentiments that are valuable for sentiment analysis and knowledge graphs. Documents and emails can provide textual data that can be analyzed for trends, patterns, and relationships.

Knowledge capture and retrieval from unstructured data can be enhanced through the use of metadata, which provides additional context and structure. First-order metadata includes basic file information, while second-order metadata involves extracting data from the content itself, such as object detection in images. Third-order metadata involves inferring relationships and context, such as linking an image of a conveyor belt to its maintenance records in a database. This hierarchical approach allows for a more comprehensive understanding of the data.

Unstructured data is particularly valuable in creating knowledge graphs, which can represent complex relationships and entities in a dynamic and flexible manner. By connecting different data sources and formats, knowledge graphs can provide a semantic search capability that goes beyond simple keyword searches. This allows for more accurate and relevant information retrieval, especially in domains like geospatial data, where context and relationships are critical.

In summary, unstructured data, despite its inherent challenges, offers significant potential for knowledge capture and retrieval. Through the use of metadata and advanced techniques like knowledge graphs, unstructured data can be transformed into actionable insights, enabling better decision-making and problem-solving across various industries.




Create Deepseek Chat specification, with one revision.

In [55]:
    specification_id = await create_deepseek_specification(enums.DeepseekModels.CHAT, enums.RetrievalStrategyTypes.SECTION, False, True, 1)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [efc6dc20-2b77-46bd-9a4f-67918f96a4d0].
Created conversation [0f641047-f207-4ddd-ae39-56195ee5e1cb].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data encompasses a vast array of formats, including images, audio, 3D models, point clouds, documents, and emails, each lacking a predefined schema. This diversity makes it challenging to process directly but also captures a wealth of information from various sources such as drones, robots, and mobile phones. For instance, Kirk Marple, founder of Unstruct Data, has been dealing with geospatial data since his first job, highlighting the long-term value of such data in the media and software development sectors.

The usefulness of unstructured data is profound, particularly in providing context and insights that structured data often misses. Metadata, such as GPS location or timestamps, can be crucial for geospatial analysis. Additionally, second and third-order metadata, such as object detection in images or linking data to external databases, significantly enhance its value. For example, an aerial survey company might use photogrammetry to capture detailed images but struggle with data management, storing it on SharePoint without a robust search capability.

Knowledge capture from unstructured data involves extracting meaningful information through techniques like computer vision, natural language processing, and machine learning. This process identifies entities, relationships, and trends within the data, which can then be organized into knowledge graphs. These graphs enable semantic search and discovery, allowing users to find relevant information quickly. For instance, a knowledge graph can link a photo of a conveyor belt to its maintenance history, providing a comprehensive view of the asset.

Unstructured data is invaluable in industries relying on real-world assets and operations, such as geospatial, healthcare, and manufacturing. By leveraging unstructured data, organizations can gain insights into their operations, improve decision-making, and enhance predictive analytics. In healthcare, unstructured data from X-rays and patient records can be analyzed to detect patterns and improve diagnoses. In manufacturing, data from IoT devices and maintenance reports can optimize production processes.

The potential of unstructured data lies in its ability to contextualize and enrich information, transforming it into actionable knowledge. For example, Unstruct Data aims to create a knowledge hub for real-world assets in enterprise settings, linking as-built documentation with current data to provide a gold mine of insights. This approach not only enhances data discovery but also creates a semantic search capability, moving beyond simple file name or full-text searches.

In summary, unstructured data, despite its complexity, offers significant potential for knowledge capture and retrieval. By integrating various data sources and applying advanced analytics, organizations can unlock valuable insights that drive innovation and efficiency. The key lies in the ability to contextualize and enrich unstructured data, transforming it into actionable knowledge that supports strategic decision-making and operational improvements.




Create Deepseek Chat specification, with two revisions.

In [56]:
    specification_id = await create_deepseek_specification(enums.DeepseekModels.CHAT, enums.RetrievalStrategyTypes.SECTION, False, True, 2)

    if specification_id is not None:
        print(f'Created specification [{specification_id}].')

        conversation_id = await create_conversation(specification_id)

        if conversation_id is not None:
            print(f'Created conversation [{conversation_id}].')

            message = await prompt_conversation(conversation_id, prompt)

            if message is not None:
                display(Markdown('### Conversation:'))
                display(Markdown(f'**User:**\n{prompt}'))
                display(Markdown(f'**Assistant:**\n{message}'))
                print()

            await delete_conversation(conversation_id)

Created specification [8b1b12ad-e89c-41dc-95d8-656cd63ca7d3].
Created conversation [88036326-3fcb-4919-a0f1-caf38642ae6d].


### Conversation:

**User:**
In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval.

**Assistant:**
Unstructured data, a term often used to describe a broad array of data formats including images, audio, 3D models, point clouds, documents, and emails, lacks a predefined schema. This diversity of formats originates from various sources such as drones, robots, and mobile phones, which capture real-world assets and activities. For instance, an aerial survey company might use drones to capture high-resolution images for photogrammetry, yet face challenges in managing this data effectively, often storing files on SharePoint without leveraging cloud-native capabilities or robust search functionalities.

The utility of unstructured data lies in its rich context and potential insights. Metadata, such as EXIF data in images or timestamps in audio files, provides essential context. Advanced techniques like computer vision and natural language processing can further extract insights, such as identifying objects in images or sentiment in documents, enriching the data with higher-order metadata. For example, a system can analyze a photo of a conveyor belt to identify its components and link this information to maintenance reports and sensor data, offering a comprehensive view of the asset's condition.

Knowledge graphs play a pivotal role in harnessing unstructured data by creating dynamic networks of interconnected data points. These graphs link entities and relationships, facilitating the discovery of trends, patterns, and correlations. For instance, a knowledge graph can connect a recorded Zoom meeting about an inspection to related maintenance reports and sensor data, providing a holistic view of the asset's history and current state. This approach is particularly valuable in geospatial analysis, where data from drones, satellites, and IoT devices can be integrated to monitor real-world assets and environments.

Unstructured data is also invaluable for historical analysis and trend detection. By indexing and analyzing large volumes of unstructured data, organizations can uncover hidden insights and make informed decisions. For example, an oil and gas company might use this data to monitor equipment conditions over time, identifying patterns that could indicate impending failures. This proactive approach can lead to significant cost savings and operational efficiencies.

In summary, unstructured data, though complex, offers immense potential for knowledge capture and retrieval. Through advanced techniques like metadata extraction, machine learning, and knowledge graphs, unstructured data can be transformed into actionable insights. This enables better decision-making and innovation across various industries, from geospatial analysis to predictive maintenance in industrial settings.


