<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2025_02_10_Retrieve_Content_Sources.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to retrieve content sources for use within your own RAG pipeline.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [None]:
!pip install --upgrade graphlit-client

Initialize Graphlit

In [None]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [None]:
from typing import List, Optional

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def retrieve_sources(prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.retrieve_sources(
            prompt=prompt,
            retrieval_strategy=input_types.RetrievalStrategyInput(
                type=enums.RetrievalStrategyTypes.SECTION
            ),
            reranking_strategy=input_types.RerankingStrategyInput(
                serviceType=enums.RerankingModelServiceTypes.COHERE
            )
        )

        return response.retrieve_sources.results if response.retrieve_sources is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)


Execute Graphlit example

In [None]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing contents; only needed for notebook example
await delete_all_contents()

print('Deleted all contents.')

content_id = await ingest_uri(uri="https://graphlitplatform.blob.core.windows.net/samples/Unstructured%20Data%20is%20Dark%20Data%20Podcast.mp3")

if content_id is not None:
    print(f'Ingested content [{content_id}]:')

Retrieve content sources.

In [None]:
    # Specify the RAG prompt
    prompt = "In 3-5 detailed paragraphs, explain unstructured data and its usefulness for knowledge capture and retrieval."

    sources = await retrieve_sources(prompt)

    if sources is not None:
        print(f'Found [{len(sources)}] content sources.')

        for source in sources:
            if source is not None and source.content is not None:
                display(Markdown(f'## Content [{source.content.id}]: {source.relevance}'))

                print(f'Source type [{source.type}]')

                if source.type == enums.ContentSourceTypes.TRANSCRIPT:
                    print(f'Start [{source.start_time}], end [{source.end_time}]')

                print(source.metadata)
                print()
                print(source.text)
                print()
