<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_09_06_Summarize_Podcast.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to ingest a podcast MP3 by URL, and summarize in various formats, including generating chapters.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [1]:
!pip install --upgrade graphlit-client

Collecting graphlit-client
  Downloading graphlit_client-1.0.20240903001-py3-none-any.whl.metadata (2.7 kB)
Collecting httpx (from graphlit-client)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting websockets (from graphlit-client)
  Downloading websockets-13.0.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting httpcore==1.* (from httpx->graphlit-client)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->graphlit-client)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading graphlit_client-1.0.20240903001-py3-none-any.whl (197 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m197.7/197.7 kB[0m [31m7.4 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m3.5 MB/s[0m eta 

Initialize Graphlit

In [2]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [3]:
from typing import List, Optional

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def summarize_audio(prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.summarize_contents(
            filter=input_types.ContentFilter(
                # Filter just on audio content
                fileTypes=[enums.FileTypes.AUDIO],
            ),
            summarizations=[
                input_types.SummarizationStrategyInput(
                    type=enums.SummarizationTypes.CHAPTERS,
                ),
                input_types.SummarizationStrategyInput(
                    type=enums.SummarizationTypes.CUSTOM,
                    prompt=prompt
                ),
            ]
        )

        return response.summarize_contents
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)


Execute Graphlit example

In [5]:
from IPython.display import display, Markdown

# Remove any existing contents; only needed for notebook example
await delete_all_contents()

print('Deleted all contents.')

content_id = await ingest_uri(uri="https://graphlitplatform.blob.core.windows.net/samples/Podcasts/GraphRAG%20Knowledge%20Graphs%20for%20AI%20Applications%20with%20Kirk%20Marple.mp3")

print(f'Ingested content [{content_id}].')

prompt = "What is one key takeaway from this podcast? Explain thoroughly."

summarizations = await summarize_audio(prompt)

if summarizations is not None:
    for summarization in summarizations:
        if summarization is not None and summarization.items is not None:
            print(f'Summarization [{summarization.type}]:')

            for item in summarization.items:
                display(Markdown(item.text))

            print()

Deleted all contents.
Ingested content [a5590f5f-57d1-413c-833c-aa099ca734d5].
Summarization [CHAPTERS]:


[00:00:00] Introduction and Guest Welcome

[00:00:35] Overview of Graphlet and GraphRAG

[00:01:19] Early Days and Evolution of Graphlet

[00:02:03] Data Ingestion and Metadata

[00:03:08] Building a Knowledge Graph

[00:04:48] Entity Extraction Challenges

[00:06:14] Using Different Models for Extraction

[00:08:03] Data Enrichment and Chaining Models

[00:09:02] Ingestion Pipeline and Data Storage

[00:10:52] Schema.org and Entity Relationships

[00:12:22] Data Retrieval and Vector Databases

[00:15:04] Future of Graph and Vector Databases

[00:19:38] Potential of GraphRAG

[00:20:23] Query Processing and Re-ranking

[00:25:00] Evaluation and Relevance

[00:26:46] Dynamic Prompting and Compilation

[00:31:00] Model Selection and Orchestration

[00:33:02] Developer Experience and SDKs

[00:36:08] Use Cases Beyond Chatbots

[00:39:04] Dynamic Content Generation

[00:41:10] Future Directions and Agents

[00:44:15] Platform as a Service and Cloud Integration

[00:45:52] Conclusion and Future Outlook


Summarization [CUSTOM]:


One key takeaway from this podcast is the innovative approach Graphlet is taking to integrate knowledge graphs with Retrieval-Augmented Generation (RAG) for AI applications. Kirk Marple, CEO and founder of Graphlet, explains how their platform focuses on building a comprehensive unstructured data platform that incorporates multimodal data (documents, audio, video) into a knowledge graph. This integration allows for more efficient data retrieval and exploration, which is crucial for AI and machine learning models. Graphlet's approach includes advanced techniques like entity extraction, metadata filtering, and the use of vector databases, making it easier to manage and utilize large volumes of unstructured data. This method not only enhances the retrieval process but also opens up new possibilities for content repurposing and dynamic content generation, moving beyond traditional chatbot applications to more complex and valuable use cases.


