<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_09_30_Publish_Podcast_Guest_Backgrounder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to ingest podcasts from Azure blob storage, and publish a summarized background bio of the podcast guest.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.

Place MP3 recordings of podcasts on Azure blob storage.

Assign these properties as Colab secrets: AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCESS_KEY.


---

Install Graphlit Python client SDK

In [1]:
!pip install --upgrade graphlit-client

Collecting graphlit-client
  Downloading graphlit_client-1.0.20240929002-py3-none-any.whl.metadata (2.7 kB)
Collecting httpx (from graphlit-client)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting websockets (from graphlit-client)
  Downloading websockets-13.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.8 kB)
Collecting httpcore==1.* (from httpx->graphlit-client)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->graphlit-client)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading graphlit_client-1.0.20240929002-py3-none-any.whl (198 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m198.3/198.3 kB[0m [31m1.9 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m2.9 MB/s[0m eta [

Initialize Graphlit

In [2]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [3]:
from typing import List, Optional

async def create_specification(model: enums.OpenAIModels):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"OpenAI [{model}]",
        type=enums.SpecificationTypes.EXTRACTION,
        serviceType=enums.ModelServiceTypes.OPEN_AI,
        openAI=input_types.OpenAIModelPropertiesInput(
            model=model
        )
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_feed(account_name: str, container_name: str, storage_key: str, prefix: str, workflow_id: Optional[str] = None, read_limit: Optional[int] = None):
    if graphlit.client is None:
        return;

    input = input_types.FeedInput(
        name=f'Azure blob storage',
        type=enums.FeedTypes.SITE,
        site=input_types.SiteFeedPropertiesInput(
            type=enums.FeedServiceTypes.AZURE_BLOB,
            isRecursive=False,
            azureBlob=input_types.AzureBlobFeedPropertiesInput(
                accountName=account_name,
                containerName=container_name,
                storageAccessKey=storage_key,
                prefix=prefix
            ),
            readLimit=read_limit
        ),
        workflow=input_types.EntityReferenceInput(
            id=workflow_id
        ) if workflow_id is not None else None
    )

    try:
        response = await graphlit.client.create_feed(input)

        return response.create_feed.id if response.create_feed is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def is_feed_done(feed_id: str):
    if graphlit.client is None:
        return;

    response = await graphlit.client.is_feed_done(feed_id)

    return response.is_feed_done.result if response.is_feed_done is not None else None

async def get_content(content_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.get_content(content_id)

        return response.content
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def publish_content(summary_specification_id: str, publish_specification_id: str, summary_prompt: str, publish_prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.publish_contents(
            name="Published Summary",
            connector=input_types.ContentPublishingConnectorInput(
               type=enums.ContentPublishingServiceTypes.TEXT,
               format=enums.ContentPublishingFormats.MARKDOWN,
            ),
            summary_prompt=summary_prompt,
            summary_specification=input_types.EntityReferenceInput(
                id=summary_specification_id
            ),
            publish_prompt=publish_prompt,
            publish_specification=input_types.EntityReferenceInput(
                id=publish_specification_id
            ),
            is_synchronous=True
        )

        return response.publish_contents if response.publish_contents is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_specifications():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_specifications(is_synchronous=True)

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)

async def delete_all_feeds():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_feeds(is_synchronous=True)


Execute Graphlit example

In [4]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing feeds and contents; only needed for notebook example
await delete_all_feeds()
await delete_all_contents()

print('Deleted all feeds and contents.')

# NOTE: point to an Azure blob container with MP3 recordings of podcasts
container_name = 'samples'
prefix = 'Podcasts/'

account_name = userdata.get('AZURE_STORAGE_ACCOUNT_NAME')
storage_access_key = userdata.get('AZURE_STORAGE_ACCESS_KEY')

feed_id = await create_feed(account_name, container_name, storage_access_key, prefix)

if feed_id is not None:
    print(f'Created feed [{feed_id}].')

    # Wait for feed to complete, since ingestion happens asychronously
    done = False
    time.sleep(5)
    while not done:
        done = await is_feed_done(feed_id)

        if not done:
            time.sleep(2)

    print(f'Completed feed [{feed_id}].')


Deleted all feeds and contents.
Created feed [e3bdbe9f-1926-4071-81d4-85352c391d5d].
Completed feed [e3bdbe9f-1926-4071-81d4-85352c391d5d].


In [5]:
# Remove any existing specifications; only needed for notebook example

await delete_all_specifications()

print('Deleted all specifications.')

# Configure details about podcast guest
guest_name = 'Kirk Marple'
guest_first_name = 'Kirk'
company_name = 'Graphlit'
guest_pronoun = 'his'

# Configure the summary prompt to extract key details about the podcast guest you're writing about
summary_prompt = f"""You are being provided the transcript of a podcast, where {guest_name} was a guest.
Focus on any details that {guest_first_name} talks about, especially {guest_pronoun} professional background, and {guest_pronoun} vision for starting the company named {company_name}.
This information will be used to compile a detailed backgrounder about {guest_first_name} and {company_name}.
Respond with 25 verbose bullet points covering all relevant details. Be specific about any named entities like persons, companies or places.
"""

# Configure the publish prompt to compile the final backgrounder report from the details captured from the podcasts
publish_prompt = f"""
You are responding to a request to write a backgrounder about {guest_name} and {company_name}.
Write a detailed backgrounder report, describing {guest_first_name} and {company_name} in the third-person.
Make sure to cover {guest_first_name}'s early career background, previous companies that were started, and the vision for starting {company_name}.
"""

# Select the model to use for summarization; using GPT-4o Mini because of speed
summary_model = enums.OpenAIModels.GPT4O_MINI_128K
# Select the model to use for publishing; using o1-preview because of the more detailed responses and thought put into them
publish_model = enums.OpenAIModels.O1_PREVIEW_128K

summary_specification_id = await create_specification(summary_model)

if summary_specification_id is not None:
    print(f'Created summary specification [{summary_specification_id}].')

    publish_specification_id = await create_specification(publish_model)

    if publish_specification_id is not None:
        print(f'Created publish specification [{publish_specification_id}].')

        published_content = await publish_content(summary_specification_id, publish_specification_id, summary_prompt, publish_prompt)

        if published_content is not None:
            display(Markdown(f'### Published summary'))
            display(Markdown(published_content.markdown))

Deleted all specifications.
Created summary specification [84b2d07e-a787-4673-bf6e-733956b7d457].
Created publish specification [8ebafb6e-e699-4b29-bf92-4684cefbbdd8].


### Published summary

# Backgrounder on Kirk Marple and Graphlit

Kirk Marple is a seasoned technology leader with over 25 years of experience in software development and data management. His career spans across various domains, including multimedia systems, unstructured data management, and knowledge graph technologies. As the founder and CEO of Graphlit, Kirk has been at the forefront of innovation in handling unstructured data, aiming to make it more accessible and actionable for businesses across industries.

## Early Career Background

Kirk began his professional journey after earning a degree in computer science from the University of Pennsylvania and a master's degree from the University of British Columbia, where he focused on image processing and real-time video technologies. In 1994, he joined Microsoft, where he worked for six years, including a significant tenure in Microsoft Research.

At Microsoft, Kirk was involved in pioneering projects that laid the groundwork for future technological advancements. He contributed to the development of Blackbird, a multimedia platform for MSN, and worked on 3D virtual worlds, which were precursors to today's metaverse concepts. His work encompassed multimedia technologies, 3D graphics, and the early iterations of Windows Media Player. This period provided Kirk with deep insights into multimedia data, file formats, and the potential of technology to transform how people interact with digital content.

## Previous Companies Founded

After his impactful stint at Microsoft, Kirk channeled his expertise into entrepreneurial ventures. He founded Radiant Grid, a video transcoding and media management company. Radiant Grid specialized in providing advanced transcoding solutions for broadcast and media companies, focusing on both web video and traditional broadcast video. Under his leadership, the company developed software that was adopted by major broadcasters such as ESPN, NBC, Fox, and every PBS station across the United States. Radiant Grid's technology played a crucial role in processing and managing large volumes of video content, aiding in the seamless delivery of media across various platforms.

Kirk successfully bootstrapped Radiant Grid, running it for over a decade. His hands-on approach and deep technical knowledge allowed the company to innovate rapidly and respond to the evolving needs of the media industry. In recognition of its value and impact, Radiant Grid was eventually acquired, marking a significant milestone in Kirk's entrepreneurial journey.

Following the sale of Radiant Grid, Kirk took on executive roles at several technology companies, including positions as Chief Technology Officer (CTO) and Vice President (VP). He also worked at General Motors (GM), where he delved into the automotive industry's data challenges. At GM, Kirk developed data pipelines for autonomous vehicles, specifically for Cruise Automation. His work involved processing vast amounts of data from lidar and video systems, reinforcing the parallels between media data management and the data requirements of autonomous technologies.

## Vision for Starting Graphlit

Kirk's experiences across media, broadcasting, and the automotive industry illuminated a significant gap in the technology landscape: the lack of robust tools and platforms for managing unstructured data. He observed that while structured data had advanced platforms like Fivetran and Snowflake, there was no equivalent for handling the diverse and complex nature of unstructured data, which constitutes a substantial portion of all data generated globally.

Recognizing this unmet need, Kirk founded Graphlit approximately three years ago. His vision for Graphlit was to build an unstructured data platform that could ingest, process, and make sense of various data types—including documents, images, audio, video, and 3D geometry. He aimed to create a solution that would not only store unstructured data but also enrich it through metadata extraction, machine learning, and knowledge graphs.

Graphlit focuses on making unstructured data explorable and actionable. By leveraging knowledge graphs, the platform connects disparate data points, revealing relationships and insights that would otherwise remain hidden. Kirk integrated advanced technologies such as Retrieval-Augmented Generation (RAG) to enhance the platform's capabilities, enabling more accurate and context-aware information retrieval.

His vision extends to providing developers and businesses with tools that simplify the integration of unstructured data into applications, particularly those utilizing large language models (LLMs) and artificial intelligence (AI). Graphlit's platform abstracts the complexity of data management, allowing users to focus on building innovative solutions without worrying about the underlying infrastructure.

Kirk's dedication to solving the unstructured data challenge is driven by his belief that unlocking the value hidden within this data can transform industries. By making unstructured data more accessible, Graphlit empowers organizations to derive meaningful insights, automate processes, and make informed decisions based on a comprehensive understanding of their data assets.

# Conclusion

Kirk Marple's journey is a testament to his commitment to innovation and his ability to identify and address critical gaps in the technology sector. From his early days at Microsoft to his entrepreneurial successes with Radiant Grid and now Graphlit, he has consistently pushed the boundaries of what's possible in software and data management. His vision for Graphlit is poised to revolutionize how organizations handle unstructured data, making it an indispensable asset in the era of big data and AI-driven applications.