<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_09_30_Publish_Podcast_Guest_Backgrounder.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to ingest podcasts from Azure blob storage, and publish a summarized background bio of the podcast guest.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.

Place MP3 recordings of podcasts on Azure blob storage.

Assign these properties as Colab secrets: AZURE_STORAGE_ACCOUNT_NAME and AZURE_STORAGE_ACCESS_KEY.


---

Install Graphlit Python client SDK

In [1]:
!pip install --upgrade graphlit-client

Collecting graphlit-client
  Downloading graphlit_client-1.0.20241228002-py3-none-any.whl.metadata (3.2 kB)
Downloading graphlit_client-1.0.20241228002-py3-none-any.whl (236 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m236.6/236.6 kB[0m [31m4.1 MB/s[0m eta [36m0:00:00[0m
[?25hInstalling collected packages: graphlit-client
Successfully installed graphlit-client-1.0.20241228002


Initialize Graphlit

In [2]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [3]:
from typing import List, Optional

async def create_specification(model: enums.OpenAIModels):
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name=f"OpenAI [{model}]",
        type=enums.SpecificationTypes.EXTRACTION,
        serviceType=enums.ModelServiceTypes.OPEN_AI,
        openAI=input_types.OpenAIModelPropertiesInput(
            model=model
        )
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def create_feed(account_name: str, container_name: str, storage_key: str, prefix: str, workflow_id: Optional[str] = None, read_limit: Optional[int] = None):
    if graphlit.client is None:
        return;

    input = input_types.FeedInput(
        name=f'Azure blob storage',
        type=enums.FeedTypes.SITE,
        site=input_types.SiteFeedPropertiesInput(
            type=enums.FeedServiceTypes.AZURE_BLOB,
            isRecursive=False,
            azureBlob=input_types.AzureBlobFeedPropertiesInput(
                accountName=account_name,
                containerName=container_name,
                storageAccessKey=storage_key,
                prefix=prefix
            ),
            readLimit=read_limit
        ),
        workflow=input_types.EntityReferenceInput(
            id=workflow_id
        ) if workflow_id is not None else None
    )

    try:
        response = await graphlit.client.create_feed(input)

        return response.create_feed.id if response.create_feed is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def is_feed_done(feed_id: str):
    if graphlit.client is None:
        return;

    response = await graphlit.client.is_feed_done(feed_id)

    return response.is_feed_done.result if response.is_feed_done is not None else None

async def get_content(content_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.get_content(content_id)

        return response.content
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def publish_content(summary_specification_id: str, publish_specification_id: str, summary_prompt: str, publish_prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.publish_contents(
            name="Published Summary",
            connector=input_types.ContentPublishingConnectorInput(
               type=enums.ContentPublishingServiceTypes.TEXT,
               format=enums.ContentPublishingFormats.MARKDOWN,
            ),
            summary_prompt=summary_prompt,
            summary_specification=input_types.EntityReferenceInput(
                id=summary_specification_id
            ),
            publish_prompt=publish_prompt,
            publish_specification=input_types.EntityReferenceInput(
                id=publish_specification_id
            ),
            is_synchronous=True
        )

        return response.publish_contents.content if response.publish_contents is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_specifications():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_specifications(is_synchronous=True)

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)

async def delete_all_feeds():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_feeds(is_synchronous=True)


Execute Graphlit example

In [4]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing feeds and contents; only needed for notebook example
await delete_all_feeds()
await delete_all_contents()

print('Deleted all feeds and contents.')

# NOTE: point to an Azure blob container with MP3 recordings of podcasts
container_name = 'samples'
prefix = 'Podcasts/'

account_name = userdata.get('AZURE_STORAGE_ACCOUNT_NAME')
storage_access_key = userdata.get('AZURE_STORAGE_ACCESS_KEY')

feed_id = await create_feed(account_name, container_name, storage_access_key, prefix)

if feed_id is not None:
    print(f'Created feed [{feed_id}].')

    # Wait for feed to complete, since ingestion happens asychronously
    done = False
    time.sleep(5)
    while not done:
        done = await is_feed_done(feed_id)

        if not done:
            time.sleep(2)

    print(f'Completed feed [{feed_id}].')


Deleted all feeds and contents.
Created feed [35d2267e-ed38-452a-b61d-16c3d15539ce].
Completed feed [35d2267e-ed38-452a-b61d-16c3d15539ce].


In [5]:
# Remove any existing specifications; only needed for notebook example

await delete_all_specifications()

print('Deleted all specifications.')

# Configure details about podcast guest
guest_name = 'Kirk Marple'
guest_first_name = 'Kirk'
company_name = 'Graphlit'
guest_pronoun = 'his'

# Configure the summary prompt to extract key details about the podcast guest you're writing about
summary_prompt = f"""You are being provided the transcript of a podcast, where {guest_name} was a guest.
Focus on any details that {guest_first_name} talks about, especially {guest_pronoun} professional background, and {guest_pronoun} vision for starting the company named {company_name}.
This information will be used to compile a detailed backgrounder about {guest_first_name} and {company_name}.
Respond with 25 verbose bullet points covering all relevant details. Be specific about any named entities like persons, companies or places.
"""

# Configure the publish prompt to compile the final backgrounder report from the details captured from the podcasts
publish_prompt = f"""
You are responding to a request to write a backgrounder about {guest_name} and {company_name}.
Write a detailed backgrounder report, describing {guest_first_name} and {company_name} in the third-person.
Make sure to cover {guest_first_name}'s early career background, previous companies that were started, and the vision for starting {company_name}.
"""

# Select the model to use for summarization; using GPT-4o Mini because of speed
summary_model = enums.OpenAIModels.GPT4O_MINI_128K
# Select the model to use for publishing; using o1-preview because of the more detailed responses and thought put into them
publish_model = enums.OpenAIModels.O1_PREVIEW_128K

summary_specification_id = await create_specification(summary_model)

if summary_specification_id is not None:
    print(f'Created summary specification [{summary_specification_id}].')

    publish_specification_id = await create_specification(publish_model)

    if publish_specification_id is not None:
        print(f'Created publish specification [{publish_specification_id}].')

        published_content = await publish_content(summary_specification_id, publish_specification_id, summary_prompt, publish_prompt)

        if published_content is not None:
            display(Markdown(f'### Published summary'))
            display(Markdown(published_content.markdown))

Deleted all specifications.
Created summary specification [940fec79-3fec-4682-a9d0-c9899406f6d7].
Created publish specification [9682b8bf-4272-44d7-a68b-e5b7eb95e1bc].


### Published summary

# Backgrounder: Kirk Marple and Graphlit

Kirk Marple is a seasoned technology leader and entrepreneur with over 25 years of experience in software development, media management, and data analytics. His career spans significant contributions to pioneering technologies and the founding of innovative companies aimed at solving complex data challenges.

## Early Career at Microsoft

After completing his master's degree in computer science at the University of British Columbia, Kirk began his professional journey at Microsoft in the mid-1990s.[^1] During his six-year tenure, he worked on multimedia projects within Microsoft Research, contributing to early advancements in 3D virtual worlds—a precursor to what is now known as the metaverse.[^2] His work at Microsoft laid the foundation for his expertise in image processing, real-time video, and multimedia technologies.

## Founding Radiant Grid and Media Innovations

Following his time at Microsoft, Kirk identified an opportunity in the burgeoning field of digital media. He founded Radiant Grid, a video transcoding and media management company that specialized in the broadcast industry.[^3] Under his leadership, Radiant Grid developed software solutions that were adopted by major broadcasters and studios, including ESPN, PBS, and NBC.[^4] The company's technology played a crucial role in the early days of web and broadcast video, facilitating the transition to digital formats and cloud services.

Radiant Grid's success was marked by its widespread adoption across every PBS station in the United States and its use in high-profile events like the Olympics.[^5] Kirk successfully bootstrapped the company over a decade before it was acquired, demonstrating his ability to lead and grow a tech enterprise from the ground up.

## Transition to Data Analytics at General Motors

After the sale of Radiant Grid, Kirk sought new challenges and joined General Motors (GM), where he worked on data pipelines for Cruise, GM's autonomous vehicle division.[^6] At GM, he applied his media management expertise to automotive data, specifically handling LIDAR and video data for data science applications.[^7] His work involved building architectures and prototypes using cutting-edge technologies like Kafka and Cassandra, which were instrumental in processing the massive amounts of data generated by autonomous vehicles.

## Identifying the Need for Unstructured Data Solutions

Throughout his career, Kirk recognized a recurring gap in the market: the lack of effective tools for managing unstructured data across various industries.[^8] Unstructured data—which includes images, videos, audio files, 3D models, and documents—constitutes a significant portion of data generated by enterprises but often remains underutilized due to the challenges in organizing and analyzing it.

Kirk observed that traditional data management solutions were insufficient for handling unstructured data, which requires different approaches for metadata enrichment, searchability, and integration with machine learning models.[^9] This insight became the catalyst for his next entrepreneurial venture.

## Founding Graphlit: Vision and Mission

In response to the unmet needs he identified, Kirk founded Graphlit (initially known as Unstruk Data) in 2020.[^10] Graphlit is an unstructured data platform designed to help organizations transform their unstructured data into actionable intelligence.[^11] The platform focuses on:

- **Automated Data Preparation**: Enriching metadata through machine learning and artificial intelligence to make data more accessible and meaningful.[^12]
- **Integrated Compute and Graph-Based Search**: Utilizing knowledge graphs to dynamically organize and correlate data, enabling advanced search capabilities across various data types.[^13]
- **Unstructured Data Warehouse**: Providing a scalable, serverless architecture that allows organizations to store, manage, and analyze large volumes of unstructured data efficiently.[^14]

Kirk's vision for Graphlit is to bridge the gap between raw unstructured data and valuable insights. By employing techniques such as metadata enrichment and knowledge graphs, Graphlit enables enterprises to uncover relationships within their data that were previously hidden or too complex to analyze.[^15]

## Impact and Future Endeavors

Under Kirk's leadership, Graphlit is positioning itself as a key player in the evolving landscape of data management. The company's solutions cater to a wide range of industries, including media and entertainment, manufacturing, oil and gas, and autonomous vehicles.[^16] By addressing the challenges of unstructured data, Graphlit empowers organizations to make data-driven decisions, improve operational efficiencies, and unlock new opportunities for innovation.

Kirk continues to lead Graphlit with a focus on customer-centric solutions, leveraging his extensive experience in software development and data analytics to drive the company's mission forward.[^17] His commitment to building a design-led company that prioritizes user experience and technical excellence remains central to Graphlit's ongoing success.

[^1]: "Data Leadership for Everyone" podcast - "Unstructured Data, Metadata, and Graph Search, Oh My! with Kirk Marple"
[^2]: "The Analytic Mind" podcast - "The Importance of Utilizing Unstructured Data with Kirk Marple"
[^3]: "The Founder Pack Podcast With Brendon Rod" - "A Conversation With Kirk Marple, CEO & Founder @ Unstruk Data"
[^4]: "IT Career Energizer" podcast - "Look For The Opportunities To Grow and Don’t Doubt Yourself with Kirk Marple"
[^5]: "Earley AI Podcast" - "It’s All About the Data - Kirk Marple"
[^6]: "Data Engineering Podcast" - "Bring Order To The Chaos Of Your Unstructured Data Assets With Unstruk"
[^7]: "The Modern .NET Show" - "Unstructured Data With Kirk Marple"
[^8]: "Data Science Conversations" podcast - "Enhancing GenAI with Knowledge Graphs: A Deep Dive with Kirk Marple"
[^9]: "Tech Entrepreneur on a Mission" podcast - "On making data actionable - Kirk Marple"
[^10]: "The TWIML AI Podcast" - "GraphRAG: Knowledge Graphs for AI Applications with Kirk Marple"
[^11]: "How AI is Built" podcast - "Knowledge Graphs for Better RAG, Virtual Entities, Hybrid Data Models"
[^12]: "Code Story" podcast - "Kirk Marple, Unstruk Data"
[^13]: "The Thoughtful Entrepreneur" podcast - "Data Management with Unstructured Data"
[^14]: "Silicon Alley" podcast - "Building a Remote First Company & Unlocking Unstructured Data in Your Business"
[^15]: "Discoposse Podcast" - "Kirk Marple of Unstruk Data on the Unstructured Data Challenge and Lessons of a Technical Founder"
[^16]: "Mapscaping Podcast" - "Unstructured Data is Dark Data"
[^17]: "Syntio Podcast" - "Data Platform - Unstructured Data"