<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2025_02_19_Transcribe_Podcast_using_Assembly_AI.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to ingest a podcast MP3 by URL, and configure a workflow to use Assembly.AI for audio transcription.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [3]:
!pip install --upgrade graphlit-client



Initialize Graphlit

In [4]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [7]:
from typing import List, Optional

async def create_workflow():
    if graphlit.client is None:
        return;

    input = input_types.WorkflowInput(
        name="Audio Preparation",
        preparation=input_types.PreparationWorkflowStageInput(
            jobs=[
                input_types.PreparationWorkflowJobInput(
                    connector=input_types.FilePreparationConnectorInput(
                        type=enums.FilePreparationServiceTypes.ASSEMBLY_AI,
                        assemblyAI=input_types.AssemblyAIAudioPreparationPropertiesInput(
                            model=enums.AssemblyAIModels.BEST,
                            detectLanguage=True
                        )
                    )
                )
            ]
        )
    )

    try:
        response = await graphlit.client.create_workflow(input)

        return response.create_workflow.id if response.create_workflow is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def ingest_uri(uri: str, workflow_id: Optional[str] = None):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, workflow=input_types.EntityReferenceInput(id=workflow_id) if workflow_id is not None else None, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def get_content(content_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.get_content(content_id)

        return response.content
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_content(content_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.delete_content(content_id)

        return response.delete_content.id if response.delete_content is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_workflows():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_workflows(is_synchronous=True)

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)


Execute Graphlit example

In [8]:
from IPython.display import display, Markdown

# Remove any existing contents and workflows; only needed for notebook example
await delete_all_contents()
await delete_all_workflows()

print('Deleted all contents and workflows.')

uri = "https://graphlitplatform.blob.core.windows.net/samples/Podcasts/GraphRAG%20Knowledge%20Graphs%20for%20AI%20Applications%20with%20Kirk%20Marple.mp3"

workflow_id = await create_workflow()

if workflow_id is not None:
    print(f'Created workflow [{workflow_id}]:')

    content_id = await ingest_uri(uri=uri, workflow_id=workflow_id)

    if content_id is not None:
        print(f'Ingested content [{content_id}] with Assembly.AI:')

        content = await get_content(content_id)

        if content is not None and content.markdown is not None:
            display(Markdown(f'## View [Extracted JSON]({content.text_uri})'))
            print()

            print('-------------------------------------------------------------------')
            print(content.markdown)
            print('-------------------------------------------------------------------')

            await delete_content(content_id)


Deleted all contents and workflows.
Created workflow [4c185f27-807c-406c-b3b3-27e447fdf473]:
Ingested content [570413d5-511f-40e2-941d-367ff4ffb06e] with Assembly.AI:


## View [Extracted JSON](None)


-------------------------------------------------------------------
[00:00:00] Foreign welcome to another episode of the TWIML AI podcast. I am your host Sam Charrington and today I'm joined by Kirk Marple. Kirk is CEO and founder of graphlit. Before we get going, be sure to take a moment to hit that subscribe button wherever you're listening to today's show. Kirk, welcome to the podcast.

[00:00:25] Yeah, thanks so much. I've been a longtime listener and glad to finally be part of this. I'm excited to have you on the show and I'm really looking forward to our topic. We're going to be digging into what you're doing at graphlit, but in particular the broad space of Graph rag. Tell us a little bit about graphlit and kind of how you're approaching RAG as a space.

[00:00:48] Yeah, for sure. I mean we've been around for about three years, had started really building an unstructured data platform for I mean everything, I mean multimodal data, documents, audio, video and really started gett