<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_09_12_Publish_Audio_Review_of_Paper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to ingest a PDF of an academic paper, use Sonnet 3.5 to write a comprehensive review of the paper, and listen to an audio rendition published using an [ElevenLabs](https://elevenlabs.io/) voice.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [1]:
!pip install --upgrade graphlit-client

Collecting graphlit-client
  Downloading graphlit_client-1.0.20240910001-py3-none-any.whl.metadata (2.7 kB)
Collecting httpx (from graphlit-client)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting websockets (from graphlit-client)
  Downloading websockets-13.0.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting httpcore==1.* (from httpx->graphlit-client)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->graphlit-client)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading graphlit_client-1.0.20240910001-py3-none-any.whl (197 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m197.8/197.8 kB[0m [31m5.1 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m5.2 MB/s[0m eta 

In [2]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [3]:
from typing import List, Optional

# Create specification for Anthropic Sonnet 3.5
async def create_specification():
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name="Anthropic Claude Sonnet 3.5",
        type=enums.SpecificationTypes.EXTRACTION,
        serviceType=enums.ModelServiceTypes.ANTHROPIC,
        anthropic=input_types.AnthropicModelPropertiesInput(
            model=enums.AnthropicModels.CLAUDE_3_5_SONNET,
        ),
        # NOTE: Optionally, ask LLM to revise it's response, which guarantees a full length and more detailed response
#        revisionStrategy=input_types.RevisionStrategyInput(
#            type=enums.RevisionStrategyTypes.CUSTOM,
#            customRevision="OK, that's not bad, but it needs more technical depth for this audience. You can do better than this. Reread all the context provided, and revise this into a longer, more thorough and compelling version. Don't mention anything about the revision.",
#            count=1
#        )
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def get_content(content_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.get_content(content_id)

        return response.content
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def publish_content(content_id: str, specification_id: str, prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.publish_contents(
            name="Published Summary",
            connector=input_types.ContentPublishingConnectorInput(
               type=enums.ContentPublishingServiceTypes.ELEVEN_LABS_AUDIO,
               format=enums.ContentPublishingFormats.MP3,
               elevenLabs=input_types.ElevenLabsPublishingPropertiesInput(
                   model=enums.ElevenLabsModels.TURBO_V2_5,
                   voice="ZF6FPAbjXT4488VcRRnw" # ElevenLabs Amelia voice
               )
            ),
            summary_specification=input_types.EntityReferenceInput(
                id=specification_id
            ),
            publish_prompt = prompt,
            publish_specification=input_types.EntityReferenceInput(
                id=specification_id
            ),
            filter=input_types.ContentFilter(
                id=content_id
            ),
            is_synchronous=True
        )

        return response.publish_contents if response.publish_contents is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)

In [4]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing contents; only needed for notebook example
await delete_all_contents()

print('Deleted all contents.')

uri = "https://graphlitplatform.blob.core.windows.net/samples/Attention%20Is%20All%20You%20Need.1706.03762.pdf"
title = "Attention Is All You Need"
prompt = f"""
Speak as if you are a Ph.D. candidate who is reviewing a paper, and talking to your peers.

Follow these steps.

Step 1: Think about a structure for 10 minute long, engaging AI-generated paper review, with an welcome and introduction, an in-depth discussion of 4-6 interesting topics from the paper, and a wrap-up.
Step 2: For each topic, write 4-6 detailed paragraphs discussing it in-depth. Touch on key points for each topic which would be interesting to listeners. Mention the content metadata, entities and details from the provided summaries, as appropriate in the discussion. Remove any topic or section headings. Remove any references to podcast background music.  Remove any timestamps.
Step 3: Combine all topics into a lengthy, single-person script which can be used to record this audio review. Use friendly and compelling conversation to write the scripts.  You can be witty, but don't be cheesy.
Step 4: Remove any unnecessary formatting or final notes about being AI generated.

Refer to the content as the '{title}' paper.

Be specific when referencing persons, organizations, or any other named entities.
"""

specification_id = await create_specification()

if specification_id is not None:
    print(f'Created specification [{specification_id}]:')

    content_id = await ingest_uri(uri=uri)

    if content_id is not None:
        content = await get_content(content_id)

        if content is not None:
            display(Markdown(f'### Publishing Content [{content.id}]: {content.name}...'))

            published_content = await publish_content(content_id, specification_id, prompt)

            if published_content is not None:
                # Need to reload content to get presigned URL to MP3
                published_content = await get_content(published_content.id)

                if published_content is not None:
                    display(Markdown(f'### Published [{published_content.name}]({published_content.audio_uri})'))

                    display(HTML(f"""
                    <audio controls>
                    <source src="{published_content.audio_uri}" type="audio/mp3">
                    Your browser does not support the audio element.
                    </audio>
                    """))

                    display(Markdown('### Transcript'))
                    display(Markdown(published_content.markdown))


Deleted all contents.
Created specification [728d008c-0e93-40a1-8411-eae2b23070c3]:


### Publishing Content [daae0fa9-1131-4983-bcb6-379b2c071e47]: Attention Is All You Need.1706.03762.pdf...

### Published [Published Summary.mp3](https://graphlit202409019591444c.blob.core.windows.net/files/e68a12d8-edc1-4ba1-a91c-1e93e2ca8181/Mezzanine/Published%20Summary.mp3?sv=2024-08-04&se=2024-09-13T07%3A14%3A34Z&sr=c&sp=rl&sig=xoTBEDV58x7GXhOdntF9jvRK5507ZuT45VGwqminqPw%3D)

### Transcript

[00:00:00] Hello, everyone, and welcome to our review of the groundbreaking

[00:00:04] paper, attention is all you need.

[00:00:08] I'm thrilled to discuss this work with you today as it has truly revolutionized the field of natural language processing

[00:00:16] and machine translation.

[00:00:18] Let's dive into some of the most fascinating aspects of this research.

[00:00:22] To begin, I'd like to highlight the novel architecture introduced in this paper, the transformer.

[00:00:28] What sets the transformer apart is its complete reliance on attention mechanisms,

[00:00:34] eschewing the traditional recurrent and convolutional

[00:00:37] approaches

[00:00:38] we've seen in previous models.

[00:00:40] This innovative design allows for unprecedented

[00:00:44] parallelization,

[00:00:46] significantly

[00:00:47] reducing training time while achieving state of the art results.

[00:00:51] One of the most impressive outcomes of this research

[00:00:54] is the transformers performance on machine translation tasks.

[00:00:58] On the WMT 2014 English to German translation benchmark,

[00:01:03] it achieved a BLEU score of 28.4,

[00:01:06] surpassing previous best results by over 2 BLEU points.

[00:01:10] Even more remarkably,

[00:01:12] on the English to French translation task,

[00:01:15] it reached a blue score of 41.8,

[00:01:18] setting a new single model state of the art.

[00:01:21] These results are not just incremental improvements.

[00:01:24] They represent a significant leap forward in translation quality. The key to the transformer's success

[00:01:30] lies in its innovative use of attention mechanisms,

[00:01:33] particularly the introduction

[00:01:35] of multi head attention.

[00:01:37] This clever approach allows the model to simultaneously

[00:01:40] attend to information

[00:01:42] from different representation

[00:01:43] subspaces

[00:01:45] at various positions.

[00:01:47] In essence, it's as if the model can focus on multiple aspects of the input at once,

[00:01:52] much like how we humans can process various elements of language simultaneously.

[00:01:58] This multi head attention

[00:02:00] is implemented through parallel attention layers or heads with the base model using 8 such heads.

[00:02:07] Another fascinating aspect of the transformer

[00:02:10] is how it handles sequence order

[00:02:12] without relying on recurrence

[00:02:15] or convolution.

[00:02:17] The authors introduced

[00:02:18] positional encodings,

[00:02:20] which are added to the input embeddings.

[00:02:23] Interestingly,

[00:02:24] they opted for sinusoidal

[00:02:26] functions for these encodings

[00:02:28] rather than learned positional embeddings.

[00:02:31] This choice allows the model to potentially extrapolate to sequence lengths

[00:02:36] longer than those seen during training,

[00:02:39] a valuable property for handling diverse inputs.

[00:02:42] The paper also sheds light on the scalability

[00:02:45] of the transformer architecture.

[00:02:48] The authors present results for both a base model with 65,000,000 parameters

[00:02:53] and a larger model with

[00:02:55] 213,000,000 parameters.

[00:02:58] Consistently,

[00:02:59] the larger model demonstrated improved performance,

[00:03:02] suggesting that the architecture can effectively

[00:03:04] leverage increased model capacity.

[00:03:08] This scalability

[00:03:09] is crucial

[00:03:10] for pushing the boundaries of what's possible in natural language processing tasks.

[00:03:16] One of the most intriguing aspects of the transformer

[00:03:19] is its generalization

[00:03:21] capabilities.

[00:03:23] Beyond machine translation,

[00:03:25] the authors demonstrated its effectiveness on English constituency passing without significant task specific modifications.

[00:03:33] This versatility

[00:03:35] suggests that the transformer could serve as a general purpose sequence modeling architecture,

[00:03:41] potentially

[00:03:42] revolutionizing

[00:03:42] a wide range of NLP tasks.

[00:03:45] The visualizations

[00:03:47] of the transformer's attention mechanisms

[00:03:49] provide fascinating insights into how the model operates.

[00:03:53] Some attention heads appear to specialize in specific

[00:03:57] linguistic

[00:03:58] tasks,

[00:03:58] such as anaphora resolution.

[00:04:01] This specialization emerges naturally during training without explicit programming,

[00:04:06] showcasing the model's ability to learn complex language patterns

[00:04:10] autonomously.

[00:04:12] From a practical standpoint, the transformer's efficiency is noteworthy.

[00:04:16] The base model was trained in just 12 hours using 8 NVIDIA per 100 GPUs,

[00:04:22] while the larger model took 3, 5 days on the same hardware.

[00:04:26] This relatively quick training time, combined with the model's strong performance,

[00:04:31] makes it an attractive option for both research and industry applications.

[00:04:36] Looking to the future,

[00:04:37] the authors suggest several exciting research directions

[00:04:41] stemming from their work.

[00:04:43] They propose extending the transformer to other modalities,

[00:04:47] such as images, audio, and video.

[00:04:50] They also mention investigating local restricted attention mechanisms

[00:04:55] to handle extremely large inputs and outputs efficiently.

[00:05:00] Additionally, they aim to make sequence generation less sequential,

[00:05:04] which could lead to even faster processing times.

[00:05:08] In conclusion,

[00:05:09] attention is all you need is a landmark paper that has significantly advanced the field of sequence transduction

[00:05:16] and natural language processing.

[00:05:19] The transformer architecture it introduces

[00:05:22] has not only achieved state of the art results,

[00:05:25] but has also opened up new avenues for research and application.

[00:05:30] As we continue to explore and build upon this work, I'm excited to see how it will shape the future of AI and language understanding.

[00:05:38] Thank you for joining me in this review,

[00:05:41] and I look forward to discussing any questions or thoughts you may have.

