<a href="https://colab.research.google.com/github/graphlit/graphlit-samples/blob/main/python/Notebook%20Examples/Graphlit_2024_09_12_Publish_Audio_Review_of_Paper.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

**Description**

This example shows how to ingest a PDF of an academic paper, use Sonnet 3.5 to write a comprehensive review of the paper, and listen to an audio rendition published using an [ElevenLabs](https://elevenlabs.io/) voice.

**Requirements**

Prior to running this notebook, you will need to [signup](https://docs.graphlit.dev/getting-started/signup) for Graphlit, and [create a project](https://docs.graphlit.dev/getting-started/create-project).

You will need the Graphlit organization ID, preview environment ID and JWT secret from your created project.

Assign these properties as Colab secrets: GRAPHLIT_ORGANIZATION_ID, GRAPHLIT_ENVIRONMENT_ID and GRAPHLIT_JWT_SECRET.


---

Install Graphlit Python client SDK

In [1]:
!pip install --upgrade graphlit-client

Collecting graphlit-client
  Downloading graphlit_client-1.0.20240910001-py3-none-any.whl.metadata (2.7 kB)
Collecting httpx (from graphlit-client)
  Downloading httpx-0.27.2-py3-none-any.whl.metadata (7.1 kB)
Collecting websockets (from graphlit-client)
  Downloading websockets-13.0.1-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.metadata (6.7 kB)
Collecting httpcore==1.* (from httpx->graphlit-client)
  Downloading httpcore-1.0.5-py3-none-any.whl.metadata (20 kB)
Collecting h11<0.15,>=0.13 (from httpcore==1.*->httpx->graphlit-client)
  Downloading h11-0.14.0-py3-none-any.whl.metadata (8.2 kB)
Downloading graphlit_client-1.0.20240910001-py3-none-any.whl (197 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m197.8/197.8 kB[0m [31m4.3 MB/s[0m eta [36m0:00:00[0m
[?25hDownloading httpx-0.27.2-py3-none-any.whl (76 kB)
[2K   [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m76.4/76.4 kB[0m [31m1.9 MB/s[0m eta 

In [2]:
import os
from google.colab import userdata
from graphlit import Graphlit
from graphlit_api import input_types, enums, exceptions

os.environ['GRAPHLIT_ORGANIZATION_ID'] = userdata.get('GRAPHLIT_ORGANIZATION_ID')
os.environ['GRAPHLIT_ENVIRONMENT_ID'] = userdata.get('GRAPHLIT_ENVIRONMENT_ID')
os.environ['GRAPHLIT_JWT_SECRET'] = userdata.get('GRAPHLIT_JWT_SECRET')

graphlit = Graphlit()

Define Graphlit helper functions

In [6]:
from typing import List, Optional

# Create specification for Anthropic Sonnet 3.5
async def create_specification():
    if graphlit.client is None:
        return;

    input = input_types.SpecificationInput(
        name="Anthropic Claude Sonnet 3.5",
        type=enums.SpecificationTypes.EXTRACTION,
        serviceType=enums.ModelServiceTypes.ANTHROPIC,
        anthropic=input_types.AnthropicModelPropertiesInput(
            model=enums.AnthropicModels.CLAUDE_3_5_SONNET,
        ),
        # NOTE: Optionally, ask LLM to revise it's response, which guarantees a full length and more detailed response
        revisionStrategy=input_types.RevisionStrategyInput(
            type=enums.RevisionStrategyTypes.CUSTOM,
            customRevision="OK, that's not bad, but it needs more technical depth for this audience. You can do better than this. Reread all the context provided, and revise this into a longer, more thorough and compelling version. Don't mention anything about the revision.",
            count=1
        )
    )

    try:
        response = await graphlit.client.create_specification(input)

        return response.create_specification.id if response.create_specification is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

    return None

async def ingest_uri(uri: str):
    if graphlit.client is None:
        return;

    try:
        # Using synchronous mode, so the notebook waits for the content to be ingested
        response = await graphlit.client.ingest_uri(uri=uri, is_synchronous=True)

        return response.ingest_uri.id if response.ingest_uri is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def get_content(content_id: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.get_content(content_id)

        return response.content
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def publish_content(content_id: str, specification_id: str, prompt: str):
    if graphlit.client is None:
        return;

    try:
        response = await graphlit.client.publish_contents(
            name="Published Summary",
            connector=input_types.ContentPublishingConnectorInput(
               type=enums.ContentPublishingServiceTypes.ELEVEN_LABS_AUDIO,
               format=enums.ContentPublishingFormats.MP3,
               elevenLabs=input_types.ElevenLabsPublishingPropertiesInput(
                   model=enums.ElevenLabsModels.TURBO_V2_5,
                   voice="ZF6FPAbjXT4488VcRRnw" # ElevenLabs Amelia voice
               )
            ),
            summary_specification=input_types.EntityReferenceInput(
                id=specification_id
            ),
            publish_prompt = prompt,
            publish_specification=input_types.EntityReferenceInput(
                id=specification_id
            ),
            filter=input_types.ContentFilter(
                id=content_id
            ),
            is_synchronous=True
        )

        return response.publish_contents if response.publish_contents is not None else None
    except exceptions.GraphQLClientError as e:
        print(str(e))
        return None

async def delete_all_contents():
    if graphlit.client is None:
        return;

    _ = await graphlit.client.delete_all_contents(is_synchronous=True)

In [7]:
from IPython.display import display, Markdown, HTML
import time

# Remove any existing contents; only needed for notebook example
await delete_all_contents()

print('Deleted all contents.')

uri = "https://graphlitplatform.blob.core.windows.net/samples/Attention%20Is%20All%20You%20Need.1706.03762.pdf"
title = "Attention Is All You Need"
prompt = f"""
Speak as if you are a Ph.D. candidate who is reviewing a paper, and talking to your peers.

Follow these steps.

Step 1: Think about a structure for 10 minute long, engaging AI-generated paper review, with an welcome and introduction, an in-depth discussion of 4-6 interesting topics from the paper, and a wrap-up.
Step 2: For each topic, write 4-6 detailed paragraphs discussing it in-depth. Touch on key points for each topic which would be interesting to listeners. Mention the content metadata, entities and details from the provided summaries, as appropriate in the discussion. Remove any topic or section headings. Remove any references to podcast background music.  Remove any timestamps.
Step 3: Combine all topics into a lengthy, single-person script which can be used to record this audio review. Use friendly and compelling conversation to write the scripts.  You can be witty, but don't be cheesy.
Step 4: Remove any unnecessary formatting or final notes about being AI generated.

Refer to the content as the '{title}' paper.

Be specific when referencing persons, organizations, or any other named entities.
"""

specification_id = await create_specification()

if specification_id is not None:
    print(f'Created specification [{specification_id}]:')

    content_id = await ingest_uri(uri=uri)

    if content_id is not None:
        content = await get_content(content_id)

        if content is not None:
            display(Markdown(f'### Publishing Content [{content.id}]: {content.name}...'))

            published_content = await publish_content(content_id, specification_id, prompt)

            if published_content is not None:
                # Need to reload content to get presigned URL to MP3
                published_content = await get_content(published_content.id)

                if published_content is not None:
                    display(Markdown(f'### Published [{published_content.name}]({published_content.audio_uri})'))

                    display(HTML(f"""
                    <audio controls>
                    <source src="{published_content.audio_uri}" type="audio/mp3">
                    Your browser does not support the audio element.
                    </audio>
                    """))

                    display(Markdown('### Transcript'))
                    display(Markdown(published_content.markdown))


Deleted all contents.
Created specification [72075ed5-7b82-4aa1-9d37-aee9bc2f1ed4]:


### Publishing Content [b755253b-7e41-40f7-a3cf-f45cfbc64844]: Attention Is All You Need.1706.03762.pdf...

### Published [Published Summary.mp3](https://graphlit202409019591444c.blob.core.windows.net/files/aa3b1c6e-fca9-4326-b6cd-134d327fc196/Mezzanine/Published%20Summary.mp3?sv=2024-08-04&se=2024-09-11T11%3A23%3A37Z&sr=c&sp=rl&sig=P4B7BRtpeRVs0XXm1WzevhJO4UHN4rc3nM0uWNi9zBg%3D)

### Transcript

[00:00:00] Greetings, colleagues.

[00:00:01] Today, we're delving into the seminal paper, attention is all you need,

[00:00:07] which has fundamentally

[00:00:09] reshaped our understanding of sequence transduction models in natural language processing.

[00:00:15] As we explore this groundbreaking work, I'll be highlighting its technical innovations,

[00:00:21] architectural

[00:00:22] nuances,

[00:00:23] and far reaching implications for our field.

[00:00:26] Let's begin with the core innovation,

[00:00:29] the transformer architecture.

[00:00:32] This model represents a paradigm shift in sequence transduction,

[00:00:36] relying solely on attention mechanisms

[00:00:38] and completely eschewing recurrence and convolutions.

[00:00:42] The implications of this approach are profound,

[00:00:45] both in terms of performance and computational efficiency.

[00:00:49] The transformers architecture is elegantly simple

[00:00:52] yet remarkably powerful.

[00:00:55] It consists of stacked self attention

[00:00:58] and point wise fully connected layers for both the encoder and decoder.

[00:01:04] The base model comprises 6 identical layers in each of these components.

[00:01:10] Each layer contains 2 sublayers,

[00:01:13] a multi head self attention mechanism,

[00:01:16] and a position wise feed forward network.

[00:01:19] The authors employ residual connections around each sublayer

[00:01:22] followed by layer normalization,

[00:01:25] a design choice that facilitates

[00:01:27] gradient flow through the network.

[00:01:30] Now let's dissect the multi head attention mechanism,

[00:01:34] which is arguably the paper's most significant contribution.

[00:01:38] This mechanism allows the model to jointly attend to information from different representation subspaces

[00:01:44] at different positions.

[00:01:46] In practice,

[00:01:47] this means the model can capture various aspects of the input sequence simultaneously,

[00:01:53] be it syntactic, semantic,

[00:01:56] or other linguistic features.

[00:01:58] The multi head attention

[00:02:00] operates by first projecting the queries,

[00:02:03] keys, and values each times with different learned linear projections.

[00:02:08] These projections are then fed into h parallel attention layers or heads.

[00:02:14] The outputs of these heads are concatenated

[00:02:16] and once again projected,

[00:02:18] resulting in the final output.

[00:02:21] This approach allows the model to capture different types of relationships

[00:02:25] within the same attention mechanism.

[00:02:28] The authors use scaled dot product attention,

[00:02:32] defined as

[00:02:33] attention qkv.

[00:02:36] X softmax keys,

[00:02:39] v,

[00:02:42] where q, k, and v are the queries, keys, and values respectively, and d k is the dimension of the keys. The scaling factor of decay

[00:02:52] is crucial here

[00:02:53] as it counteracts the effect of large magnitude dot products pushing the softmax function into regions with extremely small gradients.

[00:03:01] One of the transformers key strengths is its ability to handle long range dependencies in sequences.

[00:03:08] Unlike RNNs, which process sequences step by step, the self attention mechanism creates direct connections between all positions in a sequence.

[00:03:17] This results in a constant

[00:03:19] o one

[00:03:20] maximum path length between any two positions

[00:03:23] regardless

[00:03:24] of sequence length.

[00:03:26] Comparatively,

[00:03:28] RNNs have a path length of o n.

[00:03:31] And even advanced convolutional

[00:03:33] architectures

[00:03:34] like bitet have

[00:03:37] o

[00:03:38] o log n path lengths.

[00:03:41] This constant path length is a significant factor

[00:03:44] in the transformer's superior performance on tasks requiring long range understanding.

[00:03:50] An intriguing aspect of the transformer

[00:03:52] is its approach to encoding positional information.

[00:03:56] Since the model contains no recurrence or convolution,

[00:04:00] it needs a way to inject sequence order information.

[00:04:03] The author's solution is to use sinusoidal

[00:04:06] positional encodings

[00:04:08] defined by sine and cosine functions of different frequencies.

[00:04:12] This approach allows the model to extrapolate to sequence lengths longer than those encountered during training,

[00:04:18] a valuable property for generalization.

[00:04:21] Let's talk about the model's computational characteristics.

[00:04:25] The self attention layer has a complexity of o l two wasp

[00:04:30] d,

[00:04:32] where n is the sequence length and d is the representation

[00:04:36] dimension.

[00:04:38] While this quadratic dependency on sequence length could potentially be problematic

[00:04:43] for very long sequences,

[00:04:45] for the lengths typically encountered in translation tasks,

[00:04:49] n 100,

[00:04:50] This is actually more efficient than the o n touch 2 complexity

[00:04:55] of recurrent layers typically used in sequence transduction models.

[00:04:59] The transformer's performance scales impressively with model size and computational resources.

[00:05:05] The base model with 65,000,000

[00:05:08] parameters achieves state of the art results,

[00:05:11] but the larger model, boasting 213,000,000

[00:05:14] parameters,

[00:05:15] significantly

[00:05:16] outperforms it.

[00:05:17] On the WMT

[00:05:19] 2 1,014

[00:05:20] English to German translation task,

[00:05:22] the big model achieves a BLEU score of 28.4,

[00:05:26] improving over the previous best results

[00:05:29] by more than 2 to BLEU points.

[00:05:32] For English to French, it reaches an impressive 41.8,

[00:05:36] setting

[00:05:37] a new single model state of the art.

[00:05:39] These models were trained on 8 NVIDIA p a 100 GPUs,

[00:05:43] with the base model taking about 12 hours and the big model 3.5 days.

[00:05:49] The authors used the Adam optimizer with a custom learning rate schedule,

[00:05:53] including a warm up period.

[00:05:56] They also employed regularization techniques such as residual dropout with a rate of 0.1

[00:06:02] and label smoothing,

[00:06:04] One of the most fascinating aspects of this work is the model's generalizability.

[00:06:09] The authors demonstrated this by adapting the transformer for English constituency

[00:06:14] passing

[00:06:14] with minimal changes.

[00:06:16] Using a 4 layer model with d model or 10 to 4, trained on approximately

[00:06:22] 40 k WSJ sentences,

[00:06:24] they achieved an f one score of 92.7

[00:06:26] in semi supervised passing,

[00:06:29] outperforming previous state of the art models.

[00:06:32] This success suggests that the transformer is capturing fundamental aspects of sequential data that are applicable across a wide range of tasks.

[00:06:42] The paper

[00:06:43] also provides valuable insights into the model's inner workings through attention visualizations.

[00:06:50] These reveal that different attention heads

[00:06:53] learn to perform different tasks with

[00:06:55] some specializing in syntactic relationships and others in semantic relationships.

[00:07:00] For instance,

[00:07:01] some heads appear to focus on the relationship between verbs and their direct objects,

[00:07:06] while others capture coreference relationships.

[00:07:10] As we consider the implications of this work,

[00:07:13] it's clear that the transformer architecture opens up new avenues for research in attention based models

[00:07:19] and nonrecurrent

[00:07:21] sequence modeling.

[00:07:22] It challenges the long held assumption that recurrence or convolution is necessary for effective sequence modeling,

[00:07:29] potentially

[00:07:30] leading to a paradigm shift in how we approach sequential data problems.

[00:07:36] Looking forward, the authors suggest several promising directions for future work.

[00:07:41] They plan to extend the model to other modalities,

[00:07:44] like images and audio,

[00:07:46] which could lead to exciting developments in multimodal learning.

[00:07:50] They also aim to investigate local restricted attention mechanisms

[00:07:54] to handle very large inputs and outputs more efficiently,

[00:07:58] addressing the quadratic complexity issue

[00:08:01] of the current self attention mechanism.

[00:08:04] In conclusion,

[00:08:06] attention is all you need

[00:08:08] is a landmark paper that has reshaped the landscape of natural language processing.

[00:08:14] Its novel architecture,

[00:08:15] impressive performance,

[00:08:17] and broad applicability

[00:08:19] make it a cornerstone of modern NLP research.

[00:08:23] As we continue to explore the possibilities

[00:08:25] of attention based models,

[00:08:27] we can expect to see its influence extend beyond NLP to other domains,

[00:08:32] like computer vision and reinforcement learning.

[00:08:35] The transformer success has already spawned a new generation of models, including BERT,

[00:08:40] GPT, and their successors,

[00:08:43] which have pushed the boundaries of what's possible in NLP.

[00:08:46] As we stand on the shoulders of this giant, we're poised to make even more exciting discoveries

[00:08:52] in the realm of artificial intelligence

[00:08:54] and machine

[00:08:56] learning.

[00:08:57] Thank you for joining me in this deep dive into the transformer architecture.

[00:09:03] The future of NLP is bright, and papers like this are lighting the way forward.

[00:09:08] I look forward to seeing how our community builds upon this groundbreaking work in the years to come.

