# Building an always-up-to-date searchable index of video text transcripts

In this demo, we will build a video to searchable transcript pipeline using pixeltable primitives and openAI whisper. 
Along the way, we demonstrate how we can use pixeltable to:
1) Ingest video files
2) Extract a corresponding audio
3) Transcribe audio to text through a call to openAI whisper
4) Build a semantic index on the text at a sentence granularity, based on sentence_transformers models.
5) Search this index.

Along the way, we highlight the following pixeltable features:

1) We expresss the pipeline as simple operations on tables. Important intermediates, ie. data flowing between pipeline steps,  are easy to inspect for both existing and any new data.
2) Pixeltable helps us preserve views of the same data at different granularities, depending on what is meaningful for the given operations. We can view transcripts at the video level, useful for calls to openai,  but also split them into smaller sentences for more meaningful search resutls, and this relationship is preserved automatically as new data is added.
3) The searchable database is kept up to date when new videos are added, making new videos searchable within an instant, at no extra development or operational effort.

###  We will first download a few exampole videos from youtube.

In [1]:
%pip install git+https://github.com/ytdl-org/youtube-dl

Collecting git+https://github.com/ytdl-org/youtube-dl
  Cloning https://github.com/ytdl-org/youtube-dl to /private/var/folders/8v/d886z5j13dsctyjpw29t7y480000gn/T/pip-req-build-ljze0xi6
  Running command git clone --filter=blob:none --quiet https://github.com/ytdl-org/youtube-dl /private/var/folders/8v/d886z5j13dsctyjpw29t7y480000gn/T/pip-req-build-ljze0xi6


  Resolved https://github.com/ytdl-org/youtube-dl to commit a08f2b7e4567cdc50c0614ee0a4ffdff49b8b6e6


  Preparing metadata (setup.py) ... [?25l-

 done


[?25h

Note: you may need to restart the kernel to use updated packages.


In [2]:
%%bash
mkdir -p sample_videos
cd sample_videos
youtube-dl 'https://www.youtube.com/watch?v=YwWtDSponlc&ab_channel=CNBCTelevision'
youtube-dl 'https://www.youtube.com/watch?v=L9Tyb_ycRfU&ab_channel=CNBCTelevision'
youtube-dl 'https://www.youtube.com/watch?v=0wJqgHSfYi0&ab_channel=CNBCTelevision'

/Users/orm/mambaforge/envs/pixeltable_39/bin/python


/Users/orm/mambaforge/envs/pixeltable_39/bin/youtube-dl


/Users/orm/mambaforge/envs/pixeltable_39/bin/youtube-dl


[youtube] YwWtDSponlc: Downloading webpage


[download] Right now you want to be invested in companies that don't cater to the consumer, says Jim Cramer-YwWtDSponlc.mp4 has already been downloaded and merged


[youtube] L9Tyb_ycRfU: Downloading webpage


[download] Jim Cramer looks at how the Fed minutes spooked the markets today-L9Tyb_ycRfU.mp4 has already been downloaded and merged


[youtube] 0wJqgHSfYi0: Downloading webpage


[download] Snowflake CEO joins Jim Cramer after earnings report drives stock higher-0wJqgHSfYi0.mp4 has already been downloaded and merged


In [3]:
import pathlib

import pixeltable as pxt
from pixeltable.functions.video import get_metadata, extract_audio
from pixeltable.functions import openai
from pixeltable.ext.functions import whisperx
from embeddings import TextSplitter, e5_embed

  from .autonotebook import tqdm as notebook_tqdm
  torchaudio.set_audio_backend("soundfile")


In [4]:
pxt.create_dir('transcription_demo', ignore_errors=True)

Connected to Pixeltable database at: postgresql://postgres:@/pixeltable?host=/Users/orm/.pixeltable/pgdata


In [5]:
pxt.drop_table('transcription_demo.sentence_view', ignore_errors=True)
pxt.drop_table('transcription_demo.video_table', ignore_errors=True)
video_table = pxt.create_table('transcription_demo.video_table', {'video': pxt.VideoType()},)

Created table `video_table`.


In [6]:
paths = [str(pathlib.Path(p).absolute()) for p in pathlib.Path('./sample_videos/').iterdir()]
video_table.insert([{'video': video_path} for video_path in paths[:1] ])

Inserting rows into `video_table`: 1 rows [00:00, 470.58 rows/s]
Inserted 1 row with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=0, num_excs=0, updated_cols=[], cols_with_excs=[])

In [7]:
video_table.add_column(audio=extract_audio(video_table.video, format='mp3'))

Computing cells: 100%|████████████████████████████████████████████| 1/1 [00:03<00:00,  3.84s/ cells]
Added 1 column value with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

In [8]:
video_table.show()

video,audio
,


In [9]:
video_table.add_column(audio_meta=get_metadata(video_table.audio))

Computing cells: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 229.20 cells/s]
Added 1 column value with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

In [10]:
video_table.show()

video,audio,audio_meta
,,"{'size': 8266796, 'streams': [{'type': 'audio', 'frames': 0, 'duration': 7290936576, 'metadata': {'encoder': 'Lavf'}, 'time_base': '1/14112000', 'codec_context': {'name': 'mp3float', 'profile': None, 'channels': 2, 'codec_tag': '\\x00\\x00\\x00\\x00'}, 'duration_seconds': 516.648}], 'bit_rate': 128006, 'metadata': {'encoder': 'Lavf60.3.100'}, 'bit_exact': False}"


In [11]:
video_table.add_column(transcription_whisperx=whisperx.transcribe(video_table.audio, model='large-v2'))

Computing cells:   0%|                                                    | 0/1 [00:00<?, ? cells/s]

Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../../.cache/torch/whisperx-vad-segmentation.bin`


No language specified, language will be first be detected for each audio file (increases inference time).
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.2. Bad things might happen unless you revert torch to 1.x.
Detected language: en (1.00) in first 30s of audio...
Computing cells: 100%|███████████████████████████████████████████| 1/1 [04:16<00:00, 256.36s/ cells]
Added 1 column value with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

In [16]:
video_table.drop_column('transcription_whisperx_small')

In [17]:
video_table.add_column(transcription_whisperx_small=whisperx.transcribe(video_table.audio, model='small'))

Computing cells:   0%|                                                    | 0/1 [00:00<?, ? cells/s]

config.json: 100%|██████████| 2.37k/2.37k [00:00<00:00, 983kB/s]

vocabulary.txt: 100%|██████████| 460k/460k [00:00<00:00, 4.61MB/s]

tokenizer.json: 100%|██████████| 2.20M/2.20M [00:00<00:00, 8.24MB/s]

[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
[A
model.bin: 100%|██████████| 484M/484M [00:21<00:00, 22.9MB/s]
Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.5. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../../../../../.cache/torch/whisperx-vad-segmentation.bin`


No language specified, language will be first be detected for each audio file (increases inference time).
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.2. Bad things might happen unless you revert torch to 1.x.
Detected language: en (1.00) in first 30s of audio...
Computing cells: 100%|████████████████████████████████████████████| 1/1 [01:39<00:00, 99.02s/ cells]
Added 1 column value with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

In [18]:
video_table.show()

video,audio,audio_meta,transcription_whisperx,transcription_whisperx_small
,,"{'size': 8266796, 'streams': [{'type': 'audio', 'frames': 0, 'duration': 7290936576, 'metadata': {'encoder': 'Lavf'}, 'time_base': '1/14112000', 'codec_context': {'name': 'mp3float', 'profile': None, 'channels': 2, 'codec_tag': '\\x00\\x00\\x00\\x00'}, 'duration_seconds': 516.648}], 'bit_rate': 128006, 'metadata': {'encoder': 'Lavf60.3.100'}, 'bit_exact': False}","{'language': 'en', 'segments': [{'end': 30.776, 'text': ' The Snowflake back on track after a couple of months in the wilderness. The last time we heard from this enterprise software and data analytics companies back in February they put a strong quarter with a tepid four year forecast stock plunge from two hundred thirty down to the mid 100s. Since then while many other tech names have rebounded like crazy Snowflakes only traded back up to 163 as of today's close. But tonight these guys report a tremendous quarter. So like beat expectations on every key line item for the quarter revenue product revenue operating income free cash flow.', 'start': 6.084}, {'end': 55.691, 'text': ' You name it. Take time as we gave a strong product revenue guidance for the current quarter and raise their full year product revenue forecast. They gave you a little less a lower margin number but we'll find out about that. So with the stock coming into the quarter cold these numbers were enough to send it higher and after hours. This is the beginning. Let's check in with Sridhar Ramaswamy. He is the new CEO of Stoke Bank. We interviewed him months before we had a GTC. Find out more about the quarter where it's going. Mr. Ramaswamy welcome back to Bad Money.', 'start': 30.776}, {'end': 70.93, 'text': ' Great to be chatting with you, Jim. OK, so this was a very impressive set of numbers. The one that really stood out was this 46 percent growth in what's known as remaining performance obligation. I regard that as the key indicator of the future. What's driving it?', 'start': 57.654}, {'end': 99.275, 'text': ' Jim, I think overall, there are two broad strokes to the quarter. One is that our financial performance was really, really good. Our product revenue was up 34%. Remaining performance obligations, as you talked about, was up 46%. Some very huge deals. It's really an indication of how much our customers believe in us, our free cash flow margins, but also amazing.', 'start': 73.183}, {'end': 118.387, 'text': ' The other part of Q1 is really how our product pipeline, especially in AI, has been in overdrive. Our AI products are now generally available. Over 750 customers are developing on it, sending applications to production. And I would say the era of enterprise AI is here, right here at Snowflake.', 'start': 99.275}, {'end': 136.596, 'text': ' Well let's talk about enterprise AI because you gave a number of use cases and some real some customers everybody knows. I'm going to pick one. People know because it's on their dining room table. Kraft Heinz. Why does why does Kraft Heinz need snowflake. Can you repeat the question. Why does Kraft Heinz need snowflake.', 'start': 118.387}, {'end': 163.524, 'text': ' Well, you know, Kraft Heinz is an iconic brand, but they have lots and lots of data. And so part of the magic that Snowflake brings to the table with its AI offerings is that you can analyze customer feedback data very easily using language models and figure out which questions, for example, have automated responses that you can send, which ones you should send to like an actual human.', 'start': 138.063}, {'end': 186.237, 'text': ' These are the kinds of applications that people are thinking and implementing with with Snowflake. And the beauty is we make it real easy out of the box and super efficient to get these done. Now you also made an acquisition. Some people said to me you know what I can use Snowflake but I have to observe. I have to interrogate my own data. I don't know. I mean I rent these guys. I have to bring it back.', 'start': 163.524}, {'end': 200.623, 'text': ' Tell me about what it will mean that you have a true era AI observability now that you've bought this new company that I think is going to make it so that you guys are. I don't know how much you need Amazon Web Services once you do that. I don't know. You tell me.', 'start': 186.237}, {'end': 221.596, 'text': ' Well one small clarification. We signed a definitive agreement to acquire them. The actual acquisition we expect to happen soon enough. But as people are racing to develop applications you know things like observability becomes important because let's say you change the prime.', 'start': 202.261}, {'end': 242.022, 'text': ' you still want to make sure that the application's working well, or you want to try out a new model. It's all part of our mission to make AI reliable. And change management, which observability closely ties into, is an important part of making AI reliable. That's why we acquired this great team.', 'start': 221.596}, {'end': 270.401, 'text': ' But the general theme, again, is we make end-to-end AI easy to implement, easy to maintain, dramatically lower total cost of ownership. You don't have to run GPUs if you want to use AI with Snowflake. That's the stuff our customers love. Let's talk about GPU, because you've got your June 3rd through 6th Data Cloud Summit. I remember watching a video of Jensen Wang with your predecessor, Mr. Sloobin. Mr. Sloobin was famously tough on price when it came to Jensen. What will the powwow be like this time?', 'start': 242.022}, {'end': 299.002, 'text': ' Well, I've gotten to know Jensen really well over the past few months. We are super excited by the promise of accelerated computing. Language models are just the beginning. I think it's a powerful way to scale things. We collaborate with NVIDIA on a number of fronts. Our foundation model, Arctic, was unsurprisingly done on top of NVIDIA chips. We collaborate with them on models.', 'start': 272.159}, {'end': 324.104, 'text': ' There's a lot to come, and Jensen's, of course, a visionary when it comes to AI. We're going to be talking about all of this and many other new product announcements at our user conference as well. It's going to be exciting, and I'm looking forward to seeing you there. Oh, well, let's see what we can do. I do want to ask you about the margins. You know, your revenue is going very well. The margins are a bit of a decline, something I should be worried about. You know how much we care about margins in this business.', 'start': 299.002}, {'end': 353.592, 'text': ' Margins are really, really important. Of course, I work with Mike, who is amazing at this. We are leaning ahead into investing with AI. Now, these are modest size investments, and I don't expect these numbers to dramatically go up. And what Arctic clearly showed is that you can get a lot done with a small motivated team and a small amount of compute. Arctic was done on $2 million of GPU compute.', 'start': 325.725}, {'end': 383.166, 'text': ' And of course, the products are out in GA, and we are driving it. We are taking it to market. We want customers to use it for us to make dollars. I think we are very much in the mode of driving revenue for our AI products, and definitely hope to share more of that in the coming quarters. Got it. Now, Mike, your CFO did mention at one point that growth moderated in April, but he said that was a normal component of the way that things are in your business. Why is that?', 'start': 353.592}, {'end': 412.978, 'text': ' Well, the snowflake is a consumption model, which means that we make money only when our customers consume. Now, when there are holidays, for example, people don't run certain kinds of jobs, as you know, like Easter is usually in April. So there are seasonal variations like that. But the overall trend that we are seeing in the business, just the conversations, the vibe that I have with the customers that I talk to is hugely positive.', 'start': 385.162}, {'end': 436.135, 'text': ' People are truly excited by Snowflake as their data platform for data, for collaboration, and now AI applications. And you have customer after customer take multi-million, you know, multi-year contracts with Snowflake. It points to a bright future where the code is strong and you're pressing the gas really hard on new things like AI.', 'start': 412.978}, {'end': 444.053, 'text': ' So do you still speak to Mr. Slootman? I only mention him because he's one of the few friends of the show where I just respect him greatly. So how's the communication?', 'start': 436.135}, {'end': 470.981, 'text': ' It's actually, he is incredibly kind. I talk to him every other week. We also chitchat on WhatsApp pretty often. Obviously, he's the chairman of the board, and I spent 10 quality hours with him yesterday. I kind of tapped into his wisdom for how to create a great business. And he is going to stay my friend and Snowflake's friend for the foreseeable future.', 'start': 445.759}, {'end': 492.073, 'text': ' and very much a part of it. Will you tell him we said hi and congratulations on a great quarter. That's Sridhar Ramaswamy Snowflake CEO. Thank you sir. Great to see you. Great to see you. Thank you. Everybody's back. Coming up hit us with your best shot and electrified fast fire lightning round is next.', 'start': 470.981}, {'end': 514.428, 'text': ' Don't miss a second of Mad Money. Follow at Jim Cramer on X. Have a question? Tweet Cramer. Hashtag Mad Mentions. Send Jim an email to madmoneyatcnbc.com or give us a call at 1-800-743-CNBC. Miss something? Head to madmoney.cnbc.com.', 'start': 494.531}]}","{'language': 'en', 'segments': [{'end': 30.776, 'text': ' The snowflake back on track after a couple months in the wilderness. The last time we heard from this enterprise software and data analytics companies back in February they put a strong quarter with a tepid four year forecast stock plunge from 230 down to the mid 100s. Since then while many other tech names have rebounded like crazy snowflakes only traded back up to one sixty three as of today's close. But tonight these guys report tremendous court stuff like beat expectations on every key line item for the court of revenue product revenue operating income free cash flow.', 'start': 6.084}, {'end': 55.691, 'text': ' You name it. Take Time Management gave a strong product revenue guidance for the current quarter and raised the full-year product revenue forecast. They gave you a little less lower margin number, but we'll find out about that. So with the stock coming into the quarter cold, these numbers were enough to send it higher in every hour. This is the beginning. Let's check in with Shridhar Ramaswamy. He is the new CEO of Stove Lake. We interviewed him much before. We're out of GCC. Find out more about the quarter where it's going. Mr. Ramaswamy, welcome back to Bad Bunny.', 'start': 30.776}, {'end': 70.93, 'text': ' Great to be chatting with you, Jim. OK, so this was a very impressive set of numbers. The one that really stood out was this 46% growth in what's known as remaining performance obligation. I regard that as the key indicator of the future of what's driving it.', 'start': 57.654}, {'end': 99.275, 'text': ' Jim, I think overall, there are two broad strokes to the quarter. One is that our financial performance was really, really good. Our product revenue was up 34%. The remaining performance obligations, as you talked about, was up 46%. Some very huge deals. It's really an indication of how much our customers believe in us, our free cash flow margins, but also amazing.', 'start': 73.183}, {'end': 118.387, 'text': ' The other part of Q1 is really how our product pipeline, especially in AI, has been an overdrive. Our AI products are now generally available. Over 750 customers are developing on it, sending applications to production. And I would say the ERA of Enterprise AI is here, right here at Snowflake.', 'start': 99.275}, {'end': 136.596, 'text': ' Well, let's talk about enterprise AI, because you gave a number of use cases and some customers everybody knows. I'm going to pick what people know, because it's on their dining room table. Craft Heinz. Why does Craft Heinz need snowflake? Can you repeat the question? Why does Craft Heinz need snowflake?', 'start': 118.387}, {'end': 163.524, 'text': ' Well, you know, Kraft Heinz is an iconic brand, but they have lots and lots of data. And so part of the magic that Snowflake brings to the table with its AI offerings is that you can analyze customer feedback data very easily using language models and figure out which questions, for example, have automated responses that you can send, which ones you should send to an actual human.', 'start': 138.063}, {'end': 186.237, 'text': ' These are the kinds of applications that people are thinking and implementing with snowflake. And the beauty is we make it real easy out of the box and super efficient to get these done. Now you also made an acquisition. Some people said to me you know what I can use snowflake but I have to observe. I have to interrogate my own data. I don't know. I mean I rent these guys. I have to bring it back.', 'start': 163.524}, {'end': 200.623, 'text': ' Tell me about what it will mean that you have a true era AI observability now that you've bought this new company that I think is going to make it so that you guys are. I don't know how much you need Amazon Web Services once you do that. I don't know you tell me.', 'start': 186.237}, {'end': 221.596, 'text': ' Well one small clarification. We signed a definitive agreement to acquire them. The actual acquisition we expect to happen soon enough. But as people are racing to develop applications you know things like observability becomes important because let's say you change the prime.', 'start': 202.261}, {'end': 242.022, 'text': ' You still want to make sure that the application is working well, or you want to try out a new model. It's all part of our mission to make AI reliable and change management, which observability closely ties into, is an important part of making AI reliable. That's why we acquired this great team.', 'start': 221.596}, {'end': 270.401, 'text': ' But the general theme, again, is we make end-to-end AI easy to implement, easy to maintain, dramatically lower total cost of ownership. You don't have to rent GPUs if you want to use AI with snowflake. That's the stuff our customers love. Let's talk about GPU because you've got your June 3rd through 6th data cloud summit. I remember watching a video of Jensen Wong with your predecessor, Mr. Slubin. Mr. Slubin was famously tough on price when it came to Jensen. What will the power be like this time?', 'start': 242.022}, {'end': 299.002, 'text': ' Well, I've gotten to know Jensen really well over the past few months. We are super excited by the promise of accelerated computing. Language models are just the beginning. I think it's a powerful way to scale things. We collaborate with NVIDIA on a number of fronts. Our foundation model Arctic was unsurprisingly done on top of NVIDIA chips. We collaborate with them on models.', 'start': 272.159}, {'end': 324.104, 'text': ' There's a lot to come. And Jensen's, of course, a visionary when it comes to AI. We're going to be talking about all of this and many other new product announcements at our user conference. It's going to be exciting. And I'm looking forward to seeing you. Oh, well, let's see what we can do. I do want to ask you about the margins. You know, your revenue is going very well. The margins a little a bit of decline. Something I should be worried about. You know how much we care about margins in this business.', 'start': 299.002}, {'end': 353.592, 'text': ' Yeah, margins are really, really important. You know, of course I work with Mike, who is amazing at this. We are leaning ahead into investing with AI. Now these are modest size investments, and I don't expect these numbers to like dramatically go up. And what Artic clearly showed is that you can get a lot done with a small motivated team and a small amount of compute. Artic was done on $2 million of GPU compute.', 'start': 325.725}, {'end': 383.166, 'text': ' And of course, the products are out in GA, and we are driving it. We are taking it to market. We want customers to use it for us to make dollars. I think we are very much in the mode of driving revenue for our AI products, and definitely hope to share more of that in the coming quarters. Got it. Now, Mike, your CFO did mention at one point that growth moderated in April, but he said that was a normal component of the way that things are in your business. Why is that?', 'start': 353.592}, {'end': 412.978, 'text': ' Well, the snowflake is a consumption model, which means that we make money only when our customers consume. Now, when there are holidays, for example, people don't run certain kinds of jobs, as you know, like Easter is usually in April. So there are seasonal variations like that. But the overall trend that we are seeing in the business be just, you know, the conversations, the vibe that I have with the customers that I talk to is hugely positive.', 'start': 385.162}, {'end': 436.135, 'text': ' People are truly excited by snowflake as their data platform for data for collaboration and now I applications and you have customer after customer take multi-million multi-year contracts with snowflake. It points to a bright future where the court is strong and you're pressing the gas really hard on new things like all right.', 'start': 412.978}, {'end': 444.053, 'text': ' So do you still speak to Mr. Slutman? I only met you because he's one of the few friends of the show where I just respect him greatly. So how's the communication?', 'start': 436.135}, {'end': 470.981, 'text': ' It's actually he is incredibly kind. I talk to him every other week. We also chit chat on what's that pretty often. Obviously he's the chairman of the board and spent 10 quality hours with him yesterday. I kind of tap into his wisdom for how to create a great business. And he is going to stay. You know my friend and snowflakes friend for the foreseeable future.', 'start': 445.759}, {'end': 492.073, 'text': ' and very much a part of. Will you tell him we said hi and congratulations to the great quarter that Shridhar Ramaswamy, Snowflake CEO. Thank you sir. Great to see you. Thank you. Everybody's back after the break. Coming up, hit us with your best shot. An electrified fast-fire lightning round is next.', 'start': 470.981}, {'end': 514.428, 'text': ' Don't miss a second of Mad Money. Follow at Jim Cramer on X. Have a question? Tweet Cramer. Hashtag Mad Mentions. Send Jim an email to madmoneyatcnbc.com or give us a call at 1-800-743-CNBC. Miss something? Head to madmoney.cnbc.com.', 'start': 494.531}]}"


In [12]:
video_table.show()

video,audio,audio_meta,transcription_whisperx
,,"{'size': 8266796, 'streams': [{'type': 'audio', 'frames': 0, 'duration': 7290936576, 'metadata': {'encoder': 'Lavf'}, 'time_base': '1/14112000', 'codec_context': {'name': 'mp3float', 'profile': None, 'channels': 2, 'codec_tag': '\\x00\\x00\\x00\\x00'}, 'duration_seconds': 516.648}], 'bit_rate': 128006, 'metadata': {'encoder': 'Lavf60.3.100'}, 'bit_exact': False}","{'language': 'en', 'segments': [{'end': 30.776, 'text': ' The Snowflake back on track after a couple of months in the wilderness. The last time we heard from this enterprise software and data analytics companies back in February they put a strong quarter with a tepid four year forecast stock plunge from two hundred thirty down to the mid 100s. Since then while many other tech names have rebounded like crazy Snowflakes only traded back up to 163 as of today's close. But tonight these guys report a tremendous quarter. So like beat expectations on every key line item for the quarter revenue product revenue operating income free cash flow.', 'start': 6.084}, {'end': 55.691, 'text': ' You name it. Take time as we gave a strong product revenue guidance for the current quarter and raise their full year product revenue forecast. They gave you a little less a lower margin number but we'll find out about that. So with the stock coming into the quarter cold these numbers were enough to send it higher and after hours. This is the beginning. Let's check in with Sridhar Ramaswamy. He is the new CEO of Stoke Bank. We interviewed him months before we had a GTC. Find out more about the quarter where it's going. Mr. Ramaswamy welcome back to Bad Money.', 'start': 30.776}, {'end': 70.93, 'text': ' Great to be chatting with you, Jim. OK, so this was a very impressive set of numbers. The one that really stood out was this 46 percent growth in what's known as remaining performance obligation. I regard that as the key indicator of the future. What's driving it?', 'start': 57.654}, {'end': 99.275, 'text': ' Jim, I think overall, there are two broad strokes to the quarter. One is that our financial performance was really, really good. Our product revenue was up 34%. Remaining performance obligations, as you talked about, was up 46%. Some very huge deals. It's really an indication of how much our customers believe in us, our free cash flow margins, but also amazing.', 'start': 73.183}, {'end': 118.387, 'text': ' The other part of Q1 is really how our product pipeline, especially in AI, has been in overdrive. Our AI products are now generally available. Over 750 customers are developing on it, sending applications to production. And I would say the era of enterprise AI is here, right here at Snowflake.', 'start': 99.275}, {'end': 136.596, 'text': ' Well let's talk about enterprise AI because you gave a number of use cases and some real some customers everybody knows. I'm going to pick one. People know because it's on their dining room table. Kraft Heinz. Why does why does Kraft Heinz need snowflake. Can you repeat the question. Why does Kraft Heinz need snowflake.', 'start': 118.387}, {'end': 163.524, 'text': ' Well, you know, Kraft Heinz is an iconic brand, but they have lots and lots of data. And so part of the magic that Snowflake brings to the table with its AI offerings is that you can analyze customer feedback data very easily using language models and figure out which questions, for example, have automated responses that you can send, which ones you should send to like an actual human.', 'start': 138.063}, {'end': 186.237, 'text': ' These are the kinds of applications that people are thinking and implementing with with Snowflake. And the beauty is we make it real easy out of the box and super efficient to get these done. Now you also made an acquisition. Some people said to me you know what I can use Snowflake but I have to observe. I have to interrogate my own data. I don't know. I mean I rent these guys. I have to bring it back.', 'start': 163.524}, {'end': 200.623, 'text': ' Tell me about what it will mean that you have a true era AI observability now that you've bought this new company that I think is going to make it so that you guys are. I don't know how much you need Amazon Web Services once you do that. I don't know. You tell me.', 'start': 186.237}, {'end': 221.596, 'text': ' Well one small clarification. We signed a definitive agreement to acquire them. The actual acquisition we expect to happen soon enough. But as people are racing to develop applications you know things like observability becomes important because let's say you change the prime.', 'start': 202.261}, {'end': 242.022, 'text': ' you still want to make sure that the application's working well, or you want to try out a new model. It's all part of our mission to make AI reliable. And change management, which observability closely ties into, is an important part of making AI reliable. That's why we acquired this great team.', 'start': 221.596}, {'end': 270.401, 'text': ' But the general theme, again, is we make end-to-end AI easy to implement, easy to maintain, dramatically lower total cost of ownership. You don't have to run GPUs if you want to use AI with Snowflake. That's the stuff our customers love. Let's talk about GPU, because you've got your June 3rd through 6th Data Cloud Summit. I remember watching a video of Jensen Wang with your predecessor, Mr. Sloobin. Mr. Sloobin was famously tough on price when it came to Jensen. What will the powwow be like this time?', 'start': 242.022}, {'end': 299.002, 'text': ' Well, I've gotten to know Jensen really well over the past few months. We are super excited by the promise of accelerated computing. Language models are just the beginning. I think it's a powerful way to scale things. We collaborate with NVIDIA on a number of fronts. Our foundation model, Arctic, was unsurprisingly done on top of NVIDIA chips. We collaborate with them on models.', 'start': 272.159}, {'end': 324.104, 'text': ' There's a lot to come, and Jensen's, of course, a visionary when it comes to AI. We're going to be talking about all of this and many other new product announcements at our user conference as well. It's going to be exciting, and I'm looking forward to seeing you there. Oh, well, let's see what we can do. I do want to ask you about the margins. You know, your revenue is going very well. The margins are a bit of a decline, something I should be worried about. You know how much we care about margins in this business.', 'start': 299.002}, {'end': 353.592, 'text': ' Margins are really, really important. Of course, I work with Mike, who is amazing at this. We are leaning ahead into investing with AI. Now, these are modest size investments, and I don't expect these numbers to dramatically go up. And what Arctic clearly showed is that you can get a lot done with a small motivated team and a small amount of compute. Arctic was done on $2 million of GPU compute.', 'start': 325.725}, {'end': 383.166, 'text': ' And of course, the products are out in GA, and we are driving it. We are taking it to market. We want customers to use it for us to make dollars. I think we are very much in the mode of driving revenue for our AI products, and definitely hope to share more of that in the coming quarters. Got it. Now, Mike, your CFO did mention at one point that growth moderated in April, but he said that was a normal component of the way that things are in your business. Why is that?', 'start': 353.592}, {'end': 412.978, 'text': ' Well, the snowflake is a consumption model, which means that we make money only when our customers consume. Now, when there are holidays, for example, people don't run certain kinds of jobs, as you know, like Easter is usually in April. So there are seasonal variations like that. But the overall trend that we are seeing in the business, just the conversations, the vibe that I have with the customers that I talk to is hugely positive.', 'start': 385.162}, {'end': 436.135, 'text': ' People are truly excited by Snowflake as their data platform for data, for collaboration, and now AI applications. And you have customer after customer take multi-million, you know, multi-year contracts with Snowflake. It points to a bright future where the code is strong and you're pressing the gas really hard on new things like AI.', 'start': 412.978}, {'end': 444.053, 'text': ' So do you still speak to Mr. Slootman? I only mention him because he's one of the few friends of the show where I just respect him greatly. So how's the communication?', 'start': 436.135}, {'end': 470.981, 'text': ' It's actually, he is incredibly kind. I talk to him every other week. We also chitchat on WhatsApp pretty often. Obviously, he's the chairman of the board, and I spent 10 quality hours with him yesterday. I kind of tapped into his wisdom for how to create a great business. And he is going to stay my friend and Snowflake's friend for the foreseeable future.', 'start': 445.759}, {'end': 492.073, 'text': ' and very much a part of it. Will you tell him we said hi and congratulations on a great quarter. That's Sridhar Ramaswamy Snowflake CEO. Thank you sir. Great to see you. Great to see you. Thank you. Everybody's back. Coming up hit us with your best shot and electrified fast fire lightning round is next.', 'start': 470.981}, {'end': 514.428, 'text': ' Don't miss a second of Mad Money. Follow at Jim Cramer on X. Have a question? Tweet Cramer. Hashtag Mad Mentions. Send Jim an email to madmoneyatcnbc.com or give us a call at 1-800-743-CNBC. Miss something? Head to madmoney.cnbc.com.', 'start': 494.531}]}"


In [53]:
video_table.add_column(transcription=openai.transcriptions(audio=video_table.audio, model='whisper-1'))

Computing cells: 100%|████████████████████████████████████████████| 1/1 [00:26<00:00, 26.29s/ cells]
Added 1 column value with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

In [54]:
video_table.show()

video,audio,audio_meta,transcription
,"const wavesurfer = WaveSurfer.create({  container: ""#waveform_886159"",  waveColor: '#4F4A85',  progressColor: '#383351',  url: 'http://127.0.0.1:50473/Users/orm/.pixeltable/media/53ff836a5ad5418eaf9af98501068106/1c/1cfe/53ff836a5ad5418eaf9af98501068106_1_1_1cfedacae0294928b71277fd9ffcc63f.mp3',  })","{'size': 8266796, 'streams': [{'type': 'audio', 'frames': 0, 'duration': 7290936576, 'metadata': {'encoder': 'Lavf'}, 'time_base': '1/14112000', 'codec_context': {'name': 'mp3float', 'profile': None, 'channels': 2, 'codec_tag': '\\x00\\x00\\x00\\x00'}, 'duration_seconds': 516.648}], 'bit_rate': 128006, 'metadata': {'encoder': 'Lavf60.3.100'}, 'bit_exact': False}",{'text': 'The Snowflake back on track after a couple of months in the wilderness. The last time we heard from this enterprise software data analytics companies back in February they put a strong quarter with a tepid four year forecast stock plunge from two hundred thirty down to the mid 100s. Since then while many other tech names have rebounded like crazy stuff is only traded back up to 163 as of today's close. But tonight these guys report tremendous core stuff like big expectations on every key line item for the quarter revenue product revenue operating income free cash flow. You name it. Take time as we gave a strong product revenue guidance for the current quarter and raise their full year product revenue forecast. They gave you a little less a lower margin number but we'll find out about that. So with the stock coming into the quarter cold these numbers were enough to send it higher. And if you are just the beginning let's check in with Sridhar Ramaswamy. He is the new CEO of Snowflake. We interviewed him months before we had a GCC. Find out more about the quarter where it's going. Mr. Ramaswamy welcome back to Bad Bunny. Great to be chatting with you Jim. OK so here this was a very impressive set of numbers. The one that really stood out was this 46 percent growth in what's known as remaining performance obligation. I regard that as the key indicator of the future. What's driving it. Jim I think all at all. There are two broad strokes to the quarter. One is that our financial performance was really really good. Our product revenue was up 34 percent. Remaining performance obligations as you talked about was up 46 percent. Some very huge deals. It's really an indication of how much our customers believe in us. Our free cash flow margins but also amazing. The other part of Q1 is really how our product pipeline especially in A.I. has been in overdrive. Our A.I. products are now generally available. Over 750 customers are developing on it sending applications to production. And I would say the enterprise is here right here at Snowflake. Well let's talk about enterprise A.I. because you gave a number of use cases and some real some customers. Everybody knows I'm going to pick one. People know because it's on their dining room table. Kraft Heinz. Why is it. Why does Kraft Heinz need Snowflake. Can you repeat the question. Why does Kraft Heinz need Snowflake. Well you know Kraft Heinz is and is an iconic brand but they have lots and lots of data. And so part of the magic that Snowflake brings to the table with its A.I. offerings is that you can analyze customer feedback data very easily using using language models and figure out which questions for example have automated responses as you can send which ones you should send to like an actual human. These are the kinds of applications that people are thinking and implementing with with Snowflake. And the beauty is we make it real easy out of the box and super efficient to get these done. Now you also made an acquisition. Some people said to me you know what I can use Snowflake but I have to observe. I have to interrogate my own data. I don't know. I mean I rent these guys. I have to bring it back. Tell me about what it will mean that you have true era A.I. observability now that you've bought this new company that I think is going to make it so that you guys are. I don't know how much you need Amazon Web Services once you do that. I don't know. You tell me. Well one small clarification. We signed a definitive agreement to acquire them. The actual acquisition we expect to happen soon enough. But as people are racing to develop applications you know things like observability becomes important because let's say you change the product. You still want to make sure that the applications working well or you want to try out a new model. It's all part of our mission to make A.I. reliable and change management which observability closely ties into is an important part of making A.I. reliable. That's why we acquired this this great team. But the general theme again is we make end to end A.I. easy to implement. Easy to maintain. Dramatically lower total cost of ownership. You don't have to run GPUs if you want to use A.I. with Snowflake. That's the stuff our customers. Let's talk about GPU because you've got your June 3rd to 6th data cloud summit. I remember watching a video of Jason Wong with your predecessor Mr. Slootman. Mr. Slootman was famously tough on price when it came to Jensen. What will the power be like this time. Well you know I've gotten to know Jensen really well over the past few months. We are super excited by the promise of accelerated computing. Language models are just the beginning. I think it's a powerful way to scale things. We collaborate with India on a number of fronts. Our foundation model Arctic was unsurprisingly done on top of Nvidia chips. We collaborate with them on on models. There's a lot to come. And Jensen's of course a visionary when it comes to A.I. We're going to be talking about all of this and many other new product announcements at our user conference. We're going to be exciting. I'm looking forward to seeing you. Well let's see what we can do. I do want to ask you about the margins. You know your revenues going very well. The margins a little bit of decline something I should be worried about. You know how much we care about margins in this business. Margins are really really important. You know I of course I work with Mike who is amazing at this. We are leaning ahead into investing with with A.I. Now these are modest size investments and I don't expect these numbers to like dramatically go up. And what already clearly showed is that you can get a lot done with a small motivated team and a small amount of compute. Arctic was done on two million dollars off of GPU compute. And of course the products are out in G.A. and we are driving it. We are taking it to market. We want customers to use it for us to make dollars. I think we are very much in the mode of driving revenue for our A.I. A.I. products and definitely hope to share more of that in the coming. Got it. Now Mike your CFO did mention at one point that growth moderated in April. But he said that was a normal component of the way that things are in your business. Why is that. Well the snowflake is a consumption model which means that we make money only when our customers consume. Now when there are holidays for example people don't run certain kinds of jobs as you know like Easter is usually in April. So there are seasonal variations like that. But the overall trend that we are seeing in the business be just you know the conversations the vibe that I have with the customers that I talk to is hugely positive. People are truly excited by snowflake as their data platform for data for collaboration and now applications. And you have customer after customer take multimillion you know multi year contracts with snowflake. It points to a bright future where the court is strong and you're pressing the gas really hard on new things like. All right. So do you still speak to Mr. Slipman. I only mentioned it was one of the few friends of the show where I just respect him greatly. So how's the communication. It's it's actually he is incredibly kind. I talk to him every other week. So we also chitchat on WhatsApp pretty often. Obviously he's the chairman of the board and spent 10 quality hours with him yesterday. I kind of tapped into his wisdom for how to create a great business. And he is going to stay you know my friend and snowflakes friend for the foreseeable future. And very much a part. Will you tell him we said hi and congratulations on a great quarter. That's Shridhar Ramaswamy snowflake CEO. Thank you sir. Great to see you. Great to see you. Thank you. Everybody's back. Coming up hit us with your best shot. An electrified fast fire lightning round is next. Don't miss a second of mad money. Follow at Jim Cramer on X. Have a question. Tweet Kramer hashtag mad mentions. Send Jim an email to mad money at CNBC dot com or give us a call at 1 800 7 4 3 CNBC. Miss something. Head to mad money dot CNBC dot com.'}


In [55]:
video_table.add_column(transcription_text=video_table.transcription.text)

Computing cells: 100%|███████████████████████████████████████████| 1/1 [00:00<00:00, 295.25 cells/s]
Added 1 column value with 0 errors.


UpdateStatus(num_rows=1, num_computed_values=1, num_excs=0, updated_cols=[], cols_with_excs=[])

In [57]:
sentence_view = pxt.create_view('transcription_demo.sentence_view',
                                video_table,
                                iterator=TextSplitter.create(text=video_table.transcription_text))

Inserting rows into `sentence_view`: 131 rows [00:00, 12487.87 rows/s]
Created view `sentence_view` with 131 rows, 0 exceptions.


In [58]:
sentence_view.select(sentence_view.pos, sentence_view.text).where(sentence_view.pos <= 10).show()

pos,text
0,The Snowflake back on track after a couple of months in the wilderness.
1,The last time we heard from this enterprise software data analytics companies back in February they put a strong quarter with a tepid four year forecast stock plunge from two hundred thirty down to the mid 100s.
2,Since then while many other tech names have rebounded like crazy stuff is only traded back up to 163 as of today's close.
3,But tonight these guys report tremendous core stuff like big expectations on every key line item for the quarter revenue product revenue operating income free cash flow.
4,You name it.
5,Take time as we gave a strong product revenue guidance for the current quarter and raise their full year product revenue forecast.
6,They gave you a little less a lower margin number but we'll find out about that.
7,So with the stock coming into the quarter cold these numbers were enough to send it higher.
8,And if you are just the beginning let's check in with Sridhar Ramaswamy.
9,He is the new CEO of Snowflake.


In [59]:
sentence_view.add_embedding_index(col_name='text', text_embed=e5_embed)

Computing cells: 100%|████████████████████████████████████████| 131/131 [00:01<00:00, 87.36 cells/s]


In [60]:
similarity = sentence_view.text.similarity('you should buy NVIDIA')
sentence_view.select(sentence_view.text, similarity).order_by(similarity, asc=False).limit(20).collect()

text,col_1
You still want to make sure that the applications working well or you want to try out a new model.,0.817813
Our foundation model Arctic was unsurprisingly done on top of Nvidia chips.,0.814672
Follow at Jim Cramer on X. Have a question.,0.812509
Let's talk about GPU because you've got your June 3rd to 6th data cloud summit.,0.812478
Head to mad money dot CNBC dot com.,0.810483
We are super excited by the promise of accelerated computing.,0.806162
You don't have to run GPUs if you want to use A.I. with Snowflake.,0.805502
Margins are really really important.,0.804435
I have to interrogate my own data.,0.804117
I don't know how much you need Amazon Web Services once you do that.,0.799093


In [61]:
video_table.insert([{'video': video_path} for video_path in paths[2:]])

Inserting rows into `video_table`: 1 rows [00:00, 182.07 rows/s]██| 4/4 [00:44<00:00, 11.25s/ cells]
Computing cells: 100%|████████████████████████████████████████████| 4/4 [00:45<00:00, 11.25s/ cells]
Inserting rows into `sentence_view`: 240 rows [00:00, 398.74 rows/s]
Inserted 241 rows with 0 errors.


UpdateStatus(num_rows=241, num_computed_values=4, num_excs=0, updated_cols=[], cols_with_excs=[])

In [62]:
video_table.select(video_table.video, video_table.audio, video_table.audio_meta).show()

video,audio,audio_meta
,"const wavesurfer = WaveSurfer.create({  container: ""#waveform_856040"",  waveColor: '#4F4A85',  progressColor: '#383351',  url: 'http://127.0.0.1:50473/Users/orm/.pixeltable/media/53ff836a5ad5418eaf9af98501068106/1c/1cfe/53ff836a5ad5418eaf9af98501068106_1_1_1cfedacae0294928b71277fd9ffcc63f.mp3',  })","{'size': 8266796, 'streams': [{'type': 'audio', 'frames': 0, 'duration': 7290936576, 'metadata': {'encoder': 'Lavf'}, 'time_base': '1/14112000', 'codec_context': {'name': 'mp3float', 'profile': None, 'channels': 2, 'codec_tag': '\\x00\\x00\\x00\\x00'}, 'duration_seconds': 516.648}], 'bit_rate': 128006, 'metadata': {'encoder': 'Lavf60.3.100'}, 'bit_exact': False}"
,"const wavesurfer = WaveSurfer.create({  container: ""#waveform_985885"",  waveColor: '#4F4A85',  progressColor: '#383351',  url: 'http://127.0.0.1:50473/Users/orm/.pixeltable/media/53ff836a5ad5418eaf9af98501068106/e6/e6ca/53ff836a5ad5418eaf9af98501068106_1_6_e6ca606aefae40cea8fce8029da4eb72.mp3',  })","{'size': 10607276, 'streams': [{'type': 'audio', 'frames': 0, 'duration': 9355239936, 'metadata': {'encoder': 'Lavf'}, 'time_base': '1/14112000', 'codec_context': {'name': 'mp3float', 'profile': None, 'channels': 2, 'codec_tag': '\\x00\\x00\\x00\\x00'}, 'duration_seconds': 662.928}], 'bit_rate': 128005, 'metadata': {'encoder': 'Lavf60.3.100'}, 'bit_exact': False}"
