<a href="https://colab.research.google.com/github/jhuarancca/fullstack-course1-module2/blob/master/Demo_2_WhisperX.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# **WhisperX Demo** 🎙️✨


---

Jugando con WhisperX, la increíble herramienta de transcripción de voz. Si tienes un archivo de audio y quieres convertirlo en texto, sigue las instrucciones a continuación y verás lo fácil que es.

## ¿Cómo empezar? 🚀

1. **Habilita la GPU**: Para obtener una transcripción rápida, asegúrate de habilitar la GPU. Ve a "Runtime" > "Change runtime type" > y selecciona "GPU" en la opción "Hardware accelerator".
2. **Sube tu archivo de audio**: Usa la herramienta de abajo para subir tu archivo de audio.
3. **Ejecuta las celdas**: Simplemente ejecuta las celdas en orden, ¡y verás la magia suceder!

**Nota**: Si eres nuevo en Google Colab, cada celda con código se ejecuta haciendo clic en el botón de reproducción (▶️) a la izquierda de la celda, o puedes presionar `Shift + Enter`.

## Instalación del paquete 📦

Primero, necesitamos instalar WhisperX. Si ya lo tienes instalado, asegúrate de tener la versión más reciente. Puedes hacerlo ejecutando la celda de abajo.

In [5]:
!pip install git+https://github.com/m-bain/whisperx.git --upgrade

Collecting git+https://github.com/m-bain/whisperx.git
  Cloning https://github.com/m-bain/whisperx.git to /tmp/pip-req-build-_r8rrhi0
  Running command git clone --filter=blob:none --quiet https://github.com/m-bain/whisperx.git /tmp/pip-req-build-_r8rrhi0
  Resolved https://github.com/m-bain/whisperx.git to commit f2da2f858e99e4211fe4f64b5f2938b007827e17
  Preparing metadata (setup.py) ... [?25l[?25hdone


## Sube tu archivo de audio 🎵

Haz clic en el botón de abajo para subir tu archivo de audio. Asegúrate de que sea un archivo en formato MP3.

In [6]:
from google.colab import files

uploaded = files.upload()
audio_file = list(uploaded.keys())[0]

Saving Doksblog.com-Disturbed-The-Sound-Of-Silence-CYRIL-Remix-Official-Music.mp3 to Doksblog.com-Disturbed-The-Sound-Of-Silence-CYRIL-Remix-Official-Music.mp3


## Transcripción de voz 🗣️

Ahora, vamos a transcribir tu archivo de audio. Puedes ajustar la `batch_size` y el `compute_type` según tus necesidades.

In [11]:
import whisperx
import gc

device = "cuda"
batch_size = 16
compute_type = "float16"

# Transcripción
model = whisperx.load_model("large-v2", device, compute_type=compute_type)
audio = whisperx.load_audio(audio_file)
result = model.transcribe(audio, batch_size=batch_size)
print(result["segments"])

# Alineación
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"])

INFO:pytorch_lightning.utilities.migration.utils:Lightning automatically upgraded your loaded checkpoint from v1.5.4 to v2.2.1. To apply the upgrade to your files permanently, run `python -m pytorch_lightning.utilities.upgrade_checkpoint ../root/.cache/torch/whisperx-vad-segmentation.bin`


No language specified, language will be first be detected for each audio file (increases inference time).
Model was trained with pyannote.audio 0.0.1, yours is 3.1.1. Bad things might happen unless you revert pyannote.audio to 0.x.
Model was trained with torch 1.10.0+cu102, yours is 2.2.1+cu121. Bad things might happen unless you revert torch to 1.x.
Detected language: en (0.77) in first 30s of audio...


In [12]:
print(result['segments'])



In [13]:
# Alineación
model_a, metadata = whisperx.load_align_model(language_code=result["language"], device=device)
result = whisperx.align(result["segments"], model_a, metadata, audio, device, return_char_alignments=False)
print(result["segments"])

KeyError: 'language'

⬇️ Si quieres **descargar la transcripción** como un archivo de texto, ejecuta la siguiente celda.

In [None]:
# Concatenating the transcribed segments
transcription_text = "\n".join([segment['text'] for segment in result["segments"]])

# Writing to a .txt file
with open('transcription.txt', 'w') as file:
    file.write(transcription_text)

# Download the .txt file
files.download('transcription.txt')

<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

⬇️ Y si quieres **en .json la transcripción al completo** junto a sus timestamps, ejecuta la siguiente celda.

In [None]:
import json

# Saving the entire result as a JSON file
with open('transcription.json', 'w') as file:
    json.dump(result, file, indent=4)

# Download the JSON file
files.download('transcription.json')


<IPython.core.display.Javascript object>

<IPython.core.display.Javascript object>

In [None]:
import pandas as pd

⬇️ Si quieres **cargar la transcripción a un Data Frame** para luego trabajarla a comodidad para el propósito que consideres, este es el código.

In [None]:
pd.DataFrame(result['segments'])[['start', 'end', 'text']]

Unnamed: 0,start,end,text
0,0.982,15.532,"Hola, este es un resumen de la pagina online ..."
1,15.572,20.136,Permite grabar la voz a través del micrófono y...
2,20.176,21.897,Dictáfono online gratuito.
3,21.917,25.039,Nuestro dictáfono online es completamente grat...
4,25.059,27.201,No tiene pagos ocultos en absoluto.
5,27.241,30.543,No hay que pagar por activar licencias o por f...
6,31.34,40.626,Pouvez-vous changer les paramètres de votre m...
7,41.246,42.807,Nous gardons la confidéncialité.
8,42.847,46.109,Nous garantissons la sécurité de notre applica...
9,46.129,51.572,Les enregistrements ne sont pas envoyés à nos ...


In [None]:
!pip install openai



In [None]:
pip install openai==0.28



In [None]:
import os
import openai
openai.api_key = "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"

In [None]:
transcripcion = '''foreign
0:09
thanks for being here tonight guys I
0:11
hope we're going to have a good ride
0:12
together I want to talk about generative
0:15
AI gender of tech as we call it
0:18
okay and uh so let me walk through how
0:21
we see it
0:22
so it feels to me like we are beginning
0:24
what we're calling the transition
0:28
the transition is essentially where
0:30
we're moving from carbon-based life to
0:32
silicon-based life and we're all
0:34
privileged to be on this planet
0:37
when this is happening some of you uh
0:40
might be old enough to have been
0:41
privileged to be here when the sort of
0:43
connecting took place
0:45
starting in 1994.
0:48
and we've now entered the transition
0:51
because so much of what we as people
0:54
think of human is going to be
0:56
increasingly located and enabled by our
1:00
software and our silicon and perhaps
1:03
even our DNA experiments over the next
1:05
40 or 50 years it's a big deal and we're
1:07
here to see it
1:10
Steve Jobs famously said that the
1:12
computer was a bicycle for the mind
1:14
well this is a mind guys
1:17
it's not a very smart mind yet but it
1:19
can certainly take a Wharton class as
1:20
we've seen you've seen that article
1:23
where the AI took the Wharton class and
1:25
got an A or B or something
1:28
so this is a mine and it's not just one
1:30
mind it's many Minds
1:32
it's not just one bicycle helping one
1:34
human it's many Minds helping each human
1:36
it's a big transition
1:38
all right
1:39
so if you can increase someone's output
1:41
by 30 we call that progress
1:44
but if you can increase somebody's
1:46
output by 20x
1:48
that's a revolution and we've now
1:50
entered that
1:52
okay
1:53
now the other thing that's happening is
1:55
that the internet as we know it the
1:56
topology is changing
2:00
in the old days
2:02
we had our stored data bases our
2:05
relational databases in the center of
2:07
the cloud
2:08
and we would just take our computers and
2:10
look at it now we're going to be able to
2:12
generate new data
2:14
on the edges of the cloud and then send
2:16
it back in if we want or keep it on the
2:18
edges if we want
2:20
and that just changes a lot of the way
2:21
the applications are going to work going
2:23
forward so what that means for you as
2:24
Founders is that there's now an opening
2:28
there's a wedge
2:29
that you can now jump in and start doing
2:32
things with different topologies that
2:33
were never done before
2:35
and you're going to be looking at 20x
2:37
productivity improvements to juice
2:40
the interests of your customers or users
2:43
right we haven't had something like this
2:45
happen since 2008.
2:48
when we got the smartphone
2:50
traditionally this happens about every
2:52
14 years if you go back looking at the
2:54
browser in 94.
2:57
2008 the mobile and now we've got the
2:59
generative AI okay
3:01
so this is that moment now I can tell
3:03
you as a Founder who started four
3:05
companies
3:07
and now I've been investing in hundreds
3:09
of companies over the last decade
3:11
last decade has been pretty damn boring
3:14
foreign
3:16
it's been the same old thing over and
3:18
over again once we got doordash and you
3:21
know a couple things in 2013 everything
3:23
kind of went quiet
3:25
from a seed perspective from a startup
3:27
perspective it was the same old stuff
3:28
and I can tell you sitting there we
3:29
review 8 000 companies a year
3:32
and most of the ideas are crap
3:35
like seven thousand nine hundred of them
3:38
because it's the same stuff over and
3:40
over again
3:41
and we've got a map of over 500
3:44
companies in the general of AI space and
3:47
most of them are all the same thing
3:49
again people aren't thinking inventively
3:52
because for the last 14 years we've all
3:53
been lulled we're all numb
3:55
we've got to start thinking bigger again
3:57
because we have that moment but that
3:59
moment's going to last for 24 months
4:01
typically it lasts for 24 30 months
4:04
that's about it
4:05
and it closes up
4:07
all right that's the moment you're in
4:08
right now
4:09
and so you should be paying attention
4:11
right now what's it going to affect it's
4:13
going to affect everything anything
4:14
you're interested in anything you're
4:16
working on is going to be impacted by
4:18
gender of tech and gender of AI
4:20
and it's not just that you're going to
4:22
be able to take what you're doing and
4:24
get some productivity gains
4:26
it's that the things that are being done
4:28
are going to change radically
4:31
and if you think big enough and you see
4:33
how this can happen you're going to be
4:34
able to really drive into that wedge and
4:36
create some great things
4:38
all right now's that moment
4:40
one thing I want to point out to you
4:42
that most people are missing in terms of
4:44
how they're thinking about this perhaps
4:45
because of the name generative Tech or
4:47
generate
4:48
is that in fact what's really amazing
4:50
about this stuff is that it reads well
4:54
it reads it summarizes it simplifies
4:57
it ranks it grades
5:00
it comments on things
5:03
and most people are just focusing on
5:04
what they can make as they make those
5:05
cool pictures of you being a troll king
5:07
or an Elven queen or whatever we can do
5:10
with this stuff that's great
5:12
but really what's amazing about it is
5:14
that it reads
5:16
now what I'll tell you is that what is
5:18
my job what do I do all day long
5:21
I read
5:23
I summarize I simplify I rank I collate
5:28
my job is going to change more in the
5:30
next 10 or 15 years than most other jobs
5:34
investment banking
5:35
lawyers these are the jobs that are
5:38
going to really change over the next 10
5:39
15 years doctors teachers people who
5:42
work at manufacturing facilities those
5:44
jobs will look pretty much the same 15
5:45
years from now as my prediction
5:48
but what all you big brains do
5:51
is going to change a lot
5:53
and everyone's complaining that all the
5:55
students can use that GPT Chat thing to
5:57
to you know write their essays like yeah
5:59
but the teachers can use it to grade the
6:00
essays and if you've asked it to grade
6:02
one of your essays you will see that you
6:05
get better comments from chat GPT than
6:07
you do from your professors
6:10
because they're working six days a week
6:13
they are way overworked
6:15
they can't give you the feedback they
6:17
want to give you they're doing their job
6:18
because they want to give you the
6:19
feedback
6:20
but they don't have time
6:23
but these things do it's going to
6:24
accelerate your learning it's going to
6:26
really change your job okay here's a way
6:28
to think about it
6:30
there's five layers in this generative
6:31
stack at the bottom is everything
6:32
everybody's talking about open AI right
6:36
then you're gonna have specific AI
6:37
models
6:38
things that just write tweets better
6:40
things that just do e-commerce photos
6:43
better
6:44
all right
6:46
but the general models might end up
6:48
supplanting those or you're going to end
6:49
up with Hyper local AI models which will
6:52
look just at your Nike photos in your
6:54
database because you work at Nike and
6:55
it's going to just make those photos
6:56
look just like Nike consistently with
6:58
the brand and your AI model will train
7:00
on your proprietary data and a closed
7:03
loop system
7:05
and whoever manages that AI for them
7:07
will have a pretty good more defensible
7:08
business because they can't just be
7:10
replaced because they have unique data
7:12
sets that's the third layer the
7:14
hyperlocal models
7:15
or if you have a magazine and you have a
7:18
certain style of writing you know you
7:20
train it on their 12 writers and how
7:22
they write and then the the AI will help
7:24
your writers right in that style right
7:26
from the get-go okay you're going to see
7:28
a lot of these hyper local models for
7:30
everything going forward
7:32
above that you've got your operating
7:33
system your API layer that's where the
7:35
network effects are
7:36
okay we're called nfx stands for Network
7:39
effects because we believe that the
7:40
biggest companies come from Network
7:41
effects
7:43
that layer that platform layer that's
7:45
where the network effects are
7:47
and then on top of that you're going to
7:48
have these applications you're going to
7:49
have these killer apps
7:51
like in 2004 there was a college photo
7:53
sharing killer app that became a big
7:55
platform
7:56
of course called Facebook then call Meta
7:58
and who knows what it's going to become
8:00
a good starting point
8:01
to find a killer app
8:03
and these killer apps are out there
8:05
trying to discover who they are like a
8:06
Jasper
8:07
and if they can get the right to install
8:09
in an application layer like an OS or an
8:12
API then that's probably a potentially a
8:14
good model
8:17
but beware
8:19
in the last
8:20
in the last big change in 2008
8:23
with the mobile phones who got most of
8:26
the billions
8:27
the incumbents
8:29
Apple
8:31
and Google they got the Android and they
8:33
got the iOS they got most of the
8:35
billions and there was some doordashing
8:37
and some ubering and whatnot to feed the
8:39
hungry miles of all the Venture
8:40
capitalists and feed the hungry mouths
8:42
of all the founders they left us some
8:44
crumbs but they got the most of it the
8:45
same thing's gonna happen here
8:47
except it's not going to be apple this
8:49
time it's going to be Microsoft because
8:50
they have the distribution
8:52
I mean Google and Microsoft are going to
8:54
get the majority of the value created by
8:57
this transformation
8:58
but there's still going to be many deck
9:00
of corns built still many unicorns built
9:03
in this wedge in sectors
9:05
that those guys won't want to pay
9:07
attention to or be too small for them
9:09
but for you a 40 or 60 billion dollar
9:11
company would be just fine
9:15
so obviously the consumers and the
9:17
workers are all going to win because
9:18
we're all going to be able to do much
9:19
more
9:20
everyone's going to get access to it
9:21
like water you can come it's like Google
9:23
and Microsoft are going to get most of
9:25
it
9:25
incumbents with distribution
9:28
already are going to implement this and
9:30
add this
9:31
right so nfx will add it Sequoia will
9:34
add it
9:36
and then startups that's the open Wedge
9:38
that's what's left for you guys and
9:39
that's what's less for us to invest in
9:44
again we've got this big map that you
9:46
should go check out
9:48
because it'll convince you that the
9:49
great idea you have
9:51
everybody else has it too
9:53
and I know you've built a prototype and
9:54
I know you've already sold ten thousand
9:56
dollars of it maybe fifty thousand
9:57
dollars of it
9:58
I've heard that story 60 times in the
10:01
last two months
10:02
the problem is everyone has the same
10:04
idea everyone's got fifty thousand
10:06
hundred thousand of Revenue per month
10:08
I get it
10:11
but the idea needs to be unique at this
10:13
point this map will show you that
10:15
there's already plenty of people doing
10:16
your idea
10:17
so you need to think harder
10:18
take the next step what's it look like
10:20
next
10:23
there's gonna be three phases of this
10:24
one is what we're seeing mostly most of
10:27
that map is this phase which is putting
10:29
a wrapper around the AI we'll see what
10:31
happens the second thing is that people
10:32
are going to be using generative AI to
10:34
do some business which has nothing to do
10:35
with AI
10:36
like renovate apartment buildings
10:40
we will give you a renovated apartment
10:41
building faster and cheaper than any
10:44
place else great I'm in
10:46
I didn't have to say anything about AI
10:48
I'm not selling the AI I'm using the AI
10:50
to make it faster and cheaper more
10:52
accurate
10:54
but I'm not selling the AI to you you're
10:55
not buying AI you're buying renovating
10:57
your apartment building in a Marketplace
10:59
okay
11:01
okay and then it's going to be the stuff
11:03
the Visionary stuff the stuff that we
11:04
can't imagine right when you first saw
11:06
the smartphone you didn't think oh man
11:07
that's really going to change taxi
11:08
industry
11:10
didn't occur to you right we didn't get
11:12
sidecar for about 18 months we didn't
11:14
get Lyft for another another six months
11:16
after that and Uber came a month after
11:17
that so it took about two years before
11:19
some of them were revolutionary stuff
11:21
started happening around the last
11:22
platform shift
11:24
those things are really exciting and I
11:26
encourage you all to push forward with
11:28
your thinking about how Everything
11:29
Changes Everything Changes
11:31
we are running a program inside our
11:34
company I encourage you to run it with
11:35
all of your friends what can be improved
11:37
and how you do your work as a student
11:39
and how any anybody you're talking to
11:41
how can AI play into it it's simple but
11:44
it's actually very few people are
11:45
actually spending the hours to do it
11:46
right now everyone's a little scared
11:48
I encourage you to get over that fear
11:50
and just do it okay uh here's the
11:53
problem when the internet first came out
11:55
in 94 a lot of people thought it was
11:57
stupid
11:58
Bob Metcalf very famous guy said it was
12:00
like CB radio and it was just a fad
12:02
what that did was it allowed the 4 000
12:04
people who believed in the internet
12:06
to have an open space to build companies
12:08
without too much competition
12:11
okay
12:12
uh when crypto came out I mean still
12:14
everyone's skeptical right how many
12:15
people have Bitcoins like 120 million
12:17
people out of 4 billion people on the
12:19
Internet it's still a lot of skepticism
12:20
about it
12:21
we can understand why but
12:24
there's a lot of skepticism which means
12:26
that if you get into it and believe in
12:27
it you can make hay that's not the case
12:30
here
12:31
what's happening here is a little bit
12:32
more like when Facebook opened up their
12:33
platform in 2008 hundreds of thousands
12:36
of people started building apps it was a
12:38
gold rush
12:39
same thing's happening here so you have
12:41
to move fast there's no Skeptics
12:44
everybody gets it
12:46
everybody gets it
12:48
so
12:50
in order to win in this you're going to
12:51
have to have speed you're going to have
12:53
to move fast and be aggressive
12:56
more so than other sectors that we've
12:58
seen in the last 20 years except for
13:00
that Facebook platform thing which was
13:01
just literally hour by hour day by day
13:04
whoever got ahead it was hourly guys I
13:07
can tell your stories I won't today but
13:09
it was literally hourly and so who would
13:11
get ahead and then their Network affect
13:12
their viral growth would experience so
13:15
uh that's coming that's that's happening'''

In [None]:
transcripcion = ""

In [None]:
instrucciones = '''Actua como un experto en hacer resumenes, necesito extraer todas las ideas fuerzas de un video, en un formato bullets,responde es español,

la transcripcion del video es la siguiente:'''
prompt= instrucciones + transcripcion

In [None]:
response = openai.ChatCompletion.create(
  model="gpt-3.5-turbo",
  messages=[
    {"role": "user", "content": prompt}
  ]
  )

RateLimitError: You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.

In [None]:
!pip install openai

In [None]:
!pip install openai==0.28

In [None]:
import os
import openai
openai.api_key = "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"

In [None]:
import os
from openai import OpenAI

OPENAI_API_KEY = "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"

client = OpenAI(
    # This is the default and can be omitted
    api_key=os.environ.get("OPENAI_API_KEY"),
)

chat_completion = client.chat.completions.create(
    messages=[
        {
            "role": "user",
            "content": "Say this is a test",
        }
    ],
    model="gpt-3.5-turbo",
)