# Speech-to-Text from YouTube and ChatGPT Text Generation

To convert audio file to text we use [OpenAI Whisper speech-to-text service](https://openai.com/research/whisper). For generative AI, we interfaced with GPT-3.5 or GPT-4 via [OpenAI API](https://platform.openai.com/docs/guides/gpt).

__It is recommended to run this Jyputer notebook on Google Colab cloud instead of local Python installation.__

### Download audio track from YouTube video

In [1]:
try:
  import google.colab
  IN_COLAB = True
except:
  IN_COLAB = False

In [2]:
if IN_COLAB:
    !pip install git+https://github.com/openai/whisper.git
    !nvidia-smi -L
    #!pip install pytube -q
    !pip install pytube@git+https://github.com/OlekRomanko/pytube.git@master -q
    import whisper
    from pytube import YouTube
else:
    try:
        import whisper
    except:
        !pip install openai-whisper
        import whisper
    try:
        from pytube import YouTube
    except:
        !pip install pytube -q
        from pytube import YouTube

Collecting git+https://github.com/openai/whisper.git
  Cloning https://github.com/openai/whisper.git to /tmp/pip-req-build-ce_wpk89
  Running command git clone --filter=blob:none --quiet https://github.com/openai/whisper.git /tmp/pip-req-build-ce_wpk89
  Resolved https://github.com/openai/whisper.git to commit e8622f9afc4eba139bf796c210f5c01081000472
  Installing build dependencies ... [?25l[?25hdone
  Getting requirements to build wheel ... [?25l[?25hdone
  Preparing metadata (pyproject.toml) ... [?25l[?25hdone
GPU 0: Tesla T4 (UUID: GPU-13392fb6-3963-fc07-c22b-135c9f266882)
  Preparing metadata (setup.py) ... [?25l[?25hdone


In [3]:
model = whisper.load_model('base')

In [4]:
filename = "VisualPolitik_Silicon_Valley"

In [5]:
youtube_video_url = "youtube.com/watch?v=WNTtWYFhWus"

youtube_video = YouTube(youtube_video_url)

If from the next command you get `RegexMatchError: __init__: could not find match for ^\w+\W` error, as suggested on https://github.com/pytube/pytube/issues/1199, in order to solve the problem, you should go in the cipher.py file and replace the line 30, which is `var_regex = re.compile(r"^\w+\W")` with the line `var_regex = re.compile(r"^\$*\w+\W")`.

In [6]:
streams = youtube_video.streams.filter(only_audio=True)
streams

[<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">, <Stream: itag="140" mime_type="audio/mp4" abr="128kbps" acodec="mp4a.40.2" progressive="False" type="audio">, <Stream: itag="249" mime_type="audio/webm" abr="50kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="250" mime_type="audio/webm" abr="70kbps" acodec="opus" progressive="False" type="audio">, <Stream: itag="251" mime_type="audio/webm" abr="160kbps" acodec="opus" progressive="False" type="audio">]

In [7]:
stream = streams.first()
stream

<Stream: itag="139" mime_type="audio/mp4" abr="48kbps" acodec="mp4a.40.5" progressive="False" type="audio">

In [8]:
stream.download(filename=filename+'_audio.mp4')

'/content/VisualPolitik_Silicon_Valley_audio.mp4'

### Convert audio to text (speech-to-text)

If from the next command you get an error similar to `FileNotFoundError: [WinError 2] The system cannot find the file specified`, install newest version of `ffmpeg` with `conda install conda-forge::ffmpeg` for your local Anaconda distribution.

In [9]:
#if not IN_COLAB:
#    try:
#        import ffmpeg
#    except:
#        !pip install ffmpeg-python
#        import ffmpeg

In [10]:
output = model.transcribe(filename+"_audio.mp4")

In [11]:
output

{'text': " This video has been made possible by Brilliant, a problem-solving based website and app with hands-on approach. Improve your STEM skills while having a great time learning at brilliant.org for a slasher visual politic EN. More on that in a bit. Here is the million dollar question in Europe. Why is there no European Google? Why does Europe seem to be an elephant's graveyard when it comes to business? To give you an idea of the five largest companies in the world, four are American and were launched after 1975. In Europe, however, it is common for the largest companies to have been started much earlier. Some, such as Nestle, date back to the 19th century, and even the pharmaceutical companies that created a giant like Novartis were started in the 17th and 18th centuries, no less. Let's see, credit where credit's due, these companies deserve a round of applause for having managed to survive in the market. But it's significant that no young European company is in this top five w

Print transcription

In [12]:
# Print the transcription
print(output["text"])

 This video has been made possible by Brilliant, a problem-solving based website and app with hands-on approach. Improve your STEM skills while having a great time learning at brilliant.org for a slasher visual politic EN. More on that in a bit. Here is the million dollar question in Europe. Why is there no European Google? Why does Europe seem to be an elephant's graveyard when it comes to business? To give you an idea of the five largest companies in the world, four are American and were launched after 1975. In Europe, however, it is common for the largest companies to have been started much earlier. Some, such as Nestle, date back to the 19th century, and even the pharmaceutical companies that created a giant like Novartis were started in the 17th and 18th centuries, no less. Let's see, credit where credit's due, these companies deserve a round of applause for having managed to survive in the market. But it's significant that no young European company is in this top five while in th

In [13]:
# Save the transcription to a file
transcript_filename = "transcript_{}.txt".format(filename)
with open(transcript_filename, "w") as transcript_file:
    transcript_file.write(output["text"])

In [14]:
with open(transcript_filename, "r") as f:
    text = f.read()

### Use text extracted from YouTube audio track as ChatGPT context

In [15]:
#Importing OpenAI module
try:
  import openai
except:
  !pip install openai
  import openai

Collecting openai
  Downloading openai-0.27.8-py3-none-any.whl (73 kB)
[?25l     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m0.0/73.6 kB[0m [31m?[0m eta [36m-:--:--[0m[2K     [91m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m[91m╸[0m[90m━[0m [32m71.7/73.6 kB[0m [31m2.2 MB/s[0m eta [36m0:00:01[0m[2K     [90m━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━[0m [32m73.6/73.6 kB[0m [31m1.8 MB/s[0m eta [36m0:00:00[0m
Installing collected packages: openai
Successfully installed openai-0.27.8


You can find your secret API key in your [OpenAI User Settings](https://platform.openai.com/account/api-keys).

Please check [pricing for OpenAI language models](https://openai.com/pricing). For GPT-3.5 (ChatGPT) model, it is currently around $0.003 for 1K tokens (750 words).

In [16]:
## API Key
## Store your key in file api_key.txt and read it
## Alternatlively, you can copy/paste your key directly as
## API_KEY = "sk-BV9ZRsx7b7GHHtLS9CslT3BlbkFNNeCVWZMluYw2y8AIZ9HY"
if IN_COLAB:
    from google.colab import drive
    drive.mount('/content/drive')
    f = open("drive/MyDrive/files/api_key.txt", "r")
    API_KEY=f.readline()
    f.close()
else:
    f = open("api_key.txt", "r")
    API_KEY=f.readline()
    f.close()

# OpenAI Key
import os
os.environ['OPENAI_API_KEY'] = API_KEY
openai.api_key = os.getenv("OPENAI_API_KEY")

Mounted at /content/drive


In [17]:
# OpenAI API parameters
model = "gpt-3.5-turbo-16k"
# model = "gpt-3.5-turbo" # 4K tokens
# model = "gpt-4"
max_tokens = 1024
n = 1
stop = None
temperature = 0.5

### Prompt 1 - ChatGPT generating text for a Facebook post

In [18]:
prompt_1 = 'Please write a three sentence Facebook post in French language about how Canada can innovate based on the following relevant transcript: "{input}"'

In [19]:
prompt1   = prompt_1.format(input=text)
response1 = openai.ChatCompletion.create(
    model=model,
    messages=[
    {"role": "system", "content": "You are a helpful  assistant."},
    {"role": "user", "content": prompt1},
    ],
    max_tokens=max_tokens,
    n=n,
    stop=stop,
    temperature=temperature,
)

In [20]:
coutput1 = response1['choices'][0]['message']['content']
print(coutput1)

"Le Canada a la possibilité d'innover en s'inspirant de l'expérience de la Silicon Valley aux États-Unis. Comme le montre cette vidéo, la clé du succès de la Silicon Valley réside dans la flexibilité, la collaboration entre l'université et l'industrie, ainsi que le financement par le capital-risque. Le Canada peut adopter ces facteurs clés pour encourager l'innovation et stimuler la croissance économique dans le pays. #Innovation #SiliconValley #Canada"


### Prompt 2 - ChatGPT generating text about innovation in Canada

In [21]:
prompt_2 = "Innovations in different countries are based on a number of factors. Here is the relevant transacript. Please write five steps how Canada can innovate: {input}"

In [22]:
prompt2   = prompt_2.format(input=text)
response2 = openai.ChatCompletion.create(
    model=model,
    messages=[
    {"role": "system", "content": "You are a helpful  assistant."},
    {"role": "user", "content": prompt2},
    ],
    max_tokens=max_tokens,
    n=n,
    stop=stop,
    temperature=temperature,
)

In [23]:
coutput2 = response2['choices'][0]['message']['content']
print(coutput2)

To foster innovation in Canada, here are five steps that can be taken:

1. Foster collaboration between universities and private enterprises: Like Silicon Valley, Canada can promote strong partnerships between universities and private companies. This can be achieved by encouraging universities to collaborate with industry and provide resources for research and development.

2. Increase investment in research and development: The Canadian government can provide more funding for research and development initiatives across various sectors. This can be done through grants, tax incentives, and partnerships with private companies.

3. Create a favorable regulatory environment: Canada can streamline regulations and create policies that support innovation and entrepreneurship. This includes reducing bureaucratic processes, promoting intellectual property protection, and providing support for startups and small businesses.

4. Encourage venture capital investment: Canada can attract more ventur

### Prompt 3 - ChatGPT summarizing text

In [24]:
prompt_3 = "Summarize the following text: {input}"

In [25]:
prompt3   = prompt_3.format(input=text)
response3 = openai.ChatCompletion.create(
    model=model,
    messages=[
    {"role": "system", "content": "You are a helpful  assistant."},
    {"role": "user", "content": prompt3},
    ],
    max_tokens=max_tokens,
    n=n,
    stop=stop,
    temperature=temperature,
)

In [26]:
coutput3 = response3['choices'][0]['message']['content']
print(coutput3)

The text discusses the lack of a European Google and the innovation gap between Europe and the United States. It highlights the historical success of companies in Europe but points out that no young European company is in the top five largest companies in the world. The text suggests that the USA is ahead of Europe in terms of innovation and discusses the energy crisis and its impact on European businesses. It compares the situation in Europe to that of Detroit and Silicon Valley in the 1950s, highlighting the importance of innovation in the success of Silicon Valley. The text also discusses the factors that contribute to Silicon Valley's success, such as flexibility, university-industry collaboration, and venture capital. It highlights the challenges Europe faces in terms of regulation, lack of collaboration between universities and private companies, and limited investment in military technology. The text suggests that the war in Ukraine and increased military spending in Europe coul

### Text-to-speech for ChatGPT generate sumamry

In [27]:
#Importing gTTS
try:
  from gtts import gTTS
except:
  !pip install gTTS
  from gtts import gTTS

Collecting gTTS
  Downloading gTTS-2.3.2-py3-none-any.whl (28 kB)
Installing collected packages: gTTS
Successfully installed gTTS-2.3.2


In [28]:
from gtts import gTTS
tts = gTTS(coutput3)
tts.save('chatgpt_summary.wav')

In [30]:
from IPython.display import Audio
sound_file = 'chatgpt_summary.wav'
Audio(sound_file, autoplay=True)