<a href="https://colab.research.google.com/github/jcv444/FAQ/blob/master/judy_s_openai_whisper_notebook.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# OpenAI Whisper Notebook

## Purpose:
Create a Google Colab notebook that demonstrates how to use OpenAI's Whisper model for audio transcription and translation, including environment setup, high-level API usage for English and French (Chinese) audio, and low-level model access for language detection and decoding, with detailed explanations for each step.


## Section 1 - Whisper Notebook Setup

### 1.1 Set Harware Accelerator to GPU:

Start by setting Colab's hardware accelerator to 'GPU'. You can do this via either option:
1. 'view resources' (available from the drop-down list next to the RAM/Disk bars) and then selecting 'Change runtime type'.
2. Top L Runtime > Change runtime type



#‚¨áÔ∏è Run (Ctrl + Enter) blow Code Cells to execute the COMMANDS‚¨áÔ∏è

### 1.2 Install Whisper Library (Run this cell first to set up the environment)
The OpenAI Whisper library is essential for this notebook as it provides the core functionality for both audio transcription and translation. This installation step sets up the necessary tools to process audio files and convert spoken language into text, or translate it into another language, using Whisper's powerful models.

The following command will add a code cell to pull and install the latest commit from OpenAI's Whisper repository along with its Python dependencies.


In [None]:
# Installs OpenAI Whisper library from GitHub
!pip install git+https://github.com/openai/whisper.git

### 1.3 Install FFmpeg & Then Its Python Wrapper

Whisper relies on FFmpeg for efficient audio processing, including loading and manipulating various audio formats. To ensure Whisper functions correctly, FFmpeg must be installed on the system - preferably (highly recommended & often necessary for Python applications that interact with FFmpeg) within a Python wrapper - after FFmpeg has been isntalled. Here's why:  
     ‚Ä¢ FFmpeg itself is a command-line tool: It's a powerful standalone program for handling multimedia files. You can use it directly from the terminal (or a Colab code cell with !).   
         ‚Ä¢ ffmpeg-python is a Python wrapper: This library provides a more convenient and Pythonic way to build and run FFmpeg commands from within your Python code. Instead of constructing complex shell commands as strings, you can use Python objects and methods to represent FFmpeg operations, making your code cleaner, more readable, and less error-prone.   
         So, while you could use ! and manually construct FFmpeg commands, ffmpeg-python makes it much easier to integrate FFmpeg's capabilities into your Python-based workflows, like those often found in Colab notebooks.  

#### 1.3.1 Install FFmpeg CLI Tool 1st

Add a code cell to install the FFmpeg command-line tool: You need to have the actual FFmpeg program installed on the system first. This is done via !sudo apt install ffmpeg (or !apt-get install ffmpeg, **or** apt-get update && apt-get install -y ffmpeg). Without the underlying FFmpeg executable, the Python wrapper won't have anything to control (wrap around).



In [None]:
# update the packages & install FFMPEG (Alternative code: `apt-get update && apt-get install -y ffmpeg`. See  below Q&A.)
!sudo apt update && sudo apt install ffmpeg

#### 1.3.2 Install Python Wrapper 2nd
Add a code cell to install ffmpeg-python, which is a Python wrapper for FFmpeg. This is highly recommended & often necessary for Python applications to interact with FFmpeg.

In [None]:
pip install ffmpeg-python

## Section 2 - Upload File & Select Whisper Model

2.1 ‚¨ÜÔ∏è Upload file: .aac,  .flac, .m4a, .mp3, .mp4.ogg, .wav, .wma, but **NOT** Audacity‚Äôs inate .aup files.  
2.2 üîÇ Change the file's directory path via R context menu's "Copy path" from the file I uploaded in 2.1.   

‚õî Do NOT simply enter its file name:
```
!whisper "ENTER FILE NAME HERE" --model base.en
```
as some Youtubers have erroneously suggested.  
2,3 ‚õè Pick from the 5 sizes: Tiny, Base, Small, Medium, & Large.  
2.3 ü¶ú Specify the language  
2.4 ‚úÖ Run it

In [None]:
# ------- Change File's New Directory Path -- --Pick Model --Specify Language
!whisper /content/audio2022-11-2823.10.13.wav --model large --language German --task translate


100%|‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà‚ñà| 2.88G/2.88G [00:43<00:00, 71.8MiB/s]
Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/whisper/audio.py", line 58, in load_audio
    out = run(cmd, capture_output=True, check=True).stdout
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ffmpeg', '-nostdin', '-threads', '0', '-i', '/content/audio2022-11-2823.10.13.wav', '-f', 's16le', '-ac', '1', '-acodec', 'pcm_s16le', '-ar', '16000', '-']' returned non-zero exit status 1.

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.12/dist-packages/whisper/transcribe.py", line 615, in cli
    result = transcribe(model, audio_path, temperature=temperature, *

## Section 3 - Mount Google Drive for Persistent Storage

Mounting Google Drive is highly beneficial for this notebook as it provides persistent storage for audio files and model outputs. Instead of re-uploading files every session, you can store them directly on your Google Drive and access them from Colab. This also allows you to save any transcribed text or translated results directly to your Drive. When referencing files from Google Drive, use paths like `/content/drive/My Drive/your_audio.m4a`.

The next step is to add a code cell to mount Google Drive, which should come after the text cell explaining Google Drive mounting.

In [None]:
from google.colab import drive
drive.mount('/content/drive')

MessageError: Error: credential propagation was unsuccessful

# Below Codes for Clarification Purposes.

#1. .ipynb File Extension
**Q:** What does the I in .ipynb stand for?
**A:** The 'I' in .ipynb stands for IPython. This is because Jupyter notebooks evolved from the IPython project, which was originally focused on providing an interactive Python shell. While Jupyter now supports many kernels (Julia, Python, R, etc.), the file extension ipynb remains as a legacy of its IPython origins.

# 2. Diff b/w These 2 Codes:
```
!sudo¬†apt¬†update¬†&&¬†sudo¬†apt¬†install¬†ffmpeg)¬†vs
apt-get¬†update¬†&&¬†apt-get¬†install¬†-y¬†ffmpeg')
```

**A:** Both code snippets aim to install FFmpeg on your Colab environment. Here's a breakdown of the differences:
    1. !sudo apt update && sudo apt install ffmpeg
            ‚ó¶ !: This is an [redacted link] command, commonly used in Colab, that executes the rest of the line as a shell command.
                    ‚ó¶ sudo: Stands for "superuser do." It grants administrative privileges to execute the command. While often used out of habit, it's frequently not strictly necessary for package installations in Colab, as you often have sufficient permissions.
                            ‚ó¶ apt: This is the newer, more user-friendly command-line utility for package management in Debian-based systems (like the one Colab uses).
                                    ‚ó¶ update: Refreshes the list of available packages from the repositories.
                                            ‚ó¶ &&: A shell operator that means the second command (sudo apt install ffmpeg) will only execute if the first command (sudo apt update) successfully completes.
                                                    ‚ó¶ install ffmpeg: Installs the FFmpeg package.
                                                        2. get_ipython().system('apt-get update && apt-get install -y ffmpeg')
                                                                ‚ó¶ get_ipython().system(): This is the programmatic way to execute shell commands within an IPython environment (like Colab). It achieves the same outcome as the ! magic command.
                                                                        ‚ó¶ apt-get: This is an older, more verbose command-line tool for package management, still widely used in scripts. Functionally, for update and install, apt-get is very similar to apt.
                                                                                ‚ó¶ -y: This flag automatically answers "yes" to any prompts during the installation process, making the operation non-interactive. While useful in scripts, Colab environments often handle prompts gracefully even without this flag.
                                                                                In summary:
                                                                                Both commands will successfully install FFmpeg. The first one using ! and apt is generally more concise and preferred in modern interactive environments like Colab. The second one using get_ipython().system() and apt-get -y is a more explicit, script-like way of achieving the same, often seen in older documentation or scripts where non-interactive execution was critical.
                                                                                For practical purposes in Colab, !apt update && !apt install ffmpeg (or combining them with && as in your example) is often the simplest and most readable approach.

Building on my previous explanation, the core differences are:
    ‚Ä¢ !sudo apt update && sudo apt install ffmpeg
            ‚ó¶ Uses the ! IPython magic command for shell execution.
                    ‚ó¶ Employs the newer apt package manager.
                            ‚ó¶ sudo is often included, granting superuser privileges.
                                ‚Ä¢ get_ipython().system('apt-get update && apt-get install -y ffmpeg')
                                        ‚ó¶ Uses the more explicit get_ipython().system() function for shell execution.
                                                ‚ó¶ Employs the slightly older apt-get package manager.
                                                        ‚ó¶ Includes the -y flag for non-interactive installation (automatically answers 'yes' to prompts).
                                                        Both achieve the same goal of installing FFmpeg in a Colab environment. For general use in Colab, the first syntax (!sudo apt...) is generally more common due to its brevity.

# 3. Does FFMPEG Need a Python Wrapper?

Diff b/w these 2 codes:
```
pip install ffmpeg-python vs
!sudo apt update && sudo apt install ffmpeg
```
Which code is to install the wrapper? Does the wrapper need to be installed 1st?  

**A:** That's a great set of clarifying questions!  
1. Does FFmpeg need a Python wrapper?
Not strictly need, but it's highly recommended and often necessary for Python applications that interact with FFmpeg. Here's why:
    ‚Ä¢ FFmpeg itself is a command-line tool: It's a powerful standalone program for handling multimedia files. You can use it directly from the terminal (or a Colab code cell with !).  
        ‚Ä¢ ffmpeg-python is a Python wrapper: This library provides a more convenient and Pythonic way to build and run FFmpeg commands from within your Python code. Instead of constructing complex shell commands as strings, you can use Python objects and methods to represent FFmpeg operations, making your code cleaner, more readable, and less error-prone.   
        So, while you could use ! and manually construct FFmpeg commands, ffmpeg-python makes it much easier to integrate FFmpeg's capabilities into your Python-based workflows, like those often found in Colab notebooks.  
        2. Diff b/w these 2 codes:  
            ‚Ä¢ pip install ffmpeg-python
                    ‚ó¶ This command uses pip, the Python package installer. Its purpose is to install Python libraries into your Python environment.
                            ‚ó¶ It installs the ffmpeg-python library, which is the Python wrapper mentioned above. This library does not install the FFmpeg command-line tool itself.
                                ‚Ä¢ !sudo apt update && sudo apt install ffmpeg  
                                        ‚ó¶ This command uses apt (or apt-get), which is a system-level package manager for Debian-based Linux distributions (like the one Colab uses).    
                                                ‚ó¶ Its purpose is to install system applications and libraries, including command-line tools like FFmpeg.  
                                                        ‚ó¶ This command installs the actual FFmpeg executable (the command-line tool) that ffmpeg-python will then interact with.  
                                                        3. Which code is to install the wrapper?  
                                                        pip install ffmpeg-python installs the Python wrapper.  
                                                        4. Does the wrapper need to be installed 1st?   
                                                        No, the wrapper does not need to be installed first. In fact, it's typically the other way around:  
                                                            1. Install the FFmpeg command-line tool: You need to have the actual FFmpeg program installed on the system first. This is done via !sudo apt install ffmpeg (or !apt-get install ffmpeg). Without the underlying FFmpeg executable, the Python wrapper won't have anything to control.    
                                                                2. Install the Python wrapper (ffmpeg-python): Once FFmpeg is available on the system, you can then install the Python wrapper via !pip install ffmpeg-python to easily interact with it from your Python code.   
                                                                So, the correct order is to install the system-level FFmpeg, then the ffmpeg-python wrapper if you plan to use it in your Python code.  
                                                                === === ===  === === ===    
                                                                What would happen if the Python wrapper is installed before the FFmpeg?   
                                                                --- --- ---     
                                                                If you install the ffmpeg-python wrapper before installing the actual FFmpeg command-line tool, the Python wrapper won't be able to function correctly. The ffmpeg-python library is essentially a translator between your Python code and the FFmpeg executable.  
                                                                When you try to use ffmpeg-python (e.g., to process an audio file), it will look for the ffmpeg executable on your system's PATH. If ffmpeg isn't installed, the Python wrapper won't find it, and you'll typically encounter errors like FileNotFoundError or similar messages indicating that the ffmpeg command cannot be found. The wrapper needs the underlying tool to do the heavy lifting of audio/video processing.  
                                                                So, even though the ffmpeg-python library itself would be installed in your Python environment, it would be effectively useless until the ffmpeg command-line tool is also installed on the system.  
                                                                === === ===  === === ===     
                                                                But if I installed the Ffmpeg after the Python wrapper, would they find each other?   
                                                                --- --- ---    
                                                                Yes, if you install the ffmpeg-python wrapper first, and then install the FFmpeg command-line tool, they will find each other as long as FFmpeg is installed correctly and its executable is discoverable by your system's PATH.  

                                                                Here's why:  

                                                                    When you pip install ffmpeg-python, you're installing the Python library itself. This doesn't involve checking for the FFmpeg executable at installation time.  
                                                                        When you later run !sudo apt install ffmpeg, the FFmpeg executable is installed in a standard system location (e.g., /usr/bin/ffmpeg) which is typically included in the system's PATH environment variable.  
                                                                            When your Python code then tries to use ffmpeg-python to interact with FFmpeg, the ffmpeg-python library internally looks for the ffmpeg executable by searching the directories specified in the system's PATH. If it finds it, it will use it.   

                                                                            So, the order of installation for the Python wrapper versus the system-level FFmpeg executable doesn't matter as much as ensuring that both are installed and that the FFmpeg executable is accessible via the system's PATH when ffmpeg-python tries to invoke it.

# 4. Enter the File's Directory Path & NOT the File Name
**Q:** I am confused on what I should enter into spaces b/w the double quotes ‚Äú ‚Äú for below code. Should I enter the file name or the path via the audio file‚Äôs R context menu?   
```
# ------- Change File's New Directory Path -- --Pick Model --Specify Language
!whisper /content/audio2022-11-2823.10.13.wav --model large --language German --task translate
vs  
!whisper "ENTER FILE NAME HERE" --model base.en  

```

**A:** That's a very common point of confusion! For the !whisper command, you should enter the full path to your audio file inside the double quotes.  
The instruction in Section 2, specifically "2.2 üîÇ Change the directory via R context menu's "Copy path" from the file I uploaded in 2.1. !whisper "ENTER FILE NAME HERE" --model base.en", guides you to do exactly that.  
So, after you've uploaded your audio file to Colab (e.g., to /**content**/), you'll right-click on it in the Colab file browser, select "Copy path", and then paste that full path into the double quotes. For example, if your file is named my_audio.wav and it's in the **root** /**content**/ directory, the path would be /**content**/my_audio.wav.  
Therefore, your command would look like this:  
!whisper¬†"/**content**/my_audio.wav"¬†--model¬†base.en  
Or, if you've mounted Google Drive and your file is there, it might look like:
!whisper¬†"/**content**/drive/My¬†Drive/my_audio_folder/my_audio.m4a"¬†--model¬†base.en   
Always use the path obtained from "Copy path" to ensure it's correct.

# 5. Large vs large for Model Size Code

**Q:** Does it matter if I entered the code for model size with small case vs mixed cases?  
```
--model large vs
--model Large
```

**A:** No, it does not matter if you enter the model size with a lowercase (--model large) or mixed case (--model Large). The Whisper command-line tool, specifically for the --model argument, is generally case-insensitive for these model names.
So, both large and Large (or even LARGE) will correctly instruct Whisper to load the 'large' model. You can use either form, and the command will function as expected.  

# 6. Case-Sensitive Code Word(s)
**Q:** Which, if any, command-line word(s) are case-sensitive?  
**A:** Generally, command-line arguments are case-sensitive unless the tool's developers have specifically designed them to be case-insensitive for certain arguments. For example, whisper and Whisper are likely to be treated as different commands if both existed, but usually, the primary command is defined in a specific case.  
Regarding the arguments after the whisper command:  
    ‚Ä¢ As we discussed, the --model argument is case-insensitive for its values (e.g., large, Large, LARGE all work).   
        ‚Ä¢ Flags like -h (for help) are typically case-sensitive in their short form; you wouldn't use -H for the same purpose unless explicitly defined.   
            ‚Ä¢ Other arguments, such as --language or --task, and their respective values (German, transcribe, translate) are generally expected to be case-sensitive. For instance, german might not be recognized if the tool expects German.   
            To be absolutely sure about any specific argument, the best approach is always to consult the tool's help documentation by running !whisper -h. The help output usually provides the exact spelling and expected case for each argument and its possible values.

# 7. Run ALL Codes & Turn Off the Last Landmark Code

**Q:** Is there a way to run all the codes in this notebook or any .ipynb notebook?
If yes, then is there a way to turn off one code off that sits @ the very end, specifically this code @ this notebook‚Äôs very end used for instructional purposes:
```
# Additional arguments: -h is a standard flag, short for --help. It's a common convention in command-line tools to display help documentation.
!whisper¬†-h
```
**A:** Yes, there are ways to run all the code cells in a notebook, and also to prevent a specific cell from running.
1. How to run all codes in a notebook:
In Google Colab, you can easily run all the code cells in your notebook using the menu options:
    ‚Ä¢ Go to Runtime in the top menu bar.
        ‚Ä¢ Select Run all.
        This will execute every code cell in your notebook sequentially, from top to bottom.
        2. How to turn off a specific code cell:
        To prevent the instructional !whisper -h cell from being executed when you run all cells, the most effective way is to change its cell type from 'Code' to 'Text' (Markdown). This way, its content remains visible for explanation but it will not be run as code.
        I will modify the cell DHy2d-ysS4iL to be a text cell.

# 8. F5 = Refresh & Wipe Out Gemini Chat History.
**Q:** I did the F5 & lost our chat history. I confirm that all indicators still shows it to be a code cell based on what you had said prior to the nuclear F5 bomb: "Here's how to confirm if the cell has indeed been converted to a text (markdown) cell:  
       1. Look for the [ ] or [ ] next to the cell: Code cells typically have an [ ] (empty brackets) or [number] (execution count) to their left. Text cells do not have these. If these are gone, it's a text cell.   
              2. Double-click the cell: If you double-click a text cell, it will usually switch into an editing mode where you can see the raw Markdown syntax. If you double-click a code cell, it remains in code-editing mode.   
                     3. No 'Run cell' button: Text cells do not have a 'Play' button to run them; only code cells do.
                     Could you please check these indicators? If it genuinely still appears as a code cell after a brief moment, you might try a quick browser refresh (F5 or Cmd/Ctrl + R) to ensure the Colab interface is fully updated. The command to change its type was sent successfully from my end.

# 9. Feynman This Final Code: (Move to Always Be @ & as the Final Code)as a Landmark
!whisper -h  

**A:** Let's break down the !whisper -h command using the Feynman technique, as if you were explaining it to someone else to truly understand it:  
What is it? (Explain it Simply)  
Imagine you have a powerful tool called whisper that can listen to audio and turn it into text, or even translate it. But this tool has many buttons and settings. The !whisper -h command is like pressing a big "Help" button on that tool. When you press it, the tool tells you all the different things it can do and all the settings (or 'arguments') you can use with it.  
    ‚Ä¢ !: This simply tells Colab (or any Jupyter-like environment) to run the command that follows as if you were typing it directly into your computer's terminal.   
        ‚Ä¢ whisper: This is the name of the program we're running ‚Äì the OpenAI Whisper command-line interface.   
            ‚Ä¢ -h: This is a standard flag, short for --help. It's a common convention in command-line tools to display help documentation.
            When to use it? (When would you need this?)
            You'd use !whisper -h whenever you:  
                1. Forget how to do something specific: "How do I specify the language again?" or "What are the different model sizes I can use?"   
                    2. Want to explore features: "What other cool things can Whisper do that I don't know about?"    
                        3. Are debugging an error: Sometimes an error message might mention an argument, and !whisper -h can help you understand what that argument is supposed to do or how to format it correctly.    
                            4. Are new to the tool: It's a great first step to get an overview of all available options before diving into actual tasks.   
                            Think of it like checking the instruction manual for your new gadget when you're trying to figure out a function.  
                            How to use it? (How do you operate this?)   
                            To use it, you simply type it into a code cell in Colab and run the cell:  
                            !whisper¬†-h  
                            When you run this, it won't actually process any audio. Instead, it will print out a long message in the output area below the cell. This message will list all the available arguments for the whisper command, along with a brief description of what each argument does, what kind of value it expects (e.g., a filename, a language code, a model name), and sometimes examples of how to use it.  
                            For example, you'd see information about arguments like:   
                                ‚Ä¢ audio: The path to your audio file.   
                                    ‚Ä¢ --model: Which Whisper model to use (tiny, base, small, medium, large).   
                                        ‚Ä¢ --language: To specify the audio's language.     
                                            ‚Ä¢ --task: To choose between transcribe or translate.   
                                                ‚Ä¢ --output_dir: Where to save the output files.    
                                                By reading this output, you can then construct your actual !whisper commands with confidence, knowing all the options at your disposal.  

In [None]:
# Additional arguments: -h is a standard flag, short for --help. It's a common convention in command-line tools to display help documentation.
!whisper¬†-h