<div style="  background: linear-gradient(145deg, #0f172a, #1e293b);  border: 4px solid transparent;  border-radius: 14px;  padding: 18px 22px;  margin: 12px 0;  font-size: 26px;  font-weight: 600;  color: #f8fafc;  box-shadow: 0 6px 14px rgba(0,0,0,0.25);  background-clip: padding-box;  position: relative;">  <div style="    position: absolute;    inset: 0;    padding: 4px;    border-radius: 14px;    background: linear-gradient(90deg, #06b6d4, #3b82f6, #8b5cf6);    -webkit-mask:       linear-gradient(#fff 0 0) content-box,       linear-gradient(#fff 0 0);    -webkit-mask-composite: xor;    mask-composite: exclude;    pointer-events: none;  "></div>    <b>Introduction to PyDub</b>    <br/>  <span style="color:#9ca3af; font-size: 18px; font-weight: 400;">(Spoken Language Processing in Python)</span></div>

## Table of Contents

1. [Introduction](#section-1)
2. [Installing PyDub](#section-2)
3. [PyDub's Main Class: AudioSegment](#section-3)
4. [Playing an Audio File](#section-4)
5. [Inspecting Audio Parameters](#section-5)
6. [Changing Audio Parameters](#section-6)
7. [Manipulating Audio: Volume Control](#section-7)
8. [Normalization: Making Audio Sound Consistent](#section-8)
9. [Remixing: Slicing and Combining](#section-9)
10. [Splitting Stereo to Mono](#section-10)
11. [Exporting Audio Files](#section-11)
12. [Bulk Processing: Reformatting Multiple Files](#section-12)
13. [Advanced Bulk Manipulation Pipeline](#section-13)
14. [Conclusion](#section-14)

---

<a id="section-1"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 1. Introduction</span><br>

PyDub is a powerful and simple-to-use Python library for audio manipulation. It allows developers to perform spoken language processing tasks such as slicing, concatenating, exporting, and processing audio files with minimal code. This notebook covers the fundamentals of PyDub, from installation to building automated processing pipelines.

---

<a id="section-2"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 2. Installing PyDub</span><br>

To get started, you need to install the `pydub` library. Additionally, if you plan to play audio directly from Python or work with non-WAV files (like MP3), you will need `ffmpeg` and `simpleaudio`.

<div style="background: #e0f2fe; border-left: 16px solid #0284c7; padding: 14px 18px; border-radius: 8px; font-size: 18px; color: #075985;"> ðŸ’¡ <b>Tip:</b> PyDub relies on <b>ffmpeg</b> to handle various audio formats (mp3, flac, etc.). Ensure ffmpeg is installed on your system and added to your PATH.</div>

### Installation Commands



In [None]:
# Install PyDub via pip
!pip install pydub

# Install simpleaudio for playback capabilities
!pip install simpleaudio



---

<a id="section-3"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 3. PyDub's Main Class: AudioSegment</span><br>

The core of PyDub is the `AudioSegment` class. It acts as a container for your audio data. You can load audio files into an `AudioSegment` object to begin manipulating them.

### Loading an Audio File



In [None]:
from pydub import AudioSegment

# Import an audio file explicitly specifying the format
# Note: Replace 'wav_file.wav' with your actual file path
wav_file = AudioSegment.from_file(file="wav_file.wav", format="wav")

# You can also omit the format if the file extension is clear
wav_file = AudioSegment.from_file(file="wav_file.wav")

# Check the type of the object
print(type(wav_file))



**Expected Output:**


```text
<class 'pydub.audio_segment.AudioSegment'>
```



---

<a id="section-4"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 4. Playing an Audio File</span><br>

To verify that your audio has loaded correctly, you can play it back directly within your script using the `play` function. This requires the `simpleaudio` package installed earlier.



In [None]:
from pydub import AudioSegment
from pydub.playback import play

# Load the audio file
wav_file = AudioSegment.from_file(file="wav_file.wav")

# Play the audio file
play(wav_file)



---

<a id="section-5"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 5. Inspecting Audio Parameters</span><br>

Audio files have several key properties that determine their quality and structure. PyDub allows you to easily inspect these attributes.

### Key Attributes:
1.  **Channels**: 1 for Mono, 2 for Stereo.
2.  **Frame Rate (Sample Rate)**: How many samples per second (e.g., 44100 Hz).
3.  **Sample Width**: Number of bytes per sample (e.g., 2 bytes = 16-bit).
4.  **Max Amplitude**: The loudness peak of the audio.
5.  **Duration**: Length of the audio in milliseconds.



In [None]:
# Load two different audio files for comparison
wav_file = AudioSegment.from_file(file="wav_file.wav")
two_speakers = AudioSegment.from_file(file="two_speakers.wav")

# 1. Check number of channels
print(f"Channels (wav_file): {wav_file.channels}")
print(f"Channels (two_speakers): {two_speakers.channels}")

# 2. Check frame rate (Sample rate)
print(f"Frame Rate: {wav_file.frame_rate} Hz")

# 3. Find number of bytes per sample
print(f"Sample Width: {wav_file.sample_width} bytes")

# 4. Find the max amplitude
print(f"Max Amplitude: {wav_file.max}")

# 5. Duration of audio file in milliseconds
print(f"Duration: {len(wav_file)} ms")



---

<a id="section-6"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 6. Changing Audio Parameters</span><br>

You can modify the attributes of an `AudioSegment` using `set_` methods. This is useful for standardizing audio files before processing them in a machine learning pipeline.

### Modifying Sample Width, Frame Rate, and Channels



In [None]:
# Generic syntax: changed_audio = audio_segment.set_ATTRIBUTENAME(x)

# 1. Change sample width to 1 byte (8-bit audio)
wav_file_width_1 = wav_file.set_sample_width(1)
print(f"New Sample Width: {wav_file_width_1.sample_width}")

# 2. Change sample rate to 16000 Hz (Common for speech recognition)
wav_file_16k = wav_file.set_frame_rate(16000)
print(f"New Frame Rate: {wav_file_16k.frame_rate}")

# 3. Change number of channels to 1 (Mono)
wav_file_1_channel = wav_file.set_channels(1)
print(f"New Channel Count: {wav_file_1_channel.channels}")



---

<a id="section-7"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 7. Manipulating Audio: Volume Control</span><br>

PyDub makes changing volume intuitive by using standard math operators. You can add or subtract decibels (dB) directly from the `AudioSegment` object.

### Decreasing Volume
If an audio file is too loud or you want to create a "whisper" effect, subtract dB.



In [None]:
# Import audio file
wav_file = AudioSegment.from_file("wav_file.wav")

# Decrease volume by 60 dB (Turning it down)
quiet_wav_file = wav_file - 60

# Note: If audio is too quiet, speech recognition APIs might fail
# Example pseudo-code result:
# recognizer.recognize_google(quiet_wav_file) -> UnknownValueError



### Increasing Volume
If an audio file is too quiet, add dB.



In [None]:
# Increase the volume by 10 dB
louder_wav_file = wav_file + 10

# Louder audio is often easier for APIs to transcribe
# Example pseudo-code result:
# recognizer.recognize_google(louder_wav_file) -> "this is a wav file"



---

<a id="section-8"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 8. Normalization: Making Audio Sound Consistent</span><br>

When dealing with multiple audio files, volume levels can vary wildly. Normalization ensures that the audio is adjusted to a consistent peak volume.



In [None]:
from pydub import AudioSegment
from pydub.effects import normalize
from pydub.playback import play

# Import an audio file with uneven sound levels
loud_quiet = AudioSegment.from_file("loud_quiet.wav")

# Normalize the sound levels (boosts quiet parts, caps loud parts)
normalized_loud_quiet = normalize(loud_quiet)

# Play to check the difference
play(normalized_loud_quiet)



---

<a id="section-9"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 9. Remixing: Slicing and Combining</span><br>

PyDub allows you to slice audio using Python list slicing syntax (using milliseconds) and combine audio using the addition operator.

### Slicing (Removing Static)
If a file has static in the first 5 seconds (5000 ms), you can slice it out.



In [None]:
# Import audio with static at the start
static_at_start = AudioSegment.from_file("static_at_start.wav")

# Remove the static via slicing (Skip the first 5000ms)
no_static_at_start = static_at_start[5000:]

# Check the new sound
play(no_static_at_start)



### Combining Audio Files
You can concatenate two audio files simply by adding them.



In [None]:
# Import two audio files
wav_file_1 = AudioSegment.from_file("wav_file_1.wav")
wav_file_2 = AudioSegment.from_file("wav_file_2.wav")

# Combine the two audio files
wav_file_3 = wav_file_1 + wav_file_2

# Play the combined file
play(wav_file_3)

# Advanced: Combine and adjust volume simultaneously
# Combine files and make the result 10dB louder
louder_wav_file_3 = wav_file_1 + wav_file_2 + 10



---

<a id="section-10"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 10. Splitting Stereo to Mono</span><br>

Stereo files contain two channels (Left and Right). Sometimes you need to isolate one channel for processing.



In [None]:
# Import phone call audio (Stereo)
phone_call = AudioSegment.from_file("phone_call.wav")

# Check channels
print(f"Original Channels: {phone_call.channels}")

# Split stereo to mono
# This returns a list of AudioSegments: [Left_Channel, Right_Channel]
phone_call_channels = phone_call.split_to_mono()

# Verify the split
print(f"Split Result: {phone_call_channels}")

# Access the first channel (Left)
phone_call_channel_1 = phone_call_channels[0]
print(f"Channel 1 Channels: {phone_call_channel_1.channels}")

# Now you can process just this channel
# recognizer.recognize_google(phone_call_channel_1)



---

<a id="section-11"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 11. Exporting Audio Files</span><br>

After manipulating audio, you need to save the result. The `.export()` method handles this.



In [None]:
from pydub import AudioSegment

# Import audio file
wav_file = AudioSegment.from_file("wav_file.wav")

# Increase by 10 decibels
louder_wav_file = wav_file + 10

# Export louder audio file
# out_f: The output filename
# format: The file extension/format (wav, mp3, flac)
louder_wav_file.export(out_f="louder_wav_file.wav", format="wav")



---

<a id="section-12"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 12. Bulk Processing: Reformatting Multiple Files</span><br>

A common task is converting a dataset of mixed audio formats (mp3, flac) into a standard format (wav) for machine learning.



In [None]:
import os
from pydub import AudioSegment

def make_wav(wrong_folder_path, right_folder_path):
    """
    Converts all .mp3 and .flac files in a folder to .wav format
    and saves them to a new directory.
    """
    # Loop through files in the source directory
    for file in os.scandir(wrong_folder_path):
        
        # Only work with files with audio extensions we're fixing
        if file.path.endswith(".mp3") or file.path.endswith(".flac"):
            
            # Create the new .wav filename
            # os.path.splitext removes the old extension
            # os.path.basename gets the filename from the path
            filename_no_ext = os.path.splitext(os.path.basename(file.path))[0]
            out_file = right_folder_path + filename_no_ext + ".wav"
            
            # Read in the audio file and export it in wav format
            AudioSegment.from_file(file.path).export(out_file, format="wav")
            
            print(f"Creating {out_file}")

# Example Usage:
# make_wav("data/wrong_formats/", "data/right_format/")



---

<a id="section-13"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 13. Advanced Bulk Manipulation Pipeline</span><br>

We can combine multiple PyDub features into a single function to clean up a dataset. The following function iterates through a folder, removes the first 3 seconds (static), increases the volume, and exports the result.



In [None]:
import os
from pydub import AudioSegment

def make_no_static_louder(static_quiet_folder_path, louder_no_static_folder_path):
    """
    Loops through a folder, removes the first 3.1 seconds, 
    increases volume by 10dB, and exports.
    """
    # Loop through files
    for file in os.scandir(static_quiet_folder_path):
        
        # Create new file path
        filename_no_ext = os.path.splitext(os.path.basename(file.path))[0]
        out_file = louder_no_static_folder_path + filename_no_ext + ".wav"
        
        # Read the audio file
        audio_file = AudioSegment.from_file(file.path)
        
        # Remove first 3100ms (3.1 seconds) and add 10 decibels
        processed_audio = audio_file[3100:] + 10
        
        # Export
        processed_audio.export(out_file, format="wav")
        
        print(f"Creating {out_file}")

# Example Usage:
# make_no_static_louder("data/static_quiet/", "data/louder_no_static/")



**Expected Output:**


```text
Creating data/louder_no_static/speech-recognition-services.wav
Creating data/louder_no_static/order-issue.wav
Creating data/louder_no_static/help-with-acount.wav
```



---

<a id="section-14"></a>
<br><span style="  display: inline-block;  color: #fff;  background: linear-gradient(135deg, #a31616ff, #02b7ffff);  padding: 12px 20px;  border-radius: 12px;  font-size: 28px;  font-weight: 700;  box-shadow: 0 4px 12px rgba(0,0,0,0.2);  transition: transform 0.2s ease, box-shadow 0.2s ease;">  ðŸ§¾ 14. Conclusion</span><br>

In this notebook, we explored **PyDub**, a versatile library for Spoken Language Processing in Python. 

**Key Takeaways:**
1.  **Ease of Use**: PyDub abstracts complex audio processing (like ffmpeg) into simple Python objects (`AudioSegment`).
2.  **Inspection**: We can easily read properties like channels, frame rate, and duration.
3.  **Manipulation**: We can slice audio using list syntax, change volume with math operators, and normalize audio levels.
4.  **Automation**: PyDub integrates seamlessly with Python loops to process thousands of files, converting formats and cleaning audio for downstream tasks like Machine Learning.

**Next Steps:**
*   Try recording your own voice and using PyDub to clean it.
*   Integrate PyDub with a Speech-to-Text library (like `SpeechRecognition`) to build a full transcription pipeline.
*   Experiment with other PyDub effects like `fade_in` and `fade_out`.
