# Spoken Language Processing in Python
>  Not all audio files come in the same shape, size or format. Luckily, the PyDub library by James Robert provides tools which you can use to programmatically alter and change different audio file attributes such as frame rate, number of channels, file format and more. In this chapter, you'll learn how to use this helpful library to ensure all of your audio files are in the right shape for transcription.

- toc: true 
- badges: true
- comments: true
- author: Lucas Nunes
- categories: [Datacamp]
- image: images/datacamp/___

> Note: This is a summary of the course's chapter 3 exercises "Spoken Language Processing in Python" at datacamp. <br>[Github repo](https://github.com/lnunesAI/Datacamp/) / [Course link](https://www.datacamp.com/tracks/machine-learning-scientist-with-python)

In [6]:
%%capture
!pip install pydub

## Introduction to PyDub

### Import an audio file with PyDub

<div class=""><p><code>PyDub</code>'s <code>AudioSegment</code> class makes it easy to import and manipulate audio files with Python.</p>
<p>In this exercise, we'll import an audio file of interest by creating an instance of <code>AudioSegment</code>.</p>
<p>To import an audio file, you can use the <code>from_file()</code> function on <code>AudioSegment</code> and pass it your target audio file's pathname as a string. The <code>format</code> parameter gives you an option to specify the format of your audio file, however, this is optional as <code>PyDub</code> will automatically infer it.</p>
<p><code>PyDub</code> works with <code>.wav</code> files without any extra dependencies but for other file types like <code>.mp3</code>, you'll need to install <a href="http://www.ffmpeg.org/" target="_blank" rel="noopener noreferrer">ffmpeg</a>.</p>
<p>A sample audio file has been setup as <code>wav_file.wav</code>, you can listen to it <a href="https://assets.datacamp.com/production/repositories/4637/datasets/6238f8088db33efb5d103dfac1e42eb9fe3e6f2b/wav_file.wav" target="_blank" rel="noopener noreferrer">here</a>.</p></div>

In [1]:
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/wav_file.wav

Instructions
<ul>
<li>Import <code>AudioSegment</code> from <code>pydub</code>.</li>
<li>Call the <code>from_file</code> method and pass it the audio file pathname.</li>
</ul>

In [2]:
# Import AudioSegment from Pydub
from pydub import AudioSegment

# Create an AudioSegment instance
wav_file = AudioSegment.from_file(file="wav_file.wav", format="wav")
# Check the type
print(type(wav_file))

<class 'pydub.audio_segment.AudioSegment'>


In [3]:
wav_file

**You've just imported your first audio file using PyDub. Over the next few lessons, you'll start to see how many helpful functions PyDub has built-in for working with audio.**

### Play an audio file with PyDub

<div class=""><p>If you're working with audio files, chances are you want to listen to them.</p>
<p><code>PyDub</code>'s <code>playback</code> module provides a function called <code>play()</code> which can be passed an AudioSegment. Running the <code>play()</code> function with an AudioSegment passed in will play the AudioSegment out loud.</p>
<p>This can be helpful to check the quality of your audio files and assess any changes you need to make.</p>
<p>In this exercise you'll see how simple it is to use the <code>play()</code> function.</p>
<p>Remember: to use the <code>play()</code> function, you'll need <code>simpleaudio</code> or <code>pyaudio</code> installed for <code>.wav</code> files and <code>ffmpeg</code> for other kinds of files.</p></div>

Instructions
<ul>
<li>Import <code>play</code> from the <code>pydub.playback</code> module.</li>
<li>Call <code>play()</code> whilst passing it the <code>wav_file</code> AudioSegment.</li>
</ul>

In [None]:
# Import AudioSegment and play
from pydub import AudioSegment
from pydub.playback import play
# Create an AudioSegment instance
wav_file = AudioSegment.from_file(file="wav_file.wav", format="wav")
# Play the audio file
play(wav_file)

### Audio parameters with PyDub

<div class=""><p>Every audio file you work with will have a number of characteristics associated with them, such as, channels, frame rate (or sample rate), sample width and more.</p>
<p>Knowing these parameters is useful to ensure your audio files are compatible with various API requirements for speech transcription.</p>
<p>For example, many APIs recommend a minimum frame rate (<code>wav_file.frame_rate</code>) of 16,000 Hz.</p>
<p>When you create an instance of <code>AudioSegment</code>, <code>PyDub</code> automatically infers these parameters from your audio files and saves them as attributes.</p>
<p>In this exercise, we'll explore these attributes.</p></div>

Instructions 1/4
<li>Find the <code>frame_rate</code> of <code>wav_file</code>.</li>

In [None]:
# Import audio file
wav_file = AudioSegment.from_file(file="wav_file.wav")

# Find the frame rate
print(wav_file.frame_rate)

48000


Instructions 2/4
<li>Find the number of channels of <code>wav_file</code>.</li>

In [None]:
# Find the number of channels
print(wav_file.channels)

2


Instructions 3/4
<li>Find the max amplitude of <code>wav_file</code>.</li>

In [None]:
# Find the max amplitude
print(wav_file.max)

8484


Instructions 4/4
<li>Find the length of <code>wav_file</code> in milliseconds.</li>

In [None]:
# Find the length
print(len(wav_file))

3284


**There are many more characteristics you can find out about your audio files once you've imported them as an AudioSegment. Try find some more by adding a dot after your audio file (wav_file.) and pressing tab.**

### Adjusting audio parameters

<div class=""><p>During your exploratory data analysis, you may find some of the parameters of your audio files differ or are incompatible with speech recognition APIs.</p>
<p>Don't worry, <code>PyDub</code> has built-in functionality which allows you to change various attributes.</p>
<p>For example, you can set the frame rate of your audio file calling <code>set_frame_rate()</code> on your <code>AudioSegment</code> instance and passing it an integer of the desired frame rate measured in Hertz.</p>
<p>In this exercise, we'll practice altering some audio attributes.</p></div>

Instructions 1/3
<li>Create a new <code>wav_file</code> with a frame rate of 16,000 Hz and then check its frame rate.</li>

In [8]:
# Import audio file
wav_file = AudioSegment.from_file(file="wav_file.wav")

# Create a new wav file with adjusted frame rate
wav_file_16k = wav_file.set_frame_rate(16000)

# Check the frame rate of the new wav file
print(wav_file_16k.frame_rate)

16000


In [9]:
wav_file_16k

Instructions 2/3
<li>Set the <code>wav_file</code> number of channels to 1 and then check the number of channels.</li>

In [6]:
# Set number of channels to 1
wav_file_1_ch = wav_file.set_channels(1)

# Check the number of channels
print(wav_file_1_ch.channels)

1


In [7]:
wav_file_1_ch

Instructions 3/3
<li>Print the sample width of <code>wav_file</code> and then set it to 1 and print it again.</li>

In [4]:
# Print sample_width
print(f"Old sample width: {wav_file.sample_width}")

# Set sample_width to 1
wav_file_sw_1 = wav_file.set_sample_width(1)

# Check new sample_width
print(f"New sample width: {wav_file_sw_1.sample_width}")

Old sample width: 2
New sample width: 1


In [5]:
wav_file_sw_1

**Once again, there are other methods you can call on your AudioSegment instances to adjust their attributes as further practice, you should try and find some more. But remember, lowering the values generally leads to lower audio qaulity and worse transcriptions but increasing them may increase the file size and but not the quality of the transcription. Best to explore with different values and find out the ideal tradeoff.**

## Manipulating audio files with PyDub

### Turning it down... then up

<div class=""><p>Speech recognition works best on clean, audible speech. If your audio files are too quiet or too loud, it can hinder transcription.</p>
<p>In this exercise, you'll see how to make an <code>AudioSegment</code> quieter or louder.</p>
<p>Since the <code>play()</code> function won't play your changes in the DataCamp classroom.</p>
<p>The baseline audio file, <code>volume_adjusted.wav</code> can be heard <a href="https://assets.datacamp.com/production/repositories/4637/datasets/520b312f96433535f93656d9e6d61fdb10f5c517/volume_adjusted.wav" target="_blank" rel="noopener noreferrer">here</a>.</p></div>

In [10]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/volume_adjusted.wav

Instructions 1/2
<li>Import <code>volume_adjusted.wav</code> and lower its volume by 60 dB and save it to a new variable <code>quiet_volume_adjusted</code>.</li>

In [11]:
from pydub import AudioSegment

# Import audio file
volume_adjusted = AudioSegment.from_file("volume_adjusted.wav")

# Lower the volume by 60 dB
quiet_volume_adjusted = volume_adjusted - 60

In [12]:
volume_adjusted

In [14]:
quiet_volume_adjusted

Instructions 2/2
<li>Import the target audio file, increase its volume by 15 dB and save it to the variable <code>louder_volume_adjusted</code>.</li>

In [15]:
# Increase the volume by 15 dB
louder_volume_adjusted = volume_adjusted + 15

In [16]:
louder_volume_adjusted

**Here's the louder audio file you created and the quieter one (no sound). Nice work! Adjusting the volume with operators can be useful but doesn't help when you only want to increase the loudness of only quiet sections.**

### Normalizing an audio file with PyDub

<div class=""><p>Sometimes you'll have audio files where the speech is loud in some portions and quiet in others. Having this variance in volume can hinder transcription.</p>
<p>Luckily, <code>PyDub</code>'s effects module has a function called <code>normalize()</code> which finds the maximum volume of an <code>AudioSegment</code>, then adjusts the rest of the <code>AudioSegment</code> to be in proportion. This means the quiet parts will get a volume boost.</p>
<p>You can listen to an example of an audio file which starts as loud then goes quiet, <code>loud_then_quiet.wav</code>, <a href="https://assets.datacamp.com/production/repositories/4637/datasets/9251c751d3efccf781f3e189d68b37c8d22be9ca/ex3_datacamp_loud_then_quiet.wav" target="_blank" rel="noopener noreferrer">here</a>.</p>
<p>In this exercise, you'll use <code>normalize()</code> to normalize the volume of our file, making it sound <a href="https://assets.datacamp.com/production/repositories/4637/datasets/f0c1ba35ff99f07df8cfeee810c7b12118d9cd0f/ex3_datamcamp_normalized_loud_quiet.wav" target="_blank" rel="noopener noreferrer">more like this</a>.</p></div>

In [17]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex3_datacamp_loud_then_quiet.wav

Instructions
<ul>
<li>Import <code>AudioSegment</code> from <code>PyDub</code> and <code>normalize</code> from the <code>PyDub</code>'s effects module.</li>
<li>Import the target audio file, <code>loud_then_quiet.wav</code> and save it to <code>loud_then_quiet</code>.</li>
<li>Normalize the imported audio file using the <code>normalize()</code> function and save it to <code>normalized_loud_then_quiet</code>.</li>
</ul>

In [18]:
# Import AudioSegment and normalize
from pydub import AudioSegment
from pydub.effects import normalize

# Import target audio file
loud_then_quiet = AudioSegment.from_file("ex3_datacamp_loud_then_quiet.wav") #loud_then_quiet.wav

# Normalize target audio file
normalized_loud_then_quiet = normalize(loud_then_quiet)

In [None]:
loud_then_quiet

In [None]:
normalized_loud_then_quiet

**Remember, speech recognition works best on clear speech files, so the more you can do to improve the quality of your audio files, including their volume, the better.**

### Chopping and changing audio files

<div class=""><p>Some of your audio files may have sections of redundancy. For example, you might find at the beginning of each file, there's a few seconds of static.</p>
<p>Instead of wasting compute trying to transcribe static, you can remove it. </p>
<p>Since an <code>AudioSegment</code> is iterable, and measured in milliseconds, you can use slicing to alter the length.</p>
<p>To get the first 3-seconds of <code>wav_file</code>, you'd use <code>wav_file[:3000]</code>.</p>
<p>You can also add two <code>AudioSegment</code>'s together using the addition operator. This is helpful if you need to combine several audio files.</p>
<p>To practice both of these, we're going to remove the first four seconds of <a href="https://assets.datacamp.com/production/repositories/4637/datasets/6ef2e43497070fd23c6ce4c0fe1d9d0e46469750/ex3_slicing_part_1.wav" target="_blank" rel="noopener noreferrer">part<em>1.wav</em></a>, and add the remainder to <a href="https://assets.datacamp.com/production/repositories/4637/datasets/3b47eb5ca2c696e816af04053150d96fd95b4c7f/ex3_slicing_part_2.wav" target="_blank" rel="noopener noreferrer">part2.wav</a>. Leaving the end result sounding like <a href="https://assets.datacamp.com/production/repositories/4637/datasets/3803042506ed07d707fe264d0bc6ec6ffa891e63/ex3_slicing_part_3.wav" target="_blank" rel="noopener noreferrer">part_3.wav</a>.</p></div>

In [19]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex3_slicing_part_1.wav
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex3_slicing_part_2.wav
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex3_slicing_part_3.wav

Instructions
<ul>
<li>Import <code>part_1.wav</code> and <code>part_2.wav</code> and save them to <code>part_1</code> and <code>part_2</code> respectively.</li>
<li>Remove the first 4-seconds of <code>part_1</code> using slicing and save the new audio to <code>part_1_removed</code>.</li>
<li>Add <code>part_1_removed</code> to <code>part_2</code> and save it to <code>part_3</code>.</li>
</ul>

In [20]:
from pydub import AudioSegment

# Import part 1 and part 2 audio files
part_1 = AudioSegment.from_file("ex3_slicing_part_1.wav") #part_1.wav
part_2 = AudioSegment.from_file("ex3_slicing_part_2.wav") #part_2.wav

# Remove the first four seconds of part 1
part_1_removed = part_1[4000:]

# Add the remainder of part 1 and part 2 together
part_3 = part_1_removed + part_2

In [21]:
part_1

In [22]:
part_1_removed

In [23]:
part_2

In [24]:
part_3

**You're becoming an audio manipulation master! But we're not done yet, there's still a few more tricks in the PyDub library you should know about.**

### Splitting stereo audio to mono with PyDub

<div class=""><p>If you're trying to transcribe phone calls, there's a chance they've been recorded in stereo format, with one speaker on each channel.</p>
<p>As you've seen, it's hard to transcribe an audio file with more than one speaker. One solution is to split the audio file with multiple speakers into single files with individual speakers.</p>
<p><code>PyDub</code>'s <code>split_to_mono()</code> function can help with this. When called on an <code>AudioSegment</code> recorded in stereo, it returns a list of two separate <code>AudioSegment</code>'s in mono format, one for each channel.</p>
<p>In this exercise, you'll practice this by splitting this <a href="https://assets.datacamp.com/production/repositories/4637/datasets/810bb65e2e681e086e90bc2c6c2372bc4bd2cb52/ex3_stereo_call.wav" target="_blank" rel="noopener noreferrer">stereo phone call</a> (<code>stereo_phone_call.wav</code>) recording into <a href="https://assets.datacamp.com/production/repositories/4637/datasets/0aa876f5cb924035481d7b786a3701624e86d1e7/ex3_stereo_call_channel_1.wav" target="_blank" rel="noopener noreferrer">channel 1</a> and <a href="https://assets.datacamp.com/production/repositories/4637/datasets/2a16db969efc35186fe25ca45a4dbd506318a1cd/ex3_stereo_call_channel_2.wav" target="_blank" rel="noopener noreferrer">channel 2</a>. This separates the two speakers, allowing for easier transcription.</p></div>

In [25]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex3_stereo_call.wav

Instructions
<ul>
<li>Import <code>AudioSegment</code> from <code>pydub</code>.</li>
<li>Create an <code>AudioSegment</code> instance <code>stereo_phone_call</code> with <code>stereo_phone_call.wav</code>.</li>
<li>Split <code>stereo_phone_call</code> into <code>channels</code> using <code>split_to_mono()</code> and check the channels of the resulting output.</li>
<li>Save each channel to new variables, <code>phone_call_channel_1</code> and <code>phone_call_channel_2</code>.</li>
</ul>

In [27]:
# Import AudioSegment
from pydub import AudioSegment

# Import stereo audio file and check channels
stereo_phone_call = AudioSegment.from_file("ex3_stereo_call.wav") #stereo_phone_call.wav
print(f"Stereo number channels: {stereo_phone_call.channels}")

# Split stereo phone call and check channels
channels = stereo_phone_call.split_to_mono()
print(f"Split number channels: {channels[0].channels}, {channels[1].channels}")

# Save new channels separately
phone_call_channel_1 = channels[0]
phone_call_channel_2 = channels[1]

Stereo number channels: 2
Split number channels: 1, 1


In [28]:
stereo_phone_call

In [31]:
phone_call_channel_1

In [33]:
phone_call_channel_2

**Having audio files with only one speaker usually results in better quality transcriptions. Now you've done all this audio processing, how do save your altered audio files to use later?**

### Exporting and reformatting audio files

<div class=""><p>If you've made some changes to your audio files, or if they've got the wrong file extension, you can use <code>PyDub</code> to export and save them as new audio files.</p>
<p>You can do this by using the <code>.export()</code> function on any instance of an <code>AudioSegment</code> you've created. The <code>export()</code> function takes two parameters, <code>out_f</code>, or the destination file path of your audio file and <code>format</code>, the format you'd like your new audio file to be. Both of these are strings. <code>format</code> is <code>"mp3"</code> by default so be sure to change it if you need. </p>
<p>In this exercise, you'll import <a href="https://assets.datacamp.com/production/repositories/4637/datasets/b035eadbae1544450868436a7179fa70158eb5de/mp3_file.mp3" target="_blank" rel="noopener noreferrer">this</a> <code>.mp3</code> file (<code>mp3_file.mp3</code>) and then export it with the <code>.wav</code> extension using <code>.export()</code>.</p>
<p>Remember, to work with files other than <code>.wav</code>, you'll need <a href="" target="_blank" rel="noopener noreferrer">ffmpeg</a>.</p></div>

In [35]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/mp3_file.mp3

Instructions
<ul>
<li>Import <code>mp3_file.mp3</code> and save it to <code>mp3_file</code>.</li>
<li>Export <code>mp3_file</code> with the file name <code>mp3_file.wav</code> with <code>"wav"</code> <code>format</code>.</li>
</ul>

In [36]:
from pydub import AudioSegment

# Import the .mp3 file
mp3_file = AudioSegment.from_file("mp3_file.mp3")

# Export the .mp3 file as wav
mp3_file.export(out_f='mp3_file.wav',
                format="wav")

<_io.BufferedRandom name='mp3_file.wav'>

In [37]:
mp3_file

**Now our .mp3 file is in the .wav format, it'll definitely be compatible with all kinds of speech transcription APIs. Let's see this at scale.**

### Manipulating multiple audio files with PyDub

<div class=""><p>You've seen how to convert a single file using <code>PyDub</code> but what if you had a folder with multiple different file types?</p>
<p>For this exercise, we've setup a <code>folder</code> which has <code>.mp3</code>, <code>.m4a</code> and <code>.aac</code> versions of the <code>good-afternoon</code> audio file.</p>
<p>We'll use <code>PyDub</code> to open each of the files and export them as <code>.wav</code> format so they're compatible with speech recognition APIs.</p></div>

In [1]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/folder.zip
!unzip folder.zip

In [20]:
import os
folder = ['folder/' + f for f in ['good_afternoon_mp3.mp3', 'good_afternoon_m4a.m4a', 'good_afternoon_aac.aac']]

Instructions
<ul>
<li>Pass <code>audio_file</code> to the <code>from_file()</code> function.</li>
<li>Use <code>export()</code> to export <code>wav_filename</code> with the format <code>".wav"</code>.</li>
</ul>

In [22]:
# Loop through the files in the folder
for audio_file in folder:
    
	# Create the new .wav filename
    wav_filename = os.path.splitext(os.path.basename(audio_file))[0] + ".wav"
        
    # Read audio_file and export it in wav format
    AudioSegment.from_file(audio_file).export(out_f=wav_filename, format="wav")
        
    print(f"Creating {wav_filename}...")

Creating good_afternoon_mp3.wav...
Creating good_afternoon_m4a.wav...
Creating good_afternoon_aac.wav...


**You've successfully converted the folder of audio files from being non-compatiable with speech_recognition to being compatible!**

### An audio processing workflow

<div class=""><p>You've seen how to import and manipulate a single audio file using <code>PyDub</code>. But what if you had a folder with multiple audio files you needed to convert?</p>
<p>In this exercise we'll use <code>PyDub</code> to format a folder of files to be ready to use with <code>speech_recognition</code>.</p>
<p>You've found your customer call files all have 3-seconds of static at the start and are quieter than they could be.</p>
<p>To fix this, we'll use <code>PyDub</code> to cut the static, increase the sound level and convert them to the <code>.wav</code> extension.</p>
<p>You can listen to an unformatted example <a href="https://assets.datacamp.com/production/repositories/4637/datasets/c53557fea60087064aec7e9d99e889b9be79e75a/ex3-static-help-with-account.mp3" target="_blank" rel="noopener noreferrer">here</a>.</p></div>

In [23]:
%%capture
!wget https://github.com/lnunesAI/Datacamp/raw/main/3-skill-tracks/spoken-language-processing-in-python/data/ex3-static-help-with-account.mp3

Instructions 1/3
<p>Let's start with one file. Import <code>account_help.mp3</code> and cut off the first 3-seconds (3000 milliseconds) of static.</p>

In [33]:
file_with_static = AudioSegment.from_file("ex3-static-help-with-account.mp3") #account_help.mp3

# Cut the first 3-seconds of static off
file_without_static = file_with_static[3100:]

In [25]:
file_with_static

In [28]:
file_without_static

Instructions 2/3
<p>Now the static has been removed, increase the volume by 10dB.</p>

In [34]:
# Increase the volume by 10dB
louder_file_without_static = file_without_static + 10

In [35]:
louder_file_without_static

Instructions 3/3
<p>Now for multiple files. Use <code>from_file()</code> to import each <code>audio_file</code> and export the louder files without static with the <code>"wav"</code> format.</p>

In [36]:
for audio_file in folder:
    file_with_static = AudioSegment.from_file(audio_file)

    # Cut the 3-seconds of static off
    file_without_static = file_with_static[3000:]

    # Increase the volume by 10dB
    louder_file_without_static = file_without_static + 10
    
    # Create the .wav filename for export
    wav_filename = os.path.splitext(os.path.basename(audio_file))[0] + ".wav"
    
    # Export the louder file without static as .wav
    louder_file_without_static.export(wav_filename, format="wav")
    print(f"Creating {wav_filename}...")

Creating good_afternoon_mp3.wav...
Creating good_afternoon_m4a.wav...
Creating good_afternoon_aac.wav...


**You've successfully processed and converted the folder of audio files from being non-compatiable with speech_recognition to being compatible! Here's what your files sound like without static, and here's without the static and 10 decibels louder.**