In [1]:
%matplotlib inline

# Importing the requires libraries
import ffmpeg
from pydub import AudioSegment
from pydub.effects import normalize
from pydub.playback import play

import speech_recognition as sr

import os

from pprint import pprint

# Own libraries
from Transcribe_speech_to_text import get_transcription

# 03. Manipulating Audio Files with PyDub

Not all audio files come in the same shape, size or format. Luckily, the PyDub library by James Robert provides tools which you can use to programmatically alter and change different audio file attributes such as frame rate, number of channels, file format and more. In this chapter, you'll learn how to use this helpful library to ensure all of your audio files are in the right shape for transcription.

## 03.01 Introduction to PyDub

See the video.

**Examples from the video - PyDub's main class**

In [2]:
# Import an audio file
wav_file = AudioSegment.from_file(file="good_morning.wav", format="wav")

type(wav_file)

pydub.audio_segment.AudioSegment

In [3]:
# Play audio file
play(wav_file)

In [4]:
# show you the number of channels, 1 for mono, 2 for stereo audio
wav_file.channels

2

In [5]:
# Getting the sample of our AudioSegment in Hertz
wav_file.frame_rate

48000

In [6]:
# Find the number of bytes per sample, 1 means 8-bit, 2 means 16-bit
wav_file.sample_width

2

In [7]:
# Find the max amplitude of your audio file, which can be considered loudness and is useful for normalizing sound levels
wav_file.max

14257

In [8]:
# Duration of audio file in milliseconds
print(len(wav_file), 'ms')
print(len(wav_file)*0.001, 's')

2604 ms
2.604 s


**Changing audio parameters**

In [9]:
print(wav_file.sample_width)

# Change sample width to 1
wav_file_width_1 = wav_file.set_sample_width(1)
print(wav_file_width_1.sample_width)

2
1


In [10]:
print(wav_file.frame_rate)

# Change sample rate
wav_file_16k = wav_file.set_frame_rate(16000)
print(wav_file_16k.frame_rate)

48000
16000


In [11]:
print(wav_file.channels)

# Change number of channels
wav_file_1_channel = wav_file.set_channels(1)
print(wav_file_1_channel.channels)

2
1


> A rule of thumb is the higher the values, excluding channels, the better. You should aim for a minimum of 16,000 Hertz as the frame rate and to have your audio files in wav format.

## 03.02 Import an audio file with PyDub

<code>PyDub</code>'s <code>AudioSegment</code> class makes it easy to import and manipulate audio files with Python.

In this exercise, we'll import an audio file of interest by creating an instance of <code>AudioSegment</code>.

To import an audio file, you can use the <code>from_file()</code> function on <code>AudioSegment</code> and pass it your target audio file's pathname as a string. The <code>format</code> parameter gives you an option to specify the format of your audio file, however, this is optional as <code>PyDub</code> will automatically infer it.

<code>PyDub</code> works with <code>.wav</code> files without any extra dependencies but for other file types like <code>.mp3</code>, you'll need to install ffmpeg (http://www.ffmpeg.org/).

A sample audio file has been setup as __wav_file.wav__, you can listen to it here: https://assets.datacamp.com/production/repositories/4637/datasets/6238f8088db33efb5d103dfac1e42eb9fe3e6f2b/wav_file.wav.

**Instructions**<br>
1. Import AudioSegment from pydub.
2. Call the from_file method and pass it the audio file pathname.

**Results**<br>
<font color=darkgreen>Woohoo! You've just imported your first audio file using PyDub. Over the next few lessons, you'll start to see how many helpful functions PyDub has built-in for working with audio.</font>

In [12]:
# Create an AudioSegment instance
wav_file = AudioSegment.from_file(file='wav_file.wav', 
                                  format="wav")

# Check the type
print(type(wav_file))

<class 'pydub.audio_segment.AudioSegment'>


## 03.03 Play an audio file with PyDub

If you're working with audio files, chances are you want to listen to them.

<code>PyDub</code>'s <code>playback</code> module provides a function called <code>play()</code> which can be passed an AudioSegment. Running the <code>play()</code> function with an AudioSegment passed in will play the AudioSegment out loud.

This can be helpful to check the quality of your audio files and assess any changes you need to make.

In this exercise you'll see how simple it is to use the play() function.

Remember: to use the <code>play()</code> function, you'll need <code>simpleaudio</code> or <code>pyaudio</code> installed for <code>.wav</code> files and <code>ffmpeg</code> for other kinds of files.

**Instructions**<br>
1. Import play from the pydub.playback module.
2. Call play() whilst passing it the wav_file AudioSegment.

**Results**<br>
<font color=darkgreen>Wow, did you hear that? Or was your volume turned down? Don't worry, no sound was played because audio functionality isn't yet available at DataCamp. But you can try this code locally and see how play() works!</font>

In [13]:
# Play the audio file
play(wav_file)

## 03.04 Audio parameters with PyDub

Every audio file you work with will have a number of characteristics associated with them, such as, channels, frame rate (or sample rate), sample width and more.

Knowing these parameters is useful to ensure your audio files are compatible with various API requirements for speech transcription.

For example, many APIs recommend a minimum frame rate (<code>wav_file.frame_rate</code>) of 16,000 Hz.

When you create an instance of <code>AudioSegment</code>, <code>PyDub</code> automatically infers these parameters from your audio files and saves them as attributes.

In this exercise, we'll explore these attributes.

**Instructions**<br>
1. Find the frame_rate of wav_file.
2. Find the number of channels of wav_file.
3. Find the max amplitude of wav_file.
4. Find the length of wav_file in milliseconds.

**Results**<br>
<font color=darkgreen>Massive effort! There are many more characteristics you can find out about your audio files once you've imported them as an AudioSegment. Try find some more by adding a dot after your audio file (wav_file.) and pressing tab.</font>

In [14]:
# Find the frame rate
print(wav_file.frame_rate)

# Find the number of channels
print(wav_file.channels)

# Find the max amplitude
print(wav_file.max)

# Find the length
# Duration of audio file in milliseconds
print(len(wav_file), 'ms')
print(len(wav_file)*0.001, 's')

48000
2
8484
3284 ms
3.2840000000000003 s


In [15]:
# Finding properties and methods in the wav_file
dir(wav_file)

['DEFAULT_CODECS',
 '__add__',
 '__class__',
 '__delattr__',
 '__dict__',
 '__dir__',
 '__doc__',
 '__eq__',
 '__format__',
 '__ge__',
 '__getattribute__',
 '__getitem__',
 '__gt__',
 '__hash__',
 '__init__',
 '__init_subclass__',
 '__iter__',
 '__le__',
 '__len__',
 '__lt__',
 '__module__',
 '__mul__',
 '__ne__',
 '__new__',
 '__radd__',
 '__reduce__',
 '__reduce_ex__',
 '__repr__',
 '__setattr__',
 '__sizeof__',
 '__str__',
 '__sub__',
 '__subclasshook__',
 '__weakref__',
 '_data',
 '_from_safe_wav',
 '_parse_position',
 '_repr_html_',
 '_spawn',
 '_sync',
 'append',
 'apply_gain',
 'apply_gain_stereo',
 'apply_mono_filter_to_each_channel',
 'array_type',
 'channels',
 'compress_dynamic_range',
 'converter',
 'dBFS',
 'duration_seconds',
 'empty',
 'export',
 'fade',
 'fade_in',
 'fade_out',
 'ffmpeg',
 'frame_count',
 'frame_rate',
 'frame_width',
 'from_file',
 'from_file_using_temporary_files',
 'from_flv',
 'from_mono_audiosegments',
 'from_mp3',
 'from_ogg',
 'from_raw',
 'from_

## 03.05 Adjusting audio parameters
During your __exploratory data analysis__, you may find some of the parameters of your audio files differ or are incompatible with speech recognition APIs.

Don't worry, <code>PyDub</code> has built-in functionality which allows you to change various attributes.

For example, you can set the frame rate of your audio file calling <code>set_frame_rate()</code> on your <code>AudioSegment</code> instance and passing it an integer of the desired frame rate measured in Hertz.

In this exercise, we'll practice altering some audio attributes.

**Instructions**<br>
1. Create a new wav_file with a frame rate of 16,000 Hz and then check its frame rate.
2. Set the wav_file number of channels to 1 and then check the number of channels.
3. Print the sample width of wav_file and then set it to 1 and print it again.

**Results**<br>
<font color=darkgreen>Nice one! Once again, there are other methods you can call on your AudioSegment instances to adjust their attributes as further practice, you should try and find some more. But remember, lowering the values generally leads to lower audio qaulity and worse transcriptions but increasing them may increase the file size and but not the quality of the transcription. Best to explore with different values and find out the ideal tradeoff.</font>

In [16]:
# Adjusted frame rate
print("Old frame rate: ", wav_file.frame_rate)
wav_file_16k = wav_file.set_frame_rate(16000)
print("New frame rate: ", wav_file_16k.frame_rate)

# Set number of channels to 1
print("Old number of channels: ", wav_file.channels)
wav_file_1_ch = wav_file.set_channels(1)
print("New number of channels: ", wav_file_1_ch.channels)

# Set sample_width to 1
print(f"Old sample width: {wav_file.sample_width}")
wav_file_sw_1 = wav_file.set_sample_width(1)
print(f"New sample width: {wav_file_sw_1.sample_width}")

Old frame rate:  48000
New frame rate:  16000
Old number of channels:  2
New number of channels:  1
Old sample width: 2
New sample width: 1


## 03.06 Manipulating audio files with PyDub

See the video.

**Examples from the video - Turning it down to 11**

In [17]:
# Create an AudioSegment instance
file, lang = 'wav_file.wav', 'en-US'
wav_file = AudioSegment.from_file(file=file, format="wav")
            
# Listening to the unchanged audio file.
play(wav_file)
#_ = wav_file.export(file, format='wav')

# Converting to text
recognizer = sr.Recognizer()

with sr.AudioFile(file) as source:
    # Record the audio
    audio = recognizer.listen(source)
pprint(recognizer.recognize_google(audio, language=lang, show_all=True))

{'alternative': [{'confidence': 0.83799374,
                  'transcript': 'this is a word file'},
                 {'transcript': 'this is a web file'},
                 {'transcript': 'this is why file'},
                 {'transcript': 'this is a wife file'},
                 {'transcript': "this is a wife I'll"}],
 'final': True}


In [18]:
# Using the own created function
pprint(get_transcription(file, lang, show_all=True))

{'alternative': [{'confidence': 0.89323336, 'transcript': 'this is why file'},
                 {'transcript': 'this is a waffle'},
                 {'transcript': 'this is a wife file'},
                 {'transcript': 'this is a word file'},
                 {'transcript': "this is a wife I'll"}],
 'final': True}


In [19]:
# Minus 60 dB
quiet_wav_file = wav_file - 60

# Listening the changes
play(quiet_wav_file)

# Try to recognize quiet audio
file, lang = 'quiet_wav_file.wav', 'en-US'
_ = quiet_wav_file.export(file, format='wav')

pprint(get_transcription(file, lang, show_all=True))

{'alternative': [{'confidence': 0.80881679, 'transcript': 'this is a Wi-Fi'},
                 {'transcript': 'this is why file'},
                 {'transcript': 'this is a web file'},
                 {'transcript': "this is a wife I'll"},
                 {'transcript': "this isn't why file"}],
 'final': True}


**Examples from the video - Increasing the volume**

In [20]:
# Increase the volume by 10 dB
louder_wav_file = wav_file + 10

# Listening the changes
play(louder_wav_file)

# Try to recognize quiet audio
file, lang = 'louder_wav_file.wav', 'en-US'
_ = louder_wav_file.export(file, format='wav')

pprint(get_transcription(file, lang, show_all=True))

{'alternative': [{'confidence': 0.8931073, 'transcript': 'this is why file'},
                 {'transcript': 'this is a waffle'},
                 {'transcript': 'this is a word file'},
                 {'transcript': 'this is a web file'},
                 {'transcript': 'this is a wife file'}],
 'final': True}


In [21]:
# Reading the file
file, lang = 'speaker_0_original.wav', 'en-US'
quite_file = AudioSegment.from_file(file=file, format="wav")
            
# Listening the file
play(quite_file)

pprint(get_transcription(file, lang, noise=0.75, show_all=True))

[]


In [22]:
# Increase the volume by 10 dB
louder_wav_file = quite_file + 10

# Listening the changes
play(louder_wav_file)

# Try to recognize quiet audio
file, lang = 'louder_speaker_0_original.wav', 'en-US'
_ = louder_wav_file.export(file, format='wav')

pprint(get_transcription(file, lang, noise=0.75, show_all=True))

{'alternative': [{'confidence': 0.65736002,
                  'transcript': 'the limitations of the speech recognition '
                                'Lottery'},
                 {'transcript': 'limitations of the speech recognition '
                                'Lottery'},
                 {'transcript': 'are the limitations of the speech recognition '
                                'Lottery'},
                 {'transcript': 'on the limitations of the speech recognition '
                                'Lottery'}],
 'final': True}


**Examples from the video - This all sounds the same**

In [23]:
# Import uneven sound audio file
loud_quiet = AudioSegment.from_file("ex3_datacamp_loud_then_quiet.wav")

# Listening to the original audio file.
play(loud_quiet)

In [24]:
# Normalize the sound levels
normalized_loud_quiet = normalize(loud_quiet)

# Listening the changes
play(normalized_loud_quiet)

**Examples from the video - No static**

In [25]:
# Import audio with static at start
static_at_start = AudioSegment.from_file("static-out-of-warranty.wav")

# Listening the file
play(static_at_start)

In [26]:
# Remove the static via slicing
no_static_at_start = static_at_start[3000:]

# Listening the new file
play(no_static_at_start)

**Examples from the video - Remixing your audio**

In [27]:
# Import two audio files
wav_file_1 = AudioSegment.from_file("speaker_1.wav")
wav_file_2 = AudioSegment.from_file("speaker_2.wav")

# Combine the two audio files
wav_file_3 = wav_file_1 + wav_file_2

# Check the sound
play(wav_file_3)

In [28]:
# Combine two wav files and make the combination louder
louder_wav_file_3 = wav_file_1 + wav_file_2 + 10

# Check the sound
play(wav_file_3)

**Examples from the video - Splitting your audio**

In [29]:
# Import phone call audio
phone_call = AudioSegment.from_file("ex3_stereo_call.wav")

# Find number of channels
phone_call.channels

2

In [30]:
# Split stereo to mono
phone_call_channels = phone_call.split_to_mono()
phone_call_channels

[<pydub.audio_segment.AudioSegment at 0x2572e9b9b20>,
 <pydub.audio_segment.AudioSegment at 0x2572e9b9880>]

In [31]:
# Find number of channels of first list item
print(phone_call_channels[0].channels)

# Listening the file
play(phone_call_channels[0])

# Try to recognize the audio
file, lang = 'ex3_stereo_call_c0.wav', 'en-US'
_ = phone_call_channels[0].export(file, format='wav')

pprint(get_transcription(file, lang))

1
('hello this is Daniel from Acme Studios how can I best help you yeah sure '
 "thing what's your name and what's wrong with the device okay nice to meet "
 "you Josh what's a CR number of your device so I can track it down")


In [32]:
# Find number of channels of first list item
print(phone_call_channels[1].channels)

# Listening the file
play(phone_call_channels[1])

# Try to recognize the audio
file, lang = 'ex3_stereo_call_c1.wav', 'en-US'
_ = phone_call_channels[1].export(file, format='wav')

pprint(get_transcription(file, lang))

1
('I was just wondering if I could get some support my name is Josh and my '
 'device seems to not want to learn some of my my my phone number is 176-4588')


## 03.07 Turning it down... then up

Speech recognition works best on clean, audible speech. If your audio files are too quiet or too loud, it can hinder transcription.

In this exercise, you'll see how to make an <code>AudioSegment</code> quieter or louder.

Since the <code>play()</code> function won't play your changes in the DataCamp classroom.

The baseline audio file, volume_adjusted.wav can be heard here: https://assets.datacamp.com/production/repositories/4637/datasets/520b312f96433535f93656d9e6d61fdb10f5c517/volume_adjusted.wav.

**Instructions**<br>
1. Import volume_adjusted.wav and lower its volume by 60 dB and save it to a new variable quiet_volume_adjusted.
2. Import the target audio file, increase its volume by 15 dB and save it to the variable louder_volume_adjusted.

**Results**<br>
<font color=darkgreen>That sounds like progress! Here's the louder audio file you created and the quieter one (no sound). Nice work! Adjusting the volume with operators can be useful but doesn't help when you only want to increase the loudness of only quiet sections. Let's take a look at a function which can help!</font>

In [33]:
# Import audio file
volume_adjusted = AudioSegment.from_file('volume_adjusted.wav')
play(volume_adjusted)

In [34]:
# Lower the volume by 60 dB
quiet_volume_adjusted = volume_adjusted - 60
play(quiet_volume_adjusted)

In [35]:
# Increase the volume by 15 dB
louder_volume_adjusted = volume_adjusted + 15
play(louder_volume_adjusted)

## 03.08 Normalizing an audio file with PyDub

Sometimes you'll have audio files where the speech is loud in some portions and quiet in others. Having this variance in volume can hinder transcription.

Luckily, <code>PyDub</code>'s effects module has a function called <code>normalize()</code> which finds the maximum volume of an <code>AudioSegment</code>, then adjusts the rest of the <code>AudioSegment</code> to be in proportion. This means the quiet parts will get a volume boost.

You can listen to an example of an audio file which starts as loud then goes quiet, <code>loud_then_quiet.wav</code>, here: https://assets.datacamp.com/production/repositories/4637/datasets/9251c751d3efccf781f3e189d68b37c8d22be9ca/ex3_datacamp_loud_then_quiet.wav.

In this exercise, you'll use <code>normalize()</code> to normalize the volume of our file, making it sound more like this: https://assets.datacamp.com/production/repositories/4637/datasets/f0c1ba35ff99f07df8cfeee810c7b12118d9cd0f/ex3_datamcamp_normalized_loud_quiet.wav.

**Instructions**<br>
1. Import AudioSegment from PyDub and normalize from the PyDub's effects module.
2. Import the target audio file, loud_then_quiet.wav and save it to loud_then_quiet.
3. Normalize the imported audio file using the normalize() function and save it to normalized_loud_then_quiet.

**Results**<br>
<font color=darkgreen>That sounds much better! Remember, speech recognition works best on clear speech files, so the more you can do to improve the quality of your audio files, including their volume, the better.</font>

In [36]:
# Import target audio file
loud_then_quiet = AudioSegment.from_file('ex3_datacamp_loud_then_quiet.wav')
play(loud_then_quiet)

In [37]:
# Normalize target audio file
normalized_loud_then_quiet = normalize(loud_then_quiet)
play(normalized_loud_then_quiet)

## 03.09 Chopping and changing audio files

Some of your audio files may have sections of redundancy. For example, you might find at the beginning of each file, there's a few seconds of static.

Instead of wasting compute trying to transcribe static, you can remove it.

Since an <code>AudioSegment</code> is iterable, and measured in milliseconds, you can use slicing to alter the length.

To get the first 3-seconds of <code>wav_file</code>, you'd use <code>wav_file[:3000]</code>.

You can also add two <code>AudioSegment</code>'s together using the addition operator. This is helpful if you need to combine several audio files.

To practice both of these, we're going to remove the first four seconds of __part1.wav__ (https://assets.datacamp.com/production/repositories/4637/datasets/6ef2e43497070fd23c6ce4c0fe1d9d0e46469750/ex3_slicing_part_1.wav), and add the remainder to __part2.wav__ (https://assets.datacamp.com/production/repositories/4637/datasets/6ef2e43497070fd23c6ce4c0fe1d9d0e46469750/ex3_slicing_part_1.wav). Leaving the end result sounding like __part_3.wav__ (https://assets.datacamp.com/production/repositories/4637/datasets/3803042506ed07d707fe264d0bc6ec6ffa891e63/ex3_slicing_part_3.wav).

**Instructions**<br>
1. Import part_1.wav and part_2.wav and save them to part_1 and part_2 respectively.
2. Remove the first 4-seconds of part_1 using slicing and save the new audio to part_1_removed.
3. Add part_1_removed to part_2 and save it to part_3.

**Results**<br>
<font color=darkgreen>Well done! You're becoming an audio manipulation master! But we're not done yet, there's still a few more tricks in the PyDub library you should know about.</font>

In [38]:
# Import part 1 and part 2 audio files
part_1 = AudioSegment.from_wav('ex3_slicing_part_1.wav')
part_2 = AudioSegment.from_file('ex3_slicing_part_2.wav')

In [39]:
# Remove the first four seconds of part 1
part_1_removed = part_1[4000:]

In [40]:
# Add the remainder of part 1 and part 2 together
part_3 = part_1_removed + part_2

## 03.10 Splitting stereo audio to mono with PyDub

If you're trying to transcribe phone calls, there's a chance they've been recorded in stereo format, with one speaker on each channel.

As you've seen, it's hard to transcribe an audio file with more than one speaker. One solution is to split the audio file with multiple speakers into single files with individual speakers.

<code>PyDub</code>'s <code>split_to_mono()</code> function can help with this. When called on an <code>AudioSegment</code> recorded in stereo, it returns a list of two separate <code>AudioSegment</code>'s in mono format, one for each channel.

In this exercise, you'll practice this by splitting this __stereo phone call__ (<code>stereo_phone_call.wav</code> https://assets.datacamp.com/production/repositories/4637/datasets/810bb65e2e681e086e90bc2c6c2372bc4bd2cb52/ex3_stereo_call.wav) recording into __channel 1__ (https://assets.datacamp.com/production/repositories/4637/datasets/0aa876f5cb924035481d7b786a3701624e86d1e7/ex3_stereo_call_channel_1.wav) and __channel 2__ (https://assets.datacamp.com/production/repositories/4637/datasets/2a16db969efc35186fe25ca45a4dbd506318a1cd/ex3_stereo_call_channel_2.wav). This separates the two speakers, allowing for easier transcription.

**Instructions**

1. Import AudioSegment from pydub.
2. Create an AudioSegment instance stereo_phone_call with stereo_phone_call.wav.
3. Split stereo_phone_call into channels using split_to_mono() and check the channels of the resulting output.
4. Save each channel to new variables, phone_call_channel_1 and phone_call_channel_2.

**Results**<br>
<font color=darkgreen>Nice one! Having audio files with only one speaker usually results in better quality transcriptions. Now you've done all this audio processing, how do save your altered audio files to use later? Let's find out.</font>

In [41]:
# Import stereo audio file and check channels
stereo_phone_call = AudioSegment.from_file('ex3_stereo_call.wav')
print(f"Stereo number channels: {stereo_phone_call.channels}")

# Split stereo phone call and check channels
channels = stereo_phone_call.split_to_mono()
print(f"Split number channels: {channels[0].channels}, {channels[1].channels}")

# Save new channels separately
phone_call_channel_1 = channels[0]
phone_call_channel_2 = channels[1]

Stereo number channels: 2
Split number channels: 1, 1


## 03.11 Converting and saving audio files with PyDub

See the video.

**Examples from the video - Reformatting and exporting multiple audio **

In [42]:
def make_wav(wrong_folder_path, right_folder_path):
    print("Let's begin!")
    # Loop through wrongly formatted files
    for file in os.scandir(wrong_folder_path):
        
        # Only work with files with audio extensions we're fixing
        if file.path.endswith(".mp3") or file.path.endswith(".flac"):
        
            # Create the new .wav filename
            out_file = right_folder_path + os.path.splitext(os.path.basename(file.path))[0] + ".wav"

            # Read in the audio file and export it in wav format
            AudioSegment.from_file(file.path).export(out_file, format="wav")
            print(f"Creating {out_file}")
    print('End.')

In [43]:
# Call our new function
make_wav(".", "audio_mp3_to_wav/")

Let's begin!
Creating audio_mp3_to_wav/AUD-20180918-WA0000.wav
Creating audio_mp3_to_wav/CocaCola.wav
Creating audio_mp3_to_wav/ex3-static-help-with-account.wav
Creating audio_mp3_to_wav/mp3_file.wav
End.


## 03.12 Exporting and reformatting audio files

If you've made some changes to your audio files, or if they've got the wrong file extension, you can use **PyDub** to export and save them as new audio files.

You can do this by using the **.export()** function on any instance of an **AudioSegment** you've created. The **export()** function takes two parameters, **out_f**, or the destination file path of your audio file and **format**, the format you'd like your new audio file to be. Both of these are strings. **format** is **"mp3"** by default so be sure to change it if you need.

In this exercise, you'll import this **.mp3** file (**mp3_file.mp3** https://assets.datacamp.com/production/repositories/4637/datasets/b035eadbae1544450868436a7179fa70158eb5de/mp3_file.mp3) and then export it with the **.wav** extension using **.export()**.

Remember, to work with files other than **.wav**, you'll need **ffmpeg**.

**Instructions**

1. Import mp3_file.mp3 and save it to mp3_file.
2. Export mp3_file with the file name mp3_file.wav with "wav" format.

**Results**<br>
<font color=darkgreen>There we go! Now our .mp3 file is in the .wav format, it'll definitely be compatible with all kinds of speech transcription APIs. Let's see this at scale.</font>

In [44]:
# Import the .mp3 file
mp3_file = AudioSegment.from_file('mp3_file.mp3')

In [45]:
# Export the .mp3 file as wav
mp3_file.export(out_f='audio_mp3_to_wav/mp3_file.wav', format='wav')

<_io.BufferedRandom name='audio_mp3_to_wav/mp3_file.wav'>

## 03.13 Manipulating multiple audio files with PyDub

You've seen how to convert a single file using **PyDub** but what if you had a folder with multiple different file types?

For this exercise, we've setup a **folder** which has **.mp3**, **.m4a** and **.aac** versions of the **good-afternoon** audio file.

We'll use **PyDub** to open each of the files and export them as **.wav** format so they're compatible with speech recognition APIs.

**Instructions**

1. Pass audio_file to the from_file() function.
2. Use export() to export wav_filename with the format ".wav".

**Results**<br>
<font color=darkgreen>Woohoo! You've successfully converted the folder of audio files from being non-compatiable with speech_recognition to being compatible!</font>

In [46]:
folder = ['AUD-20180918-WA0000.mp3', 'mp3_file.mp3', 'CocaCola.mp3', 'AUD-20190504-WA0000.m4a']

# Loop through the files in the folder
for audio_file in folder:
    # Create the new .wav filename
    wav_filename = "audio_mp3_to_wav/" + os.path.splitext(os.path.basename(audio_file))[0] + ".wav"
        
    # Read audio_file and export it in wav format
    AudioSegment.from_file(audio_file).export(out_f=wav_filename, format='wav')
        
    print(f"Creating {wav_filename}...")

Creating audio_mp3_to_wav/AUD-20180918-WA0000.wav...
Creating audio_mp3_to_wav/mp3_file.wav...
Creating audio_mp3_to_wav/CocaCola.wav...
Creating audio_mp3_to_wav/AUD-20190504-WA0000.wav...


## 03.14 An audio processing workflow

You've seen how to import and manipulate a single audio file using **PyDub**. But what if you had a folder with multiple audio files you needed to convert?

In this exercise we'll use **PyDub** to format a folder of files to be ready to use with **speech_recognition**.

You've found your customer call files all have 3-seconds of static at the start and are quieter than they could be.

To fix this, we'll use **PyDub** to cut the static, increase the sound level and convert them to the **.wav** extension.

You can listen to an unformatted example **here** (https://assets.datacamp.com/production/repositories/4637/datasets/c53557fea60087064aec7e9d99e889b9be79e75a/ex3-static-help-with-account.mp3).

**Instructions**

1. Let's start with one file. Import account_help.mp3 and cut off the first 3-seconds (3000 milliseconds) of static.
2. Now the static has been removed, increase the volume by 10dB.
3. Now for multiple files. Use from_file() to import each audio_file and export the louder files without static with the "wav" format.

**Results**<br>
<font color=darkgreen>Woohoo! You've successfully processed and converted the folder of audio files from being non-compatiable with speech_recognition to being compatible! Here's what your files sound like without static, and here's without the static and 10 decibels louder. Let's start putting all you've learned about audio processing to work in the next chapter.</font>

In [47]:
file_with_static = AudioSegment.from_file('ex3-static-help-with-account.mp3')

# Cut the first 3-seconds of static off
file_without_static = file_with_static[3000:]

In [48]:
# Increase the volume by 10dB
louder_file_without_static = file_without_static + 10

In [49]:
# Multiple files
folder = ['ex3-static-help-with-account.mp3']

for audio_file in folder:
    file_with_static = AudioSegment.from_file(audio_file)

    # Cut the 3-seconds of static off
    file_without_static = file_with_static[3000:]

    # Increase the volume by 10dB
    louder_file_without_static = file_without_static + 10
    
    # Create the .wav filename for export
    wav_filename = "audio_mp3_to_wav/" + os.path.splitext(os.path.basename(audio_file))[0] + ".wav"
    
    # Export the louder file without static as .wav
    louder_file_without_static.export(wav_filename, format='wav')
    print(f"Creating {wav_filename}...")

Creating audio_mp3_to_wav/ex3-static-help-with-account.wav...


# Aditional material

- **Online Voice Recorder & Audio Cutter**: https://voice-recorder-online.com/
- **For more details on available language models in "speech_recognition" python module**: https://cloud.google.com/speech-to-text/docs/languages
- **Datacamp course**: https://learn.datacamp.com/courses/spoken-language-processing-in-python
- To work with file different from .wav: http://ffmpeg.org/, ex. (in the shell): <code>ffmpeg -i test.mp3 test.wav</code> 
- Problem with __ffmpeg__ solved: https://www.programmersought.com/article/76562906865/