### From AudioFile to AudioData
- read the audio file using the AudioFile class. But the `recognize_google()` method requires an input of type `AudioData`.
- To convert our AudioFile to AudioData, we'll use the Recognizer class's `method record()` along with a context manager. The `record()` method takes an AudioFile as input and converts it to AudioData, ready to be used with `recognize_google()`

In [1]:
import speech_recognition as sr

In [2]:
# Instantiate Recognizer
recognizer = sr.Recognizer()

# Convert audio to AudioFile
clean_support_call = sr.AudioFile('../data/clean-support-call.wav')

# Convert AudioFile to AudioData
with clean_support_call as source:
    clean_support_call_audio = recognizer.record(source)

# Transcribe AudioData to text
text = recognizer.recognize_google(clean_support_call_audio,
                                   language="en-US")
print(text)

hello I want to get some help setting up my time please


- We've imported an audio file, converted it to the right data type and transcribed it using Google's free web API!

### Recording the audio we need
- Sometimes we may not want the entire audio file we're working with. The duration and offset parameters of the record() method can help with this.
- After exploring our dataset, we find there's one file,which has 30-seconds of silence at the end and a support call file, 3-seconds of static at the front.
- Setting duration and offset means the record() method will record up to duration audio starting at offset. They're both measured in seconds.

- Let's get the first 10-seconds of nothing_at_end_audio. To do this, we can set duration to 10.

In [3]:
# Convert AudioFile to AudioData
nothing_at_end = sr.AudioFile('../data/30-seconds-of-nothing-16k.wav')

with nothing_at_end as source:
    nothing_at_end_audio = recognizer.record(source,
                                             duration=10,
                                             offset=None)

# Transcribe AudioData to text
text = recognizer.recognize_google(nothing_at_end_audio,
                                   language="en-US")

print(text)

this Odia call has 30 seconds of nothing at the end


- Let's remove the first 3-seconds of static of static_at_start by setting offset to 3.

In [4]:
# Convert AudioFile to AudioData
static_at_start = sr.AudioFile('../data/static-out-of-warranty.wav')

with static_at_start as source:
    static_art_start_audio = recognizer.record(source,
                                               duration=None,
                                               offset=3)

# Transcribe AudioData to text
text = recognizer.recognize_google(static_art_start_audio,
                                   language="en-US")

print(text)

hello I like to get some help with my device please I think it's a warranty I want to do back to you


- Speech recognition can be resource intensive, so in practice, we'll want to explore your audio files to make sure we're not wasting any compute power trying to transcribe static or silence.

### Different kinds of audio


- Pass the Japanese version of good morning (japanese_audio) to recognize_google() using "en-US" as the language.
- Pass the same Japanese audio (japanese_audio) using "ja" as the language parameter.
- What about about non-speech audio? Pass leopard_audio to recognize_google() with show_all as True.
- What if our speech files have non-audible human sounds? Pass charlie_audio to recognize_google() to find out.

In [5]:
# Convert AudioFile to AudioData
japanese_audio = sr.AudioFile('../data/good-morning-japanense.wav')

with japanese_audio as source:
    japanese_audio = recognizer.record(source)
# Create a recognizer class
recognizer = sr.Recognizer()

# Pass the Japanese audio to recognize_google
text = recognizer.recognize_google(japanese_audio, language='en-US')

# Print the text
print(text)

mass


In [6]:
# Convert AudioFile to AudioData
japanese_audio = sr.AudioFile('../data/good-morning-japanense.wav')

with japanese_audio as source:
    japanese_audio = recognizer.record(source)
# Create a recognizer class
recognizer = sr.Recognizer()

# Pass the Japanese audio to recognize_google
text = recognizer.recognize_google(japanese_audio, language='ja')

# Print the text
print(text)

おはようございます


In [7]:
# Convert AudioFile to AudioData
leopard_audio = sr.AudioFile('../data/leopard.wav')

with leopard_audio as source:
    leopard_audio = recognizer.record(source)

# Create a recognizer class
recognizer = sr.Recognizer()

# Pass the leopard roar audio to recognize_google
text = recognizer.recognize_google(leopard_audio, 
                                   language="en-US", 
                                   show_all=True)

# Print the text
print(text)

[]


In [8]:
# Convert AudioFile to AudioData
charlie_audio = sr.AudioFile('../data/charlie-bit-me-5.wav')

with charlie_audio as source:
    charlie_audio = recognizer.record(source)


# Create a recognizer class
recognizer = sr.Recognizer()

# Pass charlie_audio to recognize_google
text = recognizer.recognize_google(charlie_audio, 
                                   language="en-US")

# Print the text
print(text)

choli


### Working with noisy audio
- start by transcribing a clean speech sample to text and then see what happens when we add some background noise.
- To try and negate the background noise, we'll take advantage of Recognizer's adjust_for_ambient_noise() function.

In [9]:
# Read in clean_support_call as the source and call recognize_google() on the file.
clean_support_call = sr.AudioFile('../data/clean-support-call.wav')

recognizer = sr.Recognizer()

# Record the audio from the clean support call
with clean_support_call as source:
  clean_support_call_audio = recognizer.record(source)

# Transcribe the speech from the clean support call
text = recognizer.recognize_google(clean_support_call_audio,
					   language="en-US")

print(text)

hello I want to get some help setting up my time please


In [10]:
# Let's do the same as before but with a noisy audio file and show_all parameter as True.
noisy_support_call = sr.AudioFile('../data/2-noisy-support-call.wav')

recognizer = sr.Recognizer()

# Record the audio from the noisy support call
with noisy_support_call as source:
  noisy_support_call_audio = recognizer.record(source)

# Transcribe the speech from the noisy support call
text = recognizer.recognize_google(noisy_support_call_audio,
                         language="en-US",
                         show_all=True)

print(text)

{'alternative': [{'transcript': 'my family', 'confidence': 0.69139749}, {'transcript': 'family'}, {'transcript': 'about family'}, {'transcript': 'MP3'}, {'transcript': 'Koi family'}], 'final': True}


In [11]:
# Set the duration parameter of adjust_for_ambient_noise() to 1 (second) so recognizer adjusts for background noise.
noisy_support_call = sr.AudioFile('../data/2-noisy-support-call.wav')

recognizer = sr.Recognizer()

# Record the audio from the noisy support call
with noisy_support_call as source:
    # Adjust the recognizer energy threshold for ambient noise
    recognizer.adjust_for_ambient_noise(source, duration=1)
    noisy_support_call_audio = recognizer.record(noisy_support_call)
 
#  Transcribe the speech from the noisy support call
text = recognizer.recognize_google(noisy_support_call_audio,
                                   language="en-US",
                                   show_all=True)

print(text)

[]


In [12]:
# A duration of 1 was too long and it cut off some of the audio. Try setting duration to 0.5.
noisy_support_call = sr.AudioFile('../data/2-noisy-support-call.wav')

recognizer = sr.Recognizer()

# Record the audio from the noisy support call
with noisy_support_call as source:
	# Adjust the recognizer energy threshold for ambient noise
    recognizer.adjust_for_ambient_noise(source, duration=0.5)
    noisy_support_call_audio = recognizer.record(noisy_support_call)
 
# Transcribe the speech from the noisy support call
text = recognizer.recognize_google(noisy_support_call_audio,
                                   language="en-US",
                                   show_all=True)

print(text)

[]


### Import an audio file with PyDub

In [13]:
# Import AudioSegment from Pydub
from pydub import AudioSegment

# Create an AudioSegment instance
wav_file = AudioSegment.from_file(file='../data/wav_file.wav', 
                                  format="wav")

# Check the type
print(type(wav_file))



<class 'pydub.audio_segment.AudioSegment'>


### Play an audio file with PyDub

In [14]:
# Import AudioSegment and play
from pydub import AudioSegment
from pydub.playback import play

# Create an AudioSegment instance
wav_file = AudioSegment.from_file(file="../data/wav_file.wav", 
                                  format="wav")

# Play the audio file
play(wav_file)



### Audio parameters with PyDub

In [15]:
# Import audio file
wav_file = AudioSegment.from_file(file="../data/wav_file.wav")

# Find the frame rate
print(wav_file.frame_rate)

# Find the number of channels
print(wav_file.channels)

# Find the max amplitude
print(wav_file.max)

# Find the length
print(len(wav_file))

48000
2
8484
3284


### Adjusting audio parameters
- During our exploratory data analysis, we may find some of the parameters of our audio files differ or are incompatible with speech recognition APIs.
- PyDub has built-in functionality which allows us to change various attributes.
- For example, we can set the frame rate of our audio file calling `set_frame_rate()` on our AudioSegment instance and passing it an integer of the desired frame rate measured in Hertz.

In [16]:
# Import audio file
wav_file = AudioSegment.from_file(file="../data/wav_file.wav")

# Create a new wav file with adjusted frame rate
wav_file_16k = wav_file.set_frame_rate(16000)

# Check the frame rate of the new wav file
print(wav_file_16k.frame_rate)

16000


In [17]:
# Import audio file
wav_file = AudioSegment.from_file(file="../data/wav_file.wav")

# Set number of channels to 1
wav_file_1_ch = wav_file.set_channels(1)

# Check the number of channels
print(wav_file_1_ch.channels)

1


In [18]:
# Import audio file
wav_file = AudioSegment.from_file(file="../data/wav_file.wav")

# Print sample_width
print(f"Old sample width: {wav_file.sample_width}")

# Set sample_width to 1
wav_file_sw_1 = wav_file.set_sample_width(1)

# Check new sample_width
print(f"New sample width: {wav_file_sw_1.sample_width}")

Old sample width: 2
New sample width: 1


### PyDub : make an AudioSegment quieter or louder.

In [19]:
from pydub import AudioSegment

# Import audio file
volume_adjusted = AudioSegment.from_file('../data/volume_adjusted.wav')

# Lower the volume by 60 dB
quiet_volume_adjusted = volume_adjusted - 60

In [21]:
# Import audio file
volume_adjusted = AudioSegment.from_file('../data/volume_adjusted.wav')

# Increase the volume by 15 dB
louder_volume_adjusted = volume_adjusted + 15

### Normalizing an audio file with PyDub
- Sometimes we'll have audio files where the speech is loud in some portions and quiet in others. Having this variance in volume can hinder transcription.
- Luckily, PyDub's effects module has a function called `normalize()` which finds the maximum volume of an AudioSegment, then adjusts the rest of the AudioSegment to be in proportion. This means the quiet parts will get a volume boost.

In [11]:
# Import AudioSegment and normalize
from pydub import AudioSegment
from pydub.effects import normalize

# Import target audio file
loud_then_quiet = AudioSegment.from_file('../data/volume_adjusted.wav')

# Normalize target audio file
normalized_loud_then_quiet = normalize(loud_then_quiet)

### Chopping and changing audio files

In [12]:
from pydub import AudioSegment

# Import part 1 and part 2 audio files
part_1 = AudioSegment.from_file('../data/slicing_part_1.wav')
part_2 = AudioSegment.from_file('../data/slicing_part_2.wav')

# Remove the first four seconds of part 1
part_1_removed = part_1[4000:]

# Add the remainder of part 1 and part 2 together
part_3 = part_1_removed + part_2

### Splitting stereo audio to mono with PyDub
- If we're trying to transcribe phone calls, there's a chance they've been recorded in stereo format, with one speaker on each channel.
- As we've seen, it's hard to transcribe an audio file with more than one speaker. One solution is to split the audio file with multiple speakers into single files with individual speakers.
- PyDub's split_to_mono() function can help with this. When called on an AudioSegment recorded in stereo, it returns a list of two separate AudioSegment's in mono format, one for each channel.

In [14]:
# Import AudioSegment
from pydub import AudioSegment

# Import stereo audio file and check channels
stereo_phone_call = AudioSegment.from_file('../data/stereo_call.wav')
print(f"Stereo number channels: {stereo_phone_call.channels}")

# Split stereo phone call and check channels
channels = stereo_phone_call.split_to_mono()
print(f"Split number channels: {channels[0].channels}, {channels[1].channels}")

# Save new channels separately
phone_call_channel_1 = channels[0]
phone_call_channel_2 = channels[1]

Stereo number channels: 2
Split number channels: 1, 1


## Exporting and reformatting audio files
- If we've made some changes to our audio files, or if they've got the wrong file extension, we can use PyDub to export and save them as new audio files.
- We can do this by using the `.export()` function on any instance of an AudioSegment we've created. The export() function takes two parameters, out_f, or the destination file path of our audio file and format, the format we'd like our new audio file to be. Both of these are strings. format is "mp3" by default so be sure to change it if we need. 

In [10]:
import pydub
from pydub import AudioSegment
AudioSegment.ffmpeg = "ffmpeg.exe"
AudioSegment.ffprobe = "ffprobe.exe"
# Import the .mp3 file
mp3_file = AudioSegment.from_file('mp3_file_.mp3')

# Export the .mp3 file as wav
mp3_file.export(out_f="mp3_file.wav",
                format="wav")

<_io.BufferedRandom name='mp3_file.wav'>

### Finding PyDub stats

In [11]:
def show_pydub_stats(filename):
    """Returns different audio attributes related to an audio file."""
    # Create AudioSegment instance
    audio_segment = AudioSegment.from_file(filename)

    # Print audio attributes and return AudioSegment instance
    print(f"Channels: {audio_segment.channels}")
    print(f"Sample width: {audio_segment.sample_width}")
    print(f"Frame rate (sample rate): {audio_segment.frame_rate}")
    print(f"Frame width: {audio_segment.frame_width}")
    print(f"Length (ms): {len(audio_segment)}")
    return audio_segment

# Try the function
call_1_audio_segment = show_pydub_stats('mp3_file.wav')

Channels: 2
Sample width: 2
Frame rate (sample rate): 48000
Frame width: 4
Length (ms): 3519
