# **Audio Visualization**

Machine Learning has found its application across a number of domains that involve mimicking the complexities and senses of human beings. Computer Vision and Speech synthesis have been around since the late 1960s and have exponentially improved over time — especially in the last few years.

In this practice session, we will focus on Speech Synthesis which is one of the growing research areas with a number of real-world applications.

To read about it more, please refer [this](https://analyticsindiamag.com/step-by-step-guide-to-audio-visualization-in-python/) article.

# **Code Implementation**

We will use the IPython module to load the audio file and a popular library called Librosa to visualize it. 

### Installing Librosa

LibROSA is a python package that helps us analyse audio files and provides the building blocks necessary to create audio information retrieval systems. We will install the librosa library using the following command:

In [None]:

!python -m pip install pip --upgrade --user -q --no-warn-script-location
!python -m pip install numpy pandas seaborn matplotlib scipy statsmodels sklearn nltk gensim tensorflow keras torch torchvision \
    tqdm scikit-image pillow librosa --user -q --no-warn-script-location

# import IPython
# IPython.Application.instance().kernel.do_shutdown(True)


### Importing The Libraries

In [None]:
import IPython.display as ipd
import librosa
import librosa.display
import matplotlib.pyplot as plt

### Loading And Playing Audio Files In Jupyter

We will now load our audio file in just a one liner. Type and execute the following code.

In [None]:
audio_path = librosa.util.example_audio_file()

In [None]:
ipd.Audio(audio_path)

On executing the above code you will get an inline audio player which can be used to play the audio as shown below.

Spectrograms

First, we will initialize the plot with a figure size.

In [None]:
plt.figure(figsize=(15,4))

We will then load the audio file using librosa and will collect the data array and sampling rate for the audio file.

In [None]:
data,sample_rate1 = librosa.load(audio_path, sr=22050, mono=True, offset=0.0, duration=50, res_type='kaiser_best')

Sampling

Sound is a continuous wave. We can digitise sound by breaking the continuous wave into discrete signals. This process is called sampling. Sampling converts a sound wave into a sequence of samples or a discrete-time signal.

The load functions loads the audio file and converts it into an array of values which represent the amplitude if a sample at a given point of time.

Sampling Rate

The sampling rate is the number of samples per second. Hz or Hertz is the unit of the sampling rate. 20 kHz is the audible range for human beings.

We can now plot the spectrogram using the waveplot method as shown below:

In [None]:
librosa.display.waveplot(data,sr=sample_rate1, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000)

Linkin Park Vs Micheal Jackson Vs Blue

In [None]:
# Load imports
import IPython.display as ipd
import librosa
import librosa.display
import matplotlib.pyplot as plt

In [None]:
# !gdown https://drive.google.com/uc?id=1uL_stM6uhcSSlvDiqQbD1S3-BBi2oHlr

In [None]:
ipd.Audio('Numb+Official+Video+Linkin+Park.mp3')

In [None]:
# Numb - Linkin Park 

filename1 = 'Numb+Official+Video+Linkin+Park.mp3'
plt.figure(figsize=(15,4))
data1,sample_rate1 = librosa.load(filename1, sr=22050, mono=True, offset=0.0, duration=50, res_type='kaiser_best')
librosa.display.waveplot(data1,sr=sample_rate1, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000)

In [None]:
print(data1)
print(len(data1))
print(sample_rate1)

In [None]:
# !wget https://download.mp3oops.fun/e/Michael-Jackson-Dangerous.mp3

In [None]:
ipd.Audio('Michael-Jackson-Dangerous.mp3')

In [None]:
# Dangerous - Michael Jackson

filename2 = 'Michael-Jackson-Dangerous.mp3'
plt.figure(figsize=(15,4))
data2,sample_rate2 = librosa.load(filename2, sr=22050, mono=True, offset=0.0, duration=180, res_type='kaiser_best')
librosa.display.waveplot(data2,sr=sample_rate2, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000)

In [None]:
print(data2)
print(len(data2))
print(sample_rate2)

In [None]:
# !gdown https://drive.google.com/uc?id=1X_73Q2wFiTEWJ5zUISvK8lbJvko-JUWJ

In [None]:
ipd.Audio('Blue+One+Love.mp3')

In [None]:
# Dangerous - Michael Jackson

filename3 = 'Blue+One+Love.mp3'
plt.figure(figsize=(15,4))
data3,sample_rate3 = librosa.load(filename3, sr=22050, mono=True, offset=0.0, duration=180, res_type='kaiser_best')
librosa.display.waveplot(data3,sr=sample_rate3, max_points=50000.0, x_axis='time', offset=0.0, max_sr=1000)

In [None]:
print(data3)
print(len(data3))
print(sample_rate3)

# **Related Articles:**

> * [Audio Visualizaton](https://analyticsindiamag.com/step-by-step-guide-to-audio-visualization-in-python/)

> * [VGG Sound Datasets](https://analyticsindiamag.com/guide-to-vgg-sound-datasets-for-visual-audio-recognition/)

> * [Voxceleb Datasets](https://analyticsindiamag.com/guide-to-voxceleb-datasets-for-visual-audio-of-human-speech/)

> * [FreeSound Datasets](https://analyticsindiamag.com/datasets-freesound-pytorch-research/)

> * [LibriSpeech Datasets](https://analyticsindiamag.com/librispeech-datasets/)