<a href="https://colab.research.google.com/github/hfwittmann/sound/blob/master/Compress_audio_via_images.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Audio compression via images

This is the beginning of the post on arthought.com:

In this notebook we will use image compression technique to compress audio signals.

The audio files in this notebook are from the ESC-50 dataset(​Ref 1)​:

The technique that is used for image compression is called singular value demoposition. A good application of svd and a demo for images is here​​ (Ref 2)​.

This notebook demontrates an image-based compression of audio files. To convert sound into images we use Short-time Fourier transform (STFT) and its inverse, both from the python audio package librosa.

STFT yields a complex two dimensional matrix which can be inverted to yield the original audio signal.

This complex SFTT “image” can be separated two “images”, into real and complex parts. These can be compressed using singular value decomposition (svd), and subsequently inverted to yield the compressed audio image.

The bird’s eye view of the presented material is as follows:

Intro image compression demo : Compress colored photo image by spltting it into RGB (red, green and brown) images
Audio compression demo: Compress audio by converting the sound into an RC (real and imaginary) image .. and then convert the compressed RC image back again
Check it out on github Last updated: 01/03/2020 12:18:53

- 1.	Piczak KJ. ESC-50: Dataset for Environmental Sound Classification. ESC-50: Dataset for Environmental Sound Classification. https://github.com/karoldvl/ESC-50/. Published March 1, 2020. Accessed March 1, 2020.

- 2.	Baumann T. Image Compression with Singular Value Decomposition. https://timbaumann.info/. https://timbaumann.info/. Published March 1, 2020. Accessed March 1, 2020.

# Settings

In [0]:
%matplotlib inline
import matplotlib
matplotlib.rcParams['figure.figsize'] = (20.0, 10.0)
import numpy as np

# Define compression helper functions

In [0]:
random_seed = 0
from sklearn.decomposition import TruncatedSVD
np.random.seed(random_seed)

def compress(imageIn, n_components=100,random_seed=0):
    
    image = imageIn
    
    if len (image.shape) != 3:
        raise('not sure what image type this')

    if image.shape[2] == 2:
        image_type = 'real_imaginary'
        
    if image.shape[2] == 3:
        image_type = 'RGB'

    print(f'Found {image_type} image')
            
    n_of_layers = image.shape[2]
    
    compressed_list = []
    
    for layer in range(n_of_layers):
        # print(layer)
        image_layer = image[:,:,layer] # ie r, g or b
        
        clf = TruncatedSVD(n_components=n_components)
        clf.fit(image_layer)
        compressed_layer = clf.inverse_transform(clf.transform(image_layer))
        
        compressed_list.append(compressed_layer)
        
        
    compressed = np.stack(compressed_list, axis=2)    
    
    if image_type =='RGB':
      # clip to expected range
      compressed = np.clip(compressed, a_min=0, a_max=255)
      
      # cast to same dtype as original image
      compressed = np.array(compressed, dtype = image.dtype)
    
    # reshape to original image size
    compressed = compressed.reshape(imageIn.shape)

    SHAPE = compressed.shape[:2]
    print('shape:', SHAPE)

    Original_memory = np.prod(SHAPE)
    Compressed_memory = (1 + np.sum(SHAPE) ) * n_components
    print(f'The compressed memory is roughly \
      {100 * Compressed_memory / Original_memory:0.0f}% of the original')
    
    return compressed

In [3]:
SHAPE = (427, 640)
n_components = 5

print('shape:', SHAPE)

Original_memory = np.prod(SHAPE)
Compressed_memory = (1 + np.sum(SHAPE) ) * n_components
print(f'The compressed memory is roughly \
  {100 * Compressed_memory / Original_memory:0.0f}% of the original')


shape: (427, 640)
The compressed memory is roughly   2% of the original


In [0]:
def compress_complex(complex_image, n_components=100,random_seed=0, doplot=False):
    '''
    complex in the sense of complex numbers,
    https://en.wikipedia.org/wiki/Complex_number
    
    ie having a real and an imaginary part
    
    '''
    real = np.real(complex_image)
    imaginary = np.imag(complex_image)

    image_ri = np.stack([real, imaginary], axis=2) 
    # image_ri : ri stands for real and imaginary
    # similarly to an 
    # ... rgb image (with three rgb layers) ...
    # ... it has teo layers (with two ri layers)

    compressed_image_ri = compress(image_ri, n_components, random_seed)

    compressed_real = compressed_image_ri[:,:,0]
    compressed_imaginary = compressed_image_ri[:,:,1]
    
    if doplot:
        plot_spectrum(real, 'Real Part')
        plot_spectrum(compressed_real, 'Real Part Compressed')
        
        plot_spectrum(imaginary, 'Imaginary Part')
        plot_spectrum(compressed_imaginary, 'Imaginary Part Compressed')
    
    compressed = compressed_real + 1j * compressed_imaginary
    return compressed
    

# Define plotting helper functions

In [0]:
from librosa import display

def plot_spectrum (data, name):
    display.specshow(data, y_axis='log', x_axis='time')
    plt.title(f'Power spectrogram of {name}')
    plt.colorbar(format='%+2.0f dB')
    plt.tight_layout()
    # plt.show()

# Compress Images

In [6]:
from sklearn.datasets import load_sample_images
dataset = load_sample_images()
print(dataset.DESCR)

Image: china.jpg
Released under a creative commons license. [1]
Attribution: Some rights reserved by danielbuechele [2]
Retrieved 21st August, 2011 from [3] by Robert Layton

[1] https://creativecommons.org/licenses/by/2.0/
[2] https://www.flickr.com/photos/danielbuechele/
[3] https://www.flickr.com/photos/danielbuechele/6061409035/sizes/z/in/photostream/


Image: flower.jpg
Released under a creative commons license. [1]
Attribution: Some rights reserved by danielbuechele [2]
Retrieved 21st August, 2011 from [3] by Robert Layton

[1] https://creativecommons.org/licenses/by/2.0/
[2] https://www.flickr.com/photos/vultilion/
[3] https://www.flickr.com/photos/vultilion/6056698931/sizes/z/in/photostream/






In [0]:
image_dict = {'China':dataset.images[0], 'Flower': dataset.images[1]}

In [8]:
from __future__ import print_function
from ipywidgets import interact, interactive, fixed, interact_manual
import ipywidgets as widgets
import matplotlib.pyplot as plt

@interact(name=image_dict.keys(), 
          n_components=[1,2,5,10,20,50,200])
def plot_compressed(name,n_components=5):
    image = image_dict[name]
    print(name)
    compressed_image = compress(image, n_components=n_components)

    # plt.figure(figsize=(20, 8))
    plt.imshow(compressed_image)
    return None

interactive(children=(Dropdown(description='name', options=('China', 'Flower'), value='China'), Dropdown(descr…

# Sounds
The audio files in this notebook are from the ESC-50 dataset:

- ESC-50: Dataset for Environmental Sound Classification
- https://github.com/karoldvl/ESC-50/
- https://dx.doi.org/10.7910/DVN/YDEPUT

# Load Audio Helper Function

In [0]:
def load_audio_file(filepath):
    # %%
    y_multichannel, sr = lr.load(filepath, mono=False)
    print(y_multichannel.shape)

    if len(y_multichannel.shape)>1:
        channels = [0]
        y_channel_selection = y_multichannel[tuple([channels])]

        y = np.mean(y_channel_selection, axis=0)
    else:
        y = np.array(y_multichannel)
    
    return y, sr

# Download sound files from github

## Helper function to download

In [0]:
import requests
def download_soundfile(url, name):
    
    print(f'downloading {url} to file {name}')

    # download the file contents in binary format
    r = requests.get(url, allow_redirects=True)

    # open method to open a file on your system and write the contents
    with open(f"{name}", "wb") as code:
        code.write(r.content)
    return None

# Define which Images to use

In [0]:
base_url = 'https://github.com/hfwittmann/sound/raw/master/sounds/'
sounds_dict = {'Cat 1': base_url + 'cats/2-82274-A-5.wav', 
               'Cat 2': base_url + 'cats/2-82274-B-5.wav', 
               'Can opening': base_url + 'can_opening/3-155659-A-34.wav'}

# Do the actual download

In [12]:
# do downloads
for name, url in sounds_dict.items():
    # print(f'downloading {name}')
    download_soundfile(url, f'{name}.wav')

downloading https://github.com/hfwittmann/sound/raw/master/sounds/cats/2-82274-A-5.wav to file Cat 1.wav
downloading https://github.com/hfwittmann/sound/raw/master/sounds/cats/2-82274-B-5.wav to file Cat 2.wav
downloading https://github.com/hfwittmann/sound/raw/master/sounds/can_opening/3-155659-A-34.wav to file Can opening.wav


# Analyse Sound files

In [13]:
import pathlib
import IPython
import librosa as lr
from glob import glob
import numpy as np
import matplotlib.pyplot as plt


@interact(name=sounds_dict.keys(),
          plt=plt.figure(figsize=(15, 7))
         )
def myprint(name):
    print(f'{name}.wav')
    y, sr = load_audio_file(f'{name}.wav')
    IPython.display.display(IPython.display.Audio(y, rate=sr))
    
    mysftf = lr.stft(y, n_fft= 1024, hop_length= 512)
    
    # plt.figure(figsize=(15, 7))
    myplot = plot_spectrum(np.log(np.abs(mysftf)), 'Log of Absolute of Compressed')

    return None


<Figure size 1080x504 with 0 Axes>

interactive(children=(Dropdown(description='name', options=('Cat 1', 'Cat 2', 'Can opening'), value='Cat 1'), …

# Compress sound files

In [14]:
@interact(name=sounds_dict.keys(),
          n_components=[1,2,5,10,20,50])
def myprint(name, n_components=20):
    print(name)
    y, sr = load_audio_file(f'{name}.wav')
    # IPython.display.display(IPython.display.Audio(y, rate=sr))
    mysftf = lr.stft(y, n_fft= 1024, hop_length= 512)
    # plot_spectrum(np.abs(mysftf), 'Absolute of Uncompressed')
    # 
    
    mysftf_compressed = compress_complex(mysftf, n_components=n_components, doplot=False)
    
    # plt.figure(figsize=(15, 7))
    plot_spectrum(np.log(np.abs(mysftf_compressed)), 'Log of Absolute of Compressed')

    y_inverted_sftf = lr.istft(mysftf_compressed, hop_length= 512)
    IPython.display.display(IPython.display.Audio(y_inverted_sftf, rate=22050))
    print('\n')

interactive(children=(Dropdown(description='name', options=('Cat 1', 'Cat 2', 'Can opening'), value='Cat 1'), …