# Multichannel Audio Capabilities

In this notebook, we will explore the capabilities of fastai_audio to be expanded into a multi-channel implementation.  

The data used is from the following challenge: http://c4dm.eecs.qmul.ac.uk/sceneseventschallenge/

In [None]:
%reload_ext autoreload
%autoreload 2
%matplotlib inline

In [None]:
import sys
sys.path.append("..")
from audio import *

In [None]:
data_folder = datapath4file("scenes_stereo")
data_url = "https://c4dm.eecs.qmul.ac.uk/rdr/bitstream/handle/123456789/29/scenes_stereo"
if not os.path.exists(data_folder): 
    filename = download_data(data_url, ext = ".zip")
    !unzip -q -j {str(filename)} -d {str(data_folder)} 
#Probably a better way to get this, but best I could think of
label_pattern = lambda x: ''.join([letter for letter in str(x).split("/")[-1].split(".")[0] if letter.isalpha()])

In [None]:
audios = AudioList.from_folder(data_folder)

### Defaults

By default, AudioConfig will have all of the same defaults as before, with the only exception being downmix which by default will be turned off with `downmix=False`.  So by default, if you have multi-channel audio, it will load each channel separately.  

In [None]:
config_default = AudioConfig()
audios = AudioList.from_folder(data_folder, config=config_default)
audios[1].show()

### Downmixing

### Mixed Channel Folder

If their are multiple channel sizes in the folder, `downmix=True` should be used.  

Previously fastai_audio handled multiple channels of audio by consolidating them all into a single channel. This is known as Downmixing and is still possible by setting the `downmix` flag to `True`.  This should also be done if your data has mixed numbers of channels.

In [None]:
downmix_config = AudioConfig(downmix=False)
audios = AudioList.from_folder("../data/misc/6-channel-multichannel/", config=downmix_config).split_none().label_from_func(label_pattern)

In [None]:
audios

In [None]:
audios.config._nchannels

In [None]:
config_downmix = AudioConfig(downmix=True)
audios = AudioList.from_folder("../data/misc/6-channel-multichannel/", config=config_downmix).split_none().label_from_func(label_pattern)

In [None]:
audios.config._nchannels

### Remove Silence

Removing silence still works in a similar way as the old remove silence works.  If you have multi-channel audio, it will look for silence accross all of your channels.  This keeps all of the channels on the same time spectrum.  `all`, `trim`, and `split` are all options for `remove_silence`. 

##### All

In [None]:
config_rs = AudioConfig(remove_silence="all")
audios = AudioList.from_folder(data_folder, config=config_rs).split_none().label_from_func(label_pattern)
audios.train[1][0].show()

##### Split

In [None]:
config_rs = AudioConfig(remove_silence="split")
audios = AudioList.from_folder(data_folder, config=config_rs).split_none().label_from_func(label_pattern)
audios.train[2][0].show()

##### Trim

In [None]:
config_rs = AudioConfig(remove_silence="trim")
audios = AudioList.from_folder(data_folder, config=config_rs).split_none().label_from_func(label_pattern)
audios.train[1][0].show()

In [None]:
config_mfcc_stack = AudioConfig(mfcc=True, delta=True, duration=4000)
audios_mfcc_stack = AudioList.from_folder(data_folder, config=config_mfcc_stack).split_by_rand_pct(.2, seed=4).label_from_func(label_pattern)
db_mfcc_stack = audios_mfcc_stack.databunch(bs=2)
db_mfcc_stack.show_batch(5)

In [None]:
audios_mfcc_stack

In [None]:
learn = audio_learner(db_mfcc_stack)
learn.lr_find()
learn.recorder.plot()

In [None]:
config_sg = AudioConfig(use_spectro=True, duration=4000)
db_sg = (AudioList.from_folder(data_folder, config=config_sg)
         .split_by_rand_pct(.2, seed=4)
         .label_from_func(label_pattern)
         .databunch(bs=2))
learn = audio_learner(db_sg).mixup()
learn.lr_find();learn.recorder.plot()

### Transforms

##### Size Transform

In [None]:
tfms = get_spectro_transforms(); tfms

In [None]:
tfms = get_spectro_transforms(size=(256,250), mask_time=False, mask_frequency=False, roll=False);tfms

In [None]:
config_sg = AudioConfig(use_spectro=True, duration=4000)
db_sg = (AudioList.from_folder(data_folder, config=config_sg)
         .split_by_rand_pct(.2, seed=4)
         .label_from_func(label_pattern)
         .transform(tfms)
         .databunch(bs=2))
db_sg.show_batch()

##### Frequency Masking

All of the other transforms will work normally and will be applied to each channel equivalently. 

In [None]:
tfms = get_spectro_transforms(mask_time=False, mask_freq=True, roll=False);tfms

In [None]:
config_sg = AudioConfig(duration=5000)
db_sg = (AudioList.from_folder(data_folder, config=config_sg)
         .split_by_rand_pct(.2, seed=4)
         .label_from_func(label_pattern)
         .transform(tfms)
         .databunch(bs=2))
db_sg.show_batch()

In [None]:
# use 4 masks of 5 rows each and set the mask_value to be 42
tfms = get_spectro_transforms(mask_time=False, mask_freq=True, roll=False, fmasks=4, num_rows=5, fmask_value=42)
db_sg = (AudioList.from_folder(data_folder, config=config_sg)
         .split_by_rand_pct(.2, seed=4)
         .label_from_func(label_pattern)
         .transform(tfms)
         .databunch(bs=2))
db_sg.show_batch()

##### Time Masking

In [None]:
# now let's check out with time and frequency masking, but let's tone down the size a bit
config_sg = AudioConfig(use_spectro=True, duration=4000)
tfms = get_spectro_transforms(mask_time=True, mask_freq=True, roll=False, num_rows=12, num_cols=8);tfms
db_sg = (AudioList.from_folder(data_folder, config=config_sg)
         .split_by_rand_pct(.2, seed=4)
         .label_from_func(label_pattern)
         .transform(tfms)
         .databunch(bs=2))
db_sg.show_batch()

##### Rolling 

In [None]:
config_sg = AudioConfig(segment_size=5000, downmix=True)
tfms = get_spectro_transforms(mask_time=True, mask_freq=True, roll=True, num_rows=14, num_cols=10);tfms
db_sg = (AudioList.from_folder(data_folder, config=config_sg)
         .split_by_rand_pct(.2, seed=4)
         .label_from_func(label_pattern)
         .transform(tfms)
         .databunch(bs=2))
db_sg.show_batch()

### Creating a Learner

In [None]:
learn = audio_learner(db_sg)
learn.lr_find();learn.recorder.plot()

In [None]:
learn.fit_one_cycle(10, slice(4e-4,4e-3))