# Coswara data augmentation notebook

This notebook is used to augment the extracted Coswara data (see the [Coswara 1 notebook](./coswara_1_data_extraction.ipynb)). In the original paper, the details of the augmentation configurations is not discussed nor is it documentet in the [GitHub repo](https://github.com/Saranga7/covid19--cough-diagnosis) of the authors. The GitHub repo contains the final augmentation configuration which is used in this notebook, along with a slightly modified configuration of it using the default parameters for the `TimeStretch` augmentation function.

Overall, each of the `Covid-19 positive` instances are augmented with 2 augmentation configuration and saved, to push the class balance to close to 1:1. At this stage, any empty recording is removed.

In [None]:
import os
import librosa
import librosa.display

import IPython.display as ipd

from tqdm import tqdm
from scipy.io.wavfile import write

import matplotlib.pyplot as plt
import numpy as np

from audiomentations import Compose, TimeStretch, PitchShift, Shift, Trim, Gain, PolarityInversion

In [None]:
# Original data augmentation configuration from the Brogrammer's git repo
augment1 = Compose([
    TimeStretch(min_rate=0.7, max_rate=1.4, p=0.9),
    PitchShift(min_semitones=-2, max_semitones=4, p=1),
    Shift(min_fraction=-0.5, max_fraction=0.5, p=0.8),
    Trim(p=1),Gain(p=1),
    PolarityInversion(p=0.8)   
    ])

# Same augmentation configuration with TimeStretch parameters set to default
augment2 = Compose([
    TimeStretch(min_rate=0.8, max_rate=1.25, p=0.5),
    PitchShift(min_semitones=-2, max_semitones=4, p=1),
    Shift(min_fraction=-0.5, max_fraction=0.5, p=0.8),
    Trim(p=1),Gain(p=1),
    PolarityInversion(p=0.8)   
    ])

In [None]:
dataset_path = '../../../Coswara-Data/data/shallow/p'
write_path = '../../../Coswara-Data/data/shallow/augmented_p_data/'

In [None]:
sr=22050

for i, (dirpath, dirnames, filenames) in enumerate(os.walk(dataset_path)):
    print('Processing ', dirpath.split("/")[-1])
    j=0
    for f in tqdm(filenames):
        try:
            fpath=os.path.join(dirpath,f)
            #print(fpath)
            data,_=librosa.load(fpath,sr=sr)

            # First augmentation
            data=augment1(data,sr)
            write(write_path+str(j)+'.wav',22050,data)
            j+=1
            
            # Second augmentation
            data=augment2(data,sr)
            write(write_path+str(j)+'.wav',22050,data)
            j+=1
            
        except:
            continue