#![Flowchart](https://drive.google.com/file/d/1r0IVntROxR92vBDzbHn1aRgH0HFYLlu5/view?usp=sharing)



![Here is the flowchart of the steps that I hve taken in my notebook](https://drive.google.com/uc?export=view&id=1DnCGt0QxE1W7JDyiS35ayDv3GQGCSM8Q)

# Contents

- [Preprocessing](#Preprocessing-Stage)  
  - [Importing packages](#Importing-the-necessary-packages)  
  - [Standardize](#Standardize-the-tsv-files)  
  - [Extract positive and negative calls from audio using tsv's](#Extract)  
  - [Plot Spectrograms](#Spec-generate)
  - [Same steps for test data](#test-data)  
- [Training Stage](#Training-Stage)  
   - [Basic CNN model](#Basic-CNN-model)
   - [VGG16-model](#VGG-16-model)

#**Preprocessing Stage**



###Downloading the PodcastR2,PodcastR3 and Podcast_Test files and extracting them.
####Since these files contains the calls of SRKW, only podcastR2 and podcastR3 have been downloaded and used

In [None]:
!apt-get -qq install awscli
!aws --no-sign-request s3 cp s3://acoustic-sandbox/labeled-data/detection/train/OrcasoundLab07052019_PodCastRound2.tar.gz ./ 
!aws --no-sign-request s3 cp s3://acoustic-sandbox/labeled-data/detection/train/OrcasoundLab09272017_PodCastRound3.tar.gz ./
!aws --no-sign-request s3 cp s3://acoustic-sandbox/labeled-data/detection/test/OrcasoundLab09272017_Test.tar.gz ./
!tar -xzf OrcasoundLab09272017_PodCastRound3.tar.gz
!tar -xzf OrcasoundLab07052019_PodCastRound2.tar.gz
!tar -xzf OrcasoundLab09272017_Test.tar.gz
!pip -q install ketos==2.0.0b4
!pip -q install pysoundfile
!pip install pydub


Selecting previously unselected package sgml-base.
(Reading database ... (Reading database ... 5%(Reading database ... 10%(Reading database ... 15%(Reading database ... 20%(Reading database ... 25%(Reading database ... 30%(Reading database ... 35%(Reading database ... 40%(Reading database ... 45%(Reading database ... 50%(Reading database ... 55%(Reading database ... 60%(Reading database ... 65%(Reading database ... 70%(Reading database ... 75%(Reading database ... 80%(Reading database ... 85%(Reading database ... 90%(Reading database ... 95%(Reading database ... 100%(Reading database ... 144328 files and directories currently installed.)
Preparing to unpack .../00-sgml-base_1.29_all.deb ...
Unpacking sgml-base (1.29) ...
Selecting previously unselected package python3-yaml.
Preparing to unpack .../01-python3-yaml_3.12-1build2_amd64.deb ...
Unpacking python3-yaml (3.12-1build2) ...
Selecting previously unselected package python3-six.
Preparing to unpack .../02-pytho

##Preprocessing on positive train dataset

###Importing the necessary packages 

In [None]:
import pandas as pd
from ketos.data_handling import selection_table as sl
import ketos.data_handling.database_interface as dbi
from ketos.data_handling.parsing import load_audio_representation
from ketos.audio.spectrogram import MagSpectrogram
from ketos.data_handling.parsing import load_audio_representation
import numpy as np
from os import listdir
from os.path import isfile, join
from scipy import signal
import soundfile as sf
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
from pydub import AudioSegment




In [None]:
#Generate mean and annot-train
def duration_mean(filename):
    annot_train = pd.read_csv(filename, sep='\t')
    mean=annot_train['duration_s'].mean()
    return annot_train,mean
  

In [None]:
#Function to add the end time
def add_end(filename):
    filename["end"]=filename["start"]+filename["duration_s"]


In [None]:
#Function to find extract filename and start time
def fname_stime(filename):
    file_name=filename.iloc[:,0].values
    start_time=filename.iloc[:,1].values
    return file_name,start_time


In [None]:
#Function to extract audio from the .wav files to generate complete positive and negative calls
def extract_audio(label,filename,path,position):
    file_name=filename.iloc[:,0].values
    start_time=filename.iloc[:,position].values
    i=0
    o=0
    for x in file_name:
  
        AUDIO_FILE=x
        sound = AudioSegment.from_file(AUDIO_FILE)
        p=start_time[i]
        p=p*1000
        print(p)
        i=i+1
        o=p+2000
        call=sound[p:o]
        call.export(path+label+ "calls{0}.wav".format(i),format="wav")


In [None]:
#Plotting the spectrogram
def plot_spectrogram(base,plot,calls):
    basePath = base
    plotPath = join(basePath,plot)
    folderpath = join(basePath, calls)
    onlyfiles = [f for f in listdir(folderpath) if isfile(join(join(folderpath, f)))]
    
    for idx, file in enumerate(onlyfiles):
        data, samplerate = sf.read(join(folderpath, file))
        f, t, spec = signal.spectrogram(data, samplerate)
        filename = file.split(sep=".")[0]
    
        fig, ax = plt.subplots(1, 1)
        ax.specgram(data, Fs=samplerate, NFFT=1024)
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)

        scale_y = 1000
        ticks_y = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale_y))
    
        plt.savefig(join(plotPath, 
                 filename + ".png"))
        plt.close(fig)

####The tsv files contains the parameters like start_time,duration_s,etc, but since these are not in the format Ketos accepts,we need to perform some changes in labels and therefore these files have been uploaded from the local machine


In [None]:
annot_train2,mean2=duration_mean('/content/podcast2.tsv')
annot_train3,mean3 = duration_mean('/content/podcast3.tsv')
annot_test,mean_test = duration_mean('/content/v10_test.tsv')

print(mean2)
annot_train2.head()


2.1110548004254963


Unnamed: 0,wav_filename,start,duration_s,location,date,data_source,data_source_id,label
0,1562337136_0004.wav,49.765625,2.45,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs
1,1562337136_0004.wav,41.046007,1.658854,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs
2,1562337136_0004.wav,37.345486,1.743924,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs
3,1562337136_0004.wav,42.917535,2.594618,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs
4,1562337136_0004.wav,45.980035,2.041667,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs


####Here is how the .tsv files and their labels look like when they are in format that could be accepted by ketos

In [None]:
add_end(annot_train2)
add_end(annot_train3)
add_end(annot_test)
annot_train2.head()

Unnamed: 0,wav_filename,start,duration_s,location,date,data_source,data_source_id,label,end
0,1562337136_0004.wav,49.765625,2.45,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,52.215625
1,1562337136_0004.wav,41.046007,1.658854,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,42.704861
2,1562337136_0004.wav,37.345486,1.743924,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,39.08941
3,1562337136_0004.wav,42.917535,2.594618,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,45.512153
4,1562337136_0004.wav,45.980035,2.041667,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,48.021701


###Standardizing the tsv files

In [None]:
map_to_ketos_annot_std ={'wav_filename': 'filename'} 
std_annot_train2 = sl.standardize(table=annot_train2, signal_labels=["SRKWs"], mapper=map_to_ketos_annot_std, trim_table=True)
std_annot_train3 = sl.standardize(table=annot_train3, signal_labels=["SRKWs"], mapper=map_to_ketos_annot_std, trim_table=True)

std_annot_test = sl.standardize(table=annot_test, signal_labels=["SRKWs"], mapper=map_to_ketos_annot_std, trim_table=True)


###Here we could see how each these tsv files look like after standardizing

In [None]:
std_annot_train2.head()


Unnamed: 0_level_0,Unnamed: 1_level_0,start,label,end
filename,annot_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1562337136_0004.wav,0,49.765625,1,52.215625
1562337136_0004.wav,1,41.046007,1,42.704861
1562337136_0004.wav,2,37.345486,1,39.08941
1562337136_0004.wav,3,42.917535,1,45.512153
1562337136_0004.wav,4,45.980035,1,48.021701


In [None]:
std_annot_train3.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,start,label,end
filename,annot_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
OS_9_27_2017_08_14_00__0002.wav,0,6.110451,1,7.856295
OS_9_27_2017_08_14_00__0004.wav,0,12.717882,1,15.167882
OS_9_27_2017_08_14_00__0004.wav,1,29.825347,1,31.637326
OS_9_27_2017_08_14_00__0004.wav,2,43.504514,1,45.103819
OS_9_27_2017_08_14_00__0004.wav,3,48.404514,1,50.344097


In [None]:
std_annot_test.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,start,label,end
filename,annot_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
OS_9_27_2017_08_14_00__0001.wav,0,11.643564,1,14.093564
OS_9_27_2017_08_14_00__0001.wav,1,15.594059,1,17.759901
OS_9_27_2017_08_14_00__0001.wav,2,53.9,1,56.35
OS_9_27_2017_08_14_00__0001.wav,3,59.781486,1,61.25
OS_9_27_2017_08_19_00__0002.wav,0,6.592882,1,7.826389


###Saving these standardized tsv files

In [None]:
std_annot_train2.to_csv('standardized_train2.tsv', mode='a', sep='\t',header=False)
std_annot_train3.to_csv('standardized_train3.tsv', mode='a', sep='\t',header=False)
std_annot_test.to_csv('standardized_test.tsv', mode='a', sep='\t',header=False)

In [None]:
annot_id2 = pd.read_csv('/content/standardized_train2.tsv', sep='\t')
annot_id3 = pd.read_csv('/content/standardized_train3.tsv', sep='\t')
annot_idtest = pd.read_csv('/content/standardized_test.tsv', sep='\t')


####Extracting the .wav file names and start time from these .tsv files which would be used by Pydub to extract small segemets of sounds(one containing the calls and the other not)

In [None]:
#Function to find extract filename and start time
def fname_stime(filename):
    file_name=filename.iloc[:,0].values
    start_time=filename.iloc[:,1].values
    return file_name,start_time


In [None]:
std_annot_train2.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,start,label,end
filename,annot_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1562337136_0004.wav,0,49.765625,1,52.215625
1562337136_0004.wav,1,41.046007,1,42.704861
1562337136_0004.wav,2,37.345486,1,39.08941
1562337136_0004.wav,3,42.917535,1,45.512153
1562337136_0004.wav,4,45.980035,1,48.021701


In [None]:
filename,start_time=fname_stime(std_annot_train2)
print(filename[0])
print(start_time[0])
annot_train2.head()

49.765625
1


Unnamed: 0,wav_filename,start,duration_s,location,date,data_source,data_source_id,label,end
0,1562337136_0004.wav,49.765625,2.45,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,52.215625
1,1562337136_0004.wav,41.046007,1.658854,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,42.704861
2,1562337136_0004.wav,37.345486,1.743924,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,39.08941
3,1562337136_0004.wav,42.917535,2.594618,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,45.512153
4,1562337136_0004.wav,45.980035,2.041667,orcasound_lab,2019-07-05,Orcasound_PodCast_Round2,1562337136,SRKWs,48.021701


####We could verify the start time and the file duration matches to the same column as above

####We would change to the directory from where we want to extract calls using pydub

In [None]:
!pwd

/content


In [None]:
%cd /content/Round2_OS_07_05/wav

/content/Round2_OS_07_05/wav


In [None]:
!mkdir pod_calls

In [None]:
extract_audio('round2_calls',annot_train2,"/content/Round2_OS_07_05/wav/pod_calls/",1)

49765.625
41046.0069444444
37345.4861111111
42917.534722222204
45980.034722222204
52700.5208333333
55295.1388888889
1147.64052741152
26115.197779319897
29995.0728660652
34725.0520471894
52485.426786953496
36554.4760582929
13883.906030855502
17708.3800841515
21964.095371669
19672.159887798
27846.966527196604
29329.9511854951
34229.951185495105
56773.709902370996
37544.4560669456
46172.41980474201
6233.23983169705
42204.9438990182
56264.09537166899
58972.44491458839
13916.549789621302
7261.1576011157595
8286.26220362622
10550.0348675035
11318.8633193863
34554.5676429568
52237.62203626221
53647.1408647141
14451.171875
47755.859375
56895.5078125
31508.298465829805
37607.6708507671
1250.0
58064.3398354815
13398.4375
27945.3125
17035.15625
38233.3984375
46798.828125
57363.1450488145
11881.54296875
20336.9140625
34300.0
41412.04351204351
46430.0699300699
51488.28125
53785.15625
57134.765625
129.12860154603
5552.52986647927
8522.487702037952
21091.0049191848
24534.434293745606
30216.0927617709

####Extracting the start time plus two second sound which we know by taking the mean of the duration

In [None]:
%cd /content/Round3_OS_09_27_2017/wav

/content/Round3_OS_09_27_2017/wav


####Similarly we would extract the calls for podcast3

In [None]:
extract_audio('round3_calls',annot_train3,"/content/Round2_OS_07_05/wav/pod_calls/",1)

6110.451306413301
12717.881944444402
29825.347222222197
43504.5138888889
48404.5138888889
3530.3819444444403
18842.8819444444
21692.7083333333
38281.25
45980.034722222204
54104.1666666667
9311.163895486938
19058.7885985748
22259.501187648504
32977.0387965162
30880.2083333333
11994.791666666699
36111.9791666667
37898.4375
45894.965277777796
49042.534722222204
11532.2265625
16365.234375
20145.5078125
39094.7265625
5790.0390625
8326.171875
2631.8359375
24500.0
38519.444444444394
53729.8611111111
56605.208333333394
5869.79166666667
2807.2916666666697
0.0
33087.6088677751
36750.0
50115.3998416469
23132.4228028504
25557.2050673001
29776.326207442606
40833.3333333333
45440.41963578779
47671.2193190816
0.0
19600.0
26093.3566433566
30351.8259518259
37368.6868686869
44195.1825951826
6139.27738927739
12135.7808857809
47591.297591297596
50351.592851592904
53545.8984375
57134.765625
59240.234375
17728.740157480304
20564.5669291339
24500.0
26950.0
50061.0236220473
56784.05511811029
59468.52494475739

####Verify the calls that we extracted are of annotated three itself

In [None]:
annot_train3.head()

Unnamed: 0,wav_filename,start,duration_s,location,date,data_source,data_source_id,label,end
0,OS_9_27_2017_08_14_00__0002.wav,6.110451,1.745843,orcasound_lab,2017-09-27,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,7.856295
1,OS_9_27_2017_08_14_00__0004.wav,12.717882,2.45,orcasound_lab,2017-09-27,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,15.167882
2,OS_9_27_2017_08_14_00__0004.wav,29.825347,1.811979,orcasound_lab,2017-09-27,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,31.637326
3,OS_9_27_2017_08_14_00__0004.wav,43.504514,1.599306,orcasound_lab,2017-09-27,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,45.103819
4,OS_9_27_2017_08_14_00__0004.wav,48.404514,1.939583,orcasound_lab,2017-09-27,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,50.344097


In [None]:
%cd /content/Round2_OS_07_05

/content/Round2_OS_07_05


###In the Round2 folder we are going to create two folders train and test respectively and in each of the folders we are going to create calls and nocalls folders respectively 

In [None]:
!mkdir train
!mkdir test
%cd train
!mkdir calls
!mkdir nocalls
%cd /content/Round2_OS_07_05/test

/content/Round2_OS_07_05/train
/content/Round2_OS_07_05/test


In [None]:
'''filename='/content/Round2_OS_07_05/wav/pod_calls/MMMcalls102.wav'
siz = wavread(filename,'size') 
siz = [samples channels]
siz(1)/Fs #%should give you the length in second
'''

"filename='/content/Round2_OS_07_05/wav/pod_calls/MMMcalls102.wav'\nsiz = wavread(filename,'size') \nsiz = [samples channels]\nsiz(1)/Fs #%should give you the length in second\n"

In [None]:
!pwd
!mkdir calls
!mkdir nocalls

/content/Round2_OS_07_05/test


###Now we would plot the graphs i.e the spectrograms without x and y labels into calls folder

In [None]:
plot_spectrogram('/content/Round2_OS_07_05/','train/calls','wav/pod_calls')

##Generation of spectrograms from the background sounds 

####Since we have generated the positive calls, its time to generate the negative ones.

###The above table shows the time area for the podcast-2 and podcast-3 which does not contain the calls timeframe
###The table displays the start time and the end time that does not contain the calls

In [None]:
positives_train2 = sl.select(annotations=std_annot_train2, length=2.0)
file_durations_train2 = sl.file_duration_table('/content/Round2_OS_07_05/wav')
negatives_train2=sl.create_rndm_backgr_selections(annotations=std_annot_train2, files=file_durations_train2, length=2.0, num=len(positives_train2), trim_table=True)
negatives_train2.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,start,end,label
filename,sel_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
1562337136_0004.wav,0,0.316021,2.316021,0
1562337136_0004.wav,1,11.787732,13.787732,0
1562337136_0004.wav,2,13.420318,15.420318,0
1562337136_0004.wav,3,21.158559,23.158559,0
1562337136_0004.wav,4,23.745995,25.745995,0


####Extracting the area by looking at time where there are no occurences of calls in tsv file 

In [None]:
positives_train3 = sl.select(annotations=std_annot_train3, length=2.0)
file_durations_train33 = sl.file_duration_table('/content/Round3_OS_09_27_2017/wav')
negatives_train33=sl.create_rndm_backgr_selections(annotations=std_annot_train3, files=file_durations_train33, length=2.0, num=len(positives_train3), trim_table=True)
negatives_train33.head()

Unnamed: 0_level_0,Unnamed: 1_level_0,start,end,label
filename,sel_id,Unnamed: 2_level_1,Unnamed: 3_level_1,Unnamed: 4_level_1
OS_9_27_2017_08_03_00__0002.wav,0,31.586918,33.586918,0
OS_9_27_2017_08_03_00__0003.wav,0,9.968256,11.968256,0
OS_9_27_2017_08_03_00__0003.wav,1,15.586086,17.586086,0
OS_9_27_2017_08_03_00__0003.wav,2,17.013745,19.013745,0
OS_9_27_2017_08_03_00__0003.wav,3,50.955001,52.955001,0


###Then the steps for generating audio is same as we did above for the positive calls

In [None]:
!pwd

/content/Round2_OS_07_05/test


In [None]:
%cd '/content/Round2_OS_07_05/wav'

/content/Round2_OS_07_05/wav


In [None]:
!mkdir neg_pod_calls

In [None]:
negatives_train2.to_csv('negative2.tsv', mode='a', sep='\t',header=False)
negatives_train33.to_csv('negative3.tsv', mode='a', sep='\t',header=False)

In [None]:
negatives_train2save=pd.read_csv('/content/Round2_OS_07_05/test/negative2.tsv',sep='\t')
negatives_train33save=pd.read_csv('/content/Round2_OS_07_05/test/negative3.tsv',sep='\t')

In [None]:
negatives_train2save.head()

Unnamed: 0,1562337136_0004.wav,0,4.965906564760641,6.965906564760641,0.1
0,1562337136_0004.wav,1,8.060243,10.060243,0
1,1562337136_0004.wav,2,13.91605,15.91605,0
2,1562337136_0004.wav,3,19.756478,21.756478,0
3,1562337136_0004.wav,4,23.582382,25.582382,0
4,1562337136_0004.wav,5,28.850221,30.850221,0


In [None]:
extract_audio('r2negcalls_calls',negatives_train2save,"/content/Round2_OS_07_05/wav/neg_pod_calls/",2)

8060.242860774112
13916.049871977484
19756.47799325933
23582.381967614536
28850.221074458168
4450.438852986403
8721.565156466779
19903.782607036534
24865.959367919695
31279.37695459839
31590.178880640025
38387.03301592648
43525.611093960004
4203.365528259823
6492.822269707175
13331.841795564005
14795.226278909637
15324.373176324712
16459.15938413802
39638.8366499709
43615.40267622466
45953.89461684218
47102.58741655531
54740.55712800214
57562.58142655264
10243.830291406057
24444.547205939467
30954.892491859424
39212.908083004746
39294.65089676936
919.116200918154
50476.43758115322
53918.01241291847
59083.773021529785
2311.556104978706
9897.909544262404
24044.439252452667
25353.27758584339
39587.232437726765
46935.4182485676
5563.82449450166
19462.85740636063
24649.283585126737
25152.001590987245
27219.772184052264
32024.47306803873
37147.460409379
38933.55991514358
48244.93759730501
54160.10907529824
20105.06060543554
22425.40063509233
25683.264273836132
41324.366790936714
48331.911392

In [None]:
%cd '/content/Round3_OS_09_27_2017/wav'

/content/Round3_OS_09_27_2017/wav


In [None]:
extract_audio('r3neg_calls',negatives_train33save,"/content/Round2_OS_07_05/wav/neg_pod_calls/",2)

9968.255894815584
15586.086354547491
17013.7452387451
50955.000969534005
236.795371139948
4889.121273055351
12899.49713036998
43090.87087185986
3398.3199963797115
18275.700627149265
20926.47940087346
32290.098447052373
36999.340318160175
48545.70722265319
52665.34677943852
952.1494280022296
23641.28720832218
36232.737795665
28394.32956807292
42871.68919035031
46309.509198351116
55699.514593876636
29618.822831239588
36419.73761661428
41236.77608763006
43092.612391478324
56812.10910459464
5037.622402412809
15238.875751206137
25215.263696149465
36044.78354579584
40220.49698587678
51333.097944277935
54949.58169882989
31723.346465319253
31747.15332107207
51651.429477693855
485.1842624623259
13018.6112339436
27956.526788848125
41857.84051550468
45194.05945113681
55768.66188883582
4407.810218302871
7673.408208656951
29135.81418075261
41379.834232640744
13013.61013297742
14454.944490900516
26967.69273327629
11482.717617203662
13890.913637395555
32150.154583338575
34670.58762810711
47854.963785

In [None]:
negatives_train33save.head()

Unnamed: 0,OS_9_27_2017_08_03_00__0002.wav,0,31.586917768911015,33.586917768911015,0.1
0,OS_9_27_2017_08_03_00__0003.wav,0,9.968256,11.968256,0
1,OS_9_27_2017_08_03_00__0003.wav,1,15.586086,17.586086,0
2,OS_9_27_2017_08_03_00__0003.wav,2,17.013745,19.013745,0
3,OS_9_27_2017_08_03_00__0003.wav,3,50.955001,52.955001,0
4,OS_9_27_2017_08_09_00__0000.wav,0,0.236795,2.236795,0


In [None]:
!pwd

/content/Round3_OS_09_27_2017/wav


In [None]:
import os
import matplotlib.pyplot as plt
import soundfile as sf
import pandas as pd
from pydub import AudioSegment


# Generate mean and annot-train
def duration_mean(filename):
    annot_train = pd.read_csv(filename, sep='\t')
    mean = annot_train['duration_s'].mean()
    return annot_train, mean


# Function to find extract filename and start time
def fname_stime(filename):
    file_name = filename.iloc[:, 0].values
    start_time = filename.iloc[:, 1].values
    return file_name, start_time


# Function to extract audio from the .wav files to generate
# complete positive and negative calls
def extract_audio(label, filename, path, position, file_location):
    file_name = filename.iloc[:, 0].values
    start_time = filename.iloc[:, position].values
    i = 0
    o = 0
    os.chdir(file_location)

    for x in file_name:
        AUDIO_FILE = x
        sound = AudioSegment.from_file(AUDIO_FILE)
        p = start_time[i]
        p = p * 1000
        print(p)
        i = i + 1
        o = p + 3000
        call = sound[p:o]
        call.export(path + label + "MMMcalls{0}.wav".format(i), format="wav")


#Plotting the spectrogram
def plot_spectrogram(base,plot,calls):
    basePath = base
    plotPath = join(basePath,plot)
    folderpath = join(basePath, calls)
    onlyfiles = [f for f in listdir(folderpath) if isfile(join(join(folderpath, f)))]
    for idx, file in enumerate(onlyfiles):
        data, samplerate = sf.read(join(folderpath, file))
        f, t, spec = signal.spectrogram(data, samplerate)
        filename = file.split(sep=".")[0]
    
        fig, ax = plt.subplots(1, 1)
        ax.specgram(data, Fs=samplerate, NFFT=1024)
        ax.get_xaxis().set_visible(False)
        ax.get_yaxis().set_visible(False)


        scale_y = 1000
        ticks_y = ticker.FuncFormatter(lambda x, pos: '{0:g}'.format(x/scale_y))
        os.chdir(plotPath)

        plt.savefig(join(plotPath, 
                 filename + ".png"))
        plt.close(fig)


# Enter the path of the standardized tsv's
annot_train2, mean2 = duration_mean('/content/podcast2.tsv')
annot_train3, mean3 = duration_mean('/content/podcast3.tsv')
annot_test, mean_test = duration_mean('/content/v10_test.tsv')


# Extract the audio of the calls
extract_audio(
    "round2_calls", annot_train2,
    "/content/pod_calls/", 1,
    "/content/Round2_OS_07_05/wav/")

extract_audio(
    'round3_calls', annot_train3,
    "/content/pod_calls/", 1,
    "/content/Round3_OS_09_27_2017/wav/")


#Standardizing the tsv files
map_to_ketos_annot_std ={'wav_filename': 'filename'} 
std_annot_train2 = sl.standardize(table=annot_train2, signal_labels=["SRKWs"], 
                       mapper=map_to_ketos_annot_std, trim_table=True)
std_annot_train3 = sl.standardize(table=annot_train3, signal_labels=["SRKWs"], 
                       mapper=map_to_ketos_annot_std, trim_table=True)
std_annot_test = sl.standardize(table=annot_test, signal_labels=["SRKWs"], 
                       mapper=map_to_ketos_annot_std, trim_table=True)



# Since we also want the negative audio, we would generate negative sound 
# by extracting it from the background sound what are not present in  the call
# duration. For that first we create a tsv that generates time-interval from the
# tsv files that are not within the start-time and end-time
positives_train2 = sl.select(annotations=std_annot_train2, length=3.0)
file_durations_train2 = sl.file_duration_table('/content/Round2_OS_07_05/wav')
negatives_train2=sl.create_rndm_backgr_selections(
                    annotations=std_annot_train2, 
                    files=file_durations_train2, length=3.0, 
                    num=len(positives_train2), 
                    trim_table=True)


# Same steps for podcast3 tsv file
positives_train3 = sl.select(annotations=std_annot_train3, length=3.0)
file_durations_train33 = sl.file_duration_table('/content/Round3_OS_09_27_2017/wav')
negatives_train33=sl.create_rndm_backgr_selections(
                    annotations=std_annot_train3, 
                    files=file_durations_train33, 
                    length=3.0, num=len(positives_train3), 
                    trim_table=True)


#Saving these tsv files for future use
negatives_train2.to_csv('negative2.tsv', mode='a', sep='\t',header=False)
negatives_train33.to_csv('negative3.tsv', mode='a', sep='\t',header=False)
negatives_train2save=pd.read_csv('/content/Round2_OS_07_05/test/negative2.tsv',sep='\t')
negatives_train33save=pd.read_csv('/content/Round2_OS_07_05/test/negative3.tsv',sep='\t')



extract_audio(
    'round2_calls', negatives_train2save,
    "/content/pod_calls/", 2,
    "/content/Round2_OS_07_05/wav/")


extract_audio(
    'round3_calls', negatives_train33save,
    "/content/pod_calls/", 2,
    "/content/Round3_OS_09_27_2017/wav/")


plot_spectrogram('/content/Round2_OS_07_05/','train/calls','wav/pod_calls')
plot_spectrogram('/content/Round2_OS_07_05/','train/nocalls','wav/neg_pod_calls')


49765.625
41046.0069444444
37345.4861111111
42917.534722222204
45980.034722222204
52700.5208333333
55295.1388888889
1147.64052741152
26115.197779319897
29995.0728660652
34725.0520471894
52485.426786953496
36554.4760582929
13883.906030855502
17708.3800841515
21964.095371669
19672.159887798
27846.966527196604
29329.9511854951
34229.951185495105
56773.709902370996
37544.4560669456
46172.41980474201
6233.23983169705
42204.9438990182
56264.09537166899
58972.44491458839
13916.549789621302
7261.1576011157595
8286.26220362622
10550.0348675035
11318.8633193863
34554.5676429568
52237.62203626221
53647.1408647141
14451.171875
47755.859375
56895.5078125
31508.298465829805
37607.6708507671
1250.0
58064.3398354815
13398.4375
27945.3125
17035.15625
38233.3984375
46798.828125
57363.1450488145
11881.54296875
20336.9140625
34300.0
41412.04351204351
46430.0699300699
51488.28125
53785.15625
57134.765625
129.12860154603
5552.52986647927
8522.487702037952
21091.0049191848
24534.434293745606
30216.0927617709

AssertionError: ignored

In [None]:
plot_spectrogram('/content/Round2_OS_07_05/','train/nocalls','wav/neg_pod_calls')


###Now we have saved the images(spectrograms) of the negtive calls 

##We would perform similar steps with test data as well where in :

*   Extract area responsible for the call from the Orcasound_test tsv file
*   Standardize that tsv file
*   Generate 2 second calls that contains the audio data
*   Generate background sound from tsv of 2 seconds 
*   Generate spectrograms from these sounds 



---







In [None]:
annot_test.head()

Unnamed: 0,wav_filename,start,duration_s,location,date,data_source,data_source_id,label,end
0,OS_9_27_2017_08_14_00__0001.wav,11.643564,2.45,orcasound_lab,9/27/2017,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,14.093564
1,OS_9_27_2017_08_14_00__0001.wav,15.594059,2.165842,orcasound_lab,9/27/2017,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,17.759901
2,OS_9_27_2017_08_14_00__0001.wav,53.9,2.45,orcasound_lab,9/27/2017,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,56.35
3,OS_9_27_2017_08_14_00__0001.wav,59.781486,1.468514,orcasound_lab,9/27/2017,Orcasound_PodCast_Round3,OS_9_27_2017_08_14,SRKWs,61.25
4,OS_9_27_2017_08_19_00__0002.wav,6.592882,1.233507,orcasound_lab,9/27/2017,Orcasound_PodCast_Round3,OS_9_27_2017_08_19,SRKWs,7.826389


In [None]:
extract_audio('test_calls',annot_test,"/content/Round2_OS_07_05/wav/pod_calls_test/",1)

11643.564359999998
15594.05941
53900.0
59781.486399999994
6592.881944
23011.28472
29519.09722
50769.44444
20709.25197
22725.19685
41650.0
43280.118109999996
52125.19685
54671.65354
56687.59843
60295.078740000004
0.0
2666.724257
4688.355218
7815.618521
16091.77609
18457.118179999998
20991.77609
23187.802349999998
28045.473390000003
30011.23013
31704.38839
47311.921220000004
51957.94748
60069.86869
4432.118056
7945.486111
10506.07639
12760.41667
16299.30556
17685.9375
20331.59722
23351.5625
33253.64583
34538.19444
38221.70139
46550.0
49042.534719999996
54027.60417
578.472222
5282.8125
12888.02083
19217.1875
25010.416670000002
34300.0
36750.0
49000.0
1745.84323
6401.4251779999995
9800.0
18727.07838
22971.41726
28307.878070000002
29496.99129
32577.434680000002
35595.80364
49000.0
51207.52177
52930.08709
935.7638890000001
3360.2430560000003
5486.979167
7350.0
10165.79861
12250.0
14631.94444
16767.1875
18800.34722
20842.013890000002
22245.65972
24500.0
25988.71528
27052.083329999998
29697.74

In [None]:
%cd /content/Round3_OS_09_27_2017/wav/OrcasoundLab09272017_Test/wav

/content/Round3_OS_09_27_2017/wav/OrcasoundLab09272017_Test/wav


In [None]:
plot_spectrogram('/content/Round2_OS_07_05/','test/calls','wav/pod_calls_test')
#plot_spectrogram('/content/Round2_OS_07_05/','train/nocalls','wav/neg_pod_calls')

In [None]:
positives_test = sl.select(annotations=std_annot_test, length=2.0)
file_durations_test = sl.file_duration_table('/content/Round3_OS_09_27_2017/wav/OrcasoundLab09272017_Test/wav')
negatives_test=sl.create_rndm_backgr_selections(annotations=std_annot_test, files=file_durations_test, length=2.0, num=len(positives_test), trim_table=True)
negatives_test.to_csv('negg.tsv', mode='a', sep='\t',header=False)

In [None]:
%cd '/content/Round3_OS_09_27_2017/wav'

/content/Round3_OS_09_27_2017/wav


In [None]:
neg=pd.read_csv('/content/negg.tsv',sep='\t')
extract_audio('r3neg_calls_test',neg,"/content/Round2_OS_07_05/wav/neg_pod_calls_test/",2)

32092.359804222888
8783.855372268717
10833.785480487664
17738.476274292596
33540.715613618784
35044.90456504459
37113.680731379616
42793.43872899449
43384.58670523683
59125.63500008719
2060.9395920459265
8176.149226596921
34834.248259125256
47345.26678858501
34709.85964935287
57500.134296240125
31204.83729579973
31278.398563865892
39190.97375702137
44157.16247710742
51121.90331613828
56989.280750245656
471.8055366599856
18274.182818784648
22737.004808849408
54212.39864312781
55131.49852909686
3814.976375445269
15357.494164372554
25978.88827580829
539.6555075449214
2367.4188285905298
36720.852559893385
38216.30601213553
38402.17698800904
39696.55870925943
43875.851886507924
45262.38341813553
58304.54516292468
16795.222874192972
19196.00311286649
248.1962345352713
3595.0412613838125
22389.78804169051
24473.454693967935
28732.94690043167
30422.586576337035
35321.75520091903
35512.06545763591
39341.4811932812
7430.3162061734165
32241.99889693
35147.26495362288
40554.45806876003
42726.01082

In [None]:
neg.head()

Unnamed: 0,OS_9_27_2017_08_14_00__0001.wav,0,1.9261489230579651,3.926148923057965,0.1
0,OS_9_27_2017_08_14_00__0001.wav,1,32.09236,34.09236,0
1,OS_9_27_2017_08_19_00__0002.wav,0,8.783855,10.783855,0
2,OS_9_27_2017_08_19_00__0002.wav,1,10.833785,12.833785,0
3,OS_9_27_2017_08_19_00__0002.wav,2,17.738476,19.738476,0
4,OS_9_27_2017_08_19_00__0002.wav,3,33.540716,35.540716,0


In [None]:
plot_spectrogram('/content/Round2_OS_07_05/','test/nocalls','wav/neg_pod_calls_test')

In [None]:
!zip -r /content/train_nopreprocess.zip /content/Round2_OS_07_05/train

  adding: content/Round2_OS_07_05/train/ (stored 0%)
  adding: content/Round2_OS_07_05/train/nocalls/ (stored 0%)
  adding: content/Round2_OS_07_05/train/calls/ (stored 0%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls205.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls201.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls343.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls322.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls344.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls217.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls333.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls308.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/calls/round2_callsMMMcalls75.png (deflated 1%)
  adding: content/Round2_OS_07_05/tr

In [None]:
!zip -r /content/negtrainsave.zip /content/Round2_OS_07_05/train/nocalls

  adding: content/Round2_OS_07_05/train/nocalls/ (stored 0%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3141.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod373.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod386.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3184.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3136.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3399.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3472.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3357.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3393.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3296.png (deflated 1%)
  adding: content/Round2_OS_07_05/train/nocalls/neg_calls_pod3172.png (deflated 1%)
  adding: content

In [None]:
!zip -r /content/no_preprocess_test.zip /content/Round2_OS_07_05/test

  adding: content/Round2_OS_07_05/test/ (stored 0%)
  adding: content/Round2_OS_07_05/test/nocalls/ (stored 0%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls52.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls68.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls63.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls72.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls100.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls61.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls60.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls7.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_testMMMcalls23.png (deflated 1%)
  adding: content/Round2_OS_07_05/test/nocalls/r3neg_calls_t