## *NBC Overseas Correspondent Classifier*

In [None]:
import os
os.chdir('/home/sharedfolder')

In [None]:
import os
import shutil
import random
import numpy as np
from pyAudioAnalysis import audioTrainTest as aT
from pyAudioAnalysis import audioSegmentation as aS

In [None]:
# Creating new directories to keep our files organized

!mkdir /home/sharedfolder/NBC_DDay_News
!mkdir /home/sharedfolder/NBC_DDay_News_Clips
!mkdir /home/sharedfolder/NBC_DDay_News_Clips_Subset

In [None]:
# Let's change our working directory to "NBC_DDay_Complete_Broadcast" and create a list of filenames.

os.chdir('/home/sharedfolder/NBC_DDay_Complete_Broadcast')
filenames=os.listdir('./')
filenames = [item for item in filenames if 'news' in item.lower()]
len(filenames)
filenames=filenames[0:5]

### *Task:*

Move all files whose names contain "News" into the directory **`/home/sharedfolder/NBC_DDay_News`**.

**Hint:** You can either do this in the terminal or in Python. **`shutil.move("filename.mp3", "/path/to/directory/")`** will move a file to another directory, just like **`mv`** does in Bash.

If you choose to use Python, one option is to use the list comprehension feature. You may want to create and view your list of "News" filenames before you actually move them.

In [None]:


filenames


In [None]:
for filename in filenames:
    shutil.copy(filename, "/home/sharedfolder/NBC_DDay_News")

In [None]:
os.chdir("/home/sharedfolder/NBC_DDay_News")

!for f in *.mp3; do ffmpeg -i $f -f segment -segment_time 30 $f.%04d_clip.wav ; done



### *Splitting recordings into 30-second segments*

Next we'll use **`ffmpeg`** to split each file into 30-second WAV chunks (which are quicker and more convenient to work with than full recordings). 

In the terminal, **`cd`** to the directory **`NBC_DDay_News_Clips`** and run the following command, which will create numbered 30-second segments whose filenames end with **`_clip.wav`**. The process may take a few minutes to complete.

```
for f in *.mp3; do
ffmpeg -i $f -f segment -segment_time 30 $f.%04d_clip.wav ;
done
```

In [None]:
!for f in *.mp3; do ffmpeg -i $f -f segment -segment_time 30 $f.%04d.wav ; done
!for f in *.wav; do sox $f -n spectrogram -x 1600 -y 513 -r -o $f.png; done

### *Task:*

Use Python to randomly choose around 300 of the files you just created and move them to the directory **`NBC_DDay_News_Clips_Subset`**.

First you'll need to change your current directory to **`/home/sharedfolder/NBC_DDay_News_Clips`** with **`os.chdir(...)`**, then use **`os.listdir('./')`** to create a list of filenames like we did in the third cell of this notebook.

**Hint:** Only include filenames that end with **`_clip.wav`**. We only want to work with our 30-second clips in the next step, not the original recordings.

Try running the cell below a few times to see how **`random.sample(...)`** chooses items randomly from a list. 

Once you've created a list of randomly chosen filenames, you can use **`shutil.move(..., ...)`**  to move them to the directory **`NBC_DDay_News_Clips_Subset`**, just like we did a few cells back.


In [None]:
random.sample([1,2,3,4,5,6,7,8,9,10], 4)

In [None]:
os.chdir('/home/sharedfolder/NBC_DDay_News_Clips')

In [None]:
os.chdir('/home/sharedfolder/NBC_DDay_Complete_Broadcast')

In [None]:
clip_filenames = [item for item in os.listdir('./') if '_clip.wav' in item]

clip_filenames[:10]

In [None]:
for filename in clip_filenames:
    shutil.move(filename,'/home/sharedfolder/NBC_DDay_News_Clips')

The program **`sox`**, known as "the Swiss Army knife of sound processing programs," can create high-quality spectrograms and save them as PNG image files.

Back in the terminal, **`cd`** to the directory **`NBC_DDay_News_Clips_Subset`** and run the following command to create spectrograms for every file in the set.

```
for f in *_clip.wav; do sox $f -n spectrogram -x 1600 -y 513 -r -o $f.png; done
```

In [None]:
os.chdir('../NBC_DDay_News_Clips')

In [None]:
!for f in *_clip.wav; do sox $f -n spectrogram -x 1600 -y 513 -r -o $f.png; done

In [None]:

temp_filenames = os.listdir("/home/sharedfolder/NBC_DDay_News_Clips")

temp_filenames = random.sample(temp_filenames, 50)


In [None]:
for filename in temp_filenames:
    shutil.copy(filename,'../NBC_DDay_News_Clips_Subset')

## *Sorting things out*

We should now have a few hundred 30-second WAVs and corresponding PNG spectrograms in the same directory, **`NBC_DDay_News_Clips_Subset`**. Next we'll sort through the PNG files to find the really noisy ones from foreign correpondents — but first we need to create the directories where we'll put them.

In [None]:
# Directories where we'll put our two training classes

!mkdir /home/sharedfolder/noisy
!mkdir /home/sharedfolder/not_noisy

Open **`sharedfolder`** on your desktop and start opening spectrogram files with whatever program is most convenient. If a file looks really noisy, drag the corresponding WAV file to the **`noisy`** directory. If not, put the WAV file in **`not_noisy`**.

If you can't decide where to put a file, take a quick listen and make a decision. This is your classifier; you call the shots. If you come across a file that contains music, just ignore it and move on.

Once you have 20 or 30 WAVs in each class, you're ready to train your model.

## *Training your classifier*

Set your working directory to **`/home/sharedfolder`** and create a directory for your new model files.

In [None]:
os.chdir('/home/sharedfolder')

!mkdir models

In [None]:
noisy_list = [item.replace('.png','') for item in os.listdir('/home/sharedfolder/all_noises/not_noisy') if '.wav' in item]


In [None]:

not_noisy_list = [item.replace('.png','') for item in os.listdir('/home/sharedfolder/all_noises/noisy') if '.wav' in item]
              

In [None]:
!mkdir ../noisy_all
!mkdir ../not_noisy_all

In [None]:
os.chdir('/home/sharedfolder/NBC_DDay_Complete_Broadcast')



for item in noisy_list:
    try:
        shutil.copy(item.replace('.wav.wav','').replace('.wav.wav','.wav'),'../noisy_all')
    except:
        print item


for item in noisy_list:
    try:
        shutil.copy(item.repalce('.wav.wav','').replace('.wav.wav','.wav'),'../not_noisy_all')
    except:
        print item



In [None]:
not_noisy_list = os.listdir('/home/sharedfolder/all_noises/noisy')


os.listdir('home/sharedfolder/all_noises/not_noisy')

In [None]:
!pwd