# Setup the StarGAN baseline model for training
Purpose: To clone the repository and begin training the StarGAN-EVC model

Assumptions: 

- Raw data (in the form of .wav files) has already been pre-processed into WORLD features by run_preprocessing.py
- Pre-processed data is stored somewhere on Google Drive (can access here: https://drive.google.com/drive/folders/1UNdFseXfTQrGf_eT2E2GVVN28geNzxBG?usp=sharing) 

# Clone the Repository

Eric's repository is here: https://github.com/eric-zhizu/EmotionalConversionStarGAN

It's a good idea to fork the repository so that you have your own copy that you can push to (and then pull request in order to merge with Eric's repository later).

Another option if you don't want to fork is to create another branch, but you'll have to ask Eric for permission to push to his repository.

In [1]:
# A command the clears the output of a code cell
from IPython.display import clear_output

In [27]:
from google.colab import drive
drive.mount("/content/gdrive", force_remount=True)

Mounted at /content/gdrive


In [4]:
# You can replace this GitHub link with your forked repository if you want
!git clone https://github.com/eric-zhizu/EmotionalConversionStarGAN.git
!git checkout eric-implementation
clear_output()

In [None]:
!pip install pyworld librosa

In [None]:
%cd /content/
!git clone https://github.com/speechbrain/speechbrain.git
%cd speechbrain
%pip install -r requirements.txt
%pip install --editable .

# Download the Pre-processed Data

Choose one out of two pre-processed datasets:

- IEMOCAP
- Emotional Speech Dataset

Find the pre-processed data here: https://drive.google.com/drive/folders/1UNdFseXfTQrGf_eT2E2GVVN28geNzxBG?usp=sharing

Move the desired preprocessed dataset into your Google Drive by right-clicking on the file and adding a shortcut to the file from the root directory of your Google Drive.

Run the code below to move the file into your repository.

In [None]:
%cd EmotionalConversionStarGAN/
!unzip /content/gdrive/MyDrive/final_project/data/esd_ser_processed_data.zip

# Run the Training
full_training_script.sh has three steps:

1) Run classifier_train.py for 100 epochs

2) Run train_main.py for 200,000 iterations (one iteration = one batch, not one epoch) just for reconstructing audio

3) Run train_main.py for 100,000 iterations for emotion conversion

Adjust the hyperparameters in the .yaml files in configs/

Here are some important arguments to train_main.py:

```
python train_main.py 
--checkpoint <path/to/encoder-decoder-checkpoint> 
--load_emo <path/to/classifier-checkpoint> 
--config <path/to/config-file> 
--alter
--recon_only
```

```
Description of arguments
checkpoint: load a generator/discriminator model
load_emo: load a SER model
config: load a .yaml file containing hyperparameters
alter: if specified, will make the config file override the hyperparameters saved in checkpoint
recon_only: if specified, will ignore the classifier and train the model to reconstruct the audio in the same emotion
```

In [None]:
%cd EmotionalConversionStarGAN

/content/EmotionalConversionStarGAN


In [None]:
!pip install librosa pyworld tensorflow==1.15
clear_output()

In [None]:
!bash full_training_script.sh

[1;30;43mStreaming output truncated to the last 5000 lines.[0m
Training Generator...
D/total_loss         = -0.9153
G/total_loss         = 0.5071
D/gradient_penalty   = 0.0048
G/loss_cycle         = 0.0074
G/loss_id            = 0.0074
D/preds_real         = 0.4695
D/preds_fake         = -0.4700
No model saved this iteration.
12:16:32.335783 elapsed. Iteration 199625 complete
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Iteration 199626/200000 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Iteration 199626 lr = 0.000000
Device is  cuda
Classifier device is  cuda
Getting mini-batch.
solver.train: x_real size = torch.Size([4, 1, 512, 36])
No classifier training this run.
Training Discriminator...
No Generator update this iteration.
No log output this iteration.
No model saved this iteration.
12:16:32.394973 elapsed. Iteration 199626 complete
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Iteration 199627/200000 ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Iteration 199627 lr = 0.000000
Device is  cuda
Classifier device is  cuda
Getting 

# Evaluate Model

In [22]:
!git pull origin eric-implementation

remote: Enumerating objects: 5, done.[K
remote: Counting objects:  20% (1/5)[Kremote: Counting objects:  40% (2/5)[Kremote: Counting objects:  60% (3/5)[Kremote: Counting objects:  80% (4/5)[Kremote: Counting objects: 100% (5/5)[Kremote: Counting objects: 100% (5/5), done.[K
remote: Compressing objects: 100% (1/1)[Kremote: Compressing objects: 100% (1/1), done.[K
remote: Total 3 (delta 2), reused 3 (delta 2), pack-reused 0[K
Unpacking objects:  33% (1/3)   Unpacking objects:  66% (2/3)   Unpacking objects: 100% (3/3)   Unpacking objects: 100% (3/3), done.
From https://github.com/eric-zhizu/EmotionalConversionStarGAN
 * branch            eric-implementation -> FETCH_HEAD
   f99024d..4178032  eric-implementation -> origin/eric-implementation
Updating f99024d..4178032
Fast-forward
 convert.py | 2 [32m+[m[31m-[m
 1 file changed, 1 insertion(+), 1 deletion(-)


In [25]:
# Convert .wav ==> input features
# Pass into the model
# Output features
### TO ADD: compare output features with input features ==> evaluate the model
# Converts output features ==> .wav
%cd /content/EmotionalConversionStarGAN
!python convert.py --checkpoint /content/gdrive/MyDrive/final_project/checkpoints/esd_trial_checkpoints_02/model_step2/300000.ckpt -o processed_data/converted

/content/EmotionalConversionStarGAN
Loading model at  /content/gdrive/MyDrive/final_project/checkpoints/esd_trial_checkpoints_02/model_step2/300000.ckpt
Building components
Building optimizers
/content/gdrive/MyDrive/final_project/checkpoints/esd_trial_checkpoints_02/model_step2/300000.ckpt
Model and optimizers loaded.
Number of emotions = 4
Converting train and test samples in ./processed_data/audio
27988  files used.
  wav = wavfile.read(path)[1]
Converting 0015_000693 to 0.
Converting 0015_000693 to 1.
Converting 0015_000693 to 2.
Converting 0015_000693 to 3.
Converting 0013_001351 to 0.
Converting 0013_001351 to 1.
Converting 0013_001351 to 2.
Converting 0013_001351 to 3.
Converting 0013_000900 to 0.
Converting 0013_000900 to 1.
Converting 0013_000900 to 2.
Converting 0013_000900 to 3.
Converting 0013_000631 to 0.
Converting 0013_000631 to 1.
Converting 0013_000631 to 2.
Converting 0013_000631 to 3.
Converting 0015_001308 to 0.
Converting 0015_001308 to 1.
Converting 0015_001308 to

In [26]:
!cp -r processed_data/converted /content/gdrive/MyDrive/final_project/data/esd_trial_conversion_02