<a href="https://colab.research.google.com/github/jcalandra/audiosynthesis_dl/blob/master/src/Pict2Audio_color.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Pict2Audio : A Neural Network that associates Pictures to Audio Descriptors

## Pict2Audio_color
This is the independant code for the color neural network. This code aims to test the efficiency of every possibilities of databases and architectures for this neural network.

**You will find the multilabel version at the following link :**
[https://colab.research.google.com/drive/1_ZTdR2CG_eekUUtqAG9Bqa7RixHL8v93](https://colab.research.google.com/drive/1_ZTdR2CG_eekUUtqAG9Bqa7RixHL8v93)

## Importation of the libraries

First, we need to import all the package and libraries necessary to run the code.

The backend Tensorflow is used with the library Keras to implement the neural network.

In [0]:
from __future__ import print_function
import os
from PIL import Image

import numpy as np
import random
import  colorsys

import tensorflow as tf
import keras
from keras.models import Sequential
from keras.layers import Dense, Activation, BatchNormalization,Dropout, Conv2D, MaxPooling2D, Flatten
from keras import optimizers
from keras.optimizers import RMSprop
from keras.preprocessing import image
from sklearn.preprocessing import MultiLabelBinarizer
from keras import callbacks

from keras.models import load_model
import pickle


print('tensorflow:', tf.__version__)
print('keras:', keras.__version__)


## Importation of the Dataset :

the dataset is imported from github, using the repository audiosynthesis_dl. In this repository, you can also find documentation about sound synthesis using Neural Networks.

In [0]:
# First, import git repository
! git clone https://github.com/jcalandra/audiosynthesis_dl.git


Then, run the following script in the same environment :

https://colab.research.google.com/drive/1lJELWVC4DmQSNOw0fat4KrzTo_GmVbpZ

To avoid downloading a heavy set of data on the computer, I chose to generate the pictures directly and load them from google colab

In [0]:
import cv2
from google.colab.patches import cv2_imshow
from google.colab import files

In [0]:
## GLOBAL VALUES

NB_PITCH = 12
NB_COLOR = 8
NB_THICK = 3
NB_CLASS = NB_COLOR
PICT_WIDTH = 400
NB_CHANNEL = 3
LINE_WIDTH = PICT_WIDTH//NB_PITCH #33

NB_CARACTERISTICS = 1

# number of versions for a same pitch
NB_VERSION_TRAIN = 2
NB_VERSION_VALIDATION = 1

# number of trained and validation pictures
NB_TRAIN = NB_VERSION_TRAIN*NB_COLOR*NB_PITCH*NB_THICK
NB_VALIDATION = NB_VERSION_VALIDATION*NB_COLOR*NB_PITCH*NB_THICK

In [0]:
## LOADING THE BASELINE

baseline = cv2.imread('audiosynthesis_dl/data/base_quadrillage.png')
#cv2_imshow(baseline)

In [0]:
## PICTURE GENERATION

# generating exactly the same path line for every color, pitch and thickness

def generate_pict(nb_version_pict, folder, outline_value, pic_type): # pic_type = [PITCH, THICK, COLOR] with 0 if no, 1 if yes
  """ creates nb_pictures of pitch-lines and saves them permanently in the folder img_'folder'.
      Folder has to be a string and nb_pict is an integer."""
  
  # default values if pitches, thicknesses and/or colors don't change
  height = (3%NB_PITCH)*LINE_WIDTH + (LINE_WIDTH + 5)//2
  thickness = 6
  sat = hue = val = 0
  # default indices
  pitch_ind = 0
  thick_ind = 0
  color_ind = 0
  
  # beginning value for thickness if there is a variation of thickness
  if (pic_type[1] == 1):
    thickness = 12
  # tab of color values if there is a variation of colors
  hueval_tab = [[(i*180)/(NB_COLOR//2), 90] for i in range(NB_COLOR//2)] + [[(i*180)/(NB_COLOR//2), 210] for i in range(NB_COLOR//2)]
  #tab of available thicknesses if there is a variotion of thicknesses
  thickness_tab = [2,7,12]
  global_line_path = np.empty((PICT_WIDTH),dtype=int)
  
  for i in range (nb_version_pict) :
    
    # generation of lines path :   
    outline = outline_value 
    delta = (LINE_WIDTH - (thickness*2))//2 + outline    
    variation = np.random.randint(0,delta - outline)   # the line begin at a random point in delta
    intervalle_max = np.random.randint(2,50)           # interval allowed to keep the same height between each variation
    
    for l in range(PICT_WIDTH):
      intervalle = np.random.randint(1,intervalle_max) # to avoid a sharp variation, we keep the same variation height for each 'intervalle'
      if (l%intervalle == 0) :                         # if we want to change the height of the line
        tmp_var = np.random.randint(-1, 2)             # each variation is an increase or a decrease of 1 (or same height)
        if abs(variation + tmp_var) < delta :
          variation = variation + tmp_var
        else :
          variation = variation             
      global_line_path[l] = PICT_WIDTH + variation
    
    # pitch affiliation :    
    for p in range(NB_PITCH):
      if (pic_type[0] == 1 ):
        pitch_ind = 69 + p #69 is pitch for la440
        height = p*LINE_WIDTH + (LINE_WIDTH + 5)//2       
      line_path = global_line_path - height - 1

      
      # generation of the pictures. There are nb_version_pict*NB_COLOR*NB_PITCH pictures :
      for m in range(NB_COLOR):  
        
        # creation of the baseline, quadrilled picture :       
        line_image_rgb = baseline.copy()

                
        # color affiliation :
        line_image_hsv = cv2.cvtColor(line_image_rgb, cv2.COLOR_RGB2HSV)
        h, s, v = cv2.split(line_image_hsv)
      
        if( pic_type[2] == 1):
          color_ind = m
          color = hueval_tab[color_ind]
          sat = 150
          hue = color[0]
          val = color[1]
          
        #thickness affiliation :
        for t in range(NB_THICK): 
          thick_ind = t
          thickness = thickness_tab[thick_ind]

          for j in range(PICT_WIDTH):

            # creation of the line :
            if( line_path[j] > 0 and line_path[j] < 400 ) :
              h[line_path[j]][PICT_WIDTH - j - 1] = hue      
              s[line_path[j]][PICT_WIDTH - j - 1] = sat 
              v[line_path[j]][PICT_WIDTH - j - 1] = val 

            # and its thickness :
            for k in range (thickness) :
              if( line_path[j] - k > 0 and line_path[j] - k < 400 ) :
                h[line_path[j] - k][PICT_WIDTH - j - 1] = hue 
                s[line_path[j] - k][PICT_WIDTH - j - 1] = sat
                v[line_path[j] - k][PICT_WIDTH - j - 1] = val

              if( line_path[j] + k > 0 and line_path[j] + k < 400 ) :
                h[line_path[j] + k][PICT_WIDTH - j - 1] = hue
                s[line_path[j] + k][PICT_WIDTH - j - 1] = sat
                v[line_path[j] + k][PICT_WIDTH - j - 1] = val

          line_image_hsv = cv2.merge((h,s,v))
          line_image = cv2.cvtColor(line_image_hsv, cv2.COLOR_HSV2RGB)


          cv2_imshow(line_image) #showing the images for tests
          name = 'pitch'+str(pitch_ind)+ '_thick' + str(thick_ind) + '_color' + str(color_ind) +'_'+str(i) + '_'+folder+'.png'
          print(name)

          # save the picture in google colab :
          cv2.imwrite('./audiosynthesis_dl/data/pitch_img/img_'+folder+'/'+ name, line_image)

In [0]:
print('[INFO] generating training dataset...')
generate_pict(NB_VERSION_TRAIN, 'train', 0, [1,1,1])
print('[INFO] generating training dataset...')
generate_pict(NB_VERSION_VALIDATION, 'validation', 0, [1,1,1])

## Loading the datas


In [0]:
## LOADING THE PICTURES

#TODO : trouver un moyen d'optimiser

imagePath_train = sorted(os.listdir( "./audiosynthesis_dl/data/pitch_img/img_train")[:])
imagePath_validation = sorted(os.listdir( "./audiosynthesis_dl/data/pitch_img/img_validation")[:])

print('[INFO] loading training dataset...')

label_train = []
label_validation = []


img_train = np.empty((0,400,400,NB_CHANNEL))
for imgP in imagePath_train :
  if imgP.split(".")[-1] != "git" and imgP.split(".")[-1] != "gitignore":
    img = image.load_img( "./audiosynthesis_dl/data/pitch_img/img_train/"+imgP, 
                             target_size=(400, 400),
                             color_mode='rgb')
    #img_flip = np.fliplr(img)
    img_train = np.concatenate((img_train,np.reshape(img,(1,400,400,NB_CHANNEL))),axis=0)
    #img_train = np.concatenate((img_train,np.reshape(img_flip,(1,400,400,NB_CHANNEL))),axis=0)
    l = imgP.split("_")
    labels = l[2:2+NB_CARACTERISTICS]
    label_train.append(labels)
    #label_train.append(labels)
    

print('[INFO] loading validation dataset...')
img_validation = np.empty((0,400,400,NB_CHANNEL))
for imgP in imagePath_validation :
  if imgP.split(".")[-1] != "git" and imgP.split(".")[-1] != "gitignore":
    img = image.load_img( "./audiosynthesis_dl/data/pitch_img/img_validation/"+imgP, 
                             target_size=(400, 400),
                             color_mode='rgb')
    #img_flip = np.fliplr(img)
    img_validation = np.concatenate((img_validation,np.reshape(img,(1,400,400,NB_CHANNEL))),axis=0)
    #img_validation = np.concatenate((img_validation,np.reshape(img_flip,(1,400,400,NB_CHANNEL))),axis=0)
    l = imgP.split("_")
    labels = l[2:2+NB_CARACTERISTICS]
    label_validation.append(labels)
    #label_validation.append(labels)


print('img_train.shape=', img_train.shape)
print('img_validation.shape=', img_validation.shape)


print(label_validation)

In [0]:
## CONVERTING THE DATA

def converting_data(img_tab) :
  """ convert the data in float between 0 et 1"""
  # Convert to float
  img_tab = img_tab.astype('float32')
  # Normalize inputs from [0; 255] to [0; 1]
  imgnorm_tab = img_tab / 255

  return imgnorm_tab

In [0]:
## MIXING THE DATA FOR A GOOD LEARNING
def shuffle_data(img_tab, label_tab, nb_pict) :
  """ create shuffled tabs of data and corresponding labels """
  imgnorm_tab = converting_data(img_tab)
    
  xy_tab = []
  for i in range(nb_pict):
    label_tab_i = []
    color = int(label_tab[i][0].split('color')[1]) #labels are converted into integer between 0 and the number of classes
    label_tab_i.append(color)
    
    xy = [imgnorm_tab[i],label_tab_i]
    xy_tab.append(xy)
    
  random.shuffle(xy_tab)
  
  x_tab = np.empty((nb_pict,400,400,NB_CHANNEL))
  y_tab = np.empty((nb_pict,NB_CARACTERISTICS))
  
  for i in range(nb_pict):
    x_tab[i] = xy_tab[i][0]
    y_tab[i] = xy_tab[i][1]
  
  
  del imgnorm_tab
  del xy_tab
  return [x_tab,y_tab]


In [0]:
x_train = np.copy(img_train)
x_validation = np.copy(img_validation)
y_train = np.copy((label_train,NB_CARACTERISTICS))
y_validation = np.copy((label_validation,NB_CARACTERISTICS))

print('[INFO] shuffle datasets...')
[x_train, y_train] = shuffle_data(img_train, label_train, NB_TRAIN)
[x_validation, y_validation] = shuffle_data(img_validation, label_validation, NB_VALIDATION)

print('x_train.shape=', x_train.shape)
print('x_validation.shape=', x_validation.shape)

del img_train
del img_validation
del label_train
del label_validation


In [0]:
## GENERATION OF THE COLOR FEATURES TABS

def gen_features_tabs(y_tab, y_feature, i_feature) :
  ''' generate a tab containing the labels for one specific feature '''
  for i in range(len(y_tab)) :
    y_feature[i] = y_tab[i][i_feature]



In [0]:
y_train_color = np.empty(NB_TRAIN)
y_validation_color = np.empty(NB_VALIDATION)
gen_features_tabs(y_train, y_train_color, 0)
gen_features_tabs(y_validation, y_validation_color, 0)

print('[INFO] generating color label tabs...')
print('y_train_color.shape=', y_train_color.shape)
print('y_validation_color.shape=', y_validation_color.shape)

print(y_validation_color)

In [0]:
# Convert class vectors to binary class matrices ("one hot encoding")

print('[INFO] converting class vectors...')
y_train_color = keras.utils.to_categorical(y_train_color, NB_COLOR)
y_validation_color = keras.utils.to_categorical(y_validation_color, NB_COLOR)


print('y_train_pitch.shape =', y_train_color.shape)
print('y_validation_pitch.shape = ', y_validation_color.shape)

## The Convolutional Neural Network

Now we need to create and compile the CNN that will classify our datas.

In [0]:
## CREATION OF THE COLOR NEURAL NETWORK

print('[INFO] training COLOR NETWORK...')

model_color = Sequential()

model_color.add(Conv2D(filters=32, kernel_size=(3,3), strides=1, padding='valid', activation='relu'))
model_color.add(BatchNormalization(axis=-1))
model_color.add(MaxPooling2D(pool_size=(3,3), strides=None, padding='valid', data_format=None))

model_color.add(Conv2D(filters=64, kernel_size=(3,3), strides=1, padding='valid', activation='relu'))
model_color.add(BatchNormalization(axis=-1))
model_color.add(MaxPooling2D(pool_size=(3,3), strides=None, padding='valid', data_format=None))

model_color.add(Conv2D(filters=128, kernel_size=(3,3), strides=1, padding='valid', activation='relu'))
model_color.add(BatchNormalization(axis=-1))
model_color.add(MaxPooling2D(pool_size=(3,3), strides=None, padding='valid', data_format=None))
model_color.add(Dropout(0.75))

# first (and only) set of FC => RELU layers
model_color.add(Flatten())
model_color.add(Dense(64))
model_color.add(Activation("relu"))
model_color.add(BatchNormalization())
 
# use a *sigmoid* activation for multi-label classification
model_color.add(Dense(NB_CLASS))
model_color.add(Activation('softmax'))  
  
  
opt_color = keras.optimizers.Adam(lr=0.00001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)
model_color.compile(loss='categorical_crossentropy', optimizer= opt_color, metrics=['accuracy'])
cb = callbacks.ModelCheckpoint('model_color.h5', save_best_only=True, period=1)

hist = model_color.fit(x_train, y_train_color, validation_data=(x_validation, y_validation_color), epochs= 200, batch_size=32, callbacks = [cb])
loss_and_metrics = model_color.evaluate(x_validation, y_validation_color, batch_size=32)
print('loss =', loss_and_metrics[0],'accuracy =', loss_and_metrics[1]);

model_color.summary();

## Results

Only color (8) :

* data : 30x12 train +10x12validation
* 2D(34)+MaxPool+2D(64)+MaxPool+2D(128)+MaxPool+Dropout(0.75)+Dense(64)+Dense(8)
* epoch = 100 batch size = 32
* loss: 0.0014 - acc: 1.0000 - val_loss: 1.2460e-04 - val_acc: 1.0000


color(8) + thick(6) :
* data : 60x12(train)+15x12(validation)
* 2D(34)+2D(34)+MaxPool + 2D(64)+2D(64)+MaxPool + 2D(128)+2D(128)+MaxPool+Dropout(0.75)+
Dense(64)+Dense(8)
* epoch = 100, batch size = 32
* loss: 5.9750e-04 - acc: 1.0000 - val_loss: 4.0895e-05 - val_acc: 1.0000


color(8) + pitch(12) :
* data : 3x8x12x2 train +3x8x12x1 validation
* 2D(34)+MaxPool+2D(64)+MaxPool+2D(128)+MaxPool+Dropout(0.75)+Dense(64)+Dense(8)
opt = adam with lr=0.0001
* epoch = 50 batch size = 32
* loss: 0.2398 - acc: 0.9219 - val_loss: 0.1871 - val_acc: 0.9444

color(8)+thick(3)+pitch(12):
*  data : 3x8x12x2 train +3x8x12x1 validation
* 2D(34)+MaxPool+2D(64)+MaxPool+2D(128)+MaxPool+Dropout(0.75)+Dense(64)+Dense(8)
opt = adam with lr=0.0001
* epoch = 100 batch size = 32
* loss: 0.2398 - acc: 0.9219 - val_loss: 0.1871 - val_acc: 0.9444

color(8)+thick(3)+pitch(12):
*  data : 3x8x12x2 train +3x8x12x1 validation
* 2D(34)+MaxPool+2D(64)+MaxPool+2D(128)+MaxPool+Dropout(0.75)+Dense(64)+Dense(8)
opt = adam with lr=0.00001
* version 1 :
 * lr=0.00001 epoch = 500 batch size = 32
 * loss: 0.1058 - acc: 0.9688 - val_loss: 0.1105 - val_acc: 0.9583
* version 2 :
 * lr=0.00001 epoch = 1000 batch size = 32
 * loss: 0.0639 - acc: 0.9774 - val_loss: 0.0858 - val_acc: 0.9653
* version 3:
 * lr=0.00005 epoch = 700 batch size = 32
 * train loss: 0.0767 - train accuracy: 0.9705 - validation loss: 0.0698 - validation acc: 0.9722 à l'epoch 640
* version 4:
 * lr=0.000035 epoch = 700 batch size = 32
 * train loss: 0.0958 - train accuracy: 0.9653 - validation loss: 0.0731 - validation acc: 0.9792 à l'epoch 330




## Graphics


In [0]:
import matplotlib
import matplotlib.pyplot as plt

In [0]:
# load the trained convolutional neural network and the multi-label binarizer
print("[INFO] loading network...")
model = load_model('model_color.h5')

In [0]:
# summarize history for accuracy
plt.plot(hist.history['acc'])
plt.plot(hist.history['val_acc'])
plt.title('model accuracy')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()

# summarize history for loss
plt.plot(hist.history['loss'])
plt.plot(hist.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'validation'], loc='upper left')
plt.show()


## Test

In [0]:
def test_accuracy(y_tab, y_guessed):
  accuracy_sum = 0
  for i in range(len(y_tab)) :
    if(y_tab[i] == y_guessed[i]):
      accuracy_sum += 1
  accuracy = accuracy_sum / len(y_tab)
  print('accuracy = ', accuracy)
  

In [0]:
NB_VERSION_TESTGEN = 1
NB_TESTGEN = NB_THICK*NB_COLOR*NB_PITCH*NB_VERSION_TESTGEN

print("[INFO] generating testgen dataset...")
generate_pict(NB_VERSION_TESTGEN, 'testgen',0, [1,1,1])

label_testgen = []

print("[INFO] loading the testgen dataset...")
img_testgen = np.empty((0,400,400,NB_CHANNEL))
for imgP in sorted(os.listdir( "./audiosynthesis_dl/data/pitch_img/img_testgen")[:]) :
  if imgP.split(".")[-1] != "git" and imgP.split(".")[-1] != "gitignore":
    img = image.load_img( "./audiosynthesis_dl/data/pitch_img/img_testgen/"+imgP, 
                             target_size=(400, 400),
                             color_mode='rgb')
    img_testgen = np.concatenate((img_testgen,np.reshape(img,(1,400,400,NB_CHANNEL))),axis=0)
    l = imgP.split("_")
    labels = l[2:2+NB_CARACTERISTICS]
    label_testgen.append(labels)


x_testgen = np.copy(img_testgen)
y_testgenstat = np.copy((label_testgen,NB_CARACTERISTICS))



In [0]:
print("[INFO] shuffle testgen dataset...")
[x_testgen, y_testgenstat] = shuffle_data(img_testgen, label_testgen, NB_TESTGEN)  

# load the trained convolutional neural network
print("[INFO] loading network...")
model = load_model('model_color.h5')

y_testgen = model.predict_classes(x_testgen)

print('x_testgen.shape=', x_testgen.shape)
print('y_testgenstat.shape=', y_testgenstat.shape)

# show the inputs and predicted outputs
for i in range(len(y_testgen)):
  print("label y_testgen[%s] = %s" % (i, y_testgen[i]))
  
test_accuracy(y_testgenstat, y_testgen)
  

In [0]:
i = 3;
print('x_testgen.shape', x_testgen.shape, 'dtype', x_testgen.dtype)
print('y_testgen[{}]={}'.format(i, y_testgen[i]))
print('y_testgenstat[{}]={}'.format(i, y_testgenstat[i]))
plt.imshow(x_testgen[i,:].reshape(400,400,NB_CHANNEL), cmap = matplotlib.cm.binary)
plt.axis("off")
plt.show()
plt.gcf().clear()

## Sound synthesis according to the labelisation

full independant code here : https://colab.research.google.com/drive/1KzM-NMSlj87XU_--cifmipAuYd57zZc4


In [0]:
import math as m
import IPython

In [0]:
volume = 0.5     # range [0.0, 1.0]
fs = 44100       # sampling rate, Hz, must be integer
duration = 4.0   # in seconds, may be float
f0 = 440.0        # sine frequency, Hz, may be float
label = 0 # default value, take the one given by the neural network

In [0]:
def sinusoid(label, duration) :
  """ generate an audio of a sinusoid """
  samples = []
  fm  =  (2**(0/12))*f0
  print(fm)
  # generate samples
  for i in range(int(fs*duration)) :
    t = i/fs # seconds
    a = 2*m.pi*fm*t # radians
    v = volume*m.sin(a)
    for j in range(1,label+1):
      b = 2*m.pi*j*fm*t
      v += (volume/j)*(m.sin(b))
    samples.append(v)
  return samples


In [0]:
i = 4
label_i = y_testgen[i]

sin = sinusoid(label_i, duration)
IPython.display.Audio(sin, rate=fs)