
# Gender Voice Recognition 

Please see the [README](https://github.com/hannahier94/ITC_FinalProject_Gender_Voice_Recognition/blob/master/README.md) for more information

In [1]:
import warnings
warnings.filterwarnings('ignore')

In [18]:
! pip install googletrans

Collecting googletrans
  Downloading https://files.pythonhosted.org/packages/71/3a/3b19effdd4c03958b90f40fe01c93de6d5280e03843cc5adf6956bfc9512/googletrans-3.0.0.tar.gz
Collecting httpx==0.13.3
[?25l  Downloading https://files.pythonhosted.org/packages/54/b4/698b284c6aed4d7c2b4fe3ba5df1fcf6093612423797e76fbb24890dd22f/httpx-0.13.3-py3-none-any.whl (55kB)
[K     |████████████████████████████████| 61kB 2.4MB/s 
Collecting httpcore==0.9.*
[?25l  Downloading https://files.pythonhosted.org/packages/dd/d5/e4ff9318693ac6101a2095e580908b591838c6f33df8d3ee8dd953ba96a8/httpcore-0.9.1-py3-none-any.whl (42kB)
[K     |████████████████████████████████| 51kB 5.1MB/s 
[?25hCollecting sniffio
  Downloading https://files.pythonhosted.org/packages/b3/82/4bd4b7d9c0d1dc0fbfbc2a1e00138e7f3ab85bc239358fe9b78aa2ab586d/sniffio-1.1.0-py3-none-any.whl
Collecting hstspreload
[?25l  Downloading https://files.pythonhosted.org/packages/4c/07/12dd706501a3212a9774feb69d6a2333963a2da19ba98861ab23f2439f3d/hstsprel

In [19]:
import pandas as pd
import numpy as np
import os
import tqdm
from googletrans import Translator
import speech_recognition as sr
from text_to_speech import get_large_audio_transcription
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, LSTM 
from tensorflow.keras.callbacks import ModelCheckpoint, TensorBoard, EarlyStopping
from tensorflow.keras.optimizers import Adam
from sklearn.model_selection import train_test_split


In [None]:
rawdata_path = 'gender-recognition-by-voice/'
cat_file = 'balanced-all.csv'

df = pd.read_csv(rawdata_path + cat_file)
df.head()

Unnamed: 0,filename,gender
0,data/cv-other-train/sample-069205.npy,female
1,data/cv-valid-train/sample-063134.npy,female
2,data/cv-other-train/sample-080873.npy,female
3,data/cv-other-train/sample-105595.npy,female
4,data/cv-valid-train/sample-144613.npy,female


As you can see, we have exactly a 50/50 split of audio files per gender.

In [None]:
df['gender'].value_counts()

female    33469
male      33469
Name: gender, dtype: int64

In [None]:
LABEL2INT = {'female':0,
            'male':1}

INT2LABEL = {0: 'female',
             1: 'male'}

result_path = 'results'
feature_path = '/features.npy'
label_path = '/labels.npy'


def load_data(path=rawdata_path, 
              df = df, 
              result_path = result_path,
              vector_length=128,
              feature_path = feature_path,
              label_path = label_path,
              file_col = 'filename',
              gender_col = 'gender'):
    
    """A function to load gender recognition dataset from 
    `data` folder. After the second run, this will load from 
    results/features.npy  and results/labels.npy files
    as it is much faster!
    param path: path where you data is located
    param df: dataframe to use
    param result_path : path to store results
    param vector_length : size of each sample 
    param feature_path : path to load/store features
    param label_path : path to load/store labels
    param file_col: df column to reference files
    param gender_col : df column to reference gender labels
    returns : X, y data """
    
    feature_path = result_path + feature_path
    label_path = result_path + label_path
    
    if not os.path.isdir(result_path):
        os.mkdir(result_path)
    
    # if features & labels already loaded/bundled, load them 
    if os.path.isfile(feature_path) and os.path.isfile(label_path):
        X = np.load(feature_path)
        y = np.load(label_path)
        return X, y

    n_samples = len(df)
    print("Total samples:", n_samples)
    
    dist = df.gender.value_counts()
    
    for gender, count in list(zip(dist.index, dist.values)):
        print("Total {} samples: {}".format(gender,count))

    X = np.zeros((n_samples, vector_length))
    y = np.zeros((n_samples, 1))
    
    for i, (filename, gender) in tqdm.tqdm(enumerate(zip(df[file_col], 
                                                         df[gender_col])), 
                                            "Loading data", 
                                           total = n_samples):
        features = np.load(path + filename)
        X[i] = features
        y[i] = LABEL2INT[gender]
    
    np.save(feature_path, X)
    np.save(label_path, y)
    
    return X, y

In [None]:
def split_data(X, y, indicies, test_size=0.1, valid_size=0.1):
    
    """ A function to split X,y data
    param X: X data
    param y: y data
    param test_size: test fraction 
    param:valid_size: validation fraction 
    return: a dictionary of X/y for train/val/test """

    X_train, X_test, y_train, y_test, indices_train, indices_test = train_test_split(X, y, 
                                                                            indicies,
                                                                            test_size=test_size, 
                                                                            random_state=7)

    X_train, X_valid, y_train, y_valid = train_test_split(X_train, y_train, 
                                                          test_size=valid_size, 
                                                          random_state=7)
    return {
        "X_train": X_train,
        "X_valid": X_valid,
        "X_test": X_test,
        "y_train": y_train,
        "y_valid": y_valid,
        "y_test": y_test,
        "indices_train" : indices_train, 
        "indices_test" : indices_test
    }

In [None]:
X, y = load_data()

data = split_data(X, y, 
                  df.index.to_numpy(), 
                  test_size=0.1, 
                  valid_size=0.1)

### Model Set Up

For the model set up, we added in various dropouts (0.1 & 0.3) and regularizations (l2 applied on different combinations of layers) , but overall the regularization did not have significant effects on the model. We acheived the best results when using the following architecure : 

In [None]:
## We only need one hidden layer 
## fundamental frequency - to determine male/female represents how high / low the voice sounds
## auto correaltion of the signal - first peak = fundamental frequency

## mfcc_f0

## another possibility : original CNN design

## data augmentation to add some noise


def create_model(vector_length = 128, 
                 lr = 0.001, 
                 loss = "binary_crossentropy",
                 metric = ['accuracy']) :
    
    """
    5 hidden dense layers from 256 units to 64.
    param vector_length : size of each sample 
    param lr: learning rate
    param loss: loss eval
    param metric: metric to train
    returns: model
    """
    
    RELU = "relu"
    SIGMOID = "sigmoid"
    n_labels = 1

    model = Sequential()
    
    model.add(Dense(256, input_shape=(vector_length,)))
    
    model.add(Dense(256, activation = RELU ))
    model.add(Dense(128, activation = RELU))
    model.add(Dense(128, activation = RELU))
    model.add(Dense(64, activation = RELU))
    
    model.add(Dense(n_labels, activation= SIGMOID))
    
    
    optimizer = Adam(learning_rate = lr)
    
    model.compile(loss = loss, 
                  metrics = metric, 
                  optimizer = optimizer)
    
    model.summary()
    
    return model

In [None]:
model = create_model()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
dense_12 (Dense)             (None, 256)               33024     
_________________________________________________________________
dense_13 (Dense)             (None, 256)               65792     
_________________________________________________________________
dense_14 (Dense)             (None, 128)               32896     
_________________________________________________________________
dense_15 (Dense)             (None, 128)               16512     
_________________________________________________________________
dense_16 (Dense)             (None, 64)                8256      
_________________________________________________________________
dense_17 (Dense)             (None, 1)                 65        
Total params: 156,545
Trainable params: 156,545
Non-trainable params: 0
________________________________________________

### Fitting the Model

We used Early stopping with the modes "val_loss", "val_accuracy", and "min". We acheived the best accuracy when using the "min".

In [None]:
# use tensorboard to view metrics
tensorboard = TensorBoard(log_dir="logs")

early_stopping = EarlyStopping(mode = "min", 
                               patience = 5, 
                               restore_best_weights = True)

batch_size = 64
epochs = 300

history = model.fit(data["X_train"], 
                  data["y_train"], 
                  epochs=epochs, 
                  batch_size=batch_size, 
                  validation_data=(data["X_valid"], 
                                   data["y_valid"]),
                  callbacks=[tensorboard, early_stopping])

Epoch 1/300
Epoch 2/300
Epoch 3/300
Epoch 4/300
Epoch 5/300
Epoch 6/300
Epoch 7/300
Epoch 8/300
Epoch 9/300
Epoch 10/300
Epoch 11/300


In [None]:
print("\nEvaluating on {} samples...".format(len(data['X_test'])))
print("_____________________________\n")

loss, accuracy = model.evaluate(data["X_test"],
                                data["y_test"], 
                                verbose=0)

print("Loss: {}".format(round(loss,5)))
print("Accuracy: {}% ".format(round(accuracy * 100,2)))


Evaluating on 6694 samples...
_____________________________

Loss: 0.23114
Accuracy: 92.37% 


In [None]:
full_test_path = '/home/hanna/Downloads/final_project/gender-recognition-by-voice/data/cv-other-train/sample-006859.npy'

In [None]:

test_ind = data['indices_test'][0]

test_file , test_gender = df.iloc[test_ind]['filename'], df.iloc[test_ind]['gender']

test_vals = data['X_test'][0].reshape(1,-1)

pred = model.predict(test_vals)
pred = pred[0][0]

print('Prediction : ', INT2LABEL[round(pred)])
print('True : ', test_gender)
print('__________________')
#print('Audio : \n',get_large_audio_transcription(test_file))
print('\nAudio : \n\n',get_large_audio_transcription('gettysburg10.wav'))

Prediction :  male
True :  male
__________________

Audio : 

 Four score and seven years ago our fathers brought forth on this continent a new nation. Conceived in liberty and dedicated to the proposition that all men are created equal. 


In [20]:
# MFCC f0 (fundamental frequency) detection 
# cnn 
# data augmentation 
# spacy dependency parsing 
# anaphora resolution 
# opus project 
# tatoeba english-hebrew 
# http://www.manythings.org/anki/ 
# opus — opensubtitles 

# anaphora resolution python

#https://hebrew-nlp.co.il/ 


heb_I =  'אני'
heb_she = 'היא'
heb_he = 'הוא'

test_corpus = ['I am testing this.',
              'I am walking.',
              'I appreciate that she helped because I am confused.']

test_gender = ['F', 'M', 'F']

translator = Translator()

for sentence, gender in list(zip(test_corpus, test_gender)):
    original = sentence
    res = translator.translate(sentence, dest='he').text
    
    
    ind = [i for i,x in enumerate(res.split()) if x == heb_I]
    
    if gender == 'F':
        replacer = 'she '
        heb_pron = heb_she
    else:
        replacer = 'he '
        heb_pron = heb_he
    sentence = sentence.replace('I ', replacer)
    
    gender_res = translator.translate(sentence, dest='he').text
    
    temp = gender_res.split()
    for i in ind:
        temp[i] = heb_I

    print("""\n\nGender Change : {} : {} => {}""".format(
                                                gender,
                                                original,
                                                ' '.join(temp)
                                                ))

    print('______________________________________________________')




Gender Change : F : I am testing this. => אני בודקת את זה.
______________________________________________________


Gender Change : M : I am walking. => אני הולך.
______________________________________________________


Gender Change : F : I appreciate that she helped because I am confused. => אני מעריכה שהיא עזרה בגלל אני מבולבלת.
______________________________________________________


In [21]:
import spacy
nlp = spacy.load('en')

present_tenses = ['VBP', 'VBZ','VBG']


In [22]:
def determine_chunks(pronouns, vals, text, 
                     present_tenses = present_tenses):
    
    chunks = []
    
    iterations = len(pronouns)
    
    for i in range(iterations):
        
        start = vals[i]
        stop = None 
        
        if i != (iterations - 1):
            stop = vals[i+1]

        if pronouns[i] != 'I':
            chunks.append((str(text[start:stop]),0))
            continue
            

        present_verbs_change = [word for word in text[start:stop]
                                if word.tag_ in
                                present_tenses]

        if len(present_verbs_change) == 0:
            chunks.append((str(text[start:stop]),0))
            continue

        chunks.append((str(text[start:stop]),1))
        
    return chunks

In [23]:
from nltk import word_tokenize, pos_tag


def determine_tense_input(sentence, present_tenses = present_tenses): 
    
    """ Seperates words into chunks based on subjects of the sentence
    and sends them to the determine_chunks function to determine
    if that chunk requires further manipulation or not 
    param ....
    .....
    ....
    returns: list of tuples (portion, int) where 1 means the chunk
    requires further manipulation and 0 means it can be left as is """

    text = nlp(sentence)
    
    present = len([word for word in text 
                    if word.tag_ in present_tenses])
    
    if present == 0 :
        return sentence
    
    sub_toks = list(sorted([ (i, tok) for i, tok 
                            in enumerate(text)
                            if tok.dep_ == "nsubj" ]))


    vals = [x[0] for x in sub_toks]
    pronouns = [str(x[1]) for x in sub_toks]
    
    if 'I' not in pronouns:
        return sentence
            
    return determine_chunks(pronouns, vals, text)
            
    
determine_tense_input("He was angry, so I left. I am walking away because he told me he is happy. I'm glad this happened")

[('He was angry, so', 0),
 ('I left.', 0),
 ('I am walking away because', 1),
 ('he told me', 0),
 ('he is happy.', 0),
 ("I'm glad", 1),
 ('this happened', 0)]

In [24]:
import re
def get_translation(output, gender):
    heb_I =  'אני'
    heb_she = 'היא'
    heb_he = 'הוא'
    ind = []
    translate = ''
    lenght = 0
    translator = Translator()
    replacer = 'I'
    for sentence, val in output:
        res = translator.translate(sentence, dest='he').text
        res_list = re.findall(r"[\w]+|['.,!?;]", res)
        if val>0:
          if gender == 'F':
            replacer = 'she'
            heb_replacer = heb_she
          elif gender == 'M':
            replacer = 'he'
            heb_replacer = heb_he
          index = [i for i,x in enumerate(res_list) if x==heb_I][0]
          ind.append(index+lenght) 
        lenght += len(res_list) 
        sentence_list = re.findall(r"[\w]+|['.,!?;]", sentence)
        sentence = ' '.join([replacer if word == 'I' else word for word in sentence_list])
        translate += " " + sentence
    gender_res = translator.translate(translate, dest='he').text
    gender_res_list = re.findall(r"[\w]+|['.,!?;]", gender_res)
    for i in ind:
        if gender_res_list[i] ==  heb_replacer:
          gender_res_list[i] = heb_I
        elif gender_res_list[i-1] ==  heb_replacer:
          gender_res_list[i-1] = heb_I
        elif gender_res_list[i+1] ==  heb_replacer:
          gender_res_list[i+1] = heb_I
    print("""\n\nGender Change : {} : {} => {}""".format(
                                                    gender,
                                                    ' '.join([x[0] for x in output]),
                                                    ' '.join(gender_res_list))
                                                    )
    return ' '.join(gender_res_list)

In [25]:
sentence = "He was angry, so I left. I am walking away because he told me he is happy. I'm glad this happened"
gender = 'F'
split_sentence = determine_tense_input(sentence)
print(get_translation(split_sentence, gender))



Gender Change : F : He was angry, so I left. I am walking away because he told me he is happy. I'm glad this happened => הוא כעס , אז עזבתי . אני הולכת כי הוא אמר לי שהוא מאושר . אני שמחה שזה קרה
הוא כעס , אז עזבתי . אני הולכת כי הוא אמר לי שהוא מאושר . אני שמחה שזה קרה


In [27]:
chunks = "I, like my other fellows, am happy to be here"

split_sentence = determine_tense_input(chunks)
print(get_translation(split_sentence, gender))



Gender Change : F : I, like my other fellows, am happy to be here => אני , כמו חברי האחרים , שמחה להיות כאן
אני , כמו חברי האחרים , שמחה להיות כאן


In [None]:
test_past_tense = """  
1. I watched TV last week.

2. We ate meat with my best friend yesterday.

3. The bus stopped a few minutes ago.

4. I met my wife 9 years ago.

5. She left the school in 2010.

6. He bought a new house last month.

7. Did she clean her home?

8. I read an interesting book last month.

9. We did a lot of shopping at the shopping mall.

10. He cut his finger and went to hospital.

11. She finished her work at six o’clock.

12. The rain stopped an hour ago.

13. It discovered a new land.

14. We watched a movie last weekend.

15. We were good friends.

16. You were at station.

17. I went to bed early yesterday.

18. George came home very late last night.

19. I forgot my wallet.

20. He had a dog last year.

21. Last year I traveled to Germany.

22. Two boys played with a ball.

23. An old lady walked with her cat.

24. A nurse brought a little girl baby to the park.

25. An old man sat down and read his book.

26. A large trunk came around the corner.

27. She finished all the exercices.

28. I enrolled to the pilates course.

29. Dr Smith healed the patient.

30. They bought 2 tickets for the U2 concert.

31. Michael studied hard all year.

32. Did you play football last day?

33. I missed the class last week.

34. My brother drank a glass of milk 2 hours ago.

35. They had a meeting with her colleagues.

36. They were students last year.

37. He smoked a cigarrette.

38. They lived in the Spain.

39. Alex changed his place.

40. I liked the film.

41. Did they lose the match?

42. A gardener swept up dead leaves.

43. We listened to music.

44. Where was she at 7 o’clock last night?

45. Amelia chose to stay with her father.

46. Mary forgot to turn off the light.

47. I cancelled my meeting for tomorrow.

48. I went to school yesterday.

49. We played basketball last Sunday.

50. We saw the Eiffel Tower.

I didn't eat.

"""

for sentence in test_past_tense.replace('\n','').split('.')[1:-1]:
    if len(sentence) < 3:
        continue
    if determine_tense_input(sentence)['present'] == 1:
        pass
        #print(sentence, determine_tense_input(sentence),'\n')