# Gerador de letras do Kevinho

## Esse programa utiliza base de dados do site
#Link: https://aneisse.com/post/2019-02-10-music-data-scraping/2019-02-10-music-data-scraping/ que utliza o R para extrais músicas do site www.vagalume.com. 

As músicas foram salvas no formato txt , depois segui o exemplo do site https://towardsdatascience.com/ai-generates-taylor-swifts-song-lyrics-6fd92a03ef7e que usa LSTM para gerar músicas da Taylor Swifts e utilizei para gerar músicas do Kevinho. 


In [1]:
# Import the dependencies
import numpy as np
import pandas as pd
import sys 
from keras.models import Sequential
from keras.layers import LSTM, Activation, Flatten, Dropout, Dense, Embedding, TimeDistributed, CuDNNLSTM
from keras.callbacks import ModelCheckpoint
from keras.utils import np_utils

Using TensorFlow backend.


In [6]:
# Load the dataset and convert it to lowercase :
textFileName = 'kevinho.txt'
raw_text = open(textFileName, encoding = 'UTF-8').read()
raw_text = raw_text.lower()

In [7]:
# Mapping chars to ints :
chars = sorted(list(set(raw_text)))
int_chars = dict((i, c) for i, c in enumerate(chars))
chars_int = dict((i, c) for c, i in enumerate(chars))

In [21]:
n_chars = len(raw_text)
n_vocab = len(chars)
print('Total Characters : ' , n_chars) # number of all the characters in lyricsText.txt
print('Total Vocab : ', n_vocab) # number of unique characters

Total Characters :  59498
Total Vocab :  64


In [19]:
# process the dataset:
seq_len = 100
data_X = []
data_y = []
for i in range(0, n_chars - seq_len, 1):
    # Input Sequeance(will be used as samples)
    seq_in  = raw_text[i:i+seq_len]
    # Output sequence (will be used as target)
    seq_out = raw_text[i + seq_len]
    # Store samples in data_X
    data_X.append([chars_int[char] for char in seq_in])
    # Store targets in data_y
    data_y.append(chars_int[seq_out])
n_patterns = len(data_X)
print( 'Total Patterns : ', n_patterns)

Total Patterns :  59398


In [22]:
# Reshape X to be suitable to go into LSTM RNN :
X = np.reshape(data_X , (n_patterns, seq_len, 1))
# Normalizing input data :
X = X/ float(n_vocab)
# One hot encode the output targets :
y = np_utils.to_categorical(data_y)

In [26]:
LSTM_layer_num = 4 # number of LSTM layers
layer_size = [256,256,256,256] # number of nodes in each layer

In [27]:
model = Sequential()

In [29]:
model.add(LSTM(layer_size[0], input_shape =(X.shape[1], X.shape[2]), return_sequences = True))

In [30]:
for i in range(1,LSTM_layer_num) :
    model.add(LSTM(layer_size[i], return_sequences=True))

In [31]:
model.add(Flatten())

In [32]:
model.add(Dense(y.shape[1]))
model.add(Activation('softmax'))
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')


In [33]:
model.summary()

Model: "sequential_2"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
lstm_1 (LSTM)                (None, 100, 256)          264192    
_________________________________________________________________
lstm_2 (LSTM)                (None, 100, 256)          525312    
_________________________________________________________________
lstm_3 (LSTM)                (None, 100, 256)          525312    
_________________________________________________________________
lstm_4 (LSTM)                (None, 100, 256)          525312    
_________________________________________________________________
flatten_1 (Flatten)          (None, 25600)             0         
_________________________________________________________________
dense_1 (Dense)              (None, 64)                1638464   
_________________________________________________________________
activation_1 (Activation)    (None, 64)               

In [34]:
# Configure the checkpoint :
checkpoint_name = 'Weights-LSTM-improvement-{epoch:03d}-{loss:.5f}-bigger.hdf5'
checkpoint = ModelCheckpoint(checkpoint_name, monitor='loss', verbose = 1, save_best_only = True, mode ='min')
callbacks_list = [checkpoint]

In [35]:
# Fit the model :
model_params = {'epochs':30,
                'batch_size':128,
                'callbacks':callbacks_list,
                'verbose':1,
                'validation_split':0.2,
                'validation_data':None,
                'shuffle': True,
                'initial_epoch':0,
                'steps_per_epoch':None,
                'validation_steps':None}
model.fit(X,
          y,
          epochs = model_params['epochs'],
           batch_size = model_params['batch_size'],
           callbacks= model_params['callbacks'],
           verbose = model_params['verbose'],
           validation_split = model_params['validation_split'],
           validation_data = model_params['validation_data'],
           shuffle = model_params['shuffle'],
           initial_epoch = model_params['initial_epoch'],
           steps_per_epoch = model_params['steps_per_epoch'],
           validation_steps = model_params['validation_steps'])

Train on 47518 samples, validate on 11880 samples
Epoch 1/30

Epoch 00001: loss improved from inf to 3.08076, saving model to Weights-LSTM-improvement-001-3.08076-bigger.hdf5
Epoch 2/30

Epoch 00002: loss improved from 3.08076 to 3.05434, saving model to Weights-LSTM-improvement-002-3.05434-bigger.hdf5
Epoch 3/30

Epoch 00003: loss improved from 3.05434 to 3.05185, saving model to Weights-LSTM-improvement-003-3.05185-bigger.hdf5
Epoch 4/30

Epoch 00004: loss improved from 3.05185 to 3.05142, saving model to Weights-LSTM-improvement-004-3.05142-bigger.hdf5
Epoch 5/30

Epoch 00005: loss improved from 3.05142 to 3.03107, saving model to Weights-LSTM-improvement-005-3.03107-bigger.hdf5
Epoch 6/30

Epoch 00006: loss improved from 3.03107 to 2.85542, saving model to Weights-LSTM-improvement-006-2.85542-bigger.hdf5
Epoch 7/30

Epoch 00007: loss improved from 2.85542 to 2.51045, saving model to Weights-LSTM-improvement-007-2.51045-bigger.hdf5
Epoch 8/30

Epoch 00008: loss improved from 2.51045

<keras.callbacks.callbacks.History at 0x7f4079c11190>

In [38]:
# Load wights file :
wights_file = 'Weights-LSTM-improvement-030-0.04599-bigger.hdf5' # weights file path
model.load_weights(wights_file)
model.compile(loss = 'categorical_crossentropy', optimizer = 'adam')

In [46]:
# set a random seed :
start = np.random.randint(0, len(data_X)-1)
pattern = data_X[start]
print('Seed : ')
print("\"",''.join([int_chars[value] for value in pattern]), "\"\n")
# How many characters you want to generate
generated_characters = 320
# Generate Charachters :
for i in range(generated_characters):
    x = np.reshape(pattern, ( 1, len(pattern), 1))
    x = x / float(n_vocab)
    prediction = model.predict(x,verbose = 0)
    index = np.argmax(prediction)
    result = int_chars[index]
    #seq_in = [int_chars[value] for value in pattern]
    sys.stdout.write(result)
    pattern.append(index)
    pattern = pattern[1:len(pattern)]
print('\nDone')

Seed : 
"  vai. vai senta, vai senta, oi senta pra matar saudade. vai senta, vai senta, oi do jeitinho que tu  "

sabe. vai senta, vai senta, oi senta pra matar saudade. jittapê, kekel e kevinho, kevinho, jottapê e kekel. não preciso nem falar né?. tá bom eu vou falar, isso é hit maker fiu. vem sentando, vem sentando, vem sentando, vem. se acabou amor. que seja eterna sacanagem. explodiu bebê"
"3" "alô. por que você não me atendeu
Done
