# Recurrent Neural Network

## Insights dari RNN
Recurrent Neural Network (RNN) adalah salah satu tipe artificial neural network yang menggunakan data sekuensial atau time series.

RNN biasa digunakan untuk masalah yang berkaitan dengan urutan data, seperti language translation, natural language processing (NLP), dan speech recognition. Teknologi terkenal yang menggunakan RNN adalah Siri, Google Translate, Google Assistant, dan lain-lain.

Pada RNN, output dari step sebelumnya dijadikan input untuk step sekarang. Fitur terpenting dari RNN adalah **hidden state**, yaitu keadaan saat RNN mengingat informasi tentang urutan yang akan diteliti.

RNN dapat dibayangkan seperti serial network yang terhubung. Hubungan mereka bermacam-macam, ada **one-to-one, one-to-many, many-to-one, dan many-to-many.**


In [1]:
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Conv2D, LSTM, Embedding
from tensorflow.keras.callbacks import LambdaCallback

import numpy as np
import random
import sys
import pickle

In [2]:
maxlen = 10
step = 3
embed_size = 128
hidden_size = 128
batch_zise = 64
epochs = 10

In [4]:
def sample(preds, diversity=1.0):
      preds = np.asarray(preds).astype('float64')
      preds = np.log(preds + 1e-10) / diversity
      exp_preds = np.exp(preds)
      preds = exp_preds / np.sum(exp_preds)
      probas = np.random.multinomial(1, preds, 1)
      return np.argmax(probas)

def preprocess(source_file):
      sentences = []
      with open(source_file, 'r', encoding='utf-8') as fr:
            lines = fr.readlines()
            for line in lines:
                  line = line.strip()
                  count = 0
                  for c in line:
                        if (c >= 'a' and c <= 'z') or (c >= 'A' and c <= 'Z'):
                              count += 1
                        if count/len(line) < 0.1:
                              sentences.append(line)
      return sentences

sentences = preprocess('./lyrics.txt')
print(len(sentences))




82886


## Dictionary

In [10]:
def bi_map():
      chars = {}
      for sentence in sentences:
            for c in sentence:
                  chars[c] = chars.get(c, 0) + 1
      chars = sorted(chars.items(), key=lambda x: x[1], reverse=True)

      chars = [char[0] for char in chars]
      vocab_size = len(chars)

      char2id = {c: i for i, c in enumerate(chars)}
      id2char = {i: c for i, c in enumerate(chars)}

      with open('char2id.pkl', 'wb') as fw:
            pickle.dump(char2id, fw)
            return char2id, id2char, vocab_size
char2id, id2char, vocab_size = bi_map()

## Split dataset

In [11]:
def on_epoch_end(epoch, logs):
      index = random.randint(0, len(sentences))
      for diversity in [0.2, 0.5, 1.0]:
            print('----- diversity:', diversity)
            sentence = sentences[index][:maxlen]
            print('----- Generating with seed: ' + sentence)
            sys.stdout.write(sentence)
            for i in range(400):
                  x_pred = np.zeros((1, maxlen))
                  for t, char in enumerate(sentence):
                        x_pred[0, t] = char2id[char]
                  preds = model.predict(x_pred, verbose=0)[0]
                  next_index = sample(preds, diversity)
                  next_char = id2char[next_index]
                  sentence = sentence[1:] + next_char
def training_data_labels():
      X_data = []
      Y_data = []
      for sentence in sentences:
            for i in range(0, len(sentence) - maxlen, step):
                  X_data.append([char2id[c] for c in sentence[i: i + maxlen]])
                  y = np.zeros(vocab_size, dtype=np.bool)
                  y[char2id[sentence[i + maxlen]]] = 1
                  Y_data.append(y)
      X_data = np.array(X_data)
      Y_data = np.array(Y_data)
      X_data = X_data[:2000]
      Y_data = Y_data[:2000]
      return X_data, Y_data


## Membuat model

In [12]:
model = Sequential()
model.add(Embedding(vocab_size, embed_size, input_length=maxlen))
model.add(LSTM(hidden_size, input_shape=(maxlen, embed_size)))

model.add(Dense(vocab_size, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam')
model.summary()

X_data, Y_data = training_data_labels()
model.fit(X_data, Y_data, batch_size=batch_zise, epochs=epochs, callbacks=[LambdaCallback(on_epoch_end=on_epoch_end)])

model.save('./model/song_tf.h5')




Model: "sequential_1"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
 embedding (Embedding)       (None, 10, 128)           13184     
                                                                 
 lstm (LSTM)                 (None, 128)               131584    
                                                                 
 dense (Dense)               (None, 103)               13287     
                                                                 
Total params: 158,055
Trainable params: 158,055
Non-trainable params: 0
_________________________________________________________________


Deprecated in NumPy 1.20; for more details and guidance: https://numpy.org/devdocs/release/1.20.0-notes.html#deprecations
  y = np.zeros(vocab_size, dtype=np.bool)


Epoch 1/10
----- Generating with seed: Yeah, I st
Yeah, I st----- diversity: 0.5
----- Generating with seed: Yeah, I st
Yeah, I st----- diversity: 1.0
----- Generating with seed: Yeah, I st
Epoch 2/10
----- Generating with seed: Oh, oh, oh
Oh, oh, oh----- diversity: 0.5
----- Generating with seed: Oh, oh, oh
Oh, oh, oh----- diversity: 1.0
----- Generating with seed: Oh, oh, oh
Epoch 3/10
----- Generating with seed: You got a 
You got a ----- diversity: 0.5
----- Generating with seed: You got a 
You got a ----- diversity: 1.0
----- Generating with seed: You got a 
Epoch 4/10
----- Generating with seed: We them bo
We them bo----- diversity: 0.5
----- Generating with seed: We them bo
We them bo----- diversity: 1.0
----- Generating with seed: We them bo
Epoch 5/10
----- Generating with seed: I really h
I really h----- diversity: 0.5
----- Generating with seed: I really h
I really h----- diversity: 1.0
----- Generating with seed: I really h
Epoch 6/10
----- Generating with seed: She said s


## Load model & membuat lirik baru

In [14]:
from tensorflow.keras.models import load_model

import numpy as np
import pickle
import sys

maxlen = 10
model = load_model('./model/song_tf.h5')

with open('./char2id.pkl', 'rb') as fr:
      [char2id, id2char] = pickle.load(fr)

def sample(preds, diversity = 1.0):
      preds = np.asarray(preds).astype('float64')
      preds = np.log(preds + 1e-10) / diversity
      exp_preds = np.exp(preds)
      preds = exp_preds / np.sum(exp_preds)
      probas = np.random.multinomial(1, preds, 1)
      return np.argmax(probas)

sentence = "Enter new lyrics: "
sentence = sentence[:maxlen]

diversity = 1.0
print('----- generating with seed: ' + sentence)
print('----- diversity:', diversity)
sys.stdout.write(sentence)

for i in range(40):
      x_pred = np.zeros((1, maxlen))
      for t, char in enumerate(sentence):
            x_pred[0, t] = char2id[char]
      
      preds = model.predict(x_pred, verbose=0)[0]
      next_index = sample(preds, diversity)
      next_char = id2char[next_index]

      sentence = sentence[1:] + next_char
      sys.stdout.write(next_char)
      sys.stdout.flush()

print(sentence)

ValueError: too many values to unpack (expected 2)