# Task
Sajikan ringkasan hasil pelatihan dan evaluasi model OCR CRNN, beserta insight tentang kinerjanya dalam mengenali urutan digit MNIST.

## Persiapan Data MNIST

### Subtask:
Muat dataset MNIST dan lakukan preprocessing dasar seperti normalisasi pixel serta pembagian menjadi set pelatihan dan validasi.


**Reasoning**:
First, I'll import TensorFlow and Keras to load the MNIST dataset. Then I'll load the MNIST dataset using `tf.keras.datasets.mnist.load_data()`.



In [2]:
import tensorflow as tf

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = tf.keras.datasets.mnist.load_data()

print(f"Original training data shape: {x_train.shape}")
print(f"Original testing data shape: {x_test.shape}")

Downloading data from https://storage.googleapis.com/tensorflow/tf-keras-datasets/mnist.npz
[1m11490434/11490434[0m [32m━━━━━━━━━━━━━━━━━━━━[0m[37m[0m [1m2s[0m 0us/step
Original training data shape: (60000, 28, 28)
Original testing data shape: (10000, 28, 28)


**Reasoning**:
Now I will normalize the pixel values of the MNIST images to a range of [0, 1] and reshape them to include a channel dimension, as specified in the subtask instructions.



In [3]:
x_train = x_train.astype('float32') / 255.0
x_test = x_test.astype('float32') / 255.0

x_train = x_train.reshape(x_train.shape[0], 28, 28, 1)
x_test = x_test.reshape(x_test.shape[0], 28, 28, 1)

print(f"Normalized and reshaped training data shape: {x_train.shape}")
print(f"Normalized and reshaped testing data shape: {x_test.shape}")

Normalized and reshaped training data shape: (60000, 28, 28, 1)
Normalized and reshaped testing data shape: (10000, 28, 28, 1)


## Pembuatan Data Sintetis untuk CRNN

### Subtask:
Buat data gambar sintetik dengan menggabungkan beberapa gambar digit MNIST menjadi satu gambar panjang, dan buat label urutan yang sesuai untuk melatih model CRNN.


**Reasoning**:
The subtask requires defining a `max_sequence_length` and then creating a function `create_synthetic_data` to generate synthetic images and their corresponding sequence labels from the MNIST dataset. This function will select random digits, combine their images horizontally, and create padded labels. Finally, the function will be called for both training and test sets and the shapes of the generated data will be printed.



In [4]:
import numpy as np

# 1. Definisikan max_sequence_length
max_sequence_length = 5

# 2. Definisikan fungsi, misalnya create_synthetic_data
def create_synthetic_data(x_data, y_data, num_samples, max_sequence_length, original_image_width=28):
    synthetic_images = []
    synthetic_labels = []
    blank_char = 10  # Represent 'blank' digit, since MNIST digits are 0-9

    for _ in range(num_samples):
        # i. Secara acak pilih jumlah digit untuk digabungkan
        current_sequence_length = np.random.randint(1, max_sequence_length + 1)

        # ii. Secara acak pilih sejumlah gambar digit MNIST dan labelnya yang sesuai
        # Ensure we have enough unique indices to pick digits
        indices = np.random.choice(len(x_data), current_sequence_length, replace=False)
        selected_images = x_data[indices]
        selected_labels = y_data[indices]

        # iii. Gabungkan gambar-gambar digit yang dipilih secara horizontal
        # Initialize an empty canvas for the synthetic image
        synthetic_image_width = max_sequence_length * original_image_width
        combined_image = np.zeros((original_image_width, synthetic_image_width, 1), dtype=np.float32)

        # Place selected images onto the canvas
        current_x_offset = 0
        for img in selected_images:
            # Remove channel dimension temporarily to concatenate
            img_2d = img.squeeze()
            combined_image[:, current_x_offset:current_x_offset + original_image_width, 0] = img_2d
            current_x_offset += original_image_width

        synthetic_images.append(combined_image)

        # iv. Gabungkan label digit yang dipilih dan tambahkan karakter 'kosong'
        sequence_label = list(selected_labels)
        # Pad with blank_char to reach max_sequence_length
        sequence_label.extend([blank_char] * (max_sequence_length - current_sequence_length))
        synthetic_labels.append(sequence_label)

    # c. Konversi daftar menjadi array NumPy
    synthetic_images = np.array(synthetic_images)
    synthetic_labels = np.array(synthetic_labels)

    # d. Kembalikan array gambar sintetik dan label urutan
    return synthetic_images, synthetic_labels

# 4. Panggil fungsi create_synthetic_data untuk set pelatihan dan validasi/pengujian
num_train_samples = 50000
num_test_samples = 10000

print(f"Generating {num_train_samples} training samples...")
synth_x_train, synth_y_train = create_synthetic_data(x_train, y_train, num_train_samples, max_sequence_length)
print(f"Generating {num_test_samples} testing samples...")
synth_x_test, synth_y_test = create_synthetic_data(x_test, y_test, num_test_samples, max_sequence_length)

# 5. Cetak bentuk (shape) dari synth_x_train, synth_y_train, synth_x_test, dan synth_y_test
print(f"Shape of synthetic training images: {synth_x_train.shape}")
print(f"Shape of synthetic training labels: {synth_y_train.shape}")
print(f"Shape of synthetic testing images: {synth_x_test.shape}")
print(f"Shape of synthetic testing labels: {synth_y_test.shape}")

Generating 50000 training samples...
Generating 10000 testing samples...
Shape of synthetic training images: (50000, 28, 140, 1)
Shape of synthetic training labels: (50000, 5)
Shape of synthetic testing images: (10000, 28, 140, 1)
Shape of synthetic testing labels: (10000, 5)


**Reasoning**:
The previous code failed because `Model` was incorrectly imported from `tensorflow.keras.layers`. It should be imported directly from `tensorflow.keras`.



In [6]:
import tensorflow as tf
from tensorflow.keras.layers import Input, Conv2D, MaxPooling2D, Reshape, Bidirectional, LSTM, Dense, Activation
from tensorflow.keras import Model # Corrected import for Model
from tensorflow.keras import backend as K

# 2. Definisikan dimensi input
# input_shape: (height, width, channels)
input_shape = synth_x_train.shape[1:]
img_height, img_width, img_channels = input_shape

# Number of classes (0-9 digits + 1 for blank character)
num_classes = 10 + 1

# Input layer
input_img = Input(shape=input_shape, name='image_input')

# 3. Bangun bagian CNN dari model
# Layer 1
conv_1 = Conv2D(32, (3, 3), activation='relu', padding='same')(input_img)
pool_1 = MaxPooling2D(pool_size=(2, 2))(conv_1)

# Layer 2
conv_2 = Conv2D(64, (3, 3), activation='relu', padding='same')(pool_1)
pool_2 = MaxPooling2D(pool_size=(2, 2))(conv_2)

# Further convolutional layers to extract more features
conv_3 = Conv2D(128, (3, 3), activation='relu', padding='same')(pool_2)
pool_3 = MaxPooling2D(pool_size=(2, 1))(conv_3) # Pool only along height to preserve width for timesteps

conv_4 = Conv2D(256, (3, 3), activation='relu', padding='same')(pool_3)
pool_4 = MaxPooling2D(pool_size=(2, 1))(conv_4)

# Calculate dimensions after CNN layers
# Height will be reduced by 2*2*2*2 = 16 times in theory, but with (2,1) it's different
# Let's check output shape dynamically or calculate carefully
# Input height: 28
# pool_1 height: 28/2 = 14
# pool_2 height: 14/2 = 7
# pool_3 height: 7/2 = 3 (integer division, might need adjustment for exact CRNN alignment, but Keras handles this)
# pool_4 height: 3/2 = 1

# Input width: 140
# pool_1 width: 140/2 = 70
# pool_2 width: 70/2 = 35
# pool_3 width: 35/1 = 35
# pool_4 width: 35/1 = 35

# Get output shape from pool_4 to prepare for Reshape
cnn_output_shape = K.int_shape(pool_4)
# cnn_output_shape will be (None, height, width, channels)

# Reshape for RNN: (batch, timesteps, features)
# Timesteps will be the width of the CNN output
# Features will be height * channels of the CNN output
reshaped = Reshape(target_shape=(cnn_output_shape[2], cnn_output_shape[1] * cnn_output_shape[3]), name='reshape')(pool_4)

# 4. Bangun bagian RNN dari model
# Bidirectional LSTM layers
blstm_1 = Bidirectional(LSTM(128, return_sequences=True, dropout=0.25))(reshaped)
blstm_2 = Bidirectional(LSTM(64, return_sequences=True, dropout=0.25))(blstm_1)

# Output layer
output = Dense(num_classes, name='dense_output')(blstm_2)
output = Activation('softmax', name='softmax_output')(output)

# 5. Buat model Keras
model = Model(inputs=input_img, outputs=output)

# 6. Cetak ringkasan model
print(model.summary())

None


## Kompilasi dan Pelatihan Model CRNN

### Subtask:
Kompilasi model dengan optimizer yang sesuai dan fungsi loss Connectionist Temporal Classification (CTC), yang umum digunakan untuk pengenalan urutan. Kemudian, latih model menggunakan data sintetik yang telah dibuat.


**Reasoning**:
The subtask requires defining a custom CTC loss function, compiling the CRNN model with it, and then training the model. I will first define the `ctc_loss_func` that takes `y_true` and `y_pred` and internally uses `tf.keras.backend.ctc_batch_cost` with dynamically created `input_length` and `label_length` tensors based on the fixed output dimensions of the CNN and the `max_sequence_length`. Then, I will compile the `model` with this loss and an Adam optimizer. Finally, I will train the model using the synthetic training and testing data, specifying epochs and batch size.



In [None]:
import tensorflow as tf
from tensorflow.keras.optimizers import Adam

# 1. Definisikan fungsi ctc_loss
def ctc_loss_func(y_true, y_pred):
    # y_true: (batch_size, max_sequence_length)
    # y_pred: (batch_size, timesteps, num_classes)

    # The input_length for CTC needs to be a tensor of shape (batch_size, 1)
    # and contains the number of timesteps in y_pred for each sample.
    # In our case, this is fixed by the CNN's output width (cnn_output_shape[2]).
    batch_size = tf.shape(y_pred)[0]
    input_length = tf.fill(dims=(batch_size, 1), value=cnn_output_shape[2]) # cnn_output_shape[2] is 35

    # The label_length for CTC needs to be a tensor of shape (batch_size, 1)
    # and contains the length of the true labels (max_sequence_length).
    # In our case, this is fixed by max_sequence_length.
    label_length = tf.fill(dims=(batch_size, 1), value=max_sequence_length) # max_sequence_length is 5

    loss = tf.keras.backend.ctc_batch_cost(y_true, y_pred, input_length, label_length)
    return loss

# 2. Compile model with Adam optimizer and ctc_loss_func
optimizer = Adam()
model.compile(optimizer=optimizer, loss=ctc_loss_func)

# 3. Latih model
epochs = 10
batch_size = 64

print("Training the CRNN model...")
history = model.fit(
    x=synth_x_train,
    y=synth_y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(synth_x_test, synth_y_test),
    verbose=1
)

print("Model training complete.")

Training the CRNN model...
Epoch 1/10
[1m743/782[0m [32m━━━━━━━━━━━━━━━━━━━[0m[37m━[0m [1m2s[0m 52ms/step - loss: 8.3229

# Task
Sajikan ringkasan hasil pelatihan dan evaluasi model OCR CRNN, beserta insight tentang kinerjanya dalam mengenali urutan digit MNIST.

## Evaluasi Model dan Prediksi

### Subtask:
Evaluasi kinerja model CRNN yang telah terlatih pada set data pengujian sintetik. Tampilkan contoh prediksi untuk melihat bagaimana model mengenali urutan digit.


**Reasoning**:
First, I'll define the `decode_batch_predictions` function. This function will take the model's predictions (which are in CTC output format) and use `tf.keras.backend.ctc_decode` to convert them into a readable sequence of digits. It will also convert the sparse tensor output of `ctc_decode` into a dense NumPy array, which is easier to work with for display purposes.



In [None]:
import tensorflow as tf
import numpy as np

# 1. Definisikan fungsi decode_batch_predictions
def decode_batch_predictions(pred):
    # Use ctc_decode to get the decoded sequences. `greedy=False` for beam search if needed,
    # but for simplicity, `greedy=True` is often used first.
    # The second return value is the log-probability of the sequence, which we don't need for this task.
    input_len = np.ones(pred.shape[0]) * pred.shape[1]
    results = tf.keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0]

    # Convert sparse tensor to dense numpy array. Pad with `blank_char` (10) if the decoded sequence is shorter than max_sequence_length
    decoded_sequences = tf.sparse.to_dense(results, default_value=blank_char).numpy()
    return decoded_sequences

# Note: `blank_char` (10) and `max_sequence_length` (5) are already defined from previous steps.

**Reasoning**:
Now that the `decode_batch_predictions` function is defined, I will perform predictions on the test set, decode these predictions, create a mapping for digit representation, and then visualize 10 random examples from the test set, showing the synthetic image, actual label, and predicted label.



In [None]:
import matplotlib.pyplot as plt

# 2. Lakukan prediksi pada synth_x_test
predictions = model.predict(synth_x_test)

# 3. Gunakan fungsi decode_batch_predictions untuk mendekode predictions
decoded_predictions = decode_batch_predictions(predictions)

# 4. Definisikan pemetaan dari indeks digit ke karakter yang sesuai
digit_to_char = {i: str(i) for i in range(10)}
digit_to_char[blank_char] = '_'

def labels_to_string(labels):
    return ''.join([digit_to_char[digit] for digit in labels if digit != blank_char])

# 5. Ambil 10 contoh acak dari set pengujian dan tampilkan
print("Displaying 10 random examples from the test set:\n")

num_examples_to_display = 10
random_indices = np.random.choice(len(synth_x_test), num_examples_to_display, replace=False)

plt.figure(figsize=(15, 10))
for i, idx in enumerate(random_indices):
    # Get original and predicted labels
    original_label = synth_y_test[idx]
    predicted_label = decoded_predictions[idx]

    # Convert labels to human-readable strings
    original_str = labels_to_string(original_label)
    predicted_str = labels_to_string(predicted_label)

    plt.subplot(2, 5, i + 1)
    plt.imshow(synth_x_test[idx].squeeze(), cmap='gray')
    plt.title(f"True: {original_str}\nPred: {predicted_str}")
    plt.axis('off')

plt.tight_layout()
plt.show()

**Reasoning**:
The previous code failed because the `blank_char` variable was not globally defined, causing a `NameError` within the `decode_batch_predictions` function. I will re-define `blank_char` globally at the beginning of the cell where it is first used to ensure its availability.



In [None]:
import tensorflow as tf
import numpy as np

# Define blank_char globally as it's used in decode_batch_predictions
blank_char = 10  # Represent 'blank' digit, since MNIST digits are 0-9

# 1. Definisikan fungsi decode_batch_predictions
def decode_batch_predictions(pred):
    # Use ctc_decode to get the decoded sequences. `greedy=False` for beam search if needed,
    # but for simplicity, `greedy=True` is often used first.
    # The second return value is the log-probability of the sequence, which we don't need for this task.
    input_len = np.ones(pred.shape[0]) * pred.shape[1]
    results = tf.keras.backend.ctc_decode(pred, input_length=input_len, greedy=True)[0][0]

    # Convert sparse tensor to dense numpy array. Pad with `blank_char` (10) if the decoded sequence is shorter than max_sequence_length
    decoded_sequences = tf.sparse.to_dense(results, default_value=blank_char).numpy()
    return decoded_sequences

# Note: `max_sequence_length` (5) is already defined from previous steps.

**Reasoning**:
With `blank_char` now correctly defined and the `decode_batch_predictions` function ready, I will proceed to make predictions on the test set, decode them, map the decoded digits to characters, and finally visualize 10 random examples to demonstrate the model's performance.



In [None]:
import matplotlib.pyplot as plt

# 2. Lakukan prediksi pada synth_x_test
predictions = model.predict(synth_x_test)

# 3. Gunakan fungsi decode_batch_predictions untuk mendekode predictions
decoded_predictions = decode_batch_predictions(predictions)

# 4. Definisikan pemetaan dari indeks digit ke karakter yang sesuai
digit_to_char = {i: str(i) for i in range(10)}
digit_to_char[blank_char] = '_'

def labels_to_string(labels):
    return ''.join([digit_to_char[digit] for digit in labels if digit != blank_char])

# 5. Ambil 10 contoh acak dari set pengujian dan tampilkan
print("Displaying 10 random examples from the test set:\n")

num_examples_to_display = 10
random_indices = np.random.choice(len(synth_x_test), num_examples_to_display, replace=False)

plt.figure(figsize=(15, 10))
for i, idx in enumerate(random_indices):
    # Get original and predicted labels
    original_label = synth_y_test[idx]
    predicted_label = decoded_predictions[idx]

    # Convert labels to human-readable strings
    original_str = labels_to_string(original_label)
    predicted_str = labels_to_string(predicted_label)

    plt.subplot(2, 5, i + 1)
    plt.imshow(synth_x_test[idx].squeeze(), cmap='gray')
    plt.title(f"True: {original_str}\nPred: {predicted_str}")
    plt.axis('off')

plt.tight_layout()
plt.show()