<a href="https://colab.research.google.com/github/raven-gith/machinelearning1/blob/main/13.%20Chapter%2013/chapter_13_tfdata_preprocessing.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Chapter 13: Loading and Preprocessing Data with TensorFlow

Notebook ini mereproduksi dan menjelaskan isi Bab 13 dari buku _Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow_ oleh Aurélien Géron.

## 📘 Ringkasan Materi:

Bab ini membahas cara menggunakan **`tf.data` API** untuk memuat dan memproses data secara efisien, terutama untuk pelatihan skala besar.

### Materi utama:
1. Membuat dataset dengan `tf.data.Dataset`
2. Operasi chaining: `map()`, `shuffle()`, `batch()`, `prefetch()`
3. Preprocessing gambar (image loading pipeline)
4. Integrasi pipeline ke dalam training model


In [1]:

import tensorflow as tf
import numpy as np

# Dataset dummy
X = np.arange(10)
y = X * 2

dataset = tf.data.Dataset.from_tensor_slices((X, y))
for x_val, y_val in dataset:
    print(f"x: {x_val.numpy()}, y: {y_val.numpy()}")


x: 0, y: 0
x: 1, y: 2
x: 2, y: 4
x: 3, y: 6
x: 4, y: 8
x: 5, y: 10
x: 6, y: 12
x: 7, y: 14
x: 8, y: 16
x: 9, y: 18


In [2]:

# Pipeline
BATCH_SIZE = 4
ds = dataset.shuffle(buffer_size=10).batch(BATCH_SIZE).prefetch(1)

for batch in ds:
    print(batch)


(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([7, 6, 3, 4])>, <tf.Tensor: shape=(4,), dtype=int64, numpy=array([14, 12,  6,  8])>)
(<tf.Tensor: shape=(4,), dtype=int64, numpy=array([8, 0, 1, 2])>, <tf.Tensor: shape=(4,), dtype=int64, numpy=array([16,  0,  2,  4])>)
(<tf.Tensor: shape=(2,), dtype=int64, numpy=array([5, 9])>, <tf.Tensor: shape=(2,), dtype=int64, numpy=array([10, 18])>)


In [3]:

# Contoh pipeline image loading
import os
from tensorflow.keras.preprocessing.image import save_img

# Simulasi 3 file gambar kecil 8x8
img_dir = "example_images"
os.makedirs(img_dir, exist_ok=True)
for i in range(3):
    img_array = np.random.rand(8, 8, 3) * 255
    save_img(f"{img_dir}/img_{i}.png", img_array)

# Buat dataset dari folder
image_ds = tf.data.Dataset.list_files(f"{img_dir}/*.png")

# Parsing function
def process_img(filepath):
    img = tf.io.read_file(filepath)
    img = tf.image.decode_png(img, channels=3)
    img = tf.image.resize(img, [32, 32]) / 255.0
    return img

# Apply pipeline
image_ds = image_ds.map(process_img).batch(2)

for batch in image_ds:
    print("Batch shape:", batch.shape)


Batch shape: (2, 32, 32, 3)
Batch shape: (1, 32, 32, 3)
