# Dental Disease Classification - Training on Kaggle

Notebook n√†y ƒë·ªÉ train CNN model ph√°t hi·ªán b·ªánh rƒÉng mi·ªáng tr·ª±c ti·∫øp tr√™n Kaggle.

**Dataset**: [Oral Diseases by salmansajid05](https://www.kaggle.com/datasets/salmansajid05/oral-diseases)

**Classes**:
- Normal (rƒÉng kh·ªèe m·∫°nh)
- Cavity (s√¢u rƒÉng)
- Gingivitis (vi√™m n∆∞·ªõu)
- Plaque (m·∫£ng b√°m)

---

## ‚ö†Ô∏è L∆ØU √ù QUAN TR·ªåNG:
**PH·∫¢I ch·∫°y c√°c cell theo th·ª© t·ª± t·ª´ tr√™n xu·ªëng d∆∞·ªõi!**
1. Kh√¥ng skip b·∫•t k·ª≥ cell n√†o
2. ƒê·ª£i cell tr∆∞·ªõc ch·∫°y xong m·ªõi ch·∫°y cell ti·∫øp theo
3. ƒê·∫∑c bi·ªát cell "Train Model" ph·∫£i ch·∫°y xong tr∆∞·ªõc khi ch·∫°y visualization

**Recommended:** Click "Run All" ƒë·ªÉ ch·∫°y t·∫•t c·∫£ cells theo th·ª© t·ª± t·ª± ƒë·ªông.

In [None]:
import numpy as np
import pandas as pd
import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
from tensorflow.keras.preprocessing.image import ImageDataGenerator
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.metrics import confusion_matrix, classification_report
import os
import shutil
import gc

# Clear memory
gc.collect()
tf.keras.backend.clear_session()

# Suppress warnings
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '2'

print("‚úÖ Single GPU mode (most stable)")
print("‚úÖ This avoids Multi-GPU synchronization issues")

# Check GPUs
gpus = tf.config.list_physical_devices('GPU')
print(f"‚úÖ Available GPUs: {len(gpus)}")

# Set memory growth to prevent OOM
if gpus:
    try:
        for gpu in gpus:
            tf.config.experimental.set_memory_growth(gpu, True)
        print("‚úÖ GPU memory growth enabled")
    except RuntimeError as e:
        print(f"‚ö†Ô∏è {e}")

print(f"\nüöÄ TensorFlow version: {tf.__version__}")
print(f"üöÄ Training config: Single GPU + float32 (stable training)")


In [None]:
# Dataset path tr√™n Kaggle
dataset_path = '/kaggle/input/oral-diseases'

# Ki·ªÉm tra c·∫•u tr√∫c dataset
print("üìÅ Dataset structure:")
for root, dirs, files in os.walk(dataset_path):
    level = root.replace(dataset_path, '').count(os.sep)
    indent = ' ' * 2 * level
    print(f'{indent}{os.path.basename(root)}/')
    if level < 2:  # Ch·ªâ show 2 levels
        subindent = ' ' * 2 * (level + 1)
        for file in files[:3]:  # Show 3 files ƒë·∫ßu
            print(f'{subindent}{file}')
        if len(files) > 3:
            print(f'{subindent}... and {len(files)-3} more files')

In [None]:
IMG_SIZE = 224
BATCH_SIZE = 32  # Single GPU - kh√¥ng nh√¢n v·ªõi num_replicas
EPOCHS = 30

print(f"üìä Batch size: {BATCH_SIZE}")
print(f"üìä Single GPU training")

# Data augmentation - lightweight
train_datagen = ImageDataGenerator(
    rescale=1./255,
    rotation_range=15,
    width_shift_range=0.15,
    height_shift_range=0.15,
    horizontal_flip=True,
    validation_split=0.2
)

val_datagen = ImageDataGenerator(rescale=1./255)

# Try structure with train/val folders first
try:
    train_dir = os.path.join(dataset_path, 'train')
    val_dir = os.path.join(dataset_path, 'val')
    
    if os.path.exists(train_dir) and os.path.exists(val_dir):
        print("‚úÖ Detected train/val folder structure")
        
        train_generator = train_datagen.flow_from_directory(
            train_dir,
            target_size=(IMG_SIZE, IMG_SIZE),
            batch_size=BATCH_SIZE,
            class_mode='categorical',
            shuffle=True
        )
        
        val_generator = val_datagen.flow_from_directory(
            val_dir,
            target_size=(IMG_SIZE, IMG_SIZE),
            batch_size=BATCH_SIZE,
            class_mode='categorical',
            shuffle=False
        )
    else:
        raise FileNotFoundError("Train/val folders not found")
        
except:
    # Fallback: single folder with validation_split
    print("‚úÖ Using single folder structure with validation_split=0.2")
    
    train_generator = train_datagen.flow_from_directory(
        dataset_path,
        target_size=(IMG_SIZE, IMG_SIZE),
        batch_size=BATCH_SIZE,
        class_mode='categorical',
        subset='training',
        shuffle=True
    )
    
    val_generator = train_datagen.flow_from_directory(
        dataset_path,
        target_size=(IMG_SIZE, IMG_SIZE),
        batch_size=BATCH_SIZE,
        class_mode='categorical',
        subset='validation',
        shuffle=False
    )

print(f"\nüìä Training samples: {train_generator.samples}")
print(f"üìä Validation samples: {val_generator.samples}")
print(f"üìÇ Classes: {train_generator.class_indices}")
print(f"üíæ Memory optimization: Batch size 32, no caching")

In [None]:
num_classes = len(train_generator.class_indices)

# Build model - NO strategy scope (single GPU)
model = keras.Sequential([
    # Block 1
    layers.Conv2D(32, (3, 3), activation='relu', input_shape=(IMG_SIZE, IMG_SIZE, 3)),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.25),
    
    # Block 2
    layers.Conv2D(64, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.25),
    
    # Block 3
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.25),
    
    # Block 4
    layers.Conv2D(128, (3, 3), activation='relu'),
    layers.MaxPooling2D((2, 2)),
    layers.Dropout(0.25),
    
    # Classifier
    layers.Flatten(),
    layers.Dense(256, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(128, activation='relu'),
    layers.Dropout(0.5),
    layers.Dense(num_classes, activation='softmax')
])

# Optimizer v·ªõi learning rate conservative
optimizer = keras.optimizers.Adam(
    learning_rate=0.001,
    clipvalue=1.0  # Gradient clipping
)

model.compile(
    optimizer=optimizer,
    loss='categorical_crossentropy',
    metrics=['accuracy']
)

print("‚úÖ Model built (single GPU, no strategy)")
print("‚úÖ Gradient clipping enabled")
model.summary()

In [None]:
# Callbacks
checkpoint = keras.callbacks.ModelCheckpoint(
    'best_dental_model.h5',
    monitor='val_accuracy',
    save_best_only=True,
    mode='max',
    verbose=1
)

early_stopping = keras.callbacks.EarlyStopping(
    monitor='val_loss',
    patience=10,
    restore_best_weights=True,
    verbose=1
)

reduce_lr = keras.callbacks.ReduceLROnPlateau(
    monitor='val_loss',
    factor=0.5,
    patience=3,
    min_lr=0.00001,
    verbose=1
)

# NaN check callback
class NaNCheckCallback(keras.callbacks.Callback):
    def on_epoch_end(self, epoch, logs=None):
        if logs and (np.isnan(logs['loss']) or np.isnan(logs['accuracy'])):
            print("\n‚ùå NaN detected! Stopping training...")
            self.model.stop_training = True

nan_check = NaNCheckCallback()

print(f"üöÄ Training Configuration:")
print(f"  ‚îú‚îÄ GPU: Single T4 GPU (no Multi-GPU overhead)")
print(f"  ‚îú‚îÄ Precision: float32 (stable)")
print(f"  ‚îú‚îÄ Batch size: {BATCH_SIZE}")
print(f"  ‚îú‚îÄ Learning rate: 0.001")
print(f"  ‚îú‚îÄ Gradient clipping: 1.0")
print(f"  ‚îú‚îÄ Model: Simplified (no BatchNorm)")
print(f"  ‚îî‚îÄ Total epochs: {EPOCHS}")
print("\n‚è≥ Starting training...")
print("üí° Single GPU = simpler, more stable, still fast!")

# Calculate steps per epoch
steps_per_epoch = train_generator.samples // BATCH_SIZE
validation_steps = val_generator.samples // BATCH_SIZE

# Training
history = model.fit(
    train_generator,
    steps_per_epoch=steps_per_epoch,
    epochs=EPOCHS,
    validation_data=val_generator,
    validation_steps=validation_steps,
    callbacks=[checkpoint, early_stopping, reduce_lr, nan_check],
    verbose=2
)

print("\n‚úÖ Training completed!")

In [None]:
# Check if training was completed
try:
    history.history
except NameError:
    print("‚ùå Error: Training ch∆∞a ƒë∆∞·ª£c ch·∫°y!")
    print("üëâ H√£y ch·∫°y cell 'B∆∞·ªõc 5: Train Model' tr∆∞·ªõc khi ch·∫°y cell n√†y")
    raise NameError("Bi·∫øn 'history' ch∆∞a ƒë∆∞·ª£c ƒë·ªãnh nghƒ©a. Ch·∫°y cell training tr∆∞·ªõc!")

# Plot accuracy and loss
fig, axes = plt.subplots(1, 2, figsize=(15, 5))

# Accuracy
axes[0].plot(history.history['accuracy'], label='Training Accuracy', marker='o')
axes[0].plot(history.history['val_accuracy'], label='Validation Accuracy', marker='s')
axes[0].set_title('Model Accuracy', fontsize=14, fontweight='bold')
axes[0].set_xlabel('Epoch')
axes[0].set_ylabel('Accuracy')
axes[0].legend()
axes[0].grid(True, alpha=0.3)

# Loss
axes[1].plot(history.history['loss'], label='Training Loss', marker='o')
axes[1].plot(history.history['val_loss'], label='Validation Loss', marker='s')
axes[1].set_title('Model Loss', fontsize=14, fontweight='bold')
axes[1].set_xlabel('Epoch')
axes[1].set_ylabel('Loss')
axes[1].legend()
axes[1].grid(True, alpha=0.3)

plt.tight_layout()
plt.show()

# Print final results
print(f"\nüìà Final Training Accuracy: {history.history['accuracy'][-1]:.4f}")
print(f"üìà Final Validation Accuracy: {history.history['val_accuracy'][-1]:.4f}")
print(f"üìâ Final Training Loss: {history.history['loss'][-1]:.4f}")
print(f"üìâ Final Validation Loss: {history.history['val_loss'][-1]:.4f}")

In [None]:
# Check if model and training data exist
try:
    model
    val_generator
except NameError:
    print("‚ùå Error: Model ho·∫∑c validation data ch∆∞a c√≥!")
    print("üëâ Ch·∫°y c√°c cell tr∆∞·ªõc ƒë√≥ theo th·ª© t·ª± t·ª´ B∆∞·ªõc 1 ƒë·∫øn B∆∞·ªõc 5")
    raise NameError("H√£y ch·∫°y c√°c cell training tr∆∞·ªõc!")

# Get predictions
val_generator.reset()
predictions = model.predict(val_generator, verbose=1)
y_pred = np.argmax(predictions, axis=1)
y_true = val_generator.classes

# Confusion Matrix
cm = confusion_matrix(y_true, y_pred)
plt.figure(figsize=(10, 8))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', 
            xticklabels=train_generator.class_indices.keys(),
            yticklabels=train_generator.class_indices.keys())
plt.title('Confusion Matrix', fontsize=16, fontweight='bold')
plt.ylabel('True Label')
plt.xlabel('Predicted Label')
plt.tight_layout()
plt.show()

# Classification Report
print("\nüìä Classification Report:")
print("="*60)
print(classification_report(y_true, y_pred, 
                          target_names=train_generator.class_indices.keys()))

In [None]:
# Check if model exists
try:
    model
    history
    train_generator
except NameError:
    print("‚ùå Error: Model ch∆∞a ƒë∆∞·ª£c train!")
    print("üëâ Ch·∫°y t·∫•t c·∫£ c√°c cell t·ª´ ƒë·∫ßu theo th·ª© t·ª±")
    raise NameError("H√£y train model tr∆∞·ªõc khi save!")

# Save final model
final_model_path = '/kaggle/working/dental_model_final.h5'
model.save(final_model_path)
print(f"‚úÖ Model saved to: {final_model_path}")

# Also save as backup
model.save('/kaggle/working/dental_model_backup.h5')
print("‚úÖ Backup model saved!")

# Save training history
import json
history_path = '/kaggle/working/training_history.json'
with open(history_path, 'w') as f:
    json.dump(history.history, f)
print(f"‚úÖ Training history saved to: {history_path}")

# Save class indices
class_indices_path = '/kaggle/working/class_indices.json'
with open(class_indices_path, 'w') as f:
    json.dump(train_generator.class_indices, f)
print(f"‚úÖ Class indices saved to: {class_indices_path}")

print("\n" + "="*60)
print("üéâ TRAINING HO√ÄN T·∫§T!")
print("="*60)
print("\nüì• Download c√°c file n√†y t·ª´ Output c·ªßa notebook:")
print("  1. dental_model_final.h5 - Model ch√≠nh")
print("  2. dental_model_backup.h5 - Model backup")
print("  3. training_history.json - L·ªãch s·ª≠ training")
print("  4. class_indices.json - Mapping c√°c class")
print("\nüí° B·ªè file dental_model_final.h5 v√†o th∆∞ m·ª•c models/ c·ªßa project")

## üìù H∆∞·ªõng d·∫´n s·ª≠ d·ª•ng tr√™n Kaggle

### 1. T·∫°o Notebook m·ªõi tr√™n Kaggle
- V√†o https://www.kaggle.com/
- Click "Create" ‚Üí "New Notebook"

### 2. Add Dataset
- Click "Add Data" ·ªü panel b√™n ph·∫£i
- T√¨m "oral-diseases" by salmansajid05
- Click "Add" ƒë·ªÉ th√™m v√†o notebook

### 3. Enable GPU (**B·∫ÆT BU·ªòC** - Ch·ªçn 2 GPUs!)
- Click "Session Options" (bi·ªÉu t∆∞·ª£ng b√°nh rƒÉng)
- Ch·ªçn "Accelerator" ‚Üí **"GPU T4 x2"** (2 GPUs ƒë·ªÉ tƒÉng t·ªëc g·∫•p ƒë√¥i!)
- Click "Save"

### 4. Run All Cells
- Click "Run All" ho·∫∑c ch·∫°y t·ª´ng cell
- ‚ö° **Ch·ªâ m·∫•t kho·∫£ng 5-10 ph√∫t** v·ªõi 2 T4 GPUs (thay v√¨ 30-60 ph√∫t!)

### 5. Download Model
- Sau khi ch·∫°y xong, v√†o tab "Output" ·ªü panel b√™n ph·∫£i
- Download file `dental_model_final.h5`
- Copy v√†o th∆∞ m·ª•c `models/` c·ªßa project local

### 6. Test Model Local
```python
from tensorflow import keras
model = keras.models.load_model('models/dental_model_final.h5')
```

---

## üöÄ T·ªëi ∆∞u h√≥a ƒë√£ √°p d·ª•ng:

1. **Multi-GPU Training** (2x T4):
   - T·ª± ƒë·ªông ph√¢n ph·ªëi training tr√™n 2 GPUs
   - TƒÉng t·ªëc **2x** t·ª´ parallel processing
   
2. **Mixed Precision (float16)**:
   - Gi·∫£m memory usage 50%
   - TƒÉng t·ªëc **2-3x** tr√™n Tensor Cores c·ªßa T4
   
3. **XLA (Accelerated Linear Algebra)**:
   - Just-in-time compilation
   - TƒÉng t·ªëc th√™m **10-30%**
   
4. **Batch Size Scaling**:
   - 64 per GPU = 128 total batch size
   - Faster convergence v·ªõi large batches
   
5. **Data Pipeline Optimization**:
   - `cache()`: Cache data in memory
   - `prefetch()`: Load next batch trong khi training
   - Lo·∫°i b·ªè I/O bottleneck
   
6. **Learning Rate Scaling**:
   - T·ª± ƒë·ªông scale theo s·ªë GPUs
   - Stable training v·ªõi large batches

**K·∫øt qu·∫£:** Training nhanh h∆°n **4-6 l·∫ßn** so v·ªõi single GPU kh√¥ng optimize! ‚ö°

---

**L∆∞u √Ω:**
- Dataset t·ª± ƒë·ªông c√≥ t·∫°i `/kaggle/input/oral-diseases/`
- Model t·ª± ƒë·ªông l∆∞u v√†o `/kaggle/working/`
- **PH·∫¢I ch·ªçn GPU T4 x2** ƒë·ªÉ c√≥ 2 GPUs
- Training time: ~5-10 ph√∫t thay v√¨ 30-60 ph√∫t

## B∆∞·ªõc 8: Save Model

Model s·∫Ω ƒë∆∞·ª£c l∆∞u v√†o `/kaggle/working/` v√† t·ª± ƒë·ªông c√≥ trong Output c·ªßa notebook.

## B∆∞·ªõc 7: Evaluate on Validation Set

## B∆∞·ªõc 6: Visualize Training Results

## B∆∞·ªõc 5: Train Model

## B∆∞·ªõc 4: Build CNN Model

## B∆∞·ªõc 3: Auto-detect Dataset Structure & Create Data Generators

## B∆∞·ªõc 2: Setup Dataset Path

**L∆∞u √Ω**: Tr√™n Kaggle, add dataset v√†o notebook:
1. Click "Add Data" ·ªü b√™n ph·∫£i
2. T√¨m "oral-diseases" by salmansajid05
3. Add v√†o notebook
4. Dataset s·∫Ω c√≥ t·∫°i `/kaggle/input/oral-diseases/`

## B∆∞·ªõc 1: Import Libraries