# Voice Cloning & Augmentation untuk Training Dataset

Program ini akan menggunakan **semua file suara** dari setiap kategori (buka & tutup) di folder `voice ori` dan menghasilkan **tepat 100 file** untuk setiap kategori dengan penamaan konsisten (buka1, buka2, ..., buka100 dan tutup1, tutup2, ..., tutup100).

**Source Files:**
- Kategori Buka: 9 files (buka1.mp3 - buka9.mp3)
- Kategori Tutup: 6 files (tutup1.mp3 - tutup6.mp3)

Teknik augmentasi yang digunakan:
- Pitch Shifting
- Time Stretching
- Adding Noise
- Speed Change
- Volume Change
- Kombinasi teknik random

## 1. Install Dependencies

In [1]:
# Install required libraries
%pip install librosa soundfile audiomentations pydub numpy scipy

Collecting librosaNote: you may need to restart the kernel to use updated packages.

  Using cached librosa-0.11.0-py3-none-any.whl.metadata (8.7 kB)
Collecting soundfile
  Using cached soundfile-0.13.1-py2.py3-none-win_amd64.whl.metadata (16 kB)
Collecting audiomentations
  Downloading audiomentations-0.42.0-py3-none-any.whl.metadata (11 kB)
Collecting pydub
  Using cached pydub-0.25.1-py2.py3-none-any.whl.metadata (1.4 kB)
Collecting numpy
  Downloading numpy-2.0.2-cp39-cp39-win_amd64.whl.metadata (59 kB)
Collecting scipy
  Downloading scipy-1.13.1-cp39-cp39-win_amd64.whl.metadata (60 kB)
Collecting audioread>=2.1.9 (from librosa)
  Using cached audioread-3.1.0-py3-none-any.whl.metadata (9.0 kB)
Collecting numba>=0.51.0 (from librosa)
  Downloading numba-0.60.0-cp39-cp39-win_amd64.whl.metadata (2.8 kB)
Collecting scikit-learn>=1.1.0 (from librosa)
  Downloading scikit_learn-1.6.1-cp39-cp39-win_amd64.whl.metadata (15 kB)
Collecting joblib>=1.0 (from librosa)
  Downloading joblib-1.5.2

In [2]:
!pip install tqdm

Collecting tqdm
  Using cached tqdm-4.67.1-py3-none-any.whl.metadata (57 kB)
Using cached tqdm-4.67.1-py3-none-any.whl (78 kB)
Installing collected packages: tqdm
Successfully installed tqdm-4.67.1


## 2. Import Libraries

In [3]:
import os
import librosa
import soundfile as sf
import numpy as np
from audiomentations import Compose, AddGaussianNoise, TimeStretch, PitchShift, Shift
from pydub import AudioSegment
import random
from tqdm import tqdm



## 3. Setup Paths

In [4]:
# Define paths
base_path = "dataset"
output_path = "dataset_augmented"

# Target: 100 files per category
TARGET_FILES_PER_CATEGORY = 200

# Create output directories
os.makedirs(os.path.join(output_path, "buka"), exist_ok=True)
os.makedirs(os.path.join(output_path, "tutup"), exist_ok=True)

# Scan all audio files in each category
def get_audio_files(category_path):
    """Get all audio files (mp3, wav) from a directory"""
    audio_extensions = ['.mp3', '.wav', '.m4a', '.flac']
    files = []
    if os.path.exists(category_path):
        for file in os.listdir(category_path):
            if any(file.lower().endswith(ext) for ext in audio_extensions):
                files.append(file)
    return sorted(files)

# Get all source files for each category
SOURCE_FILES = {
    'buka': get_audio_files(os.path.join(base_path, 'buka')),
    'tutup': get_audio_files(os.path.join(base_path, 'tutup'))
}

print(f"Base path: {base_path}")
print(f"Output path: {output_path}")
print(f"Target: {TARGET_FILES_PER_CATEGORY} files per category")
print(f"\nSource files found:")
for category, files in SOURCE_FILES.items():
    print(f"  {category}: {len(files)} files")
    for file in files:
        print(f"    - {file}")

Base path: dataset
Output path: dataset_augmented
Target: 200 files per category

Source files found:
  buka: 20 files
    - Recording (10).mp3
    - Recording (11).mp3
    - Recording (12).mp3
    - Recording (13).mp3
    - Recording (14).mp3
    - Recording (15).mp3
    - Recording (16).mp3
    - Recording (17).mp3
    - Recording (18).mp3
    - Recording (19).mp3
    - Recording (2).mp3
    - Recording (20).mp3
    - Recording (3).mp3
    - Recording (4).mp3
    - Recording (5).mp3
    - Recording (6).mp3
    - Recording (7).mp3
    - Recording (8).mp3
    - Recording (9).mp3
    - Recording.mp3
  tutup: 20 files
    - Recording (10).mp3
    - Recording (11).mp3
    - Recording (12).mp3
    - Recording (13).mp3
    - Recording (14).mp3
    - Recording (15).mp3
    - Recording (16).mp3
    - Recording (17).mp3
    - Recording (18).mp3
    - Recording (19).mp3
    - Recording (2).mp3
    - Recording (20).mp3
    - Recording (3).mp3
    - Recording (4).mp3
    - Recording (5).mp3
    -

## 4. Fungsi Augmentasi Audio

In [5]:
def load_audio(file_path):
    try:
        audio, sr = librosa.load(file_path, sr=None)
        return audio, sr
    except Exception as e:
        print(f"Error loading {file_path}: {e}")
        return None, None

def save_audio(audio, sr, output_path):
    try:
        sf.write(output_path, audio, sr)
    except Exception as e:
        print(f"Error saving {output_path}: {e}")

In [6]:
def pitch_shift_augmentation(audio, sr, n_steps_list=[2, -2, 3, -3]):
    augmented_audios = []
    for n_steps in n_steps_list:
        shifted = librosa.effects.pitch_shift(audio, sr=sr, n_steps=n_steps)
        augmented_audios.append((shifted, f"pitch_{n_steps}"))
    return augmented_audios

def time_stretch_augmentation(audio, rates=[0.9, 1.1, 0.85, 1.15]):
    augmented_audios = []
    for rate in rates:
        stretched = librosa.effects.time_stretch(audio, rate=rate)
        augmented_audios.append((stretched, f"timestretch_{rate}"))
    return augmented_audios

def add_noise_augmentation(audio, sr, noise_levels=[0.005, 0.01, 0.015]):
    augmented_audios = []
    for noise_level in noise_levels:
        noise = np.random.randn(len(audio)) * noise_level
        noisy_audio = audio + noise
        # Normalize
        noisy_audio = noisy_audio / np.max(np.abs(noisy_audio))
        augmented_audios.append((noisy_audio, f"noise_{noise_level}"))
    return augmented_audios

def speed_change_augmentation(audio, sr, speed_factors=[1.1, 0.9, 1.2, 0.8]):
    augmented_audios = []
    for speed in speed_factors:
        indices = np.round(np.arange(0, len(audio), speed)).astype(int)
        indices = indices[indices < len(audio)]
        changed = audio[indices]
        augmented_audios.append((changed, f"speed_{speed}"))
    return augmented_audios

def volume_change_augmentation(audio, volume_factors=[1.2, 0.8, 1.3, 0.7]):
    augmented_audios = []
    for volume in volume_factors:
        changed = audio * volume
        # Clip to prevent distortion
        changed = np.clip(changed, -1.0, 1.0)
        augmented_audios.append((changed, f"volume_{volume}"))
    return augmented_audios

In [7]:
def generate_diverse_augmentation(audio, sr):
    augmented_audios = []
    
    # 1. Pitch shift variations (berbagai level)
    pitch_steps = [-4, -3, -2, -1, 1, 2, 3, 4, -2.5, 2.5]
    for step in pitch_steps:
        shifted = librosa.effects.pitch_shift(audio, sr=sr, n_steps=step)
        augmented_audios.append(shifted)
    
    # 2. Time stretch variations
    time_rates = [0.8, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.2, 0.92, 1.08]
    for rate in time_rates:
        stretched = librosa.effects.time_stretch(audio, rate=rate)
        augmented_audios.append(stretched)
    
    # 3. Noise variations
    noise_levels = [0.003, 0.005, 0.007, 0.01, 0.012, 0.015, 0.02, 0.004, 0.008, 0.018]
    for noise_level in noise_levels:
        noise = np.random.randn(len(audio)) * noise_level
        noisy = audio + noise
        noisy = noisy / (np.max(np.abs(noisy)) + 1e-8)
        augmented_audios.append(noisy)
    
    # 4. Speed variations
    speed_factors = [0.75, 0.85, 0.9, 0.95, 1.05, 1.1, 1.15, 1.25, 0.88, 1.12]
    for speed in speed_factors:
        indices = np.round(np.arange(0, len(audio), speed)).astype(int)
        indices = indices[indices < len(audio)]
        changed = audio[indices]
        augmented_audios.append(changed)
    
    # 5. Volume variations
    volume_factors = [0.6, 0.7, 0.8, 0.9, 1.1, 1.2, 1.3, 1.4, 0.75, 1.25]
    for volume in volume_factors:
        changed = audio * volume
        changed = np.clip(changed, -1.0, 1.0)
        augmented_audios.append(changed)
    
    # 6. Combined augmentations (random combinations)
    num_combinations = 50
    for i in range(num_combinations):
        aug_audio = audio.copy()
        
        # Random pitch (50% chance)
        if random.random() > 0.5:
            n_steps = random.uniform(-3, 3)
            aug_audio = librosa.effects.pitch_shift(aug_audio, sr=sr, n_steps=n_steps)
        
        # Random time stretch (50% chance)
        if random.random() > 0.5:
            rate = random.uniform(0.85, 1.15)
            aug_audio = librosa.effects.time_stretch(aug_audio, rate=rate)
        
        # Random noise (60% chance)
        if random.random() > 0.4:
            noise_level = random.uniform(0.003, 0.015)
            noise = np.random.randn(len(aug_audio)) * noise_level
            aug_audio = aug_audio + noise
            aug_audio = aug_audio / (np.max(np.abs(aug_audio)) + 1e-8)
        
        # Random volume (50% chance)
        if random.random() > 0.5:
            volume = random.uniform(0.7, 1.3)
            aug_audio = aug_audio * volume
            aug_audio = np.clip(aug_audio, -1.0, 1.0)
        
        augmented_audios.append(aug_audio)
    
    return augmented_audios

## 5. Fungsi Augmentasi untuk Menghasilkan 100 File dari Multiple Source Files

In [8]:
def augment_multiple_files_to_target(source_files, category, base_path, output_path, target_count=100):
    
    print(f"\n{'='*60}")
    print(f"Processing {category.upper()}")
    print(f"Source files: {len(source_files)} files")
    print(f"Target: {target_count} files")
    print(f"{'='*60}")
    
    if not source_files:
        print(f"Error: No source files found for {category}")
        return 0
    
    output_dir = os.path.join(output_path, category)
    files_created = 0
    all_augmented = []
    
    # Process each source file
    for idx, filename in enumerate(source_files, 1):
        source_file = os.path.join(base_path, category, filename)
        
        print(f"\n[{idx}/{len(source_files)}] Loading: {filename}")
        
        # Load audio
        audio, sr = load_audio(source_file)
        if audio is None:
            print(f"  Error: Failed to load {source_file}")
            continue
        
        # Add original
        all_augmented.append((audio, sr, f"original_{filename}"))
        
        # Generate augmented versions
        print(f"  Generating augmentations...")
        augmented_audios = generate_diverse_augmentation(audio, sr)
        
        # Add all augmented with metadata
        for aug_audio in augmented_audios:
            all_augmented.append((aug_audio, sr, f"aug_{filename}"))
        
        print(f"  Generated: {len(augmented_audios)} variations")
    
    # Shuffle all augmented files for variety
    print(f"\nTotal generated: {len(all_augmented)} audio samples")
    print(f"Selecting {target_count} samples...")
    random.shuffle(all_augmented)
    
    # Select exactly target_count files
    selected = all_augmented[:target_count]
    
    # Save with sequential numbering
    print(f"\nSaving {target_count} files...")
    for i, (aug_audio, sr, source_info) in enumerate(selected, 1):
        output_file = os.path.join(output_dir, f"{category}{i}.wav")
        save_audio(aug_audio, sr, output_file)
        files_created += 1
        
        # Progress indicator
        if i % 10 == 0:
            print(f"  Progress: {i}/{target_count} files saved")
    
    print(f"✓ Completed: {files_created} files created for '{category}'")
    return files_created


def augment_dataset_100_per_category(base_path, output_path, source_files, target_count=100):
    stats = {}
    
    for category, filenames in source_files.items():
        if not filenames:
            print(f"Warning: No source files found for category: {category}")
            stats[category] = 0
            continue
        
        # Augment multiple files to target count
        files_created = augment_multiple_files_to_target(
            filenames,
            category, 
            base_path,
            output_path, 
            target_count
        )
        stats[category] = files_created
    
    return stats

## 6. Jalankan Augmentasi (Multiple Source Files → 100 Files per Category)

In [9]:
# Set random seed untuk reproducibility (optional)
random.seed(42)
np.random.seed(42)

print("="*60)
print("VOICE AUGMENTATION: Multiple Files -> 100 Files per Category")
print("="*60)

# Jalankan augmentasi
stats = augment_dataset_100_per_category(
    base_path, 
    output_path, 
    SOURCE_FILES, 
    TARGET_FILES_PER_CATEGORY
)

VOICE AUGMENTATION: Multiple Files -> 100 Files per Category

Processing BUKA
Source files: 20 files
Target: 200 files

[1/20] Loading: Recording (10).mp3
  Generating augmentations...
  Generated: 100 variations

[2/20] Loading: Recording (11).mp3
  Generating augmentations...
  Generated: 100 variations

[3/20] Loading: Recording (12).mp3
  Generating augmentations...
  Generated: 100 variations

[4/20] Loading: Recording (13).mp3
  Generating augmentations...
  Generated: 100 variations

[5/20] Loading: Recording (14).mp3
  Generating augmentations...
  Generated: 100 variations

[6/20] Loading: Recording (15).mp3
  Generating augmentations...
  Generated: 100 variations

[7/20] Loading: Recording (16).mp3
  Generating augmentations...
  Generated: 100 variations

[8/20] Loading: Recording (17).mp3
  Generating augmentations...
  Generated: 100 variations

[9/20] Loading: Recording (18).mp3
  Generating augmentations...
  Generated: 100 variations

[10/20] Loading: Recording (19).mp

## 7. Verifikasi Hasil & Statistik

In [10]:
print("\n" + "="*60)
print("AUGMENTATION RESULTS")
print("="*60)

for category, count in stats.items():
    print(f"\n{category.upper()}: {count} files")
    
    # Verify files
    category_path = os.path.join(output_path, category)
    actual_files = [f for f in os.listdir(category_path) if f.endswith('.wav')]
    print(f"  Verified: {len(actual_files)} files in folder")
    
    # Show sample filenames
    print(f"  Sample filenames:")
    for i in [1, 2, 3, 50, 99, 100]:
        expected_file = f"{category}{i}.wav"
        if expected_file in actual_files:
            print(f"    ✓ {expected_file}")
        else:
            print(f"    ✗ {expected_file} (missing)")

total_files = sum(stats.values())
print(f"\n{'='*60}")
print(f"TOTAL FILES CREATED: {total_files}")
print(f"{'='*60}")

# Additional verification
print("\n" + "="*60)
print("FOLDER STRUCTURE")
print("="*60)
for category in ['buka', 'tutup']:
    category_path = os.path.join(output_path, category)
    if os.path.exists(category_path):
        files = sorted([f for f in os.listdir(category_path) if f.endswith('.wav')])
        print(f"\n{category_path}")
        print(f"  Total: {len(files)} files")
        if files:
            print(f"  First: {files[0]}")
            print(f"  Last: {files[-1]}")


AUGMENTATION RESULTS

BUKA: 200 files
  Verified: 200 files in folder
  Sample filenames:
    ✓ buka1.wav
    ✓ buka2.wav
    ✓ buka3.wav
    ✓ buka50.wav
    ✓ buka99.wav
    ✓ buka100.wav

TUTUP: 200 files
  Verified: 200 files in folder
  Sample filenames:
    ✓ tutup1.wav
    ✓ tutup2.wav
    ✓ tutup3.wav
    ✓ tutup50.wav
    ✓ tutup99.wav
    ✓ tutup100.wav

TOTAL FILES CREATED: 400

FOLDER STRUCTURE

dataset_augmented\buka
  Total: 200 files
  First: buka1.wav
  Last: buka99.wav

dataset_augmented\tutup
  Total: 200 files
  First: tutup1.wav
  Last: tutup99.wav


## 8. Visualisasi Sample Audio

In [13]:
import matplotlib.pyplot as plt
import librosa.display

def visualize_augmented_samples(category='buka', sample_indices=[1, 25, 50, 75, 100]):
   
    output_dir = os.path.join(output_path, category)
    
    fig, axes = plt.subplots(len(sample_indices), 2, figsize=(15, 3*len(sample_indices)))
    
    for idx, file_num in enumerate(sample_indices):
        file_name = f"{category}{file_num}.wav"
        file_path = os.path.join(output_dir, file_name)
        
        if os.path.exists(file_path):
            audio, sr = load_audio(file_path)
            
            # Waveform
            axes[idx, 0].plot(audio)
            axes[idx, 0].set_title(f"Waveform: {file_name}")
            axes[idx, 0].set_xlabel("Sample")
            axes[idx, 0].set_ylabel("Amplitude")
            axes[idx, 0].grid(True, alpha=0.3)
            
            # Spectrogram
            D = librosa.amplitude_to_db(np.abs(librosa.stft(audio)), ref=np.max)
            img = librosa.display.specshow(D, sr=sr, x_axis='time', y_axis='hz', ax=axes[idx, 1])
            axes[idx, 1].set_title(f"Spectrogram: {file_name}")
            fig.colorbar(img, ax=axes[idx, 1], format="%+2.0f dB")
        else:
            axes[idx, 0].text(0.5, 0.5, f"File not found: {file_name}", 
                            ha='center', va='center', transform=axes[idx, 0].transAxes)
            axes[idx, 1].text(0.5, 0.5, f"File not found: {file_name}", 
                            ha='center', va='center', transform=axes[idx, 1].transAxes)
    
    plt.tight_layout()
    plt.savefig(f'augmentation_samples_{category}.png', dpi=150, bbox_inches='tight')
    plt.show()
    
    print(f"Visualization saved as 'augmentation_samples_{category}.png'")

## 9. Fungsi Utilitas untuk Load Dataset

In [14]:
def load_augmented_dataset(augmented_path):
    
    dataset = []
    
    for category in ['buka', 'tutup']:
        category_path = os.path.join(augmented_path, category)
        audio_files = [f for f in os.listdir(category_path) if f.endswith('.wav')]
        
        print(f"Loading {category}: {len(audio_files)} files")
        
        for audio_file in tqdm(audio_files, desc=f"Loading {category}"):
            file_path = os.path.join(category_path, audio_file)
            audio, sr = load_audio(file_path)
            if audio is not None:
                dataset.append({
                    'audio': audio,
                    'sr': sr,
                    'label': category,
                    'filename': audio_file
                })
    
    print(f"\nTotal dataset size: {len(dataset)} samples")
    return dataset

In [15]:
# Load augmented dataset
print("Loading augmented dataset...")
augmented_dataset = load_augmented_dataset(output_path)

# Show distribution
from collections import Counter
label_counts = Counter([item['label'] for item in augmented_dataset])
print("\nDataset distribution:")
for label, count in label_counts.items():
    print(f"  {label}: {count} samples")

Loading augmented dataset...
Loading buka: 200 files


Loading buka: 100%|██████████| 200/200 [00:04<00:00, 48.76it/s]


Loading tutup: 200 files


Loading tutup: 100%|██████████| 200/200 [00:04<00:00, 48.35it/s]


Total dataset size: 400 samples

Dataset distribution:
  buka: 200 samples
  tutup: 200 samples





## Summary

Program ini telah berhasil:
1. ✅ Menggunakan **semua file** dari setiap kategori di folder `voice ori`
   - Buka: 9 files (buka1.mp3 - buka9.mp3)
   - Tutup: 6 files (tutup1.mp3 - tutup6.mp3)
2. ✅ Menghasilkan **tepat 100 file** untuk setiap kategori melalui augmentasi
3. ✅ Penamaan konsisten: **buka1, buka2, ..., buka100** dan **tutup1, tutup2, ..., tutup100**
4. ✅ Menggunakan 6 teknik augmentasi berbeda untuk variasi maksimal
5. ✅ Kombinasi random dari semua source files untuk hasil yang lebih beragam
6. ✅ Total dataset: **200 files** (200 buka + 200 tutup)

Dataset hasil augmentasi siap digunakan untuk training model voice recognition!

**Lokasi Output:** `voice_augmented/`
- `voice_augmented/buka/` : 200 files (buka1.wav - buka100.wav)
- `voice_augmented/tutup/` : 200 files (tutup1.wav - tutup100.wav)