# üß¨ Algoritma Genetika untuk Penentuan Kelompok KKM Reguler
## UIN Malang - Production Version

---

**Platform**: Kaggle Notebook / Local Environment  
**Dataset**: Upload `master_data.csv` ke Kaggle Dataset

**Tujuan**: Mengelompokkan mahasiswa ke dalam kelompok-kelompok KKM Reguler yang optimal dengan mempertimbangkan:
- ‚úÖ Keberadaan anggota HTQ
- ‚úÖ Heterogenitas jurusan
- ‚úÖ Proporsi jenis kelamin
- ‚úÖ Jumlah anggota per kelompok

**Metode**: Genetic Algorithm dengan PMX Crossover dan Reciprocal Exchange Mutation

---

### üìã Langkah Setup di Kaggle:
1. Upload dataset `master_data.csv` ke Kaggle Dataset
2. Add dataset ke notebook ini
3. Set parameter GA di Cell 2
4. Run all cells
5. Download hasil dari Output section

---

### üéØ Mode: PRODUCTION
- Single run dengan parameter yang ditentukan
- Output: Summary (fitness + runtime) dan CSV hasil pembagian kelompok

## 1. Import Libraries & Setup Environment

In [1]:
import pandas as pd
import numpy as np
import random
from datetime import datetime
import os
import glob
import time

print("‚úÖ Libraries imported successfully!")
print(f"   Pandas: {pd.__version__}")
print(f"   Numpy: {np.__version__}")

# Detect environment (Kaggle vs Local)
if os.path.exists('/kaggle/input'):
    print("üåê Running on KAGGLE environment")
    KAGGLE_MODE = True
    INPUT_DIR = '/kaggle/input'
    OUTPUT_DIR = '/kaggle/working'
else:
    print("üíª Running on LOCAL environment")
    KAGGLE_MODE = False
    INPUT_DIR = '../data'
    OUTPUT_DIR = '../pengujian/output'

print(f"   Input directory: {INPUT_DIR}")
print(f"   Output directory: {OUTPUT_DIR}")

# Create output directory
os.makedirs(OUTPUT_DIR, exist_ok=True)

print(f"   üìÅ Output directory created")

‚úÖ Libraries imported successfully!
   Pandas: 2.3.3
   Numpy: 2.3.4
üíª Running on LOCAL environment
   Input directory: ../data
   Output directory: ../pengujian/output
   üìÅ Output directory created


## 2. ‚öôÔ∏è SET PARAMETER ALGORITMA GENETIKA

In [2]:
# ========================================
# üîß PARAMETER ALGORITMA GENETIKA
# ========================================
# Ubah nilai parameter di bawah ini sesuai kebutuhan

# 1. Jumlah Kelompok KKM
JUMLAH_KELOMPOK = 190

# 2. Population Size (ukuran populasi)
# Rekomendasi: 50-100 untuk hasil optimal
POPSIZE = 2

# 3. Maximum Generation (jumlah generasi maksimal)
# Rekomendasi: 300-500 untuk konvergensi yang baik
MAX_GENERATION = 2

# 4. Crossover Rate (tingkat persilangan, 0.0 - 1.0)
# Rekomendasi: 0.7-0.9 untuk eksplorasi yang baik
CROSSOVER_RATE = 0.8

# 5. Mutation Rate (tingkat mutasi, 0.0 - 1.0)
# Rekomendasi: 0.1-0.3 untuk diversity
MUTATION_RATE = 0.2

# 6. Target Fitness (kriteria penghentian, 0.0 - 1.0)
# GA akan berhenti jika fitness mencapai target ini
TARGET_FITNESS = 0.95  # 95% dari fitness maksimal

# 7. Random Seed (untuk reprodusibilitas, None = random)
# None = Benar-benar acak (hasil berbeda setiap run)
RANDOM_SEED = None

# ========================================

print("="*80)
print("‚öôÔ∏è  PARAMETER ALGORITMA GENETIKA")
print("="*80)
print(f"  Jumlah Kelompok       : {JUMLAH_KELOMPOK}")
print(f"  Population Size       : {POPSIZE}")
print(f"  Max Generation        : {MAX_GENERATION}")
print(f"  Crossover Rate        : {CROSSOVER_RATE}")
print(f"  Mutation Rate         : {MUTATION_RATE}")
print(f"  Target Fitness        : {TARGET_FITNESS * 100}%")
print(f"  Random Seed           : {RANDOM_SEED if RANDOM_SEED else 'Random (time-based)'}")
print("="*80)

‚öôÔ∏è  PARAMETER ALGORITMA GENETIKA
  Jumlah Kelompok       : 190
  Population Size       : 2
  Max Generation        : 2
  Crossover Rate        : 0.8
  Mutation Rate         : 0.2
  Target Fitness        : 95.0%
  Random Seed           : Random (time-based)


## 3. Load and Validate Data

In [3]:
# Auto-detect CSV file in Kaggle or local
if KAGGLE_MODE:
    # Find CSV file in Kaggle input
    csv_files = glob.glob(f'{INPUT_DIR}/**/*.csv', recursive=True)
    if csv_files:
        csv_path = csv_files[0]
        print(f"üìÅ Found dataset: {csv_path}")
    else:
        raise FileNotFoundError("No CSV file found in Kaggle input. Please add dataset!")
else:
    csv_path = f'{INPUT_DIR}/master_data.csv'

# Load data
df = pd.read_csv(csv_path)

# Validate required columns
required_cols = ['ID', 'Jenis Kelamin', 'Jurusan', 'HTQ']
assert all(col in df.columns for col in required_cols), f"Missing columns! Required: {required_cols}"

# Check missing values
missing_count = df[required_cols].isnull().sum().sum()
assert missing_count == 0, f"Found {missing_count} missing values!"

# Check duplicate IDs
dup_count = df['ID'].duplicated().sum()
assert dup_count == 0, f"Found {dup_count} duplicate IDs!"

print("="*80)
print("‚úÖ DATA VALIDATION PASSED")
print("="*80)
print(f"Total Mahasiswa: {len(df)}")
print(f"Jumlah Jurusan: {df['Jurusan'].nunique()}")
print(f"\nDistribusi Jenis Kelamin:")
print(df['Jenis Kelamin'].value_counts())
print(f"\nDistribusi HTQ:")
print(df['HTQ'].value_counts())
print(f"\nTop 5 Jurusan:")
print(df['Jurusan'].value_counts().head())
print("\nSample Data:")
df.head(10)

‚úÖ DATA VALIDATION PASSED
Total Mahasiswa: 2338
Jumlah Jurusan: 24

Distribusi Jenis Kelamin:
Jenis Kelamin
PR    1391
LK     947
Name: count, dtype: int64

Distribusi HTQ:
HTQ
Tidak    2112
Ya        226
Name: count, dtype: int64

Top 5 Jurusan:
Jurusan
MANAJEMEN                    248
PSIKOLOGI                    218
BAHASA DAN SASTRA ARAB       184
BAHASA DAN SASTRA INGGRIS    183
HUKUM BISNIS SYARI'AH        177
Name: count, dtype: int64

Sample Data:


Unnamed: 0,ID,Jenis Kelamin,Jurusan,HTQ
0,1,PR,BAHASA DAN SASTRA INGGRIS,Tidak
1,2,PR,BIOLOGI,Tidak
2,3,PR,TEKNIK INFORMATIKA,Tidak
3,4,LK,BAHASA DAN SASTRA ARAB,Tidak
4,5,LK,ILMU AL-QUR`AN DAN TAFSIR,Tidak
5,6,PR,AL-AHWAL AL-SYAKHSHIYYAH,Tidak
6,7,PR,PSIKOLOGI,Tidak
7,8,PR,PERBANKAN SYARI`AH,Tidak
8,9,LK,HUKUM BISNIS SYARI'AH,Tidak
9,10,LK,MANAJEMEN,Tidak


## 4. Data Preprocessing

In [4]:
def preprocess_data(df, jumlah_kelompok):
    """Preprocess data dan hitung semua statistik yang diperlukan"""
    df_clean = df.copy()
    
    # Normalize HTQ to binary
    df_clean['HTQ'] = df_clean['HTQ'].apply(lambda x: 1 if str(x).lower() in ['ya', 'lulus', '1'] else 0)
    
    # Calculate aggregate statistics
    N = len(df_clean)
    L = (df_clean['Jenis Kelamin'] == 'LK').sum()
    P = (df_clean['Jenis Kelamin'] == 'PR').sum()
    K = jumlah_kelompok
    
    # Calculate expected proportions
    PL = L / N
    PP = P / N
    
    # Calculate expected sizes per group
    A = N // K
    sisa = N % K
    
    expected_sizes = [A + 1 if i < sisa else A for i in range(K)]
    
    # Max fitness
    max_fitness = K * 4
    
    return {
        'df_clean': df_clean, 'N': N, 'L': L, 'P': P, 'K': K,
        'PL': PL, 'PP': PP, 'A': A, 'sisa': sisa,
        'expected_sizes': expected_sizes, 'max_fitness': max_fitness
    }

# Preprocess with parameter from Cell 2
preprocessed = preprocess_data(df, JUMLAH_KELOMPOK)
df_clean = preprocessed['df_clean']

print("="*80)
print("‚úÖ PREPROCESSING COMPLETE")
print("="*80)
print(f"Total Mahasiswa (N): {preprocessed['N']}")
print(f"Laki-laki (L): {preprocessed['L']} ({preprocessed['PL']:.2%})")
print(f"Perempuan (P): {preprocessed['P']} ({preprocessed['PP']:.2%})")
print(f"Jumlah Kelompok (K): {preprocessed['K']}")
print(f"Base size: {preprocessed['A']}, Sisa: {preprocessed['sisa']}")
print(f"Expected sizes: {preprocessed['expected_sizes'][:5]}... (first 5)")
print(f"Max Fitness: {preprocessed['max_fitness']}")

print("‚úÖ Constraint functions defined (C1, C2, C3, C4)")

‚úÖ PREPROCESSING COMPLETE
Total Mahasiswa (N): 2338
Laki-laki (L): 947 (40.50%)
Perempuan (P): 1391 (59.50%)
Jumlah Kelompok (K): 190
Base size: 12, Sisa: 58
Expected sizes: [13, 13, 13, 13, 13]... (first 5)
Max Fitness: 760
‚úÖ Constraint functions defined (C1, C2, C3, C4)


## 5. Constraint Evaluation Functions

In [5]:
def evaluate_C1(group_df):
    """C1: Minimal ada 1 anggota HTQ di kelompok"""
    htq_count = group_df['HTQ'].sum()
    return 1 if htq_count >= 1 else 0

def evaluate_C2(group_df):
    """C2: Jumlah jurusan berbeda > 50% dari ukuran kelompok"""
    unique_majors = group_df['Jurusan'].nunique()
    threshold = len(group_df) * 0.5
    return 1 if unique_majors > threshold else 0

def evaluate_C3(group_df, PL, PP):
    """C3: Proporsi gender menyimpang ¬±10% dari proporsi ideal"""
    n_group = len(group_df)
    lk_count = (group_df['Jenis Kelamin'] == 'LK').sum()
    pr_count = (group_df['Jenis Kelamin'] == 'PR').sum()
    
    lk_prop = lk_count / n_group
    pr_prop = pr_count / n_group
    
    lk_dev = abs(lk_prop - PL)
    pr_dev = abs(pr_prop - PP)
    
    return 1 if (lk_dev <= 0.1 and pr_dev <= 0.1) else 0

def evaluate_C4(group_df, expected_size):
    """C4: Ukuran kelompok sesuai expected size"""
    return 1 if len(group_df) == expected_size else 0

print("‚úÖ Constraint functions defined (C1, C2, C3, C4)")

‚úÖ Constraint functions defined (C1, C2, C3, C4)


## 6. Fitness Calculation

In [6]:
def decode_kromosom(kromosom, df_clean, expected_sizes):
    """Decode permutation kromosom into groups"""
    groups = []
    start_idx = 0
    
    for i, size in enumerate(expected_sizes):
        end_idx = start_idx + size
        group_ids = kromosom[start_idx:end_idx]
        group_df = df_clean[df_clean['ID'].isin(group_ids)].copy()
        groups.append(group_df)
        start_idx = end_idx
    
    return groups

def calculate_fitness(kromosom, df_clean, expected_sizes, PL, PP):
    """Calculate total fitness of a kromosom"""
    groups = decode_kromosom(kromosom, df_clean, expected_sizes)
    total_fitness = 0
    
    for i, group_df in enumerate(groups):
        c1 = evaluate_C1(group_df)
        c2 = evaluate_C2(group_df)
        c3 = evaluate_C3(group_df, PL, PP)
        c4 = evaluate_C4(group_df, expected_sizes[i])
        
        total_fitness += (c1 + c2 + c3 + c4)
    
    return total_fitness

print("‚úÖ Fitness calculation functions defined")

‚úÖ Fitness calculation functions defined


## 7. Population Initialization

In [7]:
def initialize_population(df_clean, popsize):
    """Initialize population with random permutations"""
    student_ids = df_clean['ID'].values
    population = []
    
    for _ in range(popsize):
        kromosom = np.random.permutation(student_ids)
        population.append(kromosom)
    
    return population

print("‚úÖ Population initialization function defined")

‚úÖ Population initialization function defined


## 8. Parent Selection

In [8]:
def select_parents_for_crossover(population, cr):
    """Select parent pairs for crossover based on CR"""
    num_crossover = int(len(population) * cr)
    if num_crossover % 2 != 0:
        num_crossover += 1
    
    # Need at least 2 individuals for crossover
    if num_crossover < 2 or len(population) < 2:
        return []
    
    # Can't select more than population size
    num_crossover = min(num_crossover, len(population))
    
    indices = np.random.choice(len(population), num_crossover, replace=False)
    parent_pairs = [(population[indices[i]], population[indices[i+1]]) 
                    for i in range(0, num_crossover, 2)]
    return parent_pairs

def select_parents_for_mutation(population, mr):
    """Select parents for mutation based on MR"""
    num_mutation = int(len(population) * mr)
    
    # Handle edge cases
    if num_mutation == 0 or len(population) == 0:
        return []
    
    num_mutation = min(num_mutation, len(population))
    indices = np.random.choice(len(population), num_mutation, replace=False)
    return [population[i] for i in indices]

print("‚úÖ Parent selection functions defined")

‚úÖ Parent selection functions defined


## 9. PMX Crossover

In [9]:
def pmx_crossover(parent1, parent2):
    """
    Partially Mapped Crossover (PMX) - Fixed version
    Prevents infinite loops by following the mapping chain properly
    """
    size = len(parent1)
    
    # Choose two random cut points
    cx_point1 = np.random.randint(0, size)
    cx_point2 = np.random.randint(0, size)
    if cx_point1 > cx_point2:
        cx_point1, cx_point2 = cx_point2, cx_point1
    
    # Ensure we have at least some segment to swap
    if cx_point1 == cx_point2:
        cx_point2 = min(cx_point1 + 1, size)
    
    # Initialize offspring as copies
    child1 = parent1.copy()
    child2 = parent2.copy()
    
    # Swap middle segments
    child1[cx_point1:cx_point2] = parent2[cx_point1:cx_point2]
    child2[cx_point1:cx_point2] = parent1[cx_point1:cx_point2]
    
    # Fix conflicts using proper PMX algorithm
    def fix_conflicts_pmx(child, p1, p2, start, end):
        """
        Fix conflicts by following the mapping relationship.
        For each position outside the crossover segment,
        if there's a conflict, follow the mapping chain until finding a valid value.
        """
        # Create a set of values in the middle segment for fast lookup
        middle_values = set(child[start:end])
        
        for i in range(size):
            # Only fix positions outside the crossover segment
            if i < start or i >= end:
                # If current value is already in the middle segment (conflict)
                if child[i] in middle_values:
                    # Follow the mapping chain to find a valid replacement
                    value = child[i]
                    visited = set()  # Prevent infinite loops in case of cycles
                    
                    # Keep following the mapping until we find a value not in middle segment
                    while value in middle_values and value not in visited:
                        visited.add(value)
                        
                        # Find where this value appears in p2's middle segment
                        try:
                            idx_in_p2 = np.where(p2[start:end] == value)[0][0] + start
                            # Get the corresponding value from p1
                            value = p1[idx_in_p2]
                        except (IndexError, TypeError):
                            # If not found, break to avoid error
                            break
                    
                    # If we found a valid value (not in middle), use it
                    if value not in middle_values:
                        child[i] = value
                    # else: keep original value (shouldn't happen in valid permutation)
    
    fix_conflicts_pmx(child1, parent1, parent2, cx_point1, cx_point2)
    fix_conflicts_pmx(child2, parent2, parent1, cx_point1, cx_point2)
    
    return child1, child2

print("‚úÖ PMX Crossover function defined (fixed infinite loop bug)")

‚úÖ PMX Crossover function defined (fixed infinite loop bug)


## 10. Reciprocal Exchange Mutation

In [10]:
def reciprocal_exchange_mutation(parent):
    """Swap two random genes"""
    child = parent.copy()
    idx1, idx2 = np.random.choice(len(child), 2, replace=False)
    child[idx1], child[idx2] = child[idx2], child[idx1]
    return child

print("‚úÖ Reciprocal Exchange Mutation function defined")

‚úÖ Reciprocal Exchange Mutation function defined


## 11. Elitism Replacement Strategy

In [11]:
def elitism_replacement_optimized(population, population_fitness, offspring, 
                                   df_clean, expected_sizes, PL, PP, popsize):
    """
    Optimized elitism with fitness caching.
    Only calculates fitness for NEW offspring, reuses existing population fitness.
    This dramatically speeds up the algorithm (6√ó faster per generation).
    """
    # Handle empty offspring case (when CR=0 and MR=0)
    if len(offspring) == 0:
        # No offspring, just return the current population sorted by fitness
        sorted_indices = sorted(range(len(population)), 
                              key=lambda i: population_fitness[i], 
                              reverse=True)
        new_population = [population[i] for i in sorted_indices[:popsize]]
        new_fitness = [population_fitness[i] for i in sorted_indices[:popsize]]
        return new_population, new_fitness
    
    # Calculate fitness ONLY for new offspring
    offspring_fitness = [calculate_fitness(ind, df_clean, expected_sizes, PL, PP) 
                        for ind in offspring]
    
    # Combine populations and fitness scores
    combined = population + offspring
    combined_fitness = population_fitness + offspring_fitness
    
    # Sort by fitness (descending)
    sorted_indices = sorted(range(len(combined)), 
                          key=lambda i: combined_fitness[i], 
                          reverse=True)
    
    # Select top PopSize individuals
    new_population = [combined[i] for i in sorted_indices[:popsize]]
    new_fitness = [combined_fitness[i] for i in sorted_indices[:popsize]]
    
    return new_population, new_fitness

print("‚úÖ Elitism Replacement function defined (optimized with safety check)")

‚úÖ Elitism Replacement function defined (optimized with safety check)


## 12. üöÄ RUN ALGORITMA GENETIKA

In [12]:
# Set random seed
if RANDOM_SEED is not None:
    np.random.seed(RANDOM_SEED)
    random.seed(RANDOM_SEED)
    print(f"üé≤ Random seed set to: {RANDOM_SEED}")
else:
    seed = int(time.time() * 1000000) % (2**31)
    np.random.seed(seed)
    random.seed(seed)
    print(f"üé≤ Random seed (time-based): {seed}")

# Extract preprocessed data
N = preprocessed['N']
K = preprocessed['K']
PL = preprocessed['PL']
PP = preprocessed['PP']
expected_sizes = preprocessed['expected_sizes']
max_fitness = preprocessed['max_fitness']

print("\n" + "="*80)
print("üöÄ STARTING GENETIC ALGORITHM")
print("="*80)
print(f"Population Size: {POPSIZE}")
print(f"Max Generation: {MAX_GENERATION}")
print(f"Crossover Rate: {CROSSOVER_RATE}")
print(f"Mutation Rate: {MUTATION_RATE}")
print(f"Target Fitness: {TARGET_FITNESS * 100}% ({int(TARGET_FITNESS * max_fitness)}/{max_fitness})")
print("="*80)

# Initialize
start_time = time.time()
population = initialize_population(df_clean, POPSIZE)

# Calculate initial fitness
print("\n‚è≥ Calculating initial fitness...")
population_fitness = []
for kromosom in population:
    fitness = calculate_fitness(kromosom, df_clean, expected_sizes, PL, PP)
    population_fitness.append(fitness)

# Track best solution
best_fitness_history = []
avg_fitness_history = []
best_overall_fitness = max(population_fitness)
best_overall_solution = population[population_fitness.index(best_overall_fitness)].copy()

print(f"‚úÖ Initial best fitness: {best_overall_fitness}/{max_fitness} ({best_overall_fitness/max_fitness:.2%})")
print(f"\n{'='*80}")
print("üîÑ EVOLUTION PROGRESS (DETAILED DEBUG MODE)")
print("="*80)

# Main GA Loop
generation = 0
for generation in range(1, MAX_GENERATION + 1):
    gen_start_time = time.time()
    
    # ========== CROSSOVER ==========
    crossover_start = time.time()
    parent_pairs = select_parents_for_crossover(population, CROSSOVER_RATE)
    offspring_cx = []
    for p1, p2 in parent_pairs:
        c1, c2 = pmx_crossover(p1, p2)
        offspring_cx.extend([c1, c2])
    crossover_time = time.time() - crossover_start
    
    # ========== MUTATION ==========
    mutation_start = time.time()
    parents_mut = select_parents_for_mutation(population, MUTATION_RATE)
    offspring_mut = [reciprocal_exchange_mutation(p) for p in parents_mut]
    mutation_time = time.time() - mutation_start
    
    # ========== COMBINE OFFSPRING ==========
    offspring = offspring_cx + offspring_mut
    
    # ========== REPLACEMENT ==========
    replacement_start = time.time()
    population, population_fitness = elitism_replacement_optimized(
        population, population_fitness, offspring, 
        df_clean, expected_sizes, PL, PP, POPSIZE
    )
    replacement_time = time.time() - replacement_start
    
    # ========== TRACK STATISTICS ==========
    best_fitness = population_fitness[0]
    avg_fitness = np.mean(population_fitness)
    min_fitness = min(population_fitness)
    std_fitness = np.std(population_fitness)
    best_fitness_history.append(best_fitness)
    avg_fitness_history.append(avg_fitness)
    
    # Total generation time
    gen_time = time.time() - gen_start_time
    
    # ========== DEBUG LOG ==========
    print(f"\nüìç Generation {generation}/{MAX_GENERATION}")
    print(f"   ‚îú‚îÄ Crossover: {len(parent_pairs)} pairs ‚Üí {len(offspring_cx)} offspring ({crossover_time*1000:.2f}ms)")
    print(f"   ‚îú‚îÄ Mutation:  {len(parents_mut)} parents ‚Üí {len(offspring_mut)} offspring ({mutation_time*1000:.2f}ms)")
    print(f"   ‚îú‚îÄ Total Offspring: {len(offspring)}")
    print(f"   ‚îú‚îÄ Replacement: {len(population)+len(offspring)} ‚Üí {len(population)} individuals ({replacement_time*1000:.2f}ms)")
    print(f"   ‚îú‚îÄ Fitness Stats:")
    print(f"   ‚îÇ  ‚îú‚îÄ Best:  {best_fitness}/{max_fitness} ({best_fitness/max_fitness:.2%})")
    print(f"   ‚îÇ  ‚îú‚îÄ Avg:   {avg_fitness:.2f} ({avg_fitness/max_fitness:.2%})")
    print(f"   ‚îÇ  ‚îú‚îÄ Min:   {min_fitness}/{max_fitness} ({min_fitness/max_fitness:.2%})")
    print(f"   ‚îÇ  ‚îî‚îÄ Std:   {std_fitness:.2f}")
    print(f"   ‚îî‚îÄ Generation Time: {gen_time*1000:.2f}ms")
    
    # ========== UPDATE BEST OVERALL ==========
    if best_fitness > best_overall_fitness:
        improvement = best_fitness - best_overall_fitness
        best_overall_fitness = best_fitness
        best_overall_solution = population[0].copy()
        print(f"   üéâ NEW BEST FOUND! Improvement: +{improvement} fitness points")
    
    # ========== CHECK TERMINATION ==========
    if best_fitness >= TARGET_FITNESS * max_fitness:
        print(f"\nüéØ Target fitness {TARGET_FITNESS*100}% reached at generation {generation}!")
        break

# Final results
total_time = time.time() - start_time

print(f"\n{'='*80}")
print("‚úÖ GENETIC ALGORITHM COMPLETED")
print("="*80)
print(f"Final Generation: {generation}/{MAX_GENERATION}")
print(f"Total Runtime: {total_time:.2f} seconds ({total_time/60:.2f} minutes)")
print(f"Average Time per Generation: {total_time/generation:.3f} seconds")
print(f"Best Fitness: {best_overall_fitness}/{max_fitness} ({best_overall_fitness/max_fitness:.2%})")
print(f"Fitness Improvement: {best_overall_fitness - best_fitness_history[0] if best_fitness_history else 0:.0f}")
print("="*80)

üé≤ Random seed (time-based): 1983673679

üöÄ STARTING GENETIC ALGORITHM
Population Size: 2
Max Generation: 2
Crossover Rate: 0.8
Mutation Rate: 0.2
Target Fitness: 95.0% (722/760)

‚è≥ Calculating initial fitness...


‚úÖ Initial best fitness: 620/760 (81.58%)

üîÑ EVOLUTION PROGRESS (DETAILED DEBUG MODE)

üìç Generation 1/2
   ‚îú‚îÄ Crossover: 1 pairs ‚Üí 2 offspring (6.29ms)
   ‚îú‚îÄ Mutation:  0 parents ‚Üí 0 offspring (0.00ms)
   ‚îú‚îÄ Total Offspring: 2
   ‚îú‚îÄ Replacement: 4 ‚Üí 2 individuals (201.74ms)
   ‚îú‚îÄ Fitness Stats:
   ‚îÇ  ‚îú‚îÄ Best:  635/760 (83.55%)
   ‚îÇ  ‚îú‚îÄ Avg:   627.50 (82.57%)
   ‚îÇ  ‚îú‚îÄ Min:   620/760 (81.58%)
   ‚îÇ  ‚îî‚îÄ Std:   7.50
   ‚îî‚îÄ Generation Time: 208.16ms
   üéâ NEW BEST FOUND! Improvement: +15 fitness points

üìç Generation 2/2
   ‚îú‚îÄ Crossover: 1 pairs ‚Üí 2 offspring (4.16ms)
   ‚îú‚îÄ Mutation:  0 parents ‚Üí 0 offspring (0.00ms)
   ‚îú‚îÄ Total Offspring: 2
   ‚îú‚îÄ Replacement: 4 ‚Üí 2 individuals (176.44ms)
   ‚îú‚îÄ Fitness Stats:
   ‚îÇ  ‚îú‚îÄ Best:  635/760 (83.55%)
   ‚îÇ  ‚îú‚îÄ Avg:   635.00 (83.55%)
   ‚îÇ  ‚îú‚îÄ Min:   635/760 (83.55%)
   ‚îÇ  ‚îî‚îÄ Std:   0.00
   ‚îî‚îÄ Generation Time: 180.68ms

‚úÖ GENETIC ALGOR

## 13. üìä SUMMARY & CONSTRAINT ANALYSIS

In [13]:
# Decode best solution into groups
best_groups = decode_kromosom(best_overall_solution, df_clean, expected_sizes)

# Calculate constraint satisfaction
constraint_stats = {
    'C1_satisfied': 0, 'C2_satisfied': 0, 
    'C3_satisfied': 0, 'C4_satisfied': 0, 
    'perfect_groups': 0
}

for i, group_df in enumerate(best_groups):
    c1 = evaluate_C1(group_df)
    c2 = evaluate_C2(group_df)
    c3 = evaluate_C3(group_df, PL, PP)
    c4 = evaluate_C4(group_df, expected_sizes[i])
    
    constraint_stats['C1_satisfied'] += c1
    constraint_stats['C2_satisfied'] += c2
    constraint_stats['C3_satisfied'] += c3
    constraint_stats['C4_satisfied'] += c4
    
    if c1 + c2 + c3 + c4 == 4:
        constraint_stats['perfect_groups'] += 1

# Display summary
print("\n" + "="*80)
print("üìä SUMMARY HASIL ALGORITMA GENETIKA")
print("="*80)
print(f"\nüéØ FITNESS INFORMATION:")
print(f"  Final Fitness       : {best_overall_fitness}/{max_fitness} ({best_overall_fitness/max_fitness:.2%})")
print(f"  Initial Fitness     : {best_fitness_history[0] if best_fitness_history else 0}")
print(f"  Fitness Improvement : {best_overall_fitness - (best_fitness_history[0] if best_fitness_history else 0):.0f}")
print(f"  Target Reached      : {'Yes ‚úÖ' if best_overall_fitness >= TARGET_FITNESS * max_fitness else 'No ‚ùå'}")

print(f"\n‚è±Ô∏è  RUNTIME INFORMATION:")
print(f"  Total Runtime       : {total_time:.2f} seconds ({total_time/60:.2f} minutes)")
print(f"  Final Generation    : {generation}/{MAX_GENERATION}")
print(f"  Avg Time per Gen    : {total_time/generation if generation > 0 else 0:.3f} seconds")

print(f"\n‚úÖ CONSTRAINT SATISFACTION:")
print(f"  C1 (HTQ Present)    : {constraint_stats['C1_satisfied']}/{K} ({constraint_stats['C1_satisfied']/K:.2%})")
print(f"  C2 (Major Diversity): {constraint_stats['C2_satisfied']}/{K} ({constraint_stats['C2_satisfied']/K:.2%})")
print(f"  C3 (Gender Balance) : {constraint_stats['C3_satisfied']}/{K} ({constraint_stats['C3_satisfied']/K:.2%})")
print(f"  C4 (Group Size)     : {constraint_stats['C4_satisfied']}/{K} ({constraint_stats['C4_satisfied']/K:.2%})")
print(f"  Perfect Groups      : {constraint_stats['perfect_groups']}/{K} ({constraint_stats['perfect_groups']/K:.2%})")

print(f"\n‚öôÔ∏è  PARAMETER USED:")
print(f"  Jumlah Kelompok     : {JUMLAH_KELOMPOK}")
print(f"  Population Size     : {POPSIZE}")
print(f"  Max Generation      : {MAX_GENERATION}")
print(f"  Crossover Rate      : {CROSSOVER_RATE}")
print(f"  Mutation Rate       : {MUTATION_RATE}")
print(f"  Target Fitness      : {TARGET_FITNESS * 100}%")
print(f"  Random Seed         : {RANDOM_SEED if RANDOM_SEED else 'Time-based'}")

print("="*80)


üìä SUMMARY HASIL ALGORITMA GENETIKA

üéØ FITNESS INFORMATION:
  Final Fitness       : 635/760 (83.55%)
  Initial Fitness     : 635
  Fitness Improvement : 0
  Target Reached      : No ‚ùå

‚è±Ô∏è  RUNTIME INFORMATION:
  Total Runtime       : 0.57 seconds (0.01 minutes)
  Final Generation    : 2/2
  Avg Time per Gen    : 0.283 seconds

‚úÖ CONSTRAINT SATISFACTION:
  C1 (HTQ Present)    : 140/190 (73.68%)
  C2 (Major Diversity): 184/190 (96.84%)
  C3 (Gender Balance) : 121/190 (63.68%)
  C4 (Group Size)     : 190/190 (100.00%)
  Perfect Groups      : 91/190 (47.89%)

‚öôÔ∏è  PARAMETER USED:
  Jumlah Kelompok     : 190
  Population Size     : 2
  Max Generation      : 2
  Crossover Rate      : 0.8
  Mutation Rate       : 0.2
  Target Fitness      : 95.0%
  Random Seed         : Time-based


## 14. üíæ EXPORT HASIL PEMBAGIAN KELOMPOK

In [14]:
# Create timestamp for filename
timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')

# Prepare data for export
export_data = []

for group_idx, group_df in enumerate(best_groups, start=1):
    # Add each student to export data
    for _, student in group_df.iterrows():
        export_data.append({
            'ID': student['ID'],
            'Nama': student.get('Nama', ''),
            'Jenis_Kelamin': student['Jenis Kelamin'],
            'Jurusan': student['Jurusan'],
            'HTQ': 'Ya' if student['HTQ'] == 1 else 'Tidak',
            'Kelompok': group_idx
        })

# Create DataFrame
df_result = pd.DataFrame(export_data)

# Export to CSV
output_filename = f'{OUTPUT_DIR}/hasil_kelompok_kkm_{timestamp}.csv'
df_result.to_csv(output_filename, index=False)

print("\n" + "="*80)
print("EXPORT HASIL PEMBAGIAN KELOMPOK")
print("="*80)
print(f"File created: hasil_kelompok_kkm_{timestamp}.csv")
print(f"Location: {OUTPUT_DIR}/")
print(f"Total rows: {len(df_result)}")
print(f"Total students: {len(df_result)}")
print(f"Total groups: {K}")
print(f"\nColumns: ID, Nama, Jenis_Kelamin, Jurusan, HTQ, Kelompok")

print("\nSample data (first 10 rows):")
print(df_result.head(10).to_string(index=False))

print("\nGroups distribution:")
group_counts = df_result['Kelompok'].value_counts().sort_index()
print(f"  Total groups: {len(group_counts)}")
print(f"  Min size: {group_counts.min()}")
print(f"  Max size: {group_counts.max()}")
print(f"  Avg size: {group_counts.mean():.1f}")

print("\nFirst 10 groups size:")
for idx in range(1, min(11, len(group_counts) + 1)):
    count = group_counts.get(idx, 0)
    print(f"  Kelompok {idx:3d}: {count} students")

print("="*80)

# Export Summary to CSV
summary_data = {
    'Metric': [
        'Final Fitness',
        'Max Possible Fitness',
        'Fitness Percentage',
        'Initial Fitness',
        'Fitness Improvement',
        'Target Fitness',
        'Target Reached',
        'Total Runtime (seconds)',
        'Total Runtime (minutes)',
        'Final Generation',
        'Max Generation',
        'Avg Time per Generation (seconds)',
        'C1 HTQ Present (count)',
        'C1 HTQ Present (percentage)',
        'C2 Major Diversity (count)',
        'C2 Major Diversity (percentage)',
        'C3 Gender Balance (count)',
        'C3 Gender Balance (percentage)',
        'C4 Group Size (count)',
        'C4 Group Size (percentage)',
        'Perfect Groups (count)',
        'Perfect Groups (percentage)',
        'Jumlah Kelompok',
        'Population Size',
        'Crossover Rate',
        'Mutation Rate',
        'Random Seed'
    ],
    'Value': [
        best_overall_fitness,
        max_fitness,
        f"{best_overall_fitness/max_fitness:.2%}",
        best_fitness_history[0] if best_fitness_history else 0,
        best_overall_fitness - (best_fitness_history[0] if best_fitness_history else 0),
        f"{TARGET_FITNESS * 100}%",
        'Yes' if best_overall_fitness >= TARGET_FITNESS * max_fitness else 'No',
        f"{total_time:.2f}",
        f"{total_time/60:.2f}",
        generation,
        MAX_GENERATION,
        f"{total_time/generation if generation > 0 else 0:.3f}",
        constraint_stats['C1_satisfied'],
        f"{constraint_stats['C1_satisfied']/K:.2%}",
        constraint_stats['C2_satisfied'],
        f"{constraint_stats['C2_satisfied']/K:.2%}",
        constraint_stats['C3_satisfied'],
        f"{constraint_stats['C3_satisfied']/K:.2%}",
        constraint_stats['C4_satisfied'],
        f"{constraint_stats['C4_satisfied']/K:.2%}",
        constraint_stats['perfect_groups'],
        f"{constraint_stats['perfect_groups']/K:.2%}",
        JUMLAH_KELOMPOK,
        POPSIZE,
        CROSSOVER_RATE,
        MUTATION_RATE,
        RANDOM_SEED if RANDOM_SEED else 'Time-based'
    ]
}

df_summary = pd.DataFrame(summary_data)

# Export summary to CSV
summary_filename = f'{OUTPUT_DIR}/summary_hasil_kkm_{timestamp}.csv'
df_summary.to_csv(summary_filename, index=False)

print("\nEXPORT SUMMARY")
print("="*80)
print(f"File created: summary_hasil_kkm_{timestamp}.csv")
print(f"Location: {OUTPUT_DIR}/")
print(f"Total metrics: {len(df_summary)}")
print("\nSummary preview:")
print(df_summary.to_string(index=False))

print("\n" + "="*80)
print("EXPORT COMPLETED SUCCESSFULLY")
print("="*80)
print(f"\nAlgoritma Genetika selesai!")
print(f"Summary: Fitness {best_overall_fitness/max_fitness:.2%}, Runtime {total_time/60:.2f} menit")
print(f"Hasil Kelompok: {output_filename}")
print(f"Summary Report: {summary_filename}")
print("="*80)


EXPORT HASIL PEMBAGIAN KELOMPOK
File created: hasil_kelompok_kkm_20251112_223145.csv
Location: ../pengujian/output/
Total rows: 2338
Total students: 2338
Total groups: 190

Columns: ID, Nama, Jenis_Kelamin, Jurusan, HTQ, Kelompok

Sample data (first 10 rows):
  ID Nama Jenis_Kelamin                         Jurusan   HTQ  Kelompok
  93                 LK               TEKNIK ARSITEKTUR Tidak         1
 174                 LK                       PSIKOLOGI Tidak         1
 249                 PR               HUKUM TATA NEGARA Tidak         1
 590                 LK PERPUSTAKAAN DAN ILMU INFORMASI Tidak         1
 861                 PR               HUKUM TATA NEGARA Tidak         1
 974                 PR              PERBANKAN SYARI`AH Tidak         1
1068                 PR              TEKNIK INFORMATIKA Tidak         1
1361                 LK        AL-AHWAL AL-SYAKHSHIYYAH Tidak         1
1659                 PR           HUKUM BISNIS SYARI'AH Tidak         1
1998               