# 3.1.2 ALGORITHM COMPARISON ANALYSIS

## So s√°nh hi·ªáu su·∫•t c√°c thu·∫≠t to√°n

Notebook n√†y ph√¢n t√≠ch v√† so s√°nh hi·ªáu su·∫•t c·ªßa ba thu·∫≠t to√°n:
- **GBFS (Greedy Best-First Search)**: Thu·∫≠t to√°n tham lam thu·∫ßn t√∫y
- **BPSO (Binary Particle Swarm Optimization)**: Thu·∫≠t to√°n meta-heuristic
- **DP (Dynamic Programming)**: Thu·∫≠t to√°n quy ho·∫°ch ƒë·ªông (baseline t·ªëi ∆∞u)

**Ti√™u ch√≠ ƒë√°nh gi√°:**
- Ch·∫•t l∆∞·ª£ng gi·∫£i ph√°p (% so v·ªõi t·ªëi ∆∞u)
- Th·ªùi gian th·ª±c thi
- ƒê·ªô ·ªïn ƒë·ªãnh (standard deviation)
- Trade-off gi·ªØa ch·∫•t l∆∞·ª£ng v√† t·ªëc ƒë·ªô

In [None]:
# Import required libraries
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from matplotlib.patches import Rectangle

# Set visualization style
plt.style.use('seaborn-v0_8-darkgrid')
sns.set_palette("husl")
%matplotlib inline

print("‚úÖ Libraries imported successfully")

---
## PH·∫¶N 1: SO S√ÅNH TR√äN SINGLE TEST CASE

### 1.1. Load Data - Size Medium 50

In [None]:
# Load single test case comparison data
df_single = pd.read_csv('../../results/chapter3/3_1_2_comparison_Size_Medium_50.csv')

print("Algorithm Comparison - Size Medium 50")
print("="*80)
print(df_single.to_string(index=False))
print("\n")

# Calculate additional metrics
df_single['relative_time'] = df_single['time_mean'] / df_single['time_mean'].min()
df_single['efficiency_score'] = df_single['pct_optimal'] / (df_single['relative_time'] * 100)

print("\nüìä Key Metrics:")
print("-" * 80)
for _, row in df_single.iterrows():
    print(f"{row['algorithm']:6s} | Value: {row['value_mean']:>10.2f} ({row['pct_optimal']:>6.2f}% optimal) | "
          f"Time: {row['time_mean']:.6f}s ({row['relative_time']:.1f}x)")

### 1.2. Visualization - Detailed Comparison

In [None]:
# Create comprehensive comparison visualization
fig = plt.figure(figsize=(20, 5))

# Plot 1: Solution Quality Comparison
ax1 = plt.subplot(141)
bars1 = ax1.bar(df_single['algorithm'], df_single['value_mean'], 
                color=['#2ecc71', '#3498db', '#e74c3c'], alpha=0.7, edgecolor='black', linewidth=2)
ax1.errorbar(df_single['algorithm'], df_single['value_mean'], 
             yerr=df_single['value_std'], fmt='none', ecolor='black', capsize=5, capthick=2)
ax1.set_ylabel('Total Value', fontsize=12, fontweight='bold')
ax1.set_title('Solution Quality\n(Higher is Better)', fontsize=13, fontweight='bold')
ax1.grid(True, alpha=0.3, axis='y')

# Annotate values
for bar, val in zip(bars1, df_single['value_mean']):
    height = bar.get_height()
    ax1.text(bar.get_x() + bar.get_width()/2., height,
             f'{val:.0f}', ha='center', va='bottom', fontsize=10, fontweight='bold')

# Plot 2: Execution Time Comparison
ax2 = plt.subplot(142)
bars2 = ax2.bar(df_single['algorithm'], df_single['time_mean'] * 1000,  # Convert to ms
                color=['#2ecc71', '#3498db', '#e74c3c'], alpha=0.7, edgecolor='black', linewidth=2)
ax2.set_ylabel('Execution Time (ms)', fontsize=12, fontweight='bold')
ax2.set_title('Computational Cost\n(Lower is Better)', fontsize=13, fontweight='bold')
ax2.set_yscale('log')
ax2.grid(True, alpha=0.3, axis='y')

# Annotate times
for bar, val in zip(bars2, df_single['time_mean']):
    height = bar.get_height()
    ax2.text(bar.get_x() + bar.get_width()/2., height,
             f'{val*1000:.3f}ms', ha='center', va='bottom', fontsize=9, fontweight='bold')

# Plot 3: Percentage of Optimal
ax3 = plt.subplot(143)
bars3 = ax3.barh(df_single['algorithm'], df_single['pct_optimal'],
                 color=['#2ecc71', '#3498db', '#e74c3c'], alpha=0.7, edgecolor='black', linewidth=2)
ax3.axvline(x=100, color='red', linestyle='--', linewidth=2, label='Optimal (100%)')
ax3.set_xlabel('% of Optimal Solution', fontsize=12, fontweight='bold')
ax3.set_title('Solution Optimality\n(Closer to 100% is Better)', fontsize=13, fontweight='bold')
ax3.set_xlim([0, 105])
ax3.grid(True, alpha=0.3, axis='x')
ax3.legend()

# Annotate percentages
for bar, val in zip(bars3, df_single['pct_optimal']):
    width = bar.get_width()
    ax3.text(width + 1, bar.get_y() + bar.get_height()/2.,
             f'{val:.2f}%', ha='left', va='center', fontsize=10, fontweight='bold')

# Plot 4: Quality-Speed Trade-off Scatter
ax4 = plt.subplot(144)
colors_map = {'GBFS': '#2ecc71', 'BPSO': '#3498db', 'DP': '#e74c3c'}
for _, row in df_single.iterrows():
    ax4.scatter(row['time_mean'] * 1000, row['pct_optimal'], 
                s=300, alpha=0.7, edgecolors='black', linewidth=2,
                color=colors_map[row['algorithm']], label=row['algorithm'])
    ax4.annotate(row['algorithm'], 
                 xy=(row['time_mean'] * 1000, row['pct_optimal']),
                 xytext=(10, 10), textcoords='offset points',
                 fontsize=10, fontweight='bold')

ax4.set_xlabel('Execution Time (ms, log scale)', fontsize=12, fontweight='bold')
ax4.set_ylabel('% of Optimal', fontsize=12, fontweight='bold')
ax4.set_title('Quality vs Speed Trade-off\n(Top-Left is Best)', fontsize=13, fontweight='bold')
ax4.set_xscale('log')
ax4.grid(True, alpha=0.3)
ax4.legend(loc='lower right')

# Add "sweet spot" region
ax4.axhline(y=95, color='green', linestyle='--', alpha=0.3, linewidth=1)
ax4.text(0.02, 95.5, '95% threshold', fontsize=8, color='green', alpha=0.7)

plt.tight_layout()
plt.show()

print("\n‚úÖ Single test case comparison visualization complete")

### 1.3. Nh·∫≠n x√©t - Single Test Case (Size Medium 50)

**Quan s√°t t·ª´ data:**
- **GBFS**: ƒê·∫°t 100% optimal v·ªõi th·ªùi gian c·ª±c nhanh (0.01ms)
- **BPSO**: ƒê·∫°t ~70% optimal, ch·∫≠m h∆°n GBFS ~1500x
- **DP**: Optimal (100%) nh∆∞ng ch·∫≠m h∆°n GBFS ~450x

**Ph√¢n t√≠ch:**
1. **GBFS v∆∞·ª£t tr·ªôi** tr√™n test case n√†y:
   - T√¨m ƒë∆∞·ª£c optimal solution
   - Nhanh nh·∫•t (deterministic, kh√¥ng c√≥ overhead)
   - Kh√¥ng c√≥ variance (std = 0)

2. **BPSO k√©m hi·ªáu qu·∫£**:
   - Quality th·∫•p (70% optimal)
   - Ch·∫≠m nh·∫•t
   - Variance cao (kh√¥ng ·ªïn ƒë·ªãnh)

3. **DP l√† baseline**:
   - Guaranteed optimal
   - Ch·∫≠m h∆°n GBFS nh∆∞ng ch·∫•p nh·∫≠n ƒë∆∞·ª£c cho n=50

**K·∫øt lu·∫≠n:**
- V·ªõi test case n√†y, value/weight ratio l√† heuristic t·ªët ‚Üí GBFS l√† l·ª±a ch·ªçn t·ªët nh·∫•t
- BPSO ch·ªâ n√™n d√πng khi c√≥ r√†ng bu·ªôc ph·ª©c t·∫°p m√† greedy kh√¥ng handle ƒë∆∞·ª£c

---
## PH·∫¶N 2: SO S√ÅNH TR√äN 13 TEST CASES

### 2.1. Load All Test Cases Data

In [None]:
# Load all test cases comparison
df_all = pd.read_csv('../../results/chapter3/3_1_2_comparison_all_testcases.csv')

print("Algorithm Comparison - All 13 Test Cases")
print("="*100)
print(df_all[['test_case', 'n_items', 'gbfs_pct_optimal', 'bpso_pct_optimal', 
              'gbfs_time', 'bpso_time', 'dp_time']].to_string(index=False))
print("\n")

# Summary statistics
print("\nüìä Summary Statistics:")
print("-" * 100)
print(f"GBFS - % Optimal: Mean={df_all['gbfs_pct_optimal'].mean():.2f}%, "
      f"Min={df_all['gbfs_pct_optimal'].min():.2f}%, Max={df_all['gbfs_pct_optimal'].max():.2f}%")
print(f"BPSO - % Optimal: Mean={df_all['bpso_pct_optimal'].mean():.2f}%, "
      f"Min={df_all['bpso_pct_optimal'].min():.2f}%, Max={df_all['bpso_pct_optimal'].max():.2f}%")
print(f"\nGBFS Time: Mean={df_all['gbfs_time'].mean():.6f}s, Max={df_all['gbfs_time'].max():.6f}s")
print(f"BPSO Time: Mean={df_all['bpso_time'].mean():.6f}s, Max={df_all['bpso_time'].max():.6f}s")
print(f"DP Time:   Mean={df_all['dp_time'].mean():.6f}s, Max={df_all['dp_time'].max():.6f}s")
print(f"\nGBFS faster than BPSO: {(df_all['bpso_time'].mean() / df_all['gbfs_time'].mean()):.0f}x")
print(f"GBFS faster than DP:   {(df_all['dp_time'].mean() / df_all['gbfs_time'].mean()):.0f}x")

### 2.2. Visualization - Cross Test Cases Performance

In [None]:
# Figure 1: Quality Comparison Across All Test Cases
fig, axes = plt.subplots(2, 1, figsize=(16, 10))

# Plot 1: Percentage of Optimal for Each Test Case
x = np.arange(len(df_all))
width = 0.35

bars1 = axes[0].bar(x - width/2, df_all['gbfs_pct_optimal'], width, 
                    label='GBFS', color='#2ecc71', alpha=0.8, edgecolor='black')
bars2 = axes[0].bar(x + width/2, df_all['bpso_pct_optimal'], width,
                    label='BPSO', color='#3498db', alpha=0.8, edgecolor='black')

axes[0].axhline(y=100, color='red', linestyle='--', linewidth=2, label='Optimal (100%)', alpha=0.7)
axes[0].axhline(y=95, color='orange', linestyle=':', linewidth=1.5, label='95% threshold', alpha=0.5)
axes[0].set_ylabel('% of Optimal Solution', fontsize=13, fontweight='bold')
axes[0].set_title('Algorithm Quality Comparison Across 13 Test Cases', fontsize=14, fontweight='bold')
axes[0].set_xticks(x)
axes[0].set_xticklabels(df_all['test_case'], rotation=45, ha='right', fontsize=9)
axes[0].legend(loc='lower right', fontsize=11)
axes[0].grid(True, alpha=0.3, axis='y')
axes[0].set_ylim([50, 105])

# Add annotations for low performers
for i, (g, b) in enumerate(zip(df_all['gbfs_pct_optimal'], df_all['bpso_pct_optimal'])):
    if b < 65:
        axes[0].annotate(f'{b:.1f}%', xy=(i + width/2, b), xytext=(0, -15),
                        textcoords='offset points', ha='center', fontsize=8,
                        bbox=dict(boxstyle='round,pad=0.3', fc='yellow', alpha=0.7))

# Plot 2: Execution Time Comparison (Log Scale)
bars3 = axes[1].bar(x - width, df_all['gbfs_time']*1000, width,
                    label='GBFS', color='#2ecc71', alpha=0.8, edgecolor='black')
bars4 = axes[1].bar(x, df_all['bpso_time']*1000, width,
                    label='BPSO', color='#3498db', alpha=0.8, edgecolor='black')
bars5 = axes[1].bar(x + width, df_all['dp_time']*1000, width,
                    label='DP', color='#e74c3c', alpha=0.8, edgecolor='black')

axes[1].set_ylabel('Execution Time (ms, log scale)', fontsize=13, fontweight='bold')
axes[1].set_title('Computational Cost Comparison Across 13 Test Cases', fontsize=14, fontweight='bold')
axes[1].set_xticks(x)
axes[1].set_xticklabels(df_all['test_case'], rotation=45, ha='right', fontsize=9)
axes[1].set_yscale('log')
axes[1].legend(loc='upper left', fontsize=11)
axes[1].grid(True, alpha=0.3, axis='y')

plt.tight_layout()
plt.show()

print("\n‚úÖ Cross test cases comparison visualization complete")

In [None]:
# Figure 2: Quality vs Speed Trade-off Scatter
fig, ax = plt.subplots(figsize=(14, 8))

# Plot GBFS
gbfs_scatter = ax.scatter(df_all['gbfs_time']*1000, df_all['gbfs_pct_optimal'],
                          s=200, alpha=0.7, c='#2ecc71', edgecolors='black', linewidth=2,
                          marker='o', label='GBFS')

# Plot BPSO
bpso_scatter = ax.scatter(df_all['bpso_time']*1000, df_all['bpso_pct_optimal'],
                          s=200, alpha=0.7, c='#3498db', edgecolors='black', linewidth=2,
                          marker='s', label='BPSO')

# Plot DP
dp_scatter = ax.scatter(df_all['dp_time']*1000, [100]*len(df_all),
                        s=200, alpha=0.7, c='#e74c3c', edgecolors='black', linewidth=2,
                        marker='^', label='DP (Optimal)')

# Add connecting lines for same test case
for i in range(len(df_all)):
    ax.plot([df_all.iloc[i]['gbfs_time']*1000, df_all.iloc[i]['bpso_time']*1000],
            [df_all.iloc[i]['gbfs_pct_optimal'], df_all.iloc[i]['bpso_pct_optimal']],
            'k--', alpha=0.2, linewidth=1)

# Add reference regions
ax.axhline(y=95, color='orange', linestyle=':', linewidth=2, alpha=0.5, label='95% quality threshold')
ax.axvline(x=1, color='purple', linestyle=':', linewidth=2, alpha=0.5, label='1ms time threshold')

# Annotate interesting points
for i, row in df_all.iterrows():
    # Annotate test cases with particularly low BPSO performance
    if row['bpso_pct_optimal'] < 60:
        ax.annotate(row['test_case'].replace(' Medium', '').replace('Data ', '').replace('Region ', 'R'),
                   xy=(row['bpso_time']*1000, row['bpso_pct_optimal']),
                   xytext=(10, -10), textcoords='offset points', fontsize=8,
                   bbox=dict(boxstyle='round,pad=0.3', fc='yellow', alpha=0.7),
                   arrowprops=dict(arrowstyle='->', connectionstyle='arc3,rad=0.2', lw=1))

ax.set_xlabel('Execution Time (ms, log scale)', fontsize=13, fontweight='bold')
ax.set_ylabel('% of Optimal Solution', fontsize=13, fontweight='bold')
ax.set_title('Quality vs Speed Trade-off: All 13 Test Cases\n(Top-Left Corner = Best Performance)',
             fontsize=14, fontweight='bold')
ax.set_xscale('log')
ax.set_xlim([0.005, 50])
ax.set_ylim([50, 105])
ax.legend(loc='lower right', fontsize=11)
ax.grid(True, alpha=0.3)

# Add "ideal region" shading
from matplotlib.patches import Polygon
ideal_region = Polygon([[0.005, 95], [1, 95], [1, 105], [0.005, 105]],
                       facecolor='green', alpha=0.1, edgecolor='green', linewidth=2, linestyle='--')
ax.add_patch(ideal_region)
ax.text(0.02, 100, 'IDEAL\nREGION', fontsize=10, fontweight='bold', 
        color='green', alpha=0.7, ha='center', va='center')

plt.tight_layout()
plt.show()

print("\n‚úÖ Quality vs Speed scatter plot complete")

### 2.3. Nh·∫≠n x√©t - Across 13 Test Cases

**Quan s√°t t·ªïng quan:**

1. **GBFS Performance:**
   - % Optimal: Mean = 99.1%, Range = [97.0% - 100%]
   - **C·ª±c k·ª≥ ·ªïn ƒë·ªãnh** - t·∫•t c·∫£ test cases ƒë·ªÅu >= 97%
   - Th·ªùi gian: Mean = 0.013ms (nhanh nh·∫•t)
   - **Consistent excellence** tr√™n m·ªçi lo·∫°i test case

2. **BPSO Performance:**
   - % Optimal: Mean = 65.8%, Range = [55.1% - 93.7%]
   - **R·∫•t kh√¥ng ·ªïn ƒë·ªãnh** - variance cao
   - Th·ªùi gian: Mean = 15.5ms (~1200x ch·∫≠m h∆°n GBFS)
   - Performance gi·∫£m m·∫°nh v·ªõi m·ªôt s·ªë test cases

3. **DP Performance:**
   - % Optimal: Always 100% (guaranteed)
   - Th·ªùi gian: Mean = 9.6ms (~740x ch·∫≠m h∆°n GBFS)
   - Ch·∫•p nh·∫≠n ƒë∆∞·ª£c v·ªõi n=70

**Ph√¢n t√≠ch chi ti·∫øt:**

**Test cases GBFS ho√†n h·∫£o (100%):**
- Size Small 30, Size Medium 50 ‚Üí Greedy optimal cho test cases nh·ªè

**Test cases GBFS r·∫•t t·ªët (>99%):**
- H·∫ßu h·∫øt test cases c√≤n l·∫°i ‚Üí Heuristic value/weight r·∫•t hi·ªáu qu·∫£

**Test cases BPSO k√©m (<60%):**
- Data High/Low Correlation, Region 2/3 Regions
- C√≥ th·ªÉ do premature convergence ho·∫∑c search space ph·ª©c t·∫°p

**Test cases BPSO t·ªët nh·∫•t (>93%):**
- Size Small 30 ‚Üí Search space nh·ªè, d·ªÖ explore

**K·∫øt lu·∫≠n quan tr·ªçng:**

1. **GBFS l√† l·ª±a ch·ªçn t·ªët nh·∫•t** cho Knapsack problem c∆° b·∫£n:
   - Quality g·∫ßn optimal (>97%) tr√™n m·ªçi test case
   - C·ª±c k·ª≥ nhanh v√† ·ªïn ƒë·ªãnh
   - Kh√¥ng c·∫ßn tuning parameters

2. **BPSO kh√¥ng ph√π h·ª£p** v·ªõi Knapsack c∆° b·∫£n:
   - Quality th·∫•p v√† kh√¥ng ·ªïn ƒë·ªãnh
   - Ch·∫≠m h∆°n r·∫•t nhi·ªÅu
   - C·∫ßn tuning c·∫©n th·∫≠n

3. **DP l√† baseline** ƒë·ªÉ ƒë√°nh gi√°:
   - Guaranteed optimal
   - Ch·∫≠m h∆°n GBFS nh∆∞ng ch·∫•p nh·∫≠n ƒë∆∞·ª£c
   - Kh√¥ng scalable cho n l·ªõn

4. **T·∫°i sao GBFS v∆∞·ª£t tr·ªôi?**
   - Value/weight ratio l√† heuristic C·ª∞C K·ª≤ T·ªêT cho knapsack
   - Test cases c√≥ c·∫•u tr√∫c thu·∫≠n l·ª£i cho greedy
   - Deterministic ‚Üí kh√¥ng c√≥ randomness overhead

---
## PH·∫¶N 3: RANKINGS & RECOMMENDATIONS

### 3.1. Algorithm Rankings

In [None]:
# Calculate rankings for each test case
df_rankings = df_all.copy()

# Rank by quality (1 = best)
df_rankings['gbfs_quality_rank'] = 1  # Always best or tied
df_rankings['bpso_quality_rank'] = 2  # Always worst
df_rankings['dp_quality_rank'] = 1    # Always optimal

# Rank by speed (1 = fastest)
df_rankings['gbfs_speed_rank'] = 1
df_rankings['bpso_speed_rank'] = 3
df_rankings['dp_speed_rank'] = 2

# Overall score (lower is better): quality_rank + speed_rank
df_rankings['gbfs_score'] = 2   # 1 + 1
df_rankings['bpso_score'] = 5   # 2 + 3
df_rankings['dp_score'] = 3     # 1 + 2

# Count wins
print("\nüèÜ Algorithm Rankings Summary:")
print("="*80)
print(f"\n{'Metric':<30} {'GBFS':>12} {'BPSO':>12} {'DP':>12}")
print("-"*80)
print(f"{'Quality (% optimal)':.<30} {df_all['gbfs_pct_optimal'].mean():>11.2f}% {df_all['bpso_pct_optimal'].mean():>11.2f}% {'100.00%':>12}")
print(f"{'Speed (ms)':.<30} {df_all['gbfs_time'].mean()*1000:>11.4f} {df_all['bpso_time'].mean()*1000:>11.4f} {df_all['dp_time'].mean()*1000:>11.4f}")
print(f"{'Stability (consistent?)':.<30} {'‚úÖ Yes':>12} {'‚ùå No':>12} {'‚úÖ Yes':>12}")
print(f"{'Scalability':.<30} {'‚úÖ O(n log n)':>12} {'‚ö†Ô∏è O(n¬∑iter)':>12} {'‚ùå O(n¬∑W)':>12}")
print("-"*80)
print(f"{'Overall Score (lower = better)':.<30} {df_rankings['gbfs_score'].iloc[0]:>12.0f} {df_rankings['bpso_score'].iloc[0]:>12.0f} {df_rankings['dp_score'].iloc[0]:>12.0f}")
print("\nü•á Winner: GBFS")
print("ü•à Runner-up: DP")
print("ü•â Third: BPSO")

### 3.2. Recommendations - Khi N√†o D√πng Thu·∫≠t To√°n N√†o?

#### üéØ GBFS (Greedy Best-First Search)

**‚úÖ D√πng khi:**
- B√†i to√°n Knapsack c∆° b·∫£n (0/1 knapsack, fractional knapsack)
- C·∫ßn t·ªëc ƒë·ªô cao v√† ƒë·ªô ·ªïn ƒë·ªãnh
- Dataset c√≥ value/weight ratio ph√¢n bi·ªát r√µ r√†ng
- C·∫ßn k·∫øt qu·∫£ deterministic (c√≥ th·ªÉ reproduce)
- Kh√¥ng c√≥ r√†ng bu·ªôc ph·ª©c t·∫°p (regional, categorical, conflicts)

**‚ö†Ô∏è C√¢n nh·∫Øc khi:**
- Test cases c√≥ nhi·ªÅu items c√≥ ratio g·∫ßn nhau
- C·∫ßn guarantee optimal solution (d√πng DP)

**∆Øu ƒëi·ªÉm:**
- ‚ö° C·ª±c k·ª≥ nhanh (~0.01ms)
- üéØ Quality cao (>97% optimal)
- üìä ·ªîn ƒë·ªãnh tuy·ªát ƒë·ªëi (no variance)
- üîß Kh√¥ng c·∫ßn tuning parameters

**Nh∆∞·ª£c ƒëi·ªÉm:**
- Kh√¥ng ph·∫£i always optimal (but >97%)
- Ph·ª• thu·ªôc v√†o quality c·ªßa heuristic

---

#### üéØ BPSO (Binary Particle Swarm Optimization)

**‚úÖ D√πng khi:**
- B√†i to√°n c√≥ r√†ng bu·ªôc ph·ª©c t·∫°p (multi-constraint knapsack)
- Kh√¥ng c√≥ heuristic t·ªët
- C·∫ßn explore solution space to√†n di·ªán
- C√≥ th·ªÉ ch·∫•p nh·∫≠n trade-off: ch·∫≠m h∆°n nh∆∞ng linh ho·∫°t h∆°n

**‚ùå TR√ÅNH d√πng khi:**
- B√†i to√°n Knapsack c∆° b·∫£n (GBFS t·ªët h∆°n)
- C·∫ßn k·∫øt qu·∫£ nhanh
- C·∫ßn k·∫øt qu·∫£ ·ªïn ƒë·ªãnh
- Kh√¥ng c√≥ th·ªùi gian/t√†i nguy√™n ƒë·ªÉ tuning parameters

**∆Øu ƒëi·ªÉm:**
- üîç Kh√¥ng c·∫ßn heuristic
- üåê Explore to√†n di·ªán
- üîß Linh ho·∫°t v·ªõi r√†ng bu·ªôc

**Nh∆∞·ª£c ƒëi·ªÉm:**
- üêå Ch·∫≠m (~1200x so v·ªõi GBFS)
- üìâ Quality th·∫•p v√† kh√¥ng ·ªïn ƒë·ªãnh (55-94%)
- ‚öôÔ∏è C·∫ßn tuning c·∫©n th·∫≠n
- üé≤ Stochastic (k·∫øt qu·∫£ kh√°c nhau m·ªói l·∫ßn)

---

#### üéØ DP (Dynamic Programming)

**‚úÖ D√πng khi:**
- C·∫ßn **guarantee optimal solution**
- Dataset nh·ªè/v·ª´a (n < 1000, W < 10^6)
- C√≥ ƒë·ªß memory
- C√≥ th·ªÉ ch·∫•p nh·∫≠n ch·∫≠m h∆°n greedy

**‚ùå TR√ÅNH d√πng khi:**
- Dataset l·ªõn (n > 10^4 ho·∫∑c W > 10^7)
- B·ªã gi·ªõi h·∫°n memory
- C·∫ßn t·ªëc ƒë·ªô real-time

**∆Øu ƒëi·ªÉm:**
- ‚úÖ Always optimal
- üìä Deterministic
- üéØ Ch√≠nh x√°c tuy·ªát ƒë·ªëi

**Nh∆∞·ª£c ƒëi·ªÉm:**
- ‚è±Ô∏è O(n¬∑W) - kh√¥ng scalable
- üíæ Memory intensive
- üê¢ Ch·∫≠m h∆°n GBFS (~740x)

---

### üèÜ Best Practice Recommendation

**Chi·∫øn l∆∞·ª£c lai (Hybrid Strategy):**

1. **B∆∞·ªõc 1**: Ch·∫°y GBFS tr∆∞·ªõc (nhanh)
   - N·∫øu capacity utilization > 95% ‚Üí Ch·∫•p nh·∫≠n k·∫øt qu·∫£
   - N·∫øu < 95% ‚Üí Ti·∫øp t·ª•c b∆∞·ªõc 2

2. **B∆∞·ªõc 2**: Ch·∫°y DP n·∫øu n nh·ªè (n < 100)
   - Guarantee optimal
   - So s√°nh v·ªõi GBFS ƒë·ªÉ ƒë√°nh gi√° quality

3. **B∆∞·ªõc 3**: Ch·ªâ d√πng BPSO n·∫øu:
   - C√≥ r√†ng bu·ªôc ph·ª©c t·∫°p m√† GBFS kh√¥ng x·ª≠ l√Ω ƒë∆∞·ª£c
   - ƒê√£ tune parameters c·∫©n th·∫≠n
   - Ch·∫°y nhi·ªÅu l·∫ßn v√† l·∫•y best result

**K·∫øt lu·∫≠n cu·ªëi c√πng:**
- **GBFS l√† l·ª±a ch·ªçn m·∫∑c ƒë·ªãnh** cho Knapsack problem
- **DP d√πng ƒë·ªÉ validate** v√† l√†m baseline
- **BPSO ch·ªâ d√πng** cho b√†i to√°n ph·ª©c t·∫°p v·ªõi r√†ng bu·ªôc ƒë·∫∑c bi·ªát

---
## üìù K·∫æT LU·∫¨N

### T√≥m t·∫Øt ph√¢n t√≠ch so s√°nh thu·∫≠t to√°n:

1. **GBFS chi·∫øm ∆∞u th·∫ø tuy·ªát ƒë·ªëi** trong Knapsack problem c∆° b·∫£n:
   - Quality: >97% optimal tr√™n m·ªçi test case
   - Speed: Nhanh nh·∫•t (~1000x so v·ªõi BPSO)
   - Stability: Ho√†n h·∫£o (no variance)

2. **BPSO kh√¥ng ph√π h·ª£p** v·ªõi Knapsack c∆° b·∫£n:
   - Quality th·∫•p v√† kh√¥ng ·ªïn ƒë·ªãnh (55-94%)
   - Ch·∫≠m nh·∫•t
   - C·∫ßn tuning ph·ª©c t·∫°p

3. **DP l√† gold standard** ƒë·ªÉ ƒë√°nh gi√°:
   - Always optimal
   - Nh∆∞ng kh√¥ng scalable

4. **Value/weight ratio** l√† heuristic C·ª∞C K·ª≤ T·ªêT:
   - GBFS v·ªõi heuristic n√†y g·∫ßn nh∆∞ optimal
   - Gi·∫£i th√≠ch t·∫°i sao GBFS v∆∞·ª£t tr·ªôi

5. **Recommendation cu·ªëi c√πng:**
   - **D√πng GBFS** cho h·∫ßu h·∫øt tr∆∞·ªùng h·ª£p
   - **D√πng DP** n·∫øu c·∫ßn guarantee optimal v√† n nh·ªè
   - **Tr√°nh BPSO** tr·ª´ khi c√≥ r√†ng bu·ªôc ph·ª©c t·∫°p ƒë·∫∑c bi·ªát

### Next Steps:
- Ph√¢n t√≠ch ·∫£nh h∆∞·ªüng c·ªßa ƒë·∫∑c ƒëi·ªÉm d·ªØ li·ªáu (3. Data.ipynb)
- T·ªëi ∆∞u h√≥a k·∫øt h·ª£p parameters v√† data (4. Optimization.ipynb)
- Nghi√™n c·ª©u c·∫£i ti·∫øn thu·∫≠t to√°n (5. EnhancedAlgorithm.ipynb)