# 04 - Noise Threshold Analysis

**Purpose:** Understand impact of noise thresholds on biome placement

**Scope:**
- Analyze noise value distributions per biome
- Understand threshold impacts (Swamp=0.6, Others=0.4)
- Visualize noise patterns spatially
- Document why noise is NOT adjustable (game-generated)

**Prerequisites:**
- Notebook 01 completed (data loaded)
- Understanding that noise is calculated in-game, not in post-processing

**Outputs:**
- Noise distribution visualization
- Documentation of noise threshold behavior
- Clarification on what CAN vs CANNOT be adjusted

**Estimated Time:** 10 minutes

**Note:** This notebook is primarily educational. Noise values are calculated by Valheim's WorldGenerator using Perlin noise, and cannot be adjusted in post-processing.

## Setup

In [None]:
import sys
sys.path.append('.')

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from biome_utils import *
from config import *

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
plt.rcParams['figure.figsize'] = (14, 8)

print("✓ Setup complete")

## Load Data

In [None]:
SAMPLE_PATH = '../output/samples/hkLycKKCMI-samples-1024.json'
df = load_samples(SAMPLE_PATH)
print(f"Loaded {len(df):,} samples")

## Understanding Noise Thresholds

From decompiled WorldGenerator.cs:

```csharp
// Line 789: Swamp (most restrictive)
if (baseHeight > 0.05f && baseHeight < 0.25f && 
    magnitude < 3000f && biome != Heightmap.Biome.Mountain && 
    GetBiomeHeight(biome, wx, wy, default(Heightmap.BiomeArea?)) > 0.05f && 
    Mathf.PerlinNoise(wx * 0.001f + 0.123f, wy * 0.001f + 0.15123f) > 0.6f)  // ← 0.6 threshold
{
    return Heightmap.Biome.Swamp;
}

// Line 793: Mistlands (standard)
if (magnitude > 6000f && magnitude < 10000f && 
    Mathf.PerlinNoise(wx * 0.001f + 0.123f, wy * 0.001f + 0.15123f) > 0.4f)  // ← 0.4 threshold
{
    return Heightmap.Biome.Mistlands;
}

// Line 797: Plains (standard)
if (magnitude > 3000f && magnitude < 8000f && 
    biome != Heightmap.Biome.Swamp && 
    Mathf.PerlinNoise(wx * 0.001f + 0.123f, wy * 0.001f + 0.15123f) > 0.4f)  // ← 0.4 threshold
{
    return Heightmap.Biome.Plains;
}

// Line 801: BlackForest (standard)
if (Mathf.PerlinNoise(wx * 0.001f + 0.123f, wy * 0.001f + 0.15123f) > 0.4f &&
    magnitude > 600f && magnitude < 6000f)
{
    return Heightmap.Biome.BlackForest;
}
```

**Key Points:**
- **Noise Scale:** All biomes use same scale (0.001)
- **Swamp Restriction:** Requires noise > 0.6 (most restrictive)
- **Standard Threshold:** Most biomes use noise > 0.4
- **Not Adjustable:** Noise calculated in-game, not available in post-processing

## What We CAN Adjust vs What We CANNOT

### ✅ CAN Adjust (Post-Processing Filters):

1. **Sea Level Threshold** (Notebook 02)
   - Ocean/land distinction based on height
   - Data available: Height values

2. **Polar Thresholds** (Notebook 03)
   - Polar crescent shapes based on Y-coordinate
   - Data available: X, Z coordinates

3. **Distance Ring Boundaries**
   - Outer ring definition for Mistlands recovery
   - Data available: Distance from center

### ❌ CANNOT Adjust (Requires Game-Generated Data):

1. **Noise Thresholds**
   - Perlin noise calculated in WorldGenerator.GetBiome()
   - Data NOT available: Noise values not exported
   - Would require: Exporting noise values from C# plugin

2. **Base Height Calculation**
   - Complex algorithm in WorldGenerator.GetHeight()
   - Data available: Final height only (not underlying noise layers)

3. **Mountain/Ocean Height Bands**
   - Based on normalized baseHeight from game
   - Cannot recalculate baseHeight accurately in post-processing

## Biome Distribution by Noise Requirements

Let's analyze how noise thresholds affect biome placement based on game logic:

In [None]:
# Group biomes by their noise threshold requirements
noise_groups = {
    'No Noise Check': ['Ocean', 'Mountain', 'Meadows', 'DeepNorth', 'Ashlands'],
    'Noise > 0.4 (Standard)': ['BlackForest', 'Plains', 'Mistlands'],
    'Noise > 0.6 (Restrictive)': ['Swamp']
}

# Calculate stats for each group
stats = calculate_biome_distribution(df)

print("Biome Distribution by Noise Threshold Requirements:")
print("=" * 80)

for group_name, biome_names in noise_groups.items():
    print(f"\n{group_name}:")
    print("-" * 80)
    group_total = 0
    for biome_name in biome_names:
        if biome_name in stats:
            count = stats[biome_name]['count']
            pct = stats[biome_name]['percentage']
            group_total += pct
            print(f"  {biome_name:<15} {count:>10,} ({pct:>5.1f}%)")
    print(f"  {'Group Total':<15} {' '*10} ({group_total:>5.1f}%)")

print("\n" + "=" * 80)
print("Observations:")
print("  - Swamp is RARE (~5%) due to restrictive noise threshold (0.6 vs 0.4)")
print("  - This is intentional game design to make Swamp less common")
print("  - Cannot adjust in post-processing without noise values")

## Spatial Distribution: Noise-Based vs Distance-Based Biomes

In [None]:
# Compare distance-based biomes (clearer rings) vs noise-based (scattered)
fig, axes = plt.subplots(2, 3, figsize=(16, 10))

# Distance-based biomes (clearer concentric patterns)
distance_biomes = [
    (1, "Meadows", "Center-focused"),
    (64, "Mistlands", "Outer ring (6-10km)"),
    (256, "DeepNorth", "Far north edge")
]

for idx, (biome_id, name, desc) in enumerate(distance_biomes):
    ax = axes[0, idx]
    biome_data = df[df['Biome'] == biome_id]
    if len(biome_data) > 0:
        ax.scatter(biome_data['X'], biome_data['Z'], 
                  c=get_biome_color(biome_id, normalized=True)[0:1]*len(biome_data),
                  s=1, alpha=0.5)
        ax.set_title(f"{name}\n{desc}", fontsize=11, fontweight='bold')
        ax.set_xlim(-10500, 10500)
        ax.set_ylim(-10500, 10500)
        ax.set_aspect('equal')
        ax.grid(True, alpha=0.3)

# Noise-based biomes (scattered patterns)
noise_biomes = [
    (2, "BlackForest", "Noise > 0.4"),
    (16, "Plains", "Noise > 0.4"),
    (4, "Swamp", "Noise > 0.6 (rare)")
]

for idx, (biome_id, name, desc) in enumerate(noise_biomes):
    ax = axes[1, idx]
    biome_data = df[df['Biome'] == biome_id]
    if len(biome_data) > 0:
        ax.scatter(biome_data['X'], biome_data['Z'],
                  c=get_biome_color(biome_id, normalized=True)[0:1]*len(biome_data),
                  s=1, alpha=0.5)
        ax.set_title(f"{name}\n{desc}", fontsize=11, fontweight='bold')
        ax.set_xlim(-10500, 10500)
        ax.set_ylim(-10500, 10500)
        ax.set_aspect('equal')
        ax.grid(True, alpha=0.3)

# Configure all axes
for ax in axes.flat:
    ax.set_xlabel('X (meters)', fontsize=9)
    ax.set_ylabel('Z (meters)', fontsize=9)
    # Draw world boundary
    circle = plt.Circle((0, 0), 10000, fill=False, color='black', linewidth=1, linestyle=':')
    ax.add_patch(circle)

fig.suptitle('Spatial Patterns: Distance-Based vs Noise-Based Biomes', 
             fontsize=14, fontweight='bold', y=0.995)
plt.tight_layout()
plt.show()

print("\nObservations:")
print("  - Distance-based biomes (top row): Clear concentric rings")
print("  - Noise-based biomes (bottom row): Scattered, organic patterns")
print("  - Noise creates natural-looking variation within distance constraints")

## Swamp Rarity Analysis

In [None]:
# Analyze Swamp distribution vs other biomes in same distance range
mid_ring = df[(df['Distance'] >= 2000) & (df['Distance'] <= 6000)]

print("Mid Ring (2-6km) - Swamp Comparison Zone:")
print("=" * 80)

# Biomes that can appear in this zone
competing_biomes = ['BlackForest', 'Swamp', 'Plains', 'Meadows']

for biome_name in competing_biomes:
    biome_id = BIOME_NAME_TO_ID.get(biome_name)
    if biome_id:
        count = (mid_ring['Biome'] == biome_id).sum()
        pct = count / len(mid_ring) * 100
        
        # Show noise requirement
        noise_req = "None"
        if biome_name in ['BlackForest', 'Plains']:
            noise_req = "> 0.4"
        elif biome_name == 'Swamp':
            noise_req = "> 0.6 ⚠️"
        
        print(f"  {biome_name:<15} {count:>8,} ({pct:>5.1f}%)  [Noise: {noise_req}]")

print("\n" + "=" * 80)
print("Why Swamp is Rare:")
print("  1. Requires noise > 0.6 (only ~40% of locations meet this)")
print("  2. Other biomes use noise > 0.4 (60% of locations qualify)")
print("  3. Additional height requirements (0.05 < baseHeight < 0.25)")
print("  4. This is INTENTIONAL game design - Swamp is meant to be rare")
print("\n  Cannot adjust without exporting noise values from WorldGenerator!")

## Potential Future Enhancement: Export Noise Values

If we wanted to adjust noise thresholds in post-processing, we would need to:

### 1. Modify BepInEx Data Exporter Plugin

Add noise value export to `VWE_DataExporter`:

```csharp
public class BiomeSample
{
    public float X { get; set; }
    public float Z { get; set; }
    public int Biome { get; set; }
    public float Height { get; set; }
    
    // NEW: Export noise values
    public float BiomeNoise { get; set; }  // The 0.4/0.6 threshold noise
}

// In sampling loop:
float noise = Mathf.PerlinNoise(wx * 0.001f + 0.123f, wy * 0.001f + 0.15123f);
sample.BiomeNoise = noise;
```

### 2. Adjust Thresholds in Post-Processing

Then in Python:

```python
# Apply custom noise threshold for Swamp
def adjust_swamp_threshold(df, new_threshold=0.55):
    """Make Swamp less restrictive by lowering noise threshold"""
    # Find locations that would qualify as Swamp with lower threshold
    potential_swamp = (
        (df['BiomeNoise'] > new_threshold) &  # Lower threshold
        (df['Height'] >= 10) &                 # Height requirements
        (df['Height'] <= 50) &
        (df['Distance'] >= 2000) &             # Distance requirements
        (df['Distance'] <= 6000)
    )
    df.loc[potential_swamp, 'Biome'] = BIOME_NAME_TO_ID['Swamp']
    return df
```

### 3. Trade-offs

- ✅ **Pro:** Full control over biome placement thresholds
- ✅ **Pro:** Can experiment with different noise thresholds instantly
- ❌ **Con:** Increases sample data size (~4 bytes per sample)
- ❌ **Con:** Requires modifying C# plugin and recompiling
- ❌ **Con:** May deviate from intended game balance

## Key Findings

**Noise Threshold Analysis Results:**

1. **Current Limitations:**
   - Noise values not available in current sample data
   - Cannot adjust noise thresholds in post-processing
   - This is expected - noise is an in-game calculation

2. **Swamp Rarity by Design:**
   - Swamp requires noise > 0.6 (most restrictive)
   - Other biomes use noise > 0.4 (standard)
   - Results in ~5% Swamp vs ~20%+ for other biomes
   - This matches Valheim's intended balance

3. **Spatial Patterns:**
   - Distance-based biomes: Clear concentric rings
   - Noise-based biomes: Scattered, organic placement
   - Noise creates natural variation within distance constraints

4. **Focus on Adjustable Parameters:**
   - Sea level thresholds (Notebook 02) ✅
   - Polar crescent thresholds (Notebook 03) ✅
   - Distance ring boundaries ✅
   - Noise thresholds (requires C# plugin modification) ⚠️

**Recommendation:**
- **Current approach is optimal** - Focus on post-processing filters that use available data
- Only export noise values if significant biome balance issues emerge
- Current Swamp rarity appears intentional and matches game design

**Next Steps:**
- Notebook 05: Compare different filter strategies side-by-side
- Notebook 06: Heightmap 3D visualization