Skip to content

refactor: extract ground generation into dedicated module#848

Merged
louis-e merged 8 commits intomainfrom
refactor/extract-ground-generation
Mar 26, 2026
Merged

refactor: extract ground generation into dedicated module#848
louis-e merged 8 commits intomainfrom
refactor/extract-ground-generation

Conversation

@louis-e
Copy link
Owner

@louis-e louis-e commented Mar 26, 2026

Move the ~670-line ground layer generation loop (surface blocks, vegetation, shorelines, underground fill) from data_processing.rs into a new ground_generation.rs module. This keeps data_processing focused on OSM element orchestration while ground_generation owns the terrain pass.
Remove flushing since it won't prevent OOM crashes due to still existing memory spikes and massive generation time increase.

Move the ~670-line ground layer generation loop (surface blocks, vegetation,
shorelines, underground fill, region flushing) from data_processing.rs into
a new ground_generation.rs module. This keeps data_processing focused on
OSM element orchestration while ground_generation owns the terrain pass.
Copilot AI review requested due to automatic review settings March 26, 2026 18:36
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Refactors world generation by extracting the full “ground layer” terrain pass (surface/sub-surface selection, shoreline blending, vegetation placement, underground fill, and region flushing) out of data_processing.rs into a dedicated ground_generation module, leaving data_processing focused on OSM element orchestration.

Changes:

  • Added a new ground_generation module that owns the chunk-ordered ground/terrain generation loop and incremental region flushing.
  • Updated data_processing::generate_world_with_options to call ground_generation::generate_ground_layer(...) after dropping caches.
  • Registered the new module in main.rs.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
src/main.rs Registers the new ground_generation module in the crate.
src/ground_generation.rs New module containing the extracted ground generation pass and region flushing logic.
src/data_processing.rs Removes inlined ground-generation logic and delegates to ground_generation::generate_ground_layer.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

louis-e added 2 commits March 26, 2026 19:54
…rc<Ground>

Move the MIN_Y constant from data_processing into world_editor::common
(which already had a private duplicate) and re-export it. This removes
the reverse dependency from ground_generation back into data_processing.

Also narrow generate_ground_layer's ground parameter from &Arc<Ground>
to &Ground since it only needs shared read access.
…data

When flush_region() saves a region to disk and removes it from memory,
subsequent block writes (e.g. tree canopy leaves crossing a region
boundary) would silently re-create the region via entry().or_default().
The re-created region contained only the stray blocks, and save() would
then overwrite the correct .mca file with this near-empty one — causing
entire regions to appear missing in the generated world.

Fix: track flushed region coordinates in a HashSet and skip all write
operations (set_block, set_block_with_properties, set_block_if_absent,
fill_column) targeting already-flushed regions. The skipped writes are
cosmetic (tree leaves at region edges) so there is no visual impact.

Closes the "partially empty world" bug introduced by the incremental
region flushing in PR #793.
@louis-e
Copy link
Owner Author

louis-e commented Mar 26, 2026

retrigger-benchmark

@github-actions
Copy link

github-actions bot commented Mar 26, 2026

⏱️ Benchmark run finished in 0m 44s
🧠 Peak memory usage: 1063 MB

📈 Compared against baseline: 30s
🧮 Delta: 14s
🔢 Commit: d8f8d6e

⚠️ This PR worsens generation time.

📅 Last benchmark: 2026-03-26 20:19:05 UTC

You can retrigger the benchmark by commenting retrigger-benchmark.

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

…nopies

Tree canopies extend up to 3 blocks (ROUND3_PATTERN) into neighboring
regions. The previous flush logic serialized a region as soon as its
last chunk_x completed, before the next chunk could place trees whose
canopies spill back. Those spilled writes were silently dropped by the
flushed_regions guard, causing truncated trees at every region boundary.

Fix: instead of flushing at the last chunk of a region, flush when the
first chunk of the NEXT region completes. This one-chunk (16-block)
delay exceeds the maximum 3-block tree canopy radius, so all cross-
region writes land before the flush.

Also add a debug-only warning in the flushed_regions guard so unexpected
writes to flushed regions are surfaced during development.
@louis-e
Copy link
Owner Author

louis-e commented Mar 26, 2026

retrigger-benchmark

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 3 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

The is_flushed() HashSet check on every set_block / set_block_if_absent /
fill_column / set_block_with_properties call added measurable overhead
on the hottest code path (hundreds of millions of invocations).

Instead of guarding individual writes, let the rare cross-region writes
(tree canopies) re-create stale region entries in memory — they hold
only a few leaf blocks and cost negligible RAM. Before save(), call
purge_stale_regions() once to remove these entries so the correct
.mca files written during incremental flushing are never overwritten.

This eliminates the per-write overhead entirely while preserving
correctness.
@louis-e louis-e requested a review from Copilot March 26, 2026 20:05
@louis-e
Copy link
Owner Author

louis-e commented Mar 26, 2026

retrigger-benchmark

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@louis-e
Copy link
Owner Author

louis-e commented Mar 26, 2026

retrigger-benchmark

1 similar comment
@louis-e
Copy link
Owner Author

louis-e commented Mar 26, 2026

retrigger-benchmark

The incremental flush_region approach serialized regions sequentially
during ground generation instead of in parallel at save time. Benchmarks
show this causes a consistent ~47% time regression (30s → 44s) while
actually INCREASING peak memory (~935 MB → ~1060 MB) due to
serialization buffer overhead.

The memory reduction only helps for extremely large areas where in-memory
region data alone exceeds available RAM, but for those areas the time
penalty is proportionally worse (linear scaling).

This reverts the flush_region calls and all associated bookkeeping
(flushed_regions tracking, purge_stale_regions, delayed flush). The
flush_region method itself is removed as dead code. Regions are kept
in memory during generation and saved in parallel via rayon in
save_java(), restoring the original performance characteristics.
@louis-e louis-e requested a review from Copilot March 26, 2026 21:50
@louis-e
Copy link
Owner Author

louis-e commented Mar 26, 2026

retrigger-benchmark

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Comments suppressed due to low confidence (1)

src/world_editor/mod.rs:734

  • This PR removes WorldEditor::flush_region(), which previously enabled incremental region serialization to cap memory usage during ground generation. Since ground_generation now keeps all regions in memory until save_java(), large bboxes may regress in peak RAM (potentially OOM). If streaming/flush behavior is still required, keep flush_region (even as pub(crate)) and have ground generation call it when using JavaAnvil.
    /// Sets a block only if no modification has been recorded yet at this
    /// position (i.e. the in-memory overlay still holds AIR).
    ///
    /// This is faster than `set_block_absolute` with `None` whitelists/blacklists
    /// because it avoids the double HashMap traversal.
    #[inline]
    pub fn set_block_if_absent_absolute(&mut self, block: Block, x: i32, absolute_y: i32, z: i32) {
        if !self.xzbbox.contains(&XZPoint::new(x, z)) {
            return;
        }
        self.world.set_block_if_absent(x, absolute_y, z, block);
    }

    /// Returns true if a non-AIR block exists at the given absolute coordinates.
    #[inline]
    pub fn block_exists_absolute(&self, x: i32, absolute_y: i32, z: i32) -> bool {
        self.world.get_block(x, absolute_y, z).is_some()
    }

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@louis-e louis-e merged commit 81080b1 into main Mar 26, 2026
2 checks passed
@louis-e louis-e deleted the refactor/extract-ground-generation branch March 26, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants