Skip to content

Conversation

@mese79
Copy link
Member

@mese79 mese79 commented Jun 15, 2025

This is a major update over several parts (sorry for many changes)!
I needed to add an option for extracting image embeddings without patching, so I decided while adding this feature, make a major refactoring as well.

  • Added a checkbox for no_patching option (only works for images with height == width)
    • With this option checked, the whole image (slice) will be treated as a single patch, and this speeds up the feature extraction and prediction (beneficial for images equal or smaller than 512x512).
  • Made a dataset class: FFImageDataset
    • A torch IterableDataset that yields images / patches in batch.
    • The input image can be a numpy array, or a (large) stack file, or a directory of images.
    • Images will be lazy-loaded using pims (except for numpy array image, which already loaded).
    • This help unifying gui and pipeline script using the same set of functions.
  • Now zarr is being used as the feature storage file (old storages are not compatible anymore, sorry).
    • Mainly because of appending features of zarr array, and more control over compression.
  • run_pipeline.py can be used for large stack prediction on hpc without having a temporary feature store (faster). Also, it has an option for only extracting features into a zarr storage now.
  • Default overlap is now patch_size // 4 as opposed to patch_size // 2 before. This will speed up extraction and prediction (I need to check the result stats with this change).
  • Added more testing and typing.

@mese79 mese79 requested a review from jdeschamps June 15, 2025 18:48
@codecov-commenter
Copy link

codecov-commenter commented Jun 15, 2025

Codecov Report

Attention: Patch coverage is 63.68564% with 134 lines in your changes missing coverage. Please review.

Project coverage is 28.18%. Comparing base (5183db5) to head (d26ae91).

Files with missing lines Patch % Lines
src/featureforest/_segmentation_widget.py 36.02% 87 Missing ⚠️
src/featureforest/_feature_extractor_widget.py 17.39% 19 Missing ⚠️
src/featureforest/models/SAM/adapter.py 12.50% 7 Missing ⚠️
src/featureforest/models/MobileSAM/model.py 40.00% 3 Missing ⚠️
src/featureforest/models/SAM2/model.py 57.14% 3 Missing ⚠️
src/featureforest/utils/data.py 83.33% 3 Missing ⚠️
src/featureforest/utils/extract.py 93.18% 3 Missing ⚠️
src/featureforest/utils/pipeline_prediction.py 88.00% 3 Missing ⚠️
src/featureforest/utils/dataset.py 96.42% 2 Missing ⚠️
src/featureforest/models/Cellpose/adapter.py 50.00% 1 Missing ⚠️
... and 3 more
Additional details and impacted files
@@            Coverage Diff             @@
##             main      #44      +/-   ##
==========================================
+ Coverage   24.85%   28.18%   +3.33%     
==========================================
  Files          40       40              
  Lines        2394     2469      +75     
==========================================
+ Hits          595      696     +101     
+ Misses       1799     1773      -26     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@mese79
Copy link
Member Author

mese79 commented Jun 15, 2025

I need to update documentation as well.

@mese79
Copy link
Member Author

mese79 commented Jun 18, 2025

Using zarr data storage seems to make RF training slow. :(
I would probably revert it back to HDF5.

@mese79 mese79 merged commit e1ed0dc into main Jun 22, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants