Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify how to use different stages of VisCy #43

Closed
edyoshikun opened this issue Aug 30, 2023 · 3 comments · Fixed by #45
Closed

Clarify how to use different stages of VisCy #43

edyoshikun opened this issue Aug 30, 2023 · 3 comments · Fixed by #45
Assignees
Labels
documentation Improvements or additions to documentation

Comments

@edyoshikun
Copy link
Contributor

I was expecting that if you call HCSDataModule().setup('fit') the DataModule should fit the data and re-write the normalization dictionary. However, when this is called twice in a row, we get:

KeyError                                  Traceback (most recent call last)
[/home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py](https://vscode-remote+ssh-002dremote-002bec2-002d3-002d144-002d39-002d9-002eus-002deast-002d2-002ecompute-002eamazonaws-002ecom.vscode-resource.vscode-cdn.net/home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py) in line 14
      [34](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=33) # %%
      [35](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=34) data_module = HCSDataModule(
      [36](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=35)     input_data_path,
      [37](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=36)     source_channel="Phase3D",
   (...)
     [45](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=44)     augment=False,  # Turn off augmentation for now.
     [46](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=45) )
---> [47](file:///home/eduardoh/vs_data/1-test_dataloader/edhirata_dataloader.py?line=46) data_module.setup("fit")

File [~/VisCy/viscy/light/data.py:404](https://vscode-remote+ssh-002dremote-002bec2-002d3-002d144-002d39-002d9-002eus-002deast-002d2-002ecompute-002eamazonaws-002ecom.vscode-resource.vscode-cdn.net/home/eduardoh/VisCy/~/VisCy/viscy/light/data.py:404), in HCSDataModule.setup(self, stage)
    [402](file:///home/eduardoh/VisCy/viscy/light/data.py?line=401) dataset_settings = dict(channels=channels, z_window_size=self.z_window_size)
    [403](file:///home/eduardoh/VisCy/viscy/light/data.py?line=402) if stage in ("fit", "validate"):
--> [404](file:///home/eduardoh/VisCy/viscy/light/data.py?line=403)     self._setup_fit(dataset_settings)
    [405](file:///home/eduardoh/VisCy/viscy/light/data.py?line=404) elif stage == "test":
    [406](file:///home/eduardoh/VisCy/viscy/light/data.py?line=405)     self._setup_test(dataset_settings)

File [~/VisCy/viscy/light/data.py:429](https://vscode-remote+ssh-002dremote-002bec2-002d3-002d144-002d39-002d9-002eus-002deast-002d2-002ecompute-002eamazonaws-002ecom.vscode-resource.vscode-cdn.net/home/eduardoh/VisCy/~/VisCy/viscy/light/data.py:429), in HCSDataModule._setup_fit(self, dataset_settings)
    [428](file:///home/eduardoh/VisCy/viscy/light/data.py?line=427) def _setup_fit(self, dataset_settings: dict):
--> [429](file:///home/eduardoh/VisCy/viscy/light/data.py?line=428)     plate, normalize_transform = self._setup_eval(dataset_settings)
    [430](file:///home/eduardoh/VisCy/viscy/light/data.py?line=429)     fit_transform = self._fit_transform()
    [431](file:///home/eduardoh/VisCy/viscy/light/data.py?line=430)     train_transform = Compose(
    [432](file:///home/eduardoh/VisCy/viscy/light/data.py?line=431)         [normalize_transform] + self._train_transform() + fit_transform
    [433](file:///home/eduardoh/VisCy/viscy/light/data.py?line=432)     )

File [~/VisCy/viscy/light/data.py:424](https://vscode-remote+ssh-002dremote-002bec2-002d3-002d144-002d39-002d9-002eus-002deast-002d2-002ecompute-002eamazonaws-002ecom.vscode-resource.vscode-cdn.net/home/eduardoh/VisCy/~/VisCy/viscy/light/data.py:424), in HCSDataModule._setup_eval(self, dataset_settings)
    [420](file:///home/eduardoh/VisCy/viscy/light/data.py?line=419) if self.normalize_source:
    [421](file:///home/eduardoh/VisCy/viscy/light/data.py?line=420)     norm_keys += self.source_channel
    [422](file:///home/eduardoh/VisCy/viscy/light/data.py?line=421) normalize_transform = NormalizeSampled(
    [423](file:///home/eduardoh/VisCy/viscy/light/data.py?line=422)     norm_keys,
--> [424](file:///home/eduardoh/VisCy/viscy/light/data.py?line=423)     plate.zattrs["normalization"],
    [425](file:///home/eduardoh/VisCy/viscy/light/data.py?line=424) )
    [426](file:///home/eduardoh/VisCy/viscy/light/data.py?line=425) return plate, normalize_transform

File [~/conda/envs/viscy/lib/python3.10/site-packages/zarr/attrs.py:73](https://vscode-remote+ssh-002dremote-002bec2-002d3-002d144-002d39-002d9-002eus-002deast-002d2-002ecompute-002eamazonaws-002ecom.vscode-resource.vscode-cdn.net/home/eduardoh/VisCy/~/conda/envs/viscy/lib/python3.10/site-packages/zarr/attrs.py:73), in Attributes.__getitem__(self, item)
     [72](file:///home/eduardoh/conda/envs/viscy/lib/python3.10/site-packages/zarr/attrs.py?line=71) def __getitem__(self, item):
---> [73](file:///home/eduardoh/conda/envs/viscy/lib/python3.10/site-packages/zarr/attrs.py?line=72)     return self.asdict()[item]

KeyError: 'normalization'
@edyoshikun
Copy link
Contributor Author

I just realized that you have convenience functions in generate_normalization_metadata() that will generate these pieces of metadata required by HCSDataModule().setup('fit')

@edyoshikun
Copy link
Contributor Author

edyoshikun commented Aug 30, 2023

One has to first do the preprocessing by:
python -m viscy.cli.preprocess_script --config preprocess.yml

preprocess.yml

zarr_dir: /hpc/projects/comp.micro/mantis/2023_08_09_HEK_PCNA_H2B/2-phase3D/pcna_rac1_virtual_staining_b1_redo_1/phase3D.zarr
preprocessing:
  normalize:
    channel_ids: [0]
    block_size: 32
    num_workers: 16

@mattersoflight
Copy link
Member

mattersoflight commented Aug 30, 2023

Thanks for finding the problem and fixing it, @edyoshikun! In VisCy, preprocessing is not orchestrated by lightning.
Reading this may be useful too.

@ziw-liu please write a short markdown for each step in the pipeline. It can live in the docs folder. I am turning this into a documentation issue.

Also, the confluence pages you have written here will be more useful in the docs of this repo.

@mattersoflight mattersoflight changed the title HCSDataModule not letting me overwrite when fitting Clarify how to use different stages of VisCy Aug 30, 2023
@mattersoflight mattersoflight added the documentation Improvements or additions to documentation label Aug 30, 2023
@mattersoflight mattersoflight linked a pull request Aug 31, 2023 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants