Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

from_images: python-native czi support and timepoint and channel multiplexing #822

Merged
merged 8 commits into from
Nov 28, 2022

Conversation

jstriebel
Copy link
Contributor

@jstriebel jstriebel commented Nov 16, 2022

Description:

  • Dataset.from_images now adds a layer per timepoint and per channel (if the data doesn't have 1 or 3 channels).
  • Added python-native CZI support for Dataset.from_images or dataset.add_layer_from_images, without using bioformats.
  • dataset.add_layer_from_images can add a layer per timepoint and per channel when passing allow_multiple_layers=True.
  • Fixes for the pims ImageIO reader are improved.

Issues:

Todos:

  • Updated Changelog
  • Updated Documentation
  • Added / Updated Tests

@jstriebel jstriebel self-assigned this Nov 16, 2022
@jstriebel jstriebel changed the title [WIP] from_images: python-native czi support and timepoint and channel multiplexing from_images: python-native czi support and timepoint and channel multiplexing Nov 16, 2022
webknossos/webknossos/dataset/dataset.py Show resolved Hide resolved
@@ -987,7 +989,10 @@ def add_layer_from_images(
use_bioformats: bool = False,
channel: Optional[int] = None,
timepoint: Optional[int] = None,
czi_channel: Optional[int] = None,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have to admit that this whole method is growing quite large. It might be a good idea to extract it into its own module, but not quite sure.

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a higher level this PR looks good, but it could use lots of code comments, because it's hard to grasp the domain concepts when one isn't really deep in this topic.

Dataset.from_images now adds a layer per timepoint and per channel (if the data doesn't have 1 or 3 channels).

Is there a layer limit? Will this create 100 layers when there are 100 timepoints?

webknossos/tests/dataset/test_add_layer_from_images.py Outdated Show resolved Hide resolved
webknossos/webknossos/dataset/dataset.py Show resolved Hide resolved
Comment on lines 1050 to 1082
possible_layers = pims_images.get_possible_layers()
if (
possible_layers is not None
and truncate_rgba_to_rgb
and len(possible_layers.get("channel", [])) == 4
):
if len(possible_layers) == 1:
possible_layers = None
else:
del possible_layers["channel"]
if possible_layers is not None:
if allow_multiple_layers:
suffix_with_kwargs = {
"__" + "_".join(f"{k}={v}" for k, v in sorted(pairs)): dict(pairs)
for pairs in product(
*(
[(key, value) for value in values]
for key, values in possible_layers.items()
)
)
}
print(suffix_with_kwargs)
else:
suffix_with_kwargs = {"": {}}
warnings.warn(
f"There are dimensions beyond channels and xyz which cannot be read: {possible_layers}. "
"Defaulting to the first one. "
"Please set allow_multiple_layers=True if all of them should be written to different layers, "
"or set specific values as arguments.",
RuntimeWarning,
)
else:
suffix_with_kwargs = {"": {}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add some comments here. I don't know whats going on with suffix_with_kwargs (suffix doesn't seem to be used by the way). Also the semantics of possible_layers == None and the deleted possible_layers["channel"] are not clear to me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(suffix doesn't seem to be used by the way).

suffix is used as a layer name suffix below in self.add_layer(…), I renamed it to layer_name_suffix. Also added many more comments what's happening there and renamed some vars to make it more clear.

Comment on lines 203 to 215
elif len(images.shape) == 5:
if images.shape[2] == 1 or (
images.shape[2] == 3 and images.dtype == np.dtype("uint8")
):
self._img_dims = "cyx"
else:
self._img_dims = "yxc"
self._iter_dim = "z"
self._timepoint = 0
if images.shape[0] > 1:
self._possible_layers["timepoint"] = list(
range(0, images.shape[0])
)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some comments would be nice. jumping from another domain into this code is quite hard 🙈

webknossos/webknossos/dataset/_utils/pims_czi_reader.py Outdated Show resolved Hide resolved
@jstriebel
Copy link
Contributor Author

@philippotto Thanks a lot for the review!

From a higher level this PR looks good, but it could use lots of code comments, because it's hard to grasp the domain concepts when one isn't really deep in this topic.

I added a couple comments, refactored some parts slightly and used better variable names. What do you think about it now? Where is still more explanation necessary?

Dataset.from_images now adds a layer per timepoint and per channel (if the data doesn't have 1 or 3 channels).

Is there a layer limit? Will this create 100 layers when there are 100 timepoints?

There's not limit currently. ds.add_layer_from_images by default only adds a single one, allow_multiple_layers must be set to True explicitly, but Dataset.from_images automatically would add any number of layers right now. Do you think we should set an upper limit? I'd guess only for Dataset.from_images then, which might also be circumvented manually?

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for improving the code readability :) There's one code location left where I had some trouble with understanding. Other than that, it's already looking pretty good :)

if timepoint is not None:
raise RuntimeError(
f"Got {len(images.shape)} axes for the images, "
+ "cannot map to 3D+channels+timepoints."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention that timepoint must be None for this shape to work?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, this error message is misleading. This only is thrown if timepoint is already set, and the image is 5d after removing the time dimension. I improved the error message.

Comment on lines 1063 to 1065
# Below, we iterate over suffix_with_pims_open_kwargs_per_layer in the for-loop
# to add one layer per possible_layer if allow_multiple_layers is True.
# If just a single layer is added, we still add a default value in the dict.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this comment really refer to the immediate code below? because the code below merely creates suffix_with_pims_open_kwargs_per_layer (with a for comprehension) and doesn't loop over it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not directly, changed it to Further below, …

Comment on lines +1068 to +1076
suffix_with_pims_open_kwargs_per_layer = {
"__" + "_".join(f"{k}={v}" for k, v in sorted(pairs)): dict(pairs)
for pairs in product(
*(
[(key, value) for value in values]
for key, values in possible_layers.items()
)
)
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit lost what is happening in detail here. pairs seems to be a list of key-value entries which can be passed as kwargs to pims? However, I don't understand how this is derived from possible_layers.items() and product(...). Maybe you can add a comment which lists an example of what possible_layers.items() could be and what the expected suffix_with_pims_open_kwargs_per_layer value would be then. When input and output is known, it might be easier to follow the code :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good idea, I added an example 👍

RuntimeWarning,
)
else:
suffix_with_pims_open_kwargs_per_layer = {"": {}}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this basically means "call PIMS" without any kwargs and name the new layer like the user requested it (i.e., without suffix), right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exactly 👍 Also added a comment about it.

@jstriebel
Copy link
Contributor Author

@philippotto Thanks for re-checking, I added some more comments ✍️ What do you think about the number of layers, should we limit it to some maximum?

Is there a layer limit? Will this create 100 layers when there are 100 timepoints?

There's not limit currently. ds.add_layer_from_images by default only adds a single one, allow_multiple_layers must be set to True explicitly, but Dataset.from_images automatically would add any number of layers right now. Do you think we should set an upper limit? I'd guess only for Dataset.from_images then, which might also be circumvented manually?

@philippotto
Copy link
Member

What do you think about the number of layers, should we limit it to some maximum?

I think more than 20 layers are not really feasible to render in webKnossos (at the moment). This is why I'd suggest to limit it so that only the first X timepoints are used. There could be an "ignore_layer_limit" parameter if the user wants to circumvent this. But I don't think it's necessary.

@jstriebel
Copy link
Contributor Author

I think more than 20 layers are not really feasible to render in webKnossos (at the moment). This is why I'd suggest to limit it so that only the first X timepoints are used. There could be an "ignore_layer_limit" parameter if the user wants to circumvent this. But I don't think it's necessary.

I added a max_layers keyword-only argument with 20 as the default. Is there anything else missing from your side?

Copy link
Member

@philippotto philippotto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome :)

@jstriebel jstriebel enabled auto-merge (squash) November 28, 2022 12:16
@jstriebel jstriebel merged commit 1986ec3 into master Nov 28, 2022
@jstriebel jstriebel deleted the from-image-czi-layer-multiplexing branch November 28, 2022 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants