Refactor `CCA` ordination classes and add new constrained ordination methods #60

grovduck · 2023-10-02T21:27:33Z

This PR partially addresses #49 once completed. At present, it only addresses CCA and will require further changes based on class designs that have changed. Here are the current changes and some comments which need to be addressed before proceeding.

Changes

Remove all unreferenced properties in OrdConstrained and renamed to ConstrainedOrdination
Change properties to variables when property is not called outside class
Change ContrainedOrdination to abstract class with two abstract methods: _transform_X and _transform_Y.
Refactored CCA to just implement _transform_X and _transform_Y
Added RDA transformer (redundancy analysis) as another estimator that inherits from ConstrainedOrdination
Added GNNRDARegressor as another GNN technique. This needs to be further refactored in order to reduce duplication with GNNRegressor.

Issues

While the design of CCA and RDA is reasonable (and two other ordination methods which follow this pattern can further be implemented), there is way too much duplication in the associated transformers (CCATransformer and RDATransformer) as well as NN estimators (GNNRegressor and GNNRDARegressor). This particular combination (RDA + kNN) is not fully supported in yaImpute and, to my knowledge, hasn't been widely used as a mapping technique. However, RDA can be used as a technique to compare two different multivariate datasets and, as such, can be used as a valid estimator. We may want to think about providing the method (i.e. "cca" vs. "rda" vs. others yet to be implemented) as a hyperparameter to a ConstrainedTransformer.
I think there may be some duplication with the new helper function called is_2d_numeric_numpy_array and constraints of the transformers themselves. We may not need to be as stringent with all checks as the transformer would likely cover most cases.
I collapsed non-referenced properties in what was OrdConstrained (now ConstrainedOrdination) into local variables and only retained the properties that are used by the enclosing transformer. This might limit the utility of these classes for other purposes, but we can always expose these local variables if needed.
I've added a __call__ method to ConstrainedOrdination as a means for setting the needed instance properties. The return value of __call__ is the instance itself which gets passed back to the enclosing transformer (e.g. CCATransformer). Not sure if this is a good pattern or not.

Update (2023.10.04)

CCATransformer has now been replaced with a generic ConstrainedTransformer class that takes a method hyperparameter argument that currently accepts [cca, rda] and defaults to cca.
Likewise GNNRegressor now takes a method hyperparameter and includes ConstrainedTransformer as its transformer.
Removed __call__ from ConstrainedOrdination in favor of placing it all into the __init__ function.

Update (2023.10.17)

Abstract methods ConstrainedOrdination._transform_X and ConstrainedOrdination._transform_Y have been combined into a single abstract method ConstrainedOrdination._set_initialization_attributes which is implemented in subclasses
ConstrainedOrdination._check_inputs now returns X and Y arrays rather than modifying the instance attributes directly
Cleaned up logic in ConstrainedTransformer to ensure that passed constrained_method is a valid option

- Remove all unreferenced properties in `OrdConstrained` and renamed to `ConstrainedOrdination` - Change properties to variables when property is not called outside class - Change `ContrainedOrdination` to abstract class with two abstract methods: `_transform_X` and `_transform_Y`. - Refactored `CCA` to just implement `_transform_X` and `_transform_Y` - Added `RDA` transformer (redundancy analysis) as another estimator that inherits from `ConstrainedOrdination` - Added `GNNRDARegressor` as another GNN technique. This needs to be further refactored in order to reduce duplication with `GNNRegressor`.

grovduck · 2023-10-02T21:33:12Z

@aazuspan, this is a bit of a mess right now, but wanted to get your eyes on it before I went too much further with it. As I mention above, we can implement RDA and a couple of other ordination techniques fairly easily from the new ConstrainedOrdination superclass, but I think we'll want to think about how that cascades up to the transformer and then to the estimator. Please let me know if my explanation isn't clear enough. I've provided very scant documentation in the code as well, so I can enhance that if they are confusing.

Mainly, I wanted to get this committed and pushed so that I can look at your work on #59.

(Looks like I'm failing a documentation check, so I'll probably need a bit of hand-holding to know what I need to do to make that check pass.)

aazuspan

I'll take another pass tomorrow and try to look a little closer at some of the details, but from what I've seen this looks great! Huge improvement! As you mentioned, there's room for reducing duplication, but that hopefully shouldn't be too hard once we figure out the best approach (inheritance vs. parameters or something else).

I've added a call method to ConstrainedOrdination as a means for setting the needed instance properties. The return value of call is the instance itself which gets passed back to the enclosing transformer (e.g. CCATransformer). Not sure if this is a good pattern or not.

This is one of the areas I need to take a closer look at, but was there an advantage to this approach over setting everything through __init__? If I understand right, you would have to re-initialize the ordination to get a different result out of __call__, but I'm not clear if there might be a case where you would want to initialize without calling or call multiple times.

I think there may be some duplication with the new helper function called is_2d_numeric_numpy_array and constraints of the transformers themselves. We may not need to be as stringent with all checks as the transformer would likely cover most cases.

Yeah, I noted something similar in one of my comments. I agree there's some redundancy between the validation checks, although I'm not sure whether it makes more sense to do the checks up front in the transformer or keep that complexity at a lower level in the ordination...

This particular combination (RDA + kNN) is not fully supported in yaImpute and, to my knowledge, hasn't been widely used as a mapping technique.

Cool, it's exciting that we'll be able to offer additional functionality beyond yaImpute! Out of curiosity, how did you generate the test data?

(Looks like I'm failing a documentation check, so I'll probably need a bit of hand-holding to know what I need to do to make that check pass.)

You can ignore that! I had to set up RTD to build docs for all PRs to make sure it was building correctly for #59, but because this branch doesn't have any config for the docs, RTD rightfully complains. I can disable that setting if it gets too annoying (it will trigger every new commit), but otherwise we can just ignore and merge with the failing check until the docs get merged in.

src/sknnr/_gnn_rda.py

src/sknnr/transformers/_cca.py

src/sknnr/transformers/_cca_transformer.py

src/sknnr/transformers/_cca.py

- New file for ConstrainedOrdination superclass - Separate RDA class into new file - Abstract methods ConstrainedOrdination._transform_X and ConstrainedOrdiation._transform_Y now modify `X` and `Y` arrays in place - Create new ConstrainedTransformer class which replaces CCATransformer and takes "method" as a hyperparameter - Modify GNNRegressor to take "method" as a hyperparameter and to use ConstrainedTransformer instead of CCATransformer - Fix tests to correctly "uncollect" parameterizations that are not logical - Rename all "gnn_moscow_*" test data to "gnn_cca_moscow_*" - Miscellaneous changes based on comments

grovduck · 2023-10-04T22:09:56Z

@aazuspan, thanks for the great review. Very helpful comments (even on a pretty messy draft) and I think I've addressed most of your comments although there are a few still left to resolve. I'm also thinking about just having this PR address the CCA/RDA side of things and then I'll start another PR on CCorA. Otherwise, I think this will be too unwieldy.

A couple of comments below, but I mostly have inline responses to your review.

I've added a call method to ConstrainedOrdination as a means for setting the needed instance properties. The return value of call is the instance itself which gets passed back to the enclosing transformer (e.g. CCATransformer). Not sure if this is a good pattern or not.

This is one of the areas I need to take a closer look at, but was there an advantage to this approach over setting everything through __init__? If I understand right, you would have to re-initialize the ordination to get a different result out of __call__, but I'm not clear if there might be a case where you would want to initialize without calling or call multiple times.

No, you're right, there is no reason to have a __call__ method in there. I've removed it with this round and put it back into __init__. It felt a bit "dirty" to do all the actual work in __init__, but I think that's preferable to an unfamiliar API.

You can ignore that! I had to set up RTD to build docs for all PRs to make sure it was building correctly for #59, but because this branch doesn't have any config for the docs, RTD rightfully complains. I can disable that setting if it gets too annoying (it will trigger every new commit), but otherwise we can just ignore and merge with the failing check until the docs get merged in.

Sounds good. My guess is that we'll get #59 in place first and then I'll merge those changes into this branch.

grovduck · 2023-10-04T23:07:38Z

Also, I'd love your review of test_port.py on this go around. We now have two reasons to "uncollect" tests:

If an estimator does not support n_components and a value for n_components was specified
If an estimator does not support method and a value for constrained_method was specified ¹

For 2 though it gets even a bit more tricky in that we want to exclude the GNNRegressor tests that don't pass a constrained_method. So we really need to uncollect if the estimator supports method and a value for constrained_method was not specified.

There is also the issue that only one function can be specified for the pytest.mark.uncollect_if marker, so now there is a new function called estimator_does_not_support_parametrization that "or"s the two other conditions.

You'll also see that I've renamed the test data files called "gnn_moscow_" to "gnn_cca_moscow_" so that the method variable finds the correct dataset when calling load_moscow_stjoes_results. There's definitely some duplicated logic across the tests that I'm not particularly happy with, so if you have suggestions there, I'd love to hear it.

You'll clearly note the inconsistency in naming here - does it make sense to change method to constrained_method in GNNRegressor and ConstrainedTransformer? ↩

aazuspan · 2023-10-04T23:45:19Z

I'm also thinking about just having this PR address the CCA/RDA side of things and then I'll start another PR on CCorA.

Sounds good!

My guess is that we'll get #59 in place first and then I'll merge those changes into this branch.

Yeah, this is a good idea since we'll need to make some changes to the API Reference for the renamed CCATransformer.

Also, I'd love your review of test_port.py on this go around.

Definitely - I'll take a close look at the testing side tomorrow!

grovduck · 2023-10-04T23:57:58Z

Out of curiosity, how did you generate the test data?

I forgot to answer this question. It's an ugly mess, but I created local copies of yaImpute functions (specifically yai and newtargets) and added a "gnn_rda" section that uses a call to vegan's rda instead of cca. If/when I finish this off, I can do the same with the other two methods: capscale and dbrda, but need to be a bit more confident that I know generally how they differ from cca/rda.

src/sknnr/transformers/_constrained_transformer.py

aazuspan · 2023-10-05T19:16:43Z

Also, I'd love your review of test_port.py on this go around.

Thanks for the thorough explanation of the updated uncollecting system! I think your solution looks great, considering all the complexity of different cases we have to cover.

It's possible there's some way we could clean up the data loading and parameterization system now that we have a better idea of what we need, but I don't have any specific ideas, and I'm thinking it's probably not worth the time or effort to do a big redesign when it's ultimately going to be replaced by #42.

There is also the issue that only one function can be specified for the pytest.mark.uncollect_if marker, so now there is a new function called estimator_does_not_support_parametrization that "or"s the two other conditions.

This seems like a good workaround. I suspect we could probably modify pytest_collection_modifyitems to work with multiple uncollect_if markers by iterating over all markers instead of using get_closest_marker, but that seems like a lot of trouble for very little benefit, especially given that our port tests are going to eventually be replaced by #42.

You'll clearly note the inconsistency in naming here - does it make sense to change method to constrained_method in GNNRegressor and ConstrainedTransformer?

I'm fine with either! I don't think that we should change it for the sake of our tests, but it is a more descriptive name, so I don't think it would hurt to switch.

To help with the confusion between the method/constrained_method in the tests, I think we could modify the test code to refer to the estimator name as name or id or something else instead of method, but I don't have a strong feeling on that.

- Combine ConstrainedOrdination._transform_X (and _Y) into new abstractmethod ConstrainedOrdination._set_initialization_attributes which sets instance-level attributes - Change parameter `method` to `constrained_method` in GNNRegressor and ConstrainedTransformer - Set selection of subclass ordination based on dictionary lookup rather than if/else logic

grovduck · 2023-10-05T23:32:15Z

It's possible there's some way we could clean up the data loading and parameterization system now that we have a better idea of what we need, but I don't have any specific ideas, and I'm thinking it's probably not worth the time or effort to do a big redesign when it's ultimately going to be replaced by #42.

I might be confused, but I think we'll still have to have the uncollect system in place to run the right combinations, but we won't have to build the correct estimator name in order to fetch the yaImpute-based files. Is that right? Or are you thinking that we won't test all combinations once we have regression testing in place?

I'm fine with either! I don't think that we should change it for the sake of our tests, but it is a more descriptive name, so I don't think it would hurt to switch.

To help with the confusion between the method/constrained_method in the tests, I think we could modify the test code to refer to the estimator name as name or id or something else instead of method, but I don't have a strong feeling on that.

I decided to change method to constrained_method in GNNRegressor and ConstrainedOrdination even though it felt a bit redundant to have ConstrainedOrdination.constrained_method. I kept the tests more or less the same.

Some other very small changes:

Changed is_2d_numeric_array to have required keyword arguments
Combined zero_sum_rows and zero_sum_columns to zero_sum_vectors with an axis parameter. It too has required keyword arguments.

That feels safer to me, but it might be too strigent?

src/sknnr/transformers/_constrained_transformer.py

aazuspan · 2023-10-06T00:10:04Z

I might be confused, but I think we'll still have to have the uncollect system in place to run the right combinations, but we won't have to build the correct estimator name in order to fetch the yaImpute-based files. Is that right?

I hadn't thought that far ahead, apparently! Of course you're right that we'll still need to parameterize and uncollect.

Changed is_2d_numeric_array to have required keyword arguments

My personal preference with required kwargs is to use them when it isn't clear what the args should be or when the order is arbitrary, so if it were just me I would probably keep arr positional but make axis a required kwarg. But that's just me! It certainly doesn't hurt to require them.

Combined zero_sum_rows and zero_sum_columns to zero_sum_vectors with an axis parameter

Good call!

- Changed first argument to be positional, rather than required keyword - Edit error message to return list rather than dict_keys

grovduck · 2023-10-17T17:22:54Z

My personal preference with required kwargs is to use them when it isn't clear what the args should be or when the order is arbitrary, so if it were just me I would probably keep arr positional but make axis a required kwarg. But that's just me! It certainly doesn't hurt to require them.

I agree with your advice on this. I've reverted these functions such that arr is positional and axis is a required kwarg.

grovduck · 2023-10-17T20:13:27Z

Aaaand it looks like we'll need a docs change already 😉. I'll address that once I decide about adding the other two methods for ConstrainedOrdination.

https://readthedocs.org/projects/sknnr/builds/22259462/

grovduck self-assigned this Oct 2, 2023

grovduck added refactor Code cleanup without changing functionality estimator Related to one or more estimators labels Oct 2, 2023

grovduck added this to the Complete estimators milestone Oct 2, 2023

grovduck changed the title ~~WIP: Refactor ordination classes to remove extraneous details~~ Refactor ordination classes and add new family of constrained ordination techniques Oct 2, 2023

grovduck changed the title ~~Refactor ordination classes and add new family of constrained ordination techniques~~ Refactor ordination classes and add new constrained ordination methods Oct 2, 2023

aazuspan reviewed Oct 2, 2023

View reviewed changes

aazuspan reviewed Oct 3, 2023

View reviewed changes

src/sknnr/transformers/_cca.py Outdated Show resolved Hide resolved

Move CCA and RDA into _constrained_ordination module.

974a2be

aazuspan reviewed Oct 5, 2023

View reviewed changes

src/sknnr/transformers/_constrained_transformer.py Outdated Show resolved Hide resolved

lemma-osu deleted a comment from aazuspan Oct 5, 2023

Decorate ConstrainedOrdination._check_inputs as @staticmethod.

1735983

aazuspan reviewed Oct 5, 2023

View reviewed changes

src/sknnr/transformers/_constrained_transformer.py Outdated Show resolved Hide resolved

grovduck changed the title ~~Refactor ordination classes and add new constrained ordination methods~~ Refactor CCA ordination classes and add new constrained ordination methods Oct 17, 2023

Change signatures of _constrained_ordination helper functions

a5d0d84

- Changed first argument to be positional, rather than required keyword - Edit error message to return list rather than dict_keys

Merge branch 'main' into refactor_ordination

12b6641

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor `CCA` ordination classes and add new constrained ordination methods #60

Refactor `CCA` ordination classes and add new constrained ordination methods #60

grovduck commented Oct 2, 2023 •

edited

Loading

grovduck commented Oct 2, 2023

aazuspan left a comment

grovduck commented Oct 4, 2023

grovduck commented Oct 4, 2023

aazuspan commented Oct 4, 2023

grovduck commented Oct 4, 2023

aazuspan commented Oct 5, 2023

grovduck commented Oct 5, 2023

aazuspan commented Oct 6, 2023 •

edited

Loading

grovduck commented Oct 17, 2023

grovduck commented Oct 17, 2023

Refactor CCA ordination classes and add new constrained ordination methods #60

Are you sure you want to change the base?

Refactor CCA ordination classes and add new constrained ordination methods #60

Conversation

grovduck commented Oct 2, 2023 • edited Loading

Changes

Issues

Update (2023.10.04)

Update (2023.10.17)

grovduck commented Oct 2, 2023

aazuspan left a comment

Choose a reason for hiding this comment

grovduck commented Oct 4, 2023

grovduck commented Oct 4, 2023

Footnotes

aazuspan commented Oct 4, 2023

grovduck commented Oct 4, 2023

aazuspan commented Oct 5, 2023

grovduck commented Oct 5, 2023

aazuspan commented Oct 6, 2023 • edited Loading

grovduck commented Oct 17, 2023

grovduck commented Oct 17, 2023

Refactor `CCA` ordination classes and add new constrained ordination methods #60

Refactor `CCA` ordination classes and add new constrained ordination methods #60

grovduck commented Oct 2, 2023 •

edited

Loading

aazuspan commented Oct 6, 2023 •

edited

Loading