-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor CCA
ordination classes and add new constrained ordination methods
#60
base: main
Are you sure you want to change the base?
Conversation
- Remove all unreferenced properties in `OrdConstrained` and renamed to `ConstrainedOrdination` - Change properties to variables when property is not called outside class - Change `ContrainedOrdination` to abstract class with two abstract methods: `_transform_X` and `_transform_Y`. - Refactored `CCA` to just implement `_transform_X` and `_transform_Y` - Added `RDA` transformer (redundancy analysis) as another estimator that inherits from `ConstrainedOrdination` - Added `GNNRDARegressor` as another GNN technique. This needs to be further refactored in order to reduce duplication with `GNNRegressor`.
@aazuspan, this is a bit of a mess right now, but wanted to get your eyes on it before I went too much further with it. As I mention above, we can implement Mainly, I wanted to get this committed and pushed so that I can look at your work on #59. (Looks like I'm failing a documentation check, so I'll probably need a bit of hand-holding to know what I need to do to make that check pass.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll take another pass tomorrow and try to look a little closer at some of the details, but from what I've seen this looks great! Huge improvement! As you mentioned, there's room for reducing duplication, but that hopefully shouldn't be too hard once we figure out the best approach (inheritance vs. parameters or something else).
I've added a call method to ConstrainedOrdination as a means for setting the needed instance properties. The return value of call is the instance itself which gets passed back to the enclosing transformer (e.g. CCATransformer). Not sure if this is a good pattern or not.
This is one of the areas I need to take a closer look at, but was there an advantage to this approach over setting everything through __init__
? If I understand right, you would have to re-initialize the ordination to get a different result out of __call__
, but I'm not clear if there might be a case where you would want to initialize without calling or call multiple times.
I think there may be some duplication with the new helper function called is_2d_numeric_numpy_array and constraints of the transformers themselves. We may not need to be as stringent with all checks as the transformer would likely cover most cases.
Yeah, I noted something similar in one of my comments. I agree there's some redundancy between the validation checks, although I'm not sure whether it makes more sense to do the checks up front in the transformer or keep that complexity at a lower level in the ordination...
This particular combination (RDA + kNN) is not fully supported in yaImpute and, to my knowledge, hasn't been widely used as a mapping technique.
Cool, it's exciting that we'll be able to offer additional functionality beyond yaImpute
! Out of curiosity, how did you generate the test data?
(Looks like I'm failing a documentation check, so I'll probably need a bit of hand-holding to know what I need to do to make that check pass.)
You can ignore that! I had to set up RTD to build docs for all PRs to make sure it was building correctly for #59, but because this branch doesn't have any config for the docs, RTD rightfully complains. I can disable that setting if it gets too annoying (it will trigger every new commit), but otherwise we can just ignore and merge with the failing check until the docs get merged in.
- New file for ConstrainedOrdination superclass - Separate RDA class into new file - Abstract methods ConstrainedOrdination._transform_X and ConstrainedOrdiation._transform_Y now modify `X` and `Y` arrays in place - Create new ConstrainedTransformer class which replaces CCATransformer and takes "method" as a hyperparameter - Modify GNNRegressor to take "method" as a hyperparameter and to use ConstrainedTransformer instead of CCATransformer - Fix tests to correctly "uncollect" parameterizations that are not logical - Rename all "gnn_moscow_*" test data to "gnn_cca_moscow_*" - Miscellaneous changes based on comments
@aazuspan, thanks for the great review. Very helpful comments (even on a pretty messy draft) and I think I've addressed most of your comments although there are a few still left to resolve. I'm also thinking about just having this PR address the A couple of comments below, but I mostly have inline responses to your review.
No, you're right, there is no reason to have a
Sounds good. My guess is that we'll get #59 in place first and then I'll merge those changes into this branch. |
Also, I'd love your review of
For 2 though it gets even a bit more tricky in that we want to exclude the There is also the issue that only one function can be specified for the You'll also see that I've renamed the test data files called "gnn_moscow_" to "gnn_cca_moscow_" so that the Footnotes
|
Sounds good!
Yeah, this is a good idea since we'll need to make some changes to the API Reference for the renamed
Definitely - I'll take a close look at the testing side tomorrow! |
I forgot to answer this question. It's an ugly mess, but I created local copies of |
Thanks for the thorough explanation of the updated uncollecting system! I think your solution looks great, considering all the complexity of different cases we have to cover. It's possible there's some way we could clean up the data loading and parameterization system now that we have a better idea of what we need, but I don't have any specific ideas, and I'm thinking it's probably not worth the time or effort to do a big redesign when it's ultimately going to be replaced by #42.
This seems like a good workaround. I suspect we could probably modify
I'm fine with either! I don't think that we should change it for the sake of our tests, but it is a more descriptive name, so I don't think it would hurt to switch. To help with the confusion between the |
- Combine ConstrainedOrdination._transform_X (and _Y) into new abstractmethod ConstrainedOrdination._set_initialization_attributes which sets instance-level attributes - Change parameter `method` to `constrained_method` in GNNRegressor and ConstrainedTransformer - Set selection of subclass ordination based on dictionary lookup rather than if/else logic
I might be confused, but I think we'll still have to have the uncollect system in place to run the right combinations, but we won't have to build the correct estimator name in order to fetch the yaImpute-based files. Is that right? Or are you thinking that we won't test all combinations once we have regression testing in place?
I decided to change Some other very small changes:
That feels safer to me, but it might be too strigent? |
I hadn't thought that far ahead, apparently! Of course you're right that we'll still need to parameterize and uncollect.
My personal preference with required kwargs is to use them when it isn't clear what the args should be or when the order is arbitrary, so if it were just me I would probably keep
Good call! |
CCA
ordination classes and add new constrained ordination methods
- Changed first argument to be positional, rather than required keyword - Edit error message to return list rather than dict_keys
I agree with your advice on this. I've reverted these functions such that |
Aaaand it looks like we'll need a docs change already 😉. I'll address that once I decide about adding the other two methods for |
This PR partially addresses #49 once completed. At present, it only addresses
CCA
and will require further changes based on class designs that have changed. Here are the current changes and some comments which need to be addressed before proceeding.Changes
OrdConstrained
and renamed toConstrainedOrdination
ContrainedOrdination
to abstract class with two abstract methods:_transform_X
and_transform_Y
.CCA
to just implement_transform_X
and_transform_Y
RDA
transformer (redundancy analysis) as another estimator that inherits fromConstrainedOrdination
GNNRDARegressor
as another GNN technique. This needs to be further refactored in order to reduce duplication withGNNRegressor
.Issues
CCA
andRDA
is reasonable (and two other ordination methods which follow this pattern can further be implemented), there is way too much duplication in the associated transformers (CCATransformer
andRDATransformer
) as well as NN estimators (GNNRegressor
andGNNRDARegressor
). This particular combination (RDA + kNN) is not fully supported inyaImpute
and, to my knowledge, hasn't been widely used as a mapping technique. However, RDA can be used as a technique to compare two different multivariate datasets and, as such, can be used as a valid estimator. We may want to think about providing the method (i.e. "cca" vs. "rda" vs. others yet to be implemented) as a hyperparameter to aConstrainedTransformer
.is_2d_numeric_numpy_array
and constraints of the transformers themselves. We may not need to be as stringent with all checks as the transformer would likely cover most cases.OrdConstrained
(nowConstrainedOrdination
) into local variables and only retained the properties that are used by the enclosing transformer. This might limit the utility of these classes for other purposes, but we can always expose these local variables if needed.__call__
method toConstrainedOrdination
as a means for setting the needed instance properties. The return value of__call__
is the instance itself which gets passed back to the enclosing transformer (e.g.CCATransformer
). Not sure if this is a good pattern or not.Update (2023.10.04)
CCATransformer
has now been replaced with a genericConstrainedTransformer
class that takes amethod
hyperparameter argument that currently accepts [cca
,rda
] and defaults tocca
.GNNRegressor
now takes amethod
hyperparameter and includesConstrainedTransformer
as its transformer.__call__
fromConstrainedOrdination
in favor of placing it all into the__init__
function.Update (2023.10.17)
ConstrainedOrdination._transform_X
andConstrainedOrdination._transform_Y
have been combined into a single abstract methodConstrainedOrdination._set_initialization_attributes
which is implemented in subclassesConstrainedOrdination._check_inputs
now returnsX
andY
arrays rather than modifying the instance attributes directlyConstrainedTransformer
to ensure that passedconstrained_method
is a valid option