Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor CCA and CCorA classes to reduce repeated operations #49

Open
grovduck opened this issue Sep 11, 2023 · 0 comments
Open

Refactor CCA and CCorA classes to reduce repeated operations #49

grovduck opened this issue Sep 11, 2023 · 0 comments
Assignees
Labels
enhancement New feature or request estimator Related to one or more estimators refactor Code cleanup without changing functionality

Comments

@grovduck
Copy link
Member

At present, CCA and CCorA are implemented as classes that mainly use @property to access attributes. As such, properties call other properties within the class and there are often cases when properties are used repeatedly, which forces recalculation. This is particularly expensive when evaluating methods from the np.linalg module such as singular value decomposition, QR decomposition, and solving for systems of linear equations. We need to ensure that attributes are only calculated once when there is no possibility of change to attribute values.

This issue is closely tied to #47, but we are opening it in a separate issue for a few different reasons:

  1. The transform method on the enclosing estimators (CCATransformer and CCorATransformer) is the method that is called most often and the fixes in Store repeatedly used variables as estimator attributes #47 #48 address the issue of repeated calls to properties by setting estimator attributes during fit. Because the fit method on these transformers calls the classes of interest in this issue (CCA and CCorA), it still suffers from repeated calls to properties, but fit is typically called only once, so delaying the refactor shouldn't be too problematic.
  2. Initially, when we first wrote the code for these classes, we were mostly concerned with the correct porting behavior from yaImpute and wanted to verify that "checkpoints" in the porting process were lining up with output from yaImpute. As such, there are properties that correspond to multiple steps in each ordination process, very few of which we actually use. Now that the porting seems to be behaving correctly, we can treat many of these properties as local variables and only retain the attributes/properties/methods needed for clients and enclosing classes. However, if there is value in retaining some of these "internal" attributes for diagnosing model goodness-of-fit (e.g. charting capabilities for ordinations), we should perhaps retain these as (at the very least) private properties/methods. This will require a bit of research into what attributes are retained for diagnostics, using yaImpute and vegan for inspiration.
  3. For at least CCA, there are actually a number of other ordination techniques (redundancy analysis (RDA), distance-based RDA) that use common code already implemented in this class from which we may choose to create additional estimators. Before optimizing the two existing classes, we'll want to think through the design of any new estimators and reuse as much of the existing code as possible.

Note that none of these reasons really stops us from refactoring out the duplicate calls at present, but changes to the existing code and potential expansion into a family of classes makes it prudent to delay until after the Core Estimators milestone.

@grovduck grovduck added enhancement New feature or request refactor Code cleanup without changing functionality estimator Related to one or more estimators labels Sep 11, 2023
@grovduck grovduck added this to the Complete estimators milestone Sep 27, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request estimator Related to one or more estimators refactor Code cleanup without changing functionality
Projects
None yet
Development

No branches or pull requests

2 participants