Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for user to specify reduced number of components (axes) #33 #40

Merged
merged 12 commits into from
Jul 4, 2023

Conversation

grovduck
Copy link
Member

This feature introduces a new mixin (ComponentReducerMixin) to handle dimension reduction for transformers which create "axes" as linear combinations of their X features (presently CCATransformer and CCorATransformer). For these transformers, there is an upper limit on the number of axes that can be used, either by matrix rank or using tests to determine significance of axes. The user can specify a hyperparameter (n_components) on these transformers to use fewer than the maximum number of axes, which in turn changes the n-dimensional space for NN finding.

A few relevant changes:

  • CCA and CCorA ordination classes now have a max_components property to ensure consistency so that either can be used in conjunction with ComponentReducerMixin
  • CCA has been refactored to combine the coefficients and axis_weights into a single matrix, which has been moved into the projector method (similar to CCorA).
  • Both CCA and CCorA's projector method now accommodates a parameter for n_components
  • Tests have been added to ensure that transformed X data has the correct number of components and raise errors if the user specified value is outside the range [0 - max_components].

This feature introduces a new mixin (ComponentReducerMixin) to handle
dimension reduction for transformers which create "axes" as linear
combinations of their X features (presently CCATransformer and
CCorATransformer).  For these transformers, there is an upper limit
on the number of axes that can be used, either by matrix rank or
using tests to determine significance of axes.  The user can
specify a hyperparameter (n_components) on these transformers
to use fewer than the maximum number of axes, which in turn changes
the n-dimensional space for NN finding.

A few relevant changes:
- CCA and CCorA ordination classes now have a "max_components" property
  to ensure consistency so that either can be used in conjunction
  with ComponentReducerMixin
- CCA has been refactored to combine the coefficients and axis_weights
  into a single matrix, which has been moved into the projector method
  (similar to CCorA).
- Both CCA and CCorA's projector method now accommodates a parameter
  for n_components
- Tests have been added to ensure that transformed X data has the
  correct number of components and raise errors if the user specified
  value is outside the range 0 - max_components.
@grovduck grovduck requested a review from aazuspan June 28, 2023 21:34
@grovduck
Copy link
Member Author

@aazuspan, I think this is still pretty rough, but wanted to get some feedback from you in terms of design and specifically the tests. I'll add in some specific comments inlined with the code, but here are the general points:

  • There is a new mixin (ComponentReducerMixin) that handles the dimension reduction. I followed the naming of sklearn's PCA and use n_components to describe the number of axes to retain. Note that I'm much more used to using the term "axes" instead of "components", but did so to remain consistent. This mixin doesn't do much, but I've introduced it to reduce code duplication for both CCATransformer and CCorATransformer.

  • I've tweaked the CCA and CCorA classes to make them more consistent, specifically adding a max_components property and a projector method that can take n_components. This takes some of the logic out of their respective transformers.

  • The tests are pretty janky. I've got two tests in there to make sure that a transformed X is reduced in dimensionality correctly and that it throws an error when the user-specified value is out of range. But I don't like the parameterization that I have for n_components in either test. For the first, I've used [None, 0, 5] because all of those are valid values for this specific test data. However, because of axis significance tests in CCorA, it already reduces the number of axes from 28 to 5. Do you think we should first retrieve this value (e.g. CCorA(X, y).max_components) in the test before running the test. How could you parameterize such an expression? For the second, I'm just testing a negative value and an absurdly high value which both fall out of range, but not psyched about the constant values again.

  • I haven't yet created the data to test the estimators with reduced transformers, but I'd like to include those as part of this PR. These would be porting tests again, but I'll need to do this outside of yaImpute.

tests/test_transformers.py Outdated Show resolved Hide resolved
Copy link
Contributor

@aazuspan aazuspan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking great, @grovduck! I left some relatively minor comments, but I think overall this is looking like a really clean design. I think the mixin approach worked great.

I followed the naming of sklearn's PCA and use n_components to describe the number of axes to retain. Note that I'm much more used to using the term "axes" instead of "components", but did so to remain consistent.

I think this was the right call!

I haven't yet created the data to test the estimators with reduced transformers, but I'd like to include those as part of this PR. These would be porting tests again, but I'll need to do this outside of yaImpute.

Great, it makes sense to me to include that in this PR too. I assume that means the ability to specify n_components when instantiating estimators would happen here, too?

I'll spend a little more time thinking about test cases tomorrow and let you know if anything else occurs to me.

tests/test_transformers.py Outdated Show resolved Hide resolved
src/sknnr/transformers/_base.py Outdated Show resolved Hide resolved
src/sknnr/transformers/_cca.py Show resolved Hide resolved
src/sknnr/transformers/_cca_transformer.py Outdated Show resolved Hide resolved
src/sknnr/transformers/_base.py Outdated Show resolved Hide resolved
tests/test_transformers.py Outdated Show resolved Hide resolved
src/sknnr/transformers/_base.py Show resolved Hide resolved
src/sknnr/transformers/_cca.py Outdated Show resolved Hide resolved
src/sknnr/transformers/_ccora.py Outdated Show resolved Hide resolved
tests/test_transformers.py Show resolved Hide resolved
- replace estimator attributes cca_ and ccora_ with ordination_
- remove _n_features_out method and call self.n_components_ from
  get_feature_names_out
- remove estimator attribute projector_ and call ordination's projector
  from transform
- grouped CCATransformer and CCorATransformer into TEST_ORDINATION_TRANSFORMERS
- added match clause to pytest raises
@grovduck
Copy link
Member Author

As always, a great review catching lots of little errors. Still need to work on estimator tests, but I think I've addressed most of your comments.

Copy link
Contributor

@aazuspan aazuspan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look great! I just noted a few things that I missed on my first pass.

src/sknnr/transformers/_cca_transformer.py Outdated Show resolved Hide resolved
src/sknnr/transformers/_cca.py Outdated Show resolved Hide resolved
@grovduck
Copy link
Member Author

Great, it makes sense to me to include that in this PR too. I assume that means the ability to specify n_components when instantiating estimators would happen here, too?

@aazuspan, I've got the bones of this for GNNRegressor ready to go - both test data and the required additions to the estimator. Before I dump a bunch of new (reduced) test files in, I thought we should discuss what makes sense from a testing perspective. I've run GNN with n=8 components (using pynnmap) and have those test files ready:

gnn_moscow_ref_distances_k5_c8.csv
gnn_moscow_ref_neighbors_k5_c8.csv

Do we want to change code to accommodate these new files. Is this a one-off for testing or will we want to keep these data files around? I like the "reassurance" that we're getting the correct values, but I don't know how long we want to retain all the porting info? Do you have any opinions on this?

@aazuspan
Copy link
Contributor

aazuspan commented Jun 29, 2023

Do we want to change code to accommodate these new files. Is this a one-off for testing or will we want to keep these data files around? I like the "reassurance" that we're getting the correct values, but I don't know how long we want to retain all the porting info? Do you have any opinions on this?

This is a great question. I originally thought the port tests would be temporary, but I agree with you that it's nice to know that everything is staying accurate, as well as functional, so I'm fully in favor of keeping them around in some form or another.

Eventually, I suppose we will reach a point where we're confident everything is working and could theoretically test against previous versions of sknnr, although I'm not sure exactly how that would happen in practice. EDIT: pytest-regressions and syrupy both look like interesting tools that might solve this problem for us in the future by testing against automatically generated results...

For now, I guess I would lean towards integrating n_components into our testing system. Maybe it could just be another argument to tests.datasets.load_moscow_stjoes_results that gets appended (if it's not None) to the file names?

- Using 3 components as that will be testable with both GNN and MSN
- Add n_components optional keyword to load_moscow_stjoes_results
- Test for both kneighbors and predict on GNNRegressor and MSNRegressor
- Pass n_components from estimator to corresponding transformer
@grovduck
Copy link
Member Author

For now, I guess I would lean towards integrating n_components into our testing system. Maybe it could just be another argument to tests.datasets.load_moscow_stjoes_results that gets appended (if it's not None) to the file names?

OK, I've got a straw man for testing in there using this suggestion. At present, I'm only testing against a 3-component version of both GNNRegressor and MSNRegressor. The test files for GNNRegressor were created "independently" through pynnmap, so I'm fairly confident that tests are working as intended. For MSNRegressor, I "cheated" by actually running sknnr with reduced components to get the neighbor, distance, and predicted files, so there is no mystery why those tests are passing. I felt a little dirty about doing it this way, but I felt that it was going to take a while to do through yaImpute and I don't have it set up in pynnmap. And it will be good for future tests as well. I'm 99.9% sure that the test files I produced really are the correct answer (famous last words ...)

A couple of other things to note:

  1. GNNRegressor and MSNRegressor are looking way too similar to not be refactored into a common base class (at some point). We could easily abstract out the transformer in the base class and specify these in the derived classes. Probably good to tackle when I do the CCA and CCorA reworking.
  2. In test_port.py, functions test_kneighbors_reduced_components and test_predict_reduced_components, I have n_components = 3 hard-wired in there. Should we parametrize this even if we don't have other n_components to test?

Sorry this PR is turning into such a beast - I appreciate your help!

@grovduck
Copy link
Member Author

EDIT: pytest-regressions and syrupy both look like interesting tools that might solve this problem for us in the future by testing against automatically generated results...

Oooo, super cool. I'd want to play around with these packages to fully understand them, but they seem totally applicable for our use. Nice find!

Copy link
Contributor

@aazuspan aazuspan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look great - no notes from me! At first glance I was hoping we could parameterize n_components as [None, 3] and run it with all estimators to avoid duplicating those tests, but I'm guessing you probably already thought of that. I poked around a little while and couldn't come up with any good solution (apparently there's no way to silently skip a test in pytest), so I'd say go ahead and merge if you're ready!

For MSNRegressor, I "cheated" by actually running sknnr with reduced components to get the neighbor, distance, and predicted files, so there is no mystery why those tests are passing. I felt a little dirty about doing it this way, but I felt that it was going to take a while to do through yaImpute and I don't have it set up in pynnmap. And it will be good for future tests as well. I'm 99.9% sure that the test files I produced really are the correct answer (famous last words ...)

Good to know! If you're confident in the answers, that's good enough for me :)

GNNRegressor and MSNRegressor are looking way too similar to not be refactored into a common base class (at some point). We could easily abstract out the transformer in the base class and specify these in the derived classes. Probably good to tackle when I do the CCA and CCorA reworking.

I had the same thought, but I'm happy to postpone that for a future PR.

In test_port.py, functions test_kneighbors_reduced_components and test_predict_reduced_components, I have n_components = 3 hard-wired in there. Should we parametrize this even if we don't have other n_components to test?

Nah, I think this is a reasonable approach. If we eventually update the tests to use dynamically generated results with one of those tools we could parameterize over a range of n_components, but this looks good for now!

Oooo, super cool. I'd want to play around with these packages to fully understand them, but they seem totally applicable for our use. Nice find!

Yeah, I'm pretty excited about those! Just glancing through the docs, it sounds like that should simplify out test dataset system a lot.

@aazuspan aazuspan added enhancement New feature or request estimator Related to one or more estimators labels Jun 30, 2023
@aazuspan aazuspan added this to the Core Estimators milestone Jun 30, 2023
@grovduck
Copy link
Member Author

grovduck commented Jul 3, 2023

At first glance I was hoping we could parameterize n_components as [None, 3] and run it with all estimators to avoid duplicating those tests, but I'm guessing you probably already thought of that. I poked around a little while and couldn't come up with any good solution (apparently there's no way to silently skip a test in pytest), so I'd say go ahead and merge if you're ready!

Again, you give me way too much credit. I hadn't thought of removing that duplication, but it makes sense. You're way ahead of me here, but is this what you were thinking?

@pytest.mark.parametrize(
    "result", ESTIMATOR_RESULTS.items(), ids=ESTIMATOR_RESULTS.keys()
)
@pytest.mark.parametrize("n_components", [None, 3], ids=["full", "reduced"])
def test_kneighbors(result, n_components):
    """Test that the ported estimators identify the correct neighbors and distances."""
    method, estimator = result

    if method not in ORDINATION_ESTIMATOR_RESULTS.keys() and n_components is not None:
        pytest.skip("These methods do not support variable n_components.")
    dataset = load_moscow_stjoes_results(method=method, n_components=n_components)

    hyperparams = (
        {"n_neighbors": 5}
        if method not in ORDINATION_ESTIMATOR_RESULTS.keys()
        else {"n_neighbors": 5, "n_components": n_components}
    )

    est = estimator(**hyperparams).fit(
        dataset.X_train, dataset.y_train
    )

    dist, nn = est.kneighbors(return_dataframe_index=True)
    assert_array_equal(nn, dataset.ref_neighbors)
    assert_array_almost_equal(dist, dataset.ref_distances, decimal=3)

    dist, nn = est.kneighbors(dataset.X_test, return_dataframe_index=True)
    assert_array_equal(nn, dataset.trg_neighbors)
    assert_array_almost_equal(dist, dataset.trg_distances, decimal=3)

It's a little verbose and has too much logic, but I'm happy to change to this if you like this form better. It's not silently skipping (as you say), but is that OK? I read through that thread you sent and it looks like the optimal solution would be to filter out the combinations that don't make sense (e.g. ("raw", 3)) as part of collection rather than in the test itself.

@aazuspan
Copy link
Contributor

aazuspan commented Jul 3, 2023

Yes, that's almost exactly the solution I came up with. It feels odd to me to have tests that will always be skipped, but maybe that's not a big deal? It is nice not to duplicate the logic of those two tests.

it looks like the optimal solution would be to filter out the combinations that don't make sense (e.g. ("raw", 3)) as part of collection rather than in the test itself.

Correct me if I'm wrong, but my understanding is that this would require combining the two parameterized args (result and n_components) into a single arg that contains all valid combinations of result and n_components, which seems to me like it would hurt readability and flexibility. Or the other alternative seemed to be monkey patching some new functionality into pytest, which I also don't love.

Do you have any preference? I think I lean towards either leaving it as-is with duplicated tests or conditionally skipping the invalid tests, and I could live with either.

@grovduck
Copy link
Member Author

grovduck commented Jul 3, 2023

Correct me if I'm wrong, but my understanding is that this would require combining the two parameterized args (result and n_components) into a single arg that contains all valid combinations of result and n_components, which seems to me like it would hurt readability and flexibility. Or the other alternative seemed to be monkey patching some new functionality into pytest, which I also don't love

In the thread you sent, I think this post was the closest I saw to a suggested solution. I'm playing around with it a bit - it looks like you use the pytest_collection_modifyitems hook to filter out the combinations you don't want. Would you consider that too magic?

If that's too much, I think I'll probably keep the duplicated tests in there, just because it sounds like the consensus is that skip is meant to denote a test that may pass at *some point *.

@grovduck
Copy link
Member Author

grovduck commented Jul 3, 2023

Here's a worked example, copying most of what was in that post:

conftest.py
------------
import pytest


def pytest_configure(config):
    config.addinivalue_line(
        "markers",
        "uncollect_if(*, func): function to unselect tests from parametrization",
    )


def pytest_collection_modifyitems(config: pytest.Config, items: list[pytest.Item]):
    removed = []
    kept = []
    for item in items:
        m = item.get_closest_marker("uncollect_if")
        if m:
            func = m.kwargs["func"]
            if func(**item.callspec.params):
                removed.append(item)
                continue
        kept.append(item)
    if removed:
        config.hook.pytest_deselected(items=removed)
        items[:] = kept

test_port.py
------------
def uncollect_if(result, n_components):
    method, _ = result
    if n_components is not None and method not in ORDINATION_ESTIMATOR_RESULTS.keys():
        return True


@pytest.mark.uncollect_if(func=uncollect_if)
@pytest.mark.parametrize(
    "result", ESTIMATOR_RESULTS.items(), ids=ESTIMATOR_RESULTS.keys()
)
@pytest.mark.parametrize("n_components", [None, 3], ids=["full", "reduced"])
def test_kneighbors(result, n_components):
    """Test that the ported estimators identify the correct neighbors and distances."""
    method, estimator = result

    dataset = load_moscow_stjoes_results(method=method, n_components=n_components)

    hyperparams = (
        {"n_neighbors": 5}
        if method not in ORDINATION_ESTIMATOR_RESULTS.keys()
        else {"n_neighbors": 5, "n_components": n_components}
    )

    est = estimator(**hyperparams).fit(dataset.X_train, dataset.y_train)

    dist, nn = est.kneighbors(return_dataframe_index=True)
    assert_array_equal(nn, dataset.ref_neighbors)
    assert_array_almost_equal(dist, dataset.ref_distances, decimal=3)

    dist, nn = est.kneighbors(dataset.X_test, return_dataframe_index=True)
    assert_array_equal(nn, dataset.trg_neighbors)
    assert_array_almost_equal(dist, dataset.trg_distances, decimal=3)

with the corresponding output:

PS D:\code\scikit-learn-knn-regression> hatch run test:all tests/test_port.py::test_kneighbors -v
============================================================================= test session starts =============================================================================
platform win32 -- Python 3.10.1, pytest-7.3.1, pluggy-1.0.0 -- C:\Users\gregorma\AppData\Local\hatch\env\virtual\scikit-learn-knn-regression\47ek3DpI\test\Scripts\python.EXE
cachedir: .pytest_cache
rootdir: D:\code\scikit-learn-knn-regression
configfile: pyproject.toml
plugins: cov-4.0.0
collected 10 items / 3 deselected / 7 selected

tests/test_port.py::test_kneighbors[full-raw] PASSED                                                                                                                     [ 14%]
tests/test_port.py::test_kneighbors[full-euclidean] PASSED                                                                                                               [ 28%]
tests/test_port.py::test_kneighbors[full-mahalanobis] PASSED                                                                                                             [ 42%]
tests/test_port.py::test_kneighbors[full-gnn] PASSED                                                                                                                     [ 57%]
tests/test_port.py::test_kneighbors[full-msn] PASSED                                                                                                                     [ 71%]
tests/test_port.py::test_kneighbors[reduced-gnn] PASSED                                                                                                                  [ 85%]
tests/test_port.py::test_kneighbors[reduced-msn] PASSED                                                                                                                  [100%]

======================================================================= 7 passed, 3 deselected in 4.03s ======================================================================= 

@aazuspan
Copy link
Contributor

aazuspan commented Jul 3, 2023

Ah, I glossed over that post because I was a bit confused by it, to be honest. I'm not very familiar with the pytest hook system, but it does look like it's probably the cleanest solution, overall. It does feel slightly magic, but it also seems like it's idiomatic for pytest.

If you want to adapt that for our use case, I'm happy with that solution!

EDIT: Didn't notice your last post!

That looks great, thanks for implementing it! I don't understand exactly how the pytest_collection_modifyitems hook works, but I think I get the gist of it. Tests reporting as deselected rather than skipped makes more sense to me, and it's nice that they don't show up in the verbose output like the skipped tests did. I think this is a good strategy, and just have a few code suggestions for you to consider below:

  1. Probably add a docstring to pytest_collection_modifyitems to explain what it's doing and link to the Github post.
  2. In conftest.py, can we move the marker configuration into pyproject.toml? Defining the config programmatically feels a little odd, outside of a plugin approach where it needs to be patched in after-the-fact.
  3. In test_port.py, can we rename uncollect_if to something specific, like estimator_does_not_support_ncomponents (or something shorter)? It would be even better if it could be inlined, but I imagine that would be too long to be easily readable?
  4. In the same function above, what do you think about replacing the method not in ORDINATION_ESTIMATOR_RESULTS.keys() check with not hasattr(method, "n_components")? I think that accomplishes the same thing and would allow us to get rid of ORDINATION_ESTIMATOR_RESULTS, but I don't have a strong feeling so let me know if there's a downside I'm not considering.

- Use new pytest mark to specify how to exclude parameterization
  combinations
- Remove duplicated logic in tests
@grovduck
Copy link
Member Author

grovduck commented Jul 3, 2023

Arrgh, I'm losing steam on this PR ... A couple of blockers/annoyances:

  1. I tried using the new dictionary union operator (|), but it fails on 3.8. Then I tried this syntax instead d = dict("a"=1); d.update({"b", 2}), but now sourcery complains on pre-commit that I should be using the union operator. I think I'm almost ready to ditch sourcery as a pre-commit because it doesn't look like you can specify options via pyproject.toml and, because I'm using python 3.10 in development, it always wants me to incorporate the new features (same thing goes for the walrus operator in one case).

  2. We can filter out the impossible combinations of (method, n_components), but in the test we also need to change the hyperpameters based on the estimator type as well. This means we're writing the same check (i.e. hasattr(estimator(), "n_components") both in the new estimator_does_not_support_n_components function as well as the test itself.

  3. In order to use estimator_does_not_support_n_components in both test_kneighbors and test_predict, the function signature has to accomodate result and n_components from test_kneighbors, but also the weighted parameter from test_predict. For now, the function signature looks like def estimator_does_not_support_n_components(result, n_components, **kwargs), but I'm not sure that's the right approach.

I could use your feedback on these issues. The first is a blocker for actually pushing the code and it might be easier to provide feedback on the second two after I commit/push my new changes.

@aazuspan
Copy link
Contributor

aazuspan commented Jul 3, 2023

I tried using the new dictionary union operator (|), but it fails on 3.8. Then I tried this syntax instead d = dict("a"=1); d.update({"b", 2}), but now sourcery complains on pre-commit that I should be using the union operator. I think I'm almost ready to ditch sourcery as a pre-commit because it doesn't look like you can specify options via pyproject.toml and, because I'm using python 3.10 in development, it always wants me to incorporate the new features (same thing goes for the walrus operator in one case).

Wow, dict unions are new to me! I feel like I always miss out on the cool new features by maintaining backwards compatible packages...

Back to the issue at hand, I'm open to dropping sourcery from our pre-commit. I think I've accepted a few corrections from it that I was ambivalent about, but it's also pretty slow to run and the fact that any new contributor would have to sign up for an account to run our checks is a little off-putting. With the lack of configuration and the Python version issues, I'd say it's doing more harm than good.

As far as how we remove it, I'm totally open to you just disabling it in this PR and then cleaning up the rest of the pre-commit stuff in #41. That's probably not strictly the cleanest way to do it, but I think the alternative of doing #41 first would end up blocking this for a while and probably create a headache when you have to merge in all the changes that will stem from running the pyupgrade rules. Unless it feels too janky to you to combine this PR with a change to the pre-commit, that's the lazy approach I would suggest.

As recommended, I'll take a closer look at the other questions once we get to a resolution here!

@grovduck
Copy link
Member Author

grovduck commented Jul 3, 2023

As far as how we remove it, I'm totally open to you just disabling it in this PR and then cleaning up the rest of the pre-commit stuff in #41. That's probably not strictly the cleanest way to do it, but I think the alternative of doing #41 first would end up blocking this for a while and probably create a headache when you have to merge in all the changes that will stem from running the pyupgrade rules. Unless it feels too janky to you to combine this PR with a change to the pre-commit, that's the lazy approach I would suggest.

Excellent, all sourcery code is currently commented out. With #41, I'll leave it to you for the removal (we'll probably want to remove the key as well?). black did some spacing formatting on .pre-commit-config.yaml, so hopefully that jives with how it formats on your end.

To your earlier code suggestions:

  1. Probably add a docstring to pytest_collection_modifyitems to explain what it's doing and link to the Github post.

I've done this and added a link to the specific comment in the pytest issue

  1. In conftest.py, can we move the marker configuration into pyproject.toml? Defining the config programmatically feels a little odd, outside of a plugin approach where it needs to be patched in after-the-fact.

Cool, I didn't know you could do this here, but I think I've done it correctly.

  1. In test_port.py, can we rename uncollect_if to something specific, like estimator_does_not_support_ncomponents (or something shorter)? It would be even better if it could be inlined, but I imagine that would be too long to be easily readable?

I actually liked your naming suggestion for the function, so I kept it. I didn't inline it because we can use it for both tests (I think you meant to inlining the function within the test itself, right?)

  1. In the same function above, what do you think about replacing the method not in ORDINATION_ESTIMATOR_RESULTS.keys() check with not hasattr(method, "n_components")? I think that accomplishes the same thing and would allow us to get rid of ORDINATION_ESTIMATOR_RESULTS, but I don't have a strong feeling so let me know if there's a downside I'm not considering.

I've done this as well and now ORDINATION_ESTIMATOR_RESULTS is no longer needed.

Hopefully, the two questions/concerns I raised earlier will now make sense. Thanks for keeping with me on this one - it's turned out to be a bit epic!

Copy link
Contributor

@aazuspan aazuspan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all looks great! I suggested one minor change, but otherwise (knock on wood) I think this should be ready to merge!

I actually liked your naming suggestion for the function, so I kept it. I didn't inline it because we can use it for both tests (I think you meant to inlining the function within the test itself, right?)

That's what I meant, but wasn't thinking about the fact it would be duplicated. I agree with your choice.

We can filter out the impossible combinations of (method, n_components), but in the test we also need to change the hyperpameters based on the estimator type as well. This means we're writing the same check (i.e. hasattr(estimator(), "n_components") both in the new estimator_does_not_support_n_components function as well as the test itself.

Yeah, no way to repurpose the estimator_does_not_support_n_components since it takes the weird result argument instead of an estimator. I'm okay with this, and maybe more importantly can't think of any way to improve it!

In order to use estimator_does_not_support_n_components in both test_kneighbors and test_predict, the function signature has to accomodate result and n_components from test_kneighbors, but also the weighted parameter from test_predict. For now, the function signature looks like def estimator_does_not_support_n_components(result, n_components, **kwargs), but I'm not sure that's the right approach.

I see! That function signature is a little ugly, but I think your solution makes the most sense and it's pretty clear that it will just ignore any additional arguments.

Comment on lines 29 to 30
if n_components is not None and not hasattr(estimator(), "n_components"):
return True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactoring nit:

Suggested change
if n_components is not None and not hasattr(estimator(), "n_components"):
return True
return n_components is not None and not hasattr(estimator(), "n_components")

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I told you I was getting tired! 😉

Co-authored-by: Aaron Zuspan
<50475791+aazuspan@users.noreply.github.com>
Copy link
Contributor

@aazuspan aazuspan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

@grovduck grovduck merged commit b7eb9a7 into fb_add_estimators Jul 4, 2023
@grovduck grovduck deleted the user_reduced_axes branch July 4, 2023 00:29
aazuspan added a commit that referenced this pull request Jul 5, 2023
- Remove sourcery from pre-commit per discussion in #40
- Replace flake8 and isort with ruff
- Add new checks via ruff including pyupgrade

This also brings everything into compliance with the new checks:

- Replace nested context managers with single block
- Remove unused param from `test_estimators_support_dataframes`
- Move noqas in docstrings. Ruff wants them placed at the end of
the multiline string, not on the offending line.
- Remove unused noqas.
aazuspan added a commit that referenced this pull request Jul 12, 2023
- Remove sourcery from pre-commit per discussion in #40
- Replace flake8 and isort with ruff
- Add new checks via ruff including pyupgrade
- Bring all code into compliance with new checks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request estimator Related to one or more estimators
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Support use of reduced number of axes in CCorATransformer, CCATransformer
2 participants