Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

Closed
Finesim97 opened this issue Nov 26, 2022 · 2 comments · Fixed by #374
Closed

SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

Finesim97 opened this issue Nov 26, 2022 · 2 comments · Fixed by #374

Comments

@Finesim97
Copy link
Contributor

Describe the bug

In my own evaluation code I used the check for '_predict_risk_score' to see, whether models return their predictions on the time scale or risk scale, but this doesn't work, when the estimator is wrapped in a pipeline.

# Insert your code here that produces the bug.
from sklearn.pipeline import Pipeline
from sksurv.linear_model.aft import IPCRidge
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.preprocessing import OneHotEncoder
from sksurv.base import SurvivalAnalysisMixin


data_x, data_y = load_veterans_lung_cancer()


data_x_prep = OneHotEncoder().fit_transform(data_x)
model_direct = IPCRidge().fit(data_x_prep, data_y)


pipe = Pipeline([('encode', OneHotEncoder()),
                 ('model', IPCRidge())])
pipe.fit(data_x, data_y)


# Are equal
print(model_direct.predict(data_x_prep.head()))
print(pipe.predict(data_x.head()))


# Steal super method
# This does not match, because ...
print(SurvivalAnalysisMixin.score(model_direct, data_x_prep, data_y))
print(SurvivalAnalysisMixin.score(pipe, data_x, data_y))


# ... the property is not patched through
# if this returns true, the scores are treated as being on the time scale
print(not getattr(model_direct, "_predict_risk_score", True))
print(not getattr(pipe, "_predict_risk_score", True))


# The second one should also be true!

Expected Results
A Pipeline object should also have the corresponding property set, as this might break evaluation codes.

Actual Results
The property is not available. It should be possible to just add it to the __init__.py, but I am not sure, how well it works together with the @property decorator. Currently I am finishing my master thesis, but I should be able to work out a PR on the 5th of December while testing the behaviour.

Versions
(Not running the newest version cough)

System:
    python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)  [GCC 9.4.0]
executable: /home/jovyan/master-thesis/env/bin/python
   machine: Linux-5.10.0-15-amd64-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.1.2
          pip: 22.2.2
   setuptools: 65.4.0
        numpy: 1.23.3
        scipy: 1.9.1
       Cython: None
       pandas: 1.5.0
   matplotlib: 3.6.0
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jovyan/master-thesis/env/lib/libopenblasp-r0.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Zen
    num_threads: 48
sksurv: 0.18.0
@sebp
Copy link
Owner

sebp commented Dec 8, 2022

I agree that this could be improved by forwarding _predict_risk_score to the final estimator in the pipeline.

@Finesim97
Copy link
Contributor Author

I agree that this could be improved by forwarding _predict_risk_score to the final estimator in the pipeline.

Perfect, should be able to submit my PR today!

sebp pushed a commit that referenced this issue Jun 10, 2023
Adds conditional property _predict_risk_score to pipeline
if the final estimator of the pipeline has such property.

Fixes #324
sebp pushed a commit that referenced this issue Jun 10, 2023
Adds conditional property _predict_risk_score to pipeline
if the final estimator of the pipeline has such property.

Fixes #324
@sebp sebp closed this as completed in #374 Jun 10, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
2 participants