SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

Finesim97 · 2022-11-26T09:50:25Z

Describe the bug

In my own evaluation code I used the check for '_predict_risk_score' to see, whether models return their predictions on the time scale or risk scale, but this doesn't work, when the estimator is wrapped in a pipeline.

# Insert your code here that produces the bug.
from sklearn.pipeline import Pipeline
from sksurv.linear_model.aft import IPCRidge
from sksurv.datasets import load_veterans_lung_cancer
from sksurv.preprocessing import OneHotEncoder
from sksurv.base import SurvivalAnalysisMixin


data_x, data_y = load_veterans_lung_cancer()


data_x_prep = OneHotEncoder().fit_transform(data_x)
model_direct = IPCRidge().fit(data_x_prep, data_y)


pipe = Pipeline([('encode', OneHotEncoder()),
                 ('model', IPCRidge())])
pipe.fit(data_x, data_y)


# Are equal
print(model_direct.predict(data_x_prep.head()))
print(pipe.predict(data_x.head()))


# Steal super method
# This does not match, because ...
print(SurvivalAnalysisMixin.score(model_direct, data_x_prep, data_y))
print(SurvivalAnalysisMixin.score(pipe, data_x, data_y))


# ... the property is not patched through
# if this returns true, the scores are treated as being on the time scale
print(not getattr(model_direct, "_predict_risk_score", True))
print(not getattr(pipe, "_predict_risk_score", True))


# The second one should also be true!

Expected Results
A Pipeline object should also have the corresponding property set, as this might break evaluation codes.

Actual Results
The property is not available. It should be possible to just add it to the __init__.py, but I am not sure, how well it works together with the @property decorator. Currently I am finishing my master thesis, but I should be able to work out a PR on the 5th of December while testing the behaviour.

Versions
(Not running the newest version cough)

System:
    python: 3.9.9 | packaged by conda-forge | (main, Dec 20 2021, 02:41:03)  [GCC 9.4.0]
executable: /home/jovyan/master-thesis/env/bin/python
   machine: Linux-5.10.0-15-amd64-x86_64-with-glibc2.35

Python dependencies:
      sklearn: 1.1.2
          pip: 22.2.2
   setuptools: 65.4.0
        numpy: 1.23.3
        scipy: 1.9.1
       Cython: None
       pandas: 1.5.0
   matplotlib: 3.6.0
       joblib: 1.2.0
threadpoolctl: 3.1.0

Built with OpenMP: True

threadpoolctl info:
       user_api: openmp
   internal_api: openmp
         prefix: libgomp
       filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scikit_learn.libs/libgomp-a34b3233.so.1.0.0
        version: None
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jovyan/master-thesis/env/lib/libopenblasp-r0.3.21.so
        version: 0.3.21
threading_layer: pthreads
   architecture: Zen
    num_threads: 48

       user_api: blas
   internal_api: openblas
         prefix: libopenblas
       filepath: /home/jovyan/master-thesis/env/lib/python3.9/site-packages/scipy.libs/libopenblasp-r0-9f9f5dbc.3.18.so
        version: 0.3.18
threading_layer: pthreads
   architecture: Zen
    num_threads: 48
sksurv: 0.18.0

The text was updated successfully, but these errors were encountered:

sebp · 2022-12-08T21:33:04Z

I agree that this could be improved by forwarding _predict_risk_score to the final estimator in the pipeline.

Finesim97 · 2022-12-09T10:12:00Z

I agree that this could be improved by forwarding _predict_risk_score to the final estimator in the pipeline.

Perfect, should be able to submit my PR today!

Adds conditional property _predict_risk_score to pipeline if the final estimator of the pipeline has such property. Fixes #324

sebp added the enhancement label Dec 8, 2022

Finesim97 mentioned this issue Dec 9, 2022

Added conditional property to expose time scale predictions #327

Closed

5 tasks

sebp pushed a commit that referenced this issue Jun 10, 2023

Expose time scale of prediction models in pipeline

6028d43

Adds conditional property _predict_risk_score to pipeline if the final estimator of the pipeline has such property. Fixes #324

sebp mentioned this issue Jun 10, 2023

Expose _predict_risk_score in Pipeline #374

Merged

5 tasks

sebp pushed a commit that referenced this issue Jun 10, 2023

Expose time scale of prediction models in pipeline

4cf0981

Adds conditional property _predict_risk_score to pipeline if the final estimator of the pipeline has such property. Fixes #324

sebp closed this as completed in #374 Jun 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

Finesim97 commented Nov 26, 2022

sebp commented Dec 8, 2022

Finesim97 commented Dec 9, 2022

SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

SciKit-Learn Pipeline not patched with "_predict_risk_score" #324

Comments

Finesim97 commented Nov 26, 2022

sebp commented Dec 8, 2022

Finesim97 commented Dec 9, 2022