[BUG]: Target Transformation with reversivle transformers leads to faulty scoring #236

samihamdan · 2023-09-07T14:16:28Z

Is there an existing issue for this?

I have searched the existing issues

Current Behavior

Using z-scoring leads to a wrong scoring as probably we evaluate the correctly inverse-transformed prediction to a scored ground truth. You can see that as r2_corr seems fine but r2 shows a high error as its scale sensitive.
See the following image.

Expected Behavior

scoring with inversible scorers scores against the original ground truth

Steps To Reproduce

Environment

anyio==4.0.0
argon2-cffi==23.1.0
argon2-cffi-bindings==21.2.0
arrow==1.2.3
asttokens==2.4.0
async-lru==2.0.4
attrs==23.1.0
Babel==2.12.1
backcall==0.2.0
beautifulsoup4==4.12.2
bleach==6.0.0
certifi==2023.7.22
cffi==1.15.1
charset-normalizer==3.2.0
comm==0.1.4
contourpy==1.1.0
cycler==0.11.0
debugpy==1.6.7.post1
decorator==5.1.1
defusedxml==0.7.1
executing==1.2.0
fastjsonschema==2.18.0
fonttools==4.42.1
fqdn==1.5.1
idna==3.4
ipykernel==6.25.2
ipython==8.15.0
ipython-genutils==0.2.0
ipywidgets==8.1.0
isoduration==20.11.0
jedi==0.19.0
Jinja2==3.1.2
joblib==1.3.2
json5==0.9.14
jsonpointer==2.4
jsonschema==4.19.0
jsonschema-specifications==2023.7.1
julearn==0.3.0
jupyter==1.0.0
jupyter-console==6.6.3
jupyter-events==0.7.0
jupyter-lsp==2.2.0
jupyter_client==8.3.1
jupyter_core==5.3.1
jupyter_server==2.7.3
jupyter_server_terminals==0.4.4
jupyterlab==4.0.5
jupyterlab-pygments==0.2.2
jupyterlab-widgets==3.0.8
jupyterlab_server==2.24.0
kiwisolver==1.4.5
MarkupSafe==2.1.3
matplotlib==3.7.2
matplotlib-inline==0.1.6
mistune==3.0.1
nbclient==0.8.0
nbconvert==7.8.0
nbformat==5.9.2
nest-asyncio==1.5.7
notebook==7.0.3
notebook_shim==0.2.3
numpy==1.25.2
overrides==7.4.0
packaging==23.1
pandas==2.0.3
pandocfilters==1.5.0
parso==0.8.3
patsy==0.5.3
pexpect==4.8.0
pickleshare==0.7.5
Pillow==10.0.0
platformdirs==3.10.0
prometheus-client==0.17.1
prompt-toolkit==3.0.39
psutil==5.9.5
ptyprocess==0.7.0
pure-eval==0.2.2
pycparser==2.21
Pygments==2.16.1
pyparsing==3.0.9
python-dateutil==2.8.2
python-json-logger==2.0.7
pytz==2023.3.post1
PyYAML==6.0.1
pyzmq==25.1.1
qtconsole==5.4.4
QtPy==2.4.0
referencing==0.30.2
requests==2.31.0
rfc3339-validator==0.1.4
rfc3986-validator==0.1.1
rpds-py==0.10.2
scikit-learn==1.3.0
scipy==1.11.2
seaborn==0.12.2
Send2Trash==1.8.2
six==1.16.0
sniffio==1.3.0
soupsieve==2.5
stack-data==0.6.2
statsmodels==0.14.0
terminado==0.17.1
threadpoolctl==3.2.0
tinycss2==1.2.1
tornado==6.3.3
traitlets==5.9.0
tzdata==2023.3
uri-template==1.3.0
urllib3==2.0.4
wcwidth==0.2.6
webcolors==1.13
webencodings==0.5.1
websocket-client==1.6.2
widgetsnbextension==4.0.8

Relevant log output

No response

Anything else?

No response

fraimondo · 2023-09-11T09:37:24Z

Here's where the Extended Scorer transforms the y (always)

julearn/julearn/scoring/available_scorers.py

Lines 178 to 182 in dba3071

    
           y_true = ( 
        
               estimator 
        
               .steps[-1][-1]  # last est 
        
               .transform_target(X_trans, y) 
        
           )

This is where the scorers are "wrapped" only if the extend parameter is true:

julearn/julearn/scoring/available_scorers.py

Lines 161 to 164 in dba3071

    
           def _extend_scorer(scorer, extend): 
        
               if extend: 
        
                   return _ExtendedScorer(scorer) 
        
               return scorer

This is where the check_scoring passes the wrap_score parameter as the extend parameter to _extend_scorer

julearn/julearn/scoring/available_scorers.py

Lines 127 to 160 in dba3071

    
           def check_scoring( 
        
               estimator: EstimatorLike, 
        
               scoring: Union[ScorerLike, str, Callable, List[str], None], 
        
               wrap_score: bool 
        
           ) -> Union[None, ScorerLike, Callable, Dict[str, ScorerLike]]: 
        
               """Check the scoring. 
        
               Parameters 
        
               ---------- 
        
               estimator : EstimatorLike 
        
                   estimator to check the scoring for 
        
               scoring : Union[ScorerLike, str, Callable] 
        
                   scoring to check 
        
               wrap_score : bool 
        
                   Does the score needs to be wrapped 
        
                   to handle non_inverse transformable target pipelines. 
        
               """ 
        
               if scoring is None: 
        
                   return scoring 
        
               if isinstance(scoring, str): 
        
                   scoring = _extend_scorer(get_scorer(scoring), wrap_score) 
        
               if callable(scoring): 
        
                   return _extend_scorer( 
        
                       sklearn_check_scoring(estimator, scoring=scoring), 
        
                       wrap_score) 
        
               if isinstance(scoring, list): 
        
                   scorer_names = typing.cast(List[str], scoring) 
        
                   scoring_dict = {score: _extend_scorer(get_scorer(score), wrap_score) 
        
                                   for score in scorer_names} 
        
                   return _check_multimetric_scoring(  # type: ignore 
        
                       estimator, scoring_dict 
        
                   )

This is where check_scoring is called in run_cross_validation:

julearn/julearn/api.py

Lines 348 to 350 in dba3071

    
           scoring = check_scoring(pipeline, scoring, 
        
                                   wrap_score=wrap_score 
        
                                   )

Here are the two lines that set wrap_score to True, based on the presence of a target transformer:

julearn/julearn/api.py

Line 251 in dba3071

wrap_score = expanded_models[-1]._added_target_transformer

julearn/julearn/api.py

Line 321 in dba3071

wrap_score = pipeline_creator._added_target_transformer

So we always use the extended scorer, even if the y transformer is reversible. And in this specific case, scikit-learn transforms the y_pred back to the original space and julearn transforms the y_true to the transformed space, comparing bananas with potatoes.

harveybi · 2023-10-04T15:10:17Z

Also want to report something I observed before: Although I got the wrong scaled metrics when z-score target, but I found the Pearson correlation values for z-score target or not are the same. Is that expected? Since I found the metrics are always different when I z-score target or not by myself. Also example: https://chat.openai.com/share/f625997a-eb50-40af-9cbb-89d450cdb364

samihamdan added the bug Something isn't working label Sep 7, 2023

fraimondo mentioned this issue Dec 1, 2023

Fix/target_transform_score_log #243

Merged

fraimondo closed this as completed Jan 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Target Transformation with reversivle transformers leads to faulty scoring #236

[BUG]: Target Transformation with reversivle transformers leads to faulty scoring #236

samihamdan commented Sep 7, 2023

fraimondo commented Sep 11, 2023

harveybi commented Oct 4, 2023

[BUG]: Target Transformation with reversivle transformers leads to faulty scoring #236

[BUG]: Target Transformation with reversivle transformers leads to faulty scoring #236

Comments

samihamdan commented Sep 7, 2023

Is there an existing issue for this?

Current Behavior

Expected Behavior

Steps To Reproduce

Environment

Relevant log output

Anything else?

fraimondo commented Sep 11, 2023

harveybi commented Oct 4, 2023