[BUG] Groups parameter is not handed over to inner cv #213

LeSasse · 2022-12-10T08:07:17Z

Describe the bug
A clear and concise description of what the bug is. Include the error message in detail.

When I try to run a GroupKFold CV in both the inner and outer cv, the inner cv does not seem to get the groups parameter.
Does this need to be handed over additionally in the search_params?

To Reproduce
Steps to reproduce the behavior:

I set up julearn on the new julearn_sk_pandas branch by running:

#!/usr/bin/bash

conda create -n example_venv python=3.9.15 seaborn 
eval "$(conda shell.bash hook)"
conda activate example_venv

# check python version as expected
if [ "$(python3 --version)" != "Python 3.9.15" ]; then
  echo "python version $(python3 --version) is not correct, should be 3.9.15"
  exit 1
else
  echo "$(python3 --version)"
fi

# install julearn
git clone https://github.com/juaml/julearn.git
cd julearn
git checkout julearn_sk_pandas
pip install -e .
cd ..

I then ran the following example in julearn:

from sklearn.model_selection import GroupKFold
from sklearn.datasets import make_regression
import pandas as pd

from julearn.api import run_cross_validation
from julearn.utils import configure_logging

configure_logging(level="INFO")

# prepare data
X, y = make_regression(n_features=10, n_samples=200)
groups = sum([[i] * 40 for i in range(1, 6)], [])

data = pd.DataFrame(X)
X_names = data.columns.astype(str).to_list()
data.columns = X_names

data["target"] = y
data["groups"] = groups


# initialise cv objects
cv_outer = GroupKFold(n_splits=10)
cv_inner = GroupKFold(n_splits=5)

# configure inner cv in search params
model_params = {
    "svm__C": [1, 2],
    "search": "grid",
    "search_params": {
        "cv": cv_inner,
        "verbose": 1,
    }
}

# run
scores, final = run_cross_validation(
    X=X_names,
    y="target",
    data=data,
    model="svm",
    model_params=model_params,
    cv=cv_inner,
    seed=200,
    groups="groups",
    problem_type="regression"
)

This results in the following error:

5 fits failed with the following error:
Traceback (most recent call last):
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_validation.py", line 686, in _fit_and_score
    estimator.fit(X_train, y_train, **fit_params)
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 875, in fit
    self._run_search(evaluate_candidates)
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 1389, in _run_search
    evaluate_candidates(ParameterGrid(self.param_grid))
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_search.py", line 834, in evaluate_candidates
    for (cand_idx, parameters), (split_idx, (train, test)) in product(
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 352, in split
    for train, test in super().split(X, y, groups):
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 85, in split
    for test_index in self._iter_test_masks(X, y, groups):
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 97, in _iter_test_masks
    for test_index in self._iter_test_indices(X, y, groups):
  File "/home/lsasse/miniconda3/envs/example_venv/lib/python3.9/site-packages/sklearn/model_selection/_split.py", line 530, in _iter_test_indices
    raise ValueError("The 'groups' parameter should not be None.")
ValueError: The 'groups' parameter should not be None.

So my guess is that julearn does not automatically hand over the groups parameter (which makes sense since the inner cv need not necessarily have a groups parameter). Is there a way for me to add the groups parameter in the search_params?

Expected behavior
A clear and concise description of what you expected to happen.

Either I should be able to provide the groups parameter for the inner cv in the search_params, or it should be handed over automatically by julearn after some additional checks.

Screenshots
If applicable, add screenshots to help explain your problem.

System (please complete the following information):

OS: [e.g. macOS / Linux / Windows]
Version [e.g. 22]

Additional context
Add any other context about the problem here.

The text was updated successfully, but these errors were encountered:

LeSasse · 2022-12-13T05:39:43Z

I did previously hard code a solution into my local installation like this in api.py for this specific purpose, by adding the fit_params, but it should likely be done better, since this only works in my specific GroupKFold context

    fit_params = {"groups": df_groups.values}
    scores = cross_validate(pipeline, df_X_conf, y, cv=cv_outer,
                            scoring=scorer, groups=df_groups,
                            return_estimator=cv_return_estimator, fit_params=fit_params)

LeSasse · 2023-02-15T09:41:57Z

it was pointed out to me that #141 is quite similar in nature

@samihamdan

* MODIFY Internals initial * new api examples * MODIFY Internals initial * Make JuTransformer work * Hyper Tuning + new API working Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de> * fix for deslib import * Estimators test OK + no more binary/multiclass * model selection tests ok * scoring tests OK * merge conflict * transformers + utils test ok * Fix CI * codespell * small clean up * fix do_scoring tests * Stacked classifiers * For Sami to fix * Fix rebase * Fix pipeline tests * Improve SetColumnTypes * Improve some Tests on X_types * Make Stacking work with filtering * example ready * Make stares shine as regexes * Adjust warnings and errors +testing for pipeline.py * Make models use apply_to and needed_types * fixed print * remove old returned features from julearn * rename and clean julearn.estimators to julearn.models * remove default apply_to from available transformers * Modify prepare clean up a bit * Adjust confound removal, get rid of pick_columns, modify JuModel, JuTransformer, JuBaseEstiamtor * refactor juestimators DataFrameConfoundRemover add ColumnTypes * Modify examples * Modify/Add important Docstrings * Correct: check consitency imports * Example/stratifiedkfoldcv (#181) * Adjust example stratifiedkfoldcv * Integrade ColumnTypes in pipeline * Oops left print * Refactor Structure to have a base module for ColumnTypes and JuBaseClasses * Move JuColumnTransformer and make setting default of apply to in PipelienCreator possible * Fix tests for old versions * Add preprocess inspection tool * Modify inspect * Modify inspect * get the right dependency for CI * Fix lint * Fix codespell * Fix test api, but Sami broke it :( * Trigger CI * Fix coverage * Trigger CI * Adjust dataclass for python=3.11 * Bring back return train scores * updated example to inspect preprocessing steps (#182) * Add docstrings to column_types * Fix for X_types checks in pipeline * add/update docstrings to DataFrameConfoundRemover (#184) * add/update docstrings to DataFrameConfoundRemover * Make some test green again * Adjust preprocess to remove column_types per default and improve import * Fix tests * Bringing back the prepare testing * Sk pandas examples (#185) * updated multiclass classification example * updated inspect SVM example Co-authored-by: More, Shammi <s.more@fz-juelich.de> Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de> Co-authored-by: Fede Raimondo <f.raimondo@fz-juelich.de> * Adjust target transforming * Fix building doc * disable examples * fix stacked classifier example * Fix linter * update: add default apply_to to pipeline creator * LINTEEER * update: combine pandas examples running * Fix codespell * Basic regress example (#188) * fixed regression example * formatted with black * One more example * One more example * Random forest example * Stratified kfold for regression example * Make Scoring with classification work (#191) * Make Scoring with classificication work * flake * Refactor column types * Flake8 * Fix PipelineCreator and problem type issues * Linter * chore: update `pyproject.toml` (#189) * fix: use text instead of file for license to make PyPI happy * fix: correct format for authors and maintainers in pyproject.toml * chore: use lowercase for keywords in pyproject.toml * chore: make package classifier development status to 5 in pyproject.toml Co-authored-by: Fede Raimondo <federaimondo@gmail.com> * Add example transforming the target variable with zscore (#197) * Add example transforming the target variable with zscore * Correcting script with flake8 * Correct white space in title * Adding = to title * fix syntax * Removing white lines to new syntax * Removing white line for title * Remove useless plot Co-authored-by: Paas Oliveros, Lya Katarina <l.paas.oliveros@fz-juelich.de> Co-authored-by: Fede Raimondo <fraimondo@dc.uba.ar> * update confound removal example (#196) * update confound removal example * change name of file * change name of file * Fix confound removal * Spelling Co-authored-by: Fede <fraimondo@proton.me> * Sk pandas examples (#190) * added pipeline creator * updated multiclass classification example * updated inspect SVM example * updated multiclass classification example * formatted dis_plot_groupcv_inspect_svm.py * formatted plot_cm_acc_multiclass.py Co-authored-by: More, Shammi <s.more@fz-juelich.de> Co-authored-by: Fede Raimondo <federaimondo@gmail.com> * Fix hyperparameter tuning and remove "wrapping" from model and transformer name (#200) * Fix Hyperparameter tuning * Remove Wrapper naming convention Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de> * Example pca featsets (#198) * [not working] example for two PCAs * Fix double PCA issue * [working] multiple PCA steps * Fix remainder in name * Add a name in PipelineCreator.add * [working] multiple PCA steps with preprocess showcase * rename * rename * Codespell * Fix doc * Fix scoring Co-authored-by: Fede <fraimondo@proton.me> * Adjust scoring to work with registered once (#203) * Adjust scoring to work with registered once * Revert back the problem type to not be overwritable * Revert back the problem type to not be overwritable * Grouped cv example (#205) * Added new file for grouped CV. * Created example of Grouped CV. * deleted the old file of Grouped CV and created the new file for that. * Target transformer (#215) * WIP: Target transfomers + pipeline * WIP * damn git ignore * The example works * A todo for @samihamdan * Type hints + docstrings in progress * Base module ready * Typing + docstrings of conftest * Tests for inspect preprocessing + more types * Fix linting * model selection with types and docstrings * Fix linter * More tests with docstrings and typing hints * linter * Models sub-package done * Some more tests + typing + docstrings for the pipeline module * tests + docstrings + types for TargetPipelineCreator * Almost done with tranformers.target, but got tired... * Test for target confound remover done * Some WIP... go sick and I don't remember where I was * Finished testing confounds * JuColumnTransformer tested * More fixes * Updated prepare and tests * Updates to the docstings * Basic API test * I think the API is tested for the moment * All test pass, no warnings * Updated examples * fix linting * Codespell * codespell doesn't like the new verb we invented * ENH: allow to set X_types using regexp * some symmetry in the log * Fix tests * Fix for leo * WIP: model comparison * add corrected t_test * Add dependencies + fix other tets * fix test for user-specified splits * fix test * chore: improve ci-docs.yml * chore: improve ci.yml * chore: improve docs-preview.yml * chore: improve docs.yml * chore: improve pypi.yml * chore: isort * chore: black * chore: flake8 fixes * Retry * fix coverage * Add deslib for testing in tox * increase coverage * test api with apply_to * Cbpm sum (#221) * Minimal version of CPM without weighting but sum as default, only old test adjusted * Flake8 * lint * Resolve set_output bug in cbpm * Fix lint * Julearn sk pandas row select (#220) * Add row_select_col * Add test row_select simple * Adjust some logic * Lint * Adjust row selection to be accessible from PipelineCreator * Adjust row selection via column type not type. * Test and fix change_column_types transformer * fix #213 and #141 * add return confound example * fix groups * fix for [ENH] Validate run_cross_validate parameters #178 * fix for #214 * Inspection (#223) * ADD inspection tools * Add get_fitted_params to PipelineInspector * Add bugfix #219 * Fix preprocess import * adjust pipe * adjust pipe * CV Inpsection WIP * [WIP] Inspection tool (needs testing) * sort the index * Fix tests and ling * Add some test and func for _cv * Add inspection and test, fix predict_log_proba remove unnecessary todo notes * Add tests inspector * Add tests * New docs (#228) * Update docs theme to furo * Add content to index * Add new chapter * modify requirements * Add copybutton extension * Add installation related info to get started * Installation was replaced by getting started * setup structure for chapter 4 * Add messy chapter 3 to push uptodate * Number files based on chapter order * Add basic index for chapter 4 and reference * Change numbering to display in real time * Add chapter 4 to toctree * Comment data.rst to make build possible * Work in progress push for Leo * initial cbpm * some formatting * add introductory paragraph for confound removal and some references * add docs on confound removal * fix typo * capitalise titles * Work in progress push Leo * Work in progress push for Leo * Clean up made changes * data section preliminary version done * available pipeline steps * clean the docs * correct layout error * structure and add references * modify indices * Update docs theme to furo * Add content to index * Add new chapter * modify requirements * Add copybutton extension * Add installation related info to get started * Installation was replaced by getting started * setup structure for chapter 4 * Add messy chapter 3 to push uptodate * Number files based on chapter order * Add basic index for chapter 4 and reference * Change numbering to display in real time * Add chapter 4 to toctree * Comment data.rst to make build possible * Work in progress push for Leo * initial cbpm * some formatting * add introductory paragraph for confound removal and some references * add docs on confound removal * fix typo * capitalise titles * Work in progress push Leo * Work in progress push for Leo * Clean up made changes * data section preliminary version done * available pipeline steps * clean the docs * correct layout error * structure and add references * modify indices * WIP: Docs fixing, rearanging API * restructure * Add description of pipeline * Update examples structure + API reference * work in progress * Some refactoring of the what really need know section * fix CI * remove comments --------- Co-authored-by: Vera Komeyer <v.komeyer@fz-juelich.de> Co-authored-by: LeSasse <l.sasse@fz-juelich.de> * refactor index * Add general CV information * Add target preprocessing and refactor * add reference * Add model evaluation illustration pictures * Add further needed scikitlearn links * Correct typo * Add cross-reference link * Add cv-splitter as potential further selcted topic * Add model evaluation content * Fix doc sectioning and order * Fix TOCTREE * One less numbered element in the TOC * Add/viz (#227) * VIZ WIP * add viz to the dependencies for docs * add viz dependencies to tox * fix linter * delete old files * add example back * delete this file * More doc on VIZ * Add stats to viz * Finish with VIZ docs * Fix linter * Fix tests * Adaptations for proper toctree * Rearrange content to solve toctree * Minor clean-ups * Add visualization information * Fix building * added initial version of stacking documentation for julearn * fix markup in cbpm * some slight improvements to stacked model documentation * Adjust tests and fix bugs for dataframe transformers * fix bug no hyperparameters * fix bug hyperparams with target * fix bug hyperparams with target * ADD tests not fitted error target transformer * ADD test returning inspection tool with differen return_estimator settings * ADD test rais errors pipeline * MODIFY insect to allow for gridsearch * initial start with model inspection * initial version of model_inspect.rst * a few small typo fixes * fix bug in which estimators were not picklable due to local function definition inside a function in base/column_types.py * whitespaces * apparently codespell does not like .sav as a file extension and thinks it should be save. Calling it .joblib instead * [CI]: Use towncrier for news (#222) * update: add towncrier configuration in pyproject.toml * docs: refactor whats_new.rst for towncrier integration * docs: improve contributing.rst and add towncrier instructions * docs: improve maintaining.rst and add towncrier instructions * update: add docs/changes/newsfragments * chore: add 151.bugfix changelog * chore: add 170.enh changelog * update: improve docs/Makefile and add towncrier integration * chore: delete Makefile * update: add towncrier to pyproject.toml * [ENH] Allow list of hyperparameter options for tunning (#47) * [WIP] Multiple hyperparam grids * fix linter * increase coverage * add more tests * More tests for merger * add news fragemtn * Add specific gallery section for doc pages that need to be run. * Docs on hyperparameter tuning * some more updates * FIX predict proba * Update dependencies * Fix CV + update first chapter of docs * Fix tests * codespell * fix test for 3.8 maybe * Fix inspector * fix inpsecto * Fix issues with doc building * Improving some titles * Update intesphinx links * Fix inspection + docs * Docs (almost) ready * Fix codespell * Update example of stratification for regression * Update dependencies * Fix the run_cross_validation docstring * Fix test for stats * MODIFY Target Transformation * MODIFY K-fold for regression * Fix for scikit-learn 1.3.0 * Fix whats'new * Maybe ready for release? * Lint * codespell * Add pearson's R scorer * ADD: folds inspector prediction returns target value + avoid regexp when no patterns in X * Set column types even if input is an array * WIP: Config module * Add tests for new config flags * Update X Warns to 5k * linter * update doc * Typo in DOC --------- Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de> Co-authored-by: Leonard Sasse <l.sasse@fz-juelich.de> Co-authored-by: Shammi270787 <more.shammi@gmail.com> Co-authored-by: More, Shammi <s.more@fz-juelich.de> Co-authored-by: antogeo <antogeo@gmail.com> Co-authored-by: Synchon Mandal <synchon@protonmail.com> Co-authored-by: Lya K. Paas <lya.paas@gmail.com> Co-authored-by: Paas Oliveros, Lya Katarina <l.paas.oliveros@fz-juelich.de> Co-authored-by: Vera Komeyer <75420154+verakye@users.noreply.github.com> Co-authored-by: kaurao <kaustubh.patil@gmail.com> Co-authored-by: Kimia Nazarzadeh <60739692+KNazarzadeh@users.noreply.github.com> Co-authored-by: Vera Komeyer <v.komeyer@fz-juelich.de>

LeSasse added the bug Something isn't working label Dec 10, 2022

fraimondo added a commit that referenced this issue Apr 6, 2023

fix #213 and #141

1a750cb

fraimondo closed this as completed Apr 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Groups parameter is not handed over to inner cv #213

[BUG] Groups parameter is not handed over to inner cv #213

LeSasse commented Dec 10, 2022

LeSasse commented Dec 13, 2022 •

edited

LeSasse commented Feb 15, 2023

[BUG] Groups parameter is not handed over to inner cv #213

[BUG] Groups parameter is not handed over to inner cv #213

Comments

LeSasse commented Dec 10, 2022

LeSasse commented Dec 13, 2022 • edited

LeSasse commented Feb 15, 2023

LeSasse commented Dec 13, 2022 •

edited