Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ENH] Allow list of hyperparameter options for tuning #47

Merged
merged 6 commits into from May 25, 2023

Conversation

fraimondo
Copy link
Contributor

This is from a scikit-learn example:

tuned_parameters = [{'kernel': ['rbf'], 'gamma': [1e-3, 1e-4],
                     'C': [1, 10, 100, 1000]},
                    {'kernel': ['linear'], 'C': [1, 10, 100, 1000]}]

clf = GridSearchCV(
    SVC(), tuned_parameters, scoring='%s_macro' % score
)

This example creates two grids, one for the rbf kernel, exploring gamma, and one for the linear kernel, without testing gamma, because it does not make sense.

In order to do this, we need to set the following model_params:

model_params = {
    'svm__C': [1, 10, 100, 1000],
    'svm__kernel': ['rbf', linear'],
    'svm__gamma': [1e-3, 1e-4]
}

However, we will be testing different gamma options for the linear kernel (useless).

We need to find a way of specifying sets of parameters to be tested.

@fraimondo fraimondo changed the title Allow list of hyperparameter options for tunning [ENH] Allow list of hyperparameter options for tunning Dec 10, 2020
@fraimondo fraimondo added the enhancement New feature or request label Dec 10, 2020
@fraimondo fraimondo added question Further information is requested Priority: Low Low Priority Issue needs thinking We need more discussion around this topic labels Jul 21, 2022
@fraimondo fraimondo added this to the v0.3.0 milestone Jul 21, 2022
@fraimondo
Copy link
Contributor Author

What about checking if model_params is a list?

@fraimondo fraimondo removed this from the v0.3.0 milestone Apr 6, 2023
@fraimondo
Copy link
Contributor Author

I've been thinking on this issue for quite some time now. My goal is to allow for "lists of dicts" as scikit-learn does, but with the current user-friendly API.

The solution I come up with, so far, is to repeat the name parameter.

For example, the following scikit-learn hyperparameters definition

model = Pipeline([("svm", SVC())])
model_params = [
    {
        'svm__C': [1, 10, 100, 1000],
        'svm__kernel': 'linear',
    },
    {
        'svm__C': [1, 10, 100, 1000],
        'svm__kernel': 'rbf',
        'svm__gamma': [1e-3, 1e-4]
    },

Could be created using julearn's PipelineCreator:

creator = PipelineCreator(problem_type="classification")
creator.add("svm", name="svm", C=[1, 10, 100, 1000], kernel="linear")
creator.add("svm", name="svm", C=[1, 10, 100, 1000], kernel="rbf", gamma=[1e-3, 1e-4])

This could also allow to tune different learning algorithms:

creator = PipelineCreator(problem_type="classification")
creator.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7])
creator.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

And even allow to do so at more than one step at the time:

creator = PipelineCreator(problem_type="classification")
creator.add("zscore", name="preprocess")
creator.add("robustscaler", name="preprocess")
creator.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7])
creator.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

This should test combination of 2 different preprocessing with the 2 different learning algorithms.

However, one potential downside happens when more than one step has more than one hyperparameter set. For example, lets try two different preprocessing, one for the RF and one for the SVM

model = Pipeline([("preprocess", StandardScaler()), ("model", SVC())])
model_params = [
    {
        'preprocess': StandardScaler(),
        'model': SVC(),
        'model__C': [1, 10, 100, 1000],
        'model__kernel': 'linear',
    },
    {
        'preprocess': RobustScaler(),
        'model': RandomForestClassifier(),
        'model__n_estimators': 1000,
        'model__max_depth': [4, 5, 6, 7]
    },

Using the proposed method, this is not possible. In this special case, my solution would be (given that it is a completely deterministic approach), to allow to pick elements in the list of dicts that we want to try, as part of the search params parameter.

For example, this:

creator = PipelineCreator(problem_type="classification")
creator.add("zscore", name="preprocess")
creator.add("robustscaler", name="preprocess")
creator.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7])
creator.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

would generate:

model = Pipeline([("preprocess", StandardScaler()), ("model", SVC())])
model_params = [
    {
        'preprocess': StandardScaler(),
        'model': SVC(),
        'model__C': [1, 10, 100, 1000],
        'model__kernel': 'linear',
    },
    {
        'preprocess': RobustScaler(),
        'model': SVC(),
        'model__C': [1, 10, 100, 1000],
        'model__kernel': 'linear',
    },
    {
        'preprocess': StandardScaler(),
        'model': RandomForestClassifier(),
        'model__n_estimators': 1000,
        'model__max_depth': [4, 5, 6, 7]
    },
    {
        'preprocess': RobustScaler(),
        'model': RandomForestClassifier(),
        'model__n_estimators': 1000,
        'model__max_depth': [4, 5, 6, 7]
    },

So we could do something like:

creator = PipelineCreator(problem_type="classification")
creator.add("zscore", name="preprocess")
creator.add("robustscaler", name="preprocess")
creator.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7])
creator.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

search_params = {"picks": [0, 3], search="grid"}

which should create a GridSearchCV but constrain the list of dicts only to the first and last elements on the list.

@samihamdan @LeSasse any thoughts?

@LeSasse
Copy link
Collaborator

LeSasse commented May 16, 2023

I quite like this idea. I think that this would cover at least my use cases well enough. Another idea I could imagine (but tell me if that makes sense) is to simply give a list of PipelineCreators to run_cross_validation. Then for each PipelineCreator a separate grid is built.

@fraimondo
Copy link
Contributor Author

give a list of PipelineCreators to run_cross_validation. Then for each PipelineCreator a separate grid is built.

Something like this?

creator1 = PipelineCreator(problem_type="classification")
creator1.add("zscore", name="preprocess")
creator1.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

creator2 = PipelineCreator(problem_type="classification")
creator2.add("robustscaler", name="preprocess")
creator2.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7])

and then model=[creator1, creator2]

I thought about this option too. There is a restriction though: all the pipeline creators must have the same step names in the same order. Because in scikit-learn, the pipeline is only one, it's the "parameters" that change (including the model). Can you, in scikit-learn, tune to use or not a preprocessing step?

Also, while this solution simplifies the "hyperparameters of multiple steps in pairs issue", it make simpler cases more complex.

For example, the "SVM RBF with gamma issue" would look like:

creator1 = PipelineCreator(problem_type="classification")
creator1.add("zscore", name="preprocess")
creator1.add("svm", name="svm", C=[1, 10, 100, 1000], kernel="linear")

creator2 = PipelineCreator(problem_type="classification")
creator2.add("zscore", name="preprocess")
creator2.add("svm", name="svm", C=[1, 10, 100, 1000], kernel="rbf", gamma=[1e-3, 1e-4])

And this will be even more complicated if more preprocessing steps are added, as the user will be forced to copy/paste the preprocessing.

But this is just my opinion, maybe there's something I'm not seeing.

@LeSasse
Copy link
Collaborator

LeSasse commented May 16, 2023

tune to use or not a preprocessing step? I guess you can using the "passthrough" string. But for sure I can see a lot of code replication as an issue when building these type of models.

I think your proposed solution is better though, especially since those simple cases will make up most of the use cases.

@fraimondo
Copy link
Contributor Author

I'll try to come up with an implementation to test this.

@samihamdan
Copy link
Collaborator

I like the first option but I am not 100% sure about the picks argument.
Maybe I misunderstand it, but what I would do to create max flexibility would be the following:
How about making it more explicit to what dict they should be added.
Lets say we add a param called tuning_set_level (tmp name, we will find a better one) to step. This defines what combinations belong together and makes accidental usage less probable. It can be either 'all' which is default or a number starting from 0. Everything with all can only have this one step with this one name. Else these are number of the dict they should be added to.
Your use-case would still work. Also this should prevent that people forget to change the name of a step and then have a step optional as an accident.

e.g. this

creator = PipelineCreator(problem_type="classification")
creator.add("zscore", name="preprocess", tuning_set_level=0)
creator.add("robustscaler", name="preprocess",tuning_set_level=1)
creator.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7], tuning_set_level=0)
creator.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear", tuning_set_level=1)

This also means that I can add multiple preprocessing steps only for one model but not the other. If a parameter at one level is not specified its 'passthrough'

e.g. the following will only add a pca to the svm but not the rf

creator = PipelineCreator(problem_type="classification")
creator.add("zscore", name="preprocess", tuning_set_level=0)
creator.add("robustscaler", name="preprocess",tuning_set_level=1)
creator.add("pca" ,tuning_set_level=1)
creator.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7], tuning_set_level=0)
creator.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear", tuning_set_level=1)

@fraimondo
Copy link
Contributor Author

fraimondo commented May 17, 2023

I see your point. Indeed the "picks" is not the clearest idea, but it still works.

The immediate problem that arises from your approach is that, as stated before, the pipeline steps (names) must be constant. That means, that all the steps must be defined for all "tuning levels".

Your first example is fine, but your second is not.

in this case, PCA is only defined for level 1. The default would be, as you say, to set it as "passthrough". For me, this creates both a complication in the internal logic as well as the creation of the pipeline.

First, because the pipeline will have as many steps added, with several passthrough. So then inspecting models and debugging issues will not be clear/easy to the user. I mean, there is no reason to set the name anymore, your second example could be written as:

creator = PipelineCreator(problem_type="classification")
creator.add("zscore", tuning_set_level=0)
creator.add("robustscaler", tuning_set_level=1)
creator.add("pca", tuning_set_level=1)
creator.add("rf",  n_estimators=1000, max_depth=[4, 5, 6, 7], tuning_set_level=0)
creator.add("svm", C=[1, 10, 100, 1000], kernel="linear", tuning_set_level=1)

Second, because it will not be straightforward to read these 6 lines of code and understanding what the model is (think not only of 2 levels, but 3, or more, with some steps present in more than one level).

In this sense, I do prefer @LeSasse version, in which two or more pipeline creators are passed:

creator1 = PipelineCreator(problem_type="classification")
creator1.add("zscore", name="preprocess")
creator1.add("rf", name="model", n_estimators=1000, max_depth=[4, 5, 6, 7])

creator2 = PipelineCreator(problem_type="classification")
creator2.add("robustscaler", name="preprocess")
creator2.add("pca" )
creator2.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

creator3 = PipelineCreator(problem_type="classification")
creator3.add("robustscaler", name="preprocess")
creator3.add("svm", name="model", C=[1, 10, 100, 1000], kernel="linear")

The internal logic is the same: build one common pipeline with a list of N dictionaries, setting "passthrough" to the non-specified steps, where N is the number of levels/creators passed.

They both still have the "name" problem with the model inspection, as the "joint pipeline" could have as many steps as the sum of the number of steps from all the pipelines.

One alternative could be to not set "passthrough" as default, but ask the user to specify pipelines in which all the named steps match and are defined (explicit "passthrough" instead of implicit).

I'm not married to any implementation, I'm just brainstorming on this issue.

We can, easily, support both implementations (@LeSasse and mine). How? Simply by splitting creators with repeated names into N creators and then following @LeSasse idea. These way it:
A) Allows for simple definition of parameter sets like in the RBF vs linear kernel issue (my idea)
B) Allows to split complex definitions into explicit sets (@LeSasse), while keeping code readability and model interpretation easy.

@fraimondo
Copy link
Contributor Author

So I solved the issue:

  1. Now the model parameter can be a list of pipeline creators.
  2. The pipeline creator, when having the same explicit name on two or more steps will be considered as "for hyperparameter tunning".
  3. The pipeline creator knows how to split themselves into several pipeline creators in case one or more steps have repeated names.
  4. There is a merger component that knows how to merge scikit-learn Pipelines and (GridSearchCV or RandomizedSearchCV) objects into one searcher with a list of grids.

So now in the API, all the creators are split. Then each creator is converted into a pipeline. If there are more than 1 pipeline, then they are merged.

@fraimondo fraimondo removed question Further information is requested needs thinking We need more discussion around this topic labels May 24, 2023
@codecov
Copy link

codecov bot commented May 24, 2023

Codecov Report

Merging #47 (b6e93b2) into julearn_sk_pandas (19af973) will increase coverage by 0.69%.
The diff coverage is 98.21%.

Impacted file tree graph

@@                  Coverage Diff                  @@
##           julearn_sk_pandas      #47      +/-   ##
=====================================================
+ Coverage              92.75%   93.44%   +0.69%     
=====================================================
  Files                     53       55       +2     
  Lines                   2304     2579     +275     
  Branches                 424      474      +50     
=====================================================
+ Hits                    2137     2410     +273     
  Misses                    97       97              
- Partials                  70       72       +2     
Flag Coverage Δ
docs 100.00% <ø> (ø)
julearn 93.44% <98.21%> (+0.69%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
julearn/pipeline/merger.py 94.44% <94.44%> (ø)
julearn/pipeline/pipeline_creator.py 91.15% <96.10%> (+1.62%) ⬆️
julearn/api.py 98.42% <100.00%> (+0.19%) ⬆️
julearn/pipeline/test/test_merger.py 100.00% <100.00%> (ø)
julearn/pipeline/test/test_pipeline_creator.py 100.00% <100.00%> (ø)
julearn/prepare.py 97.10% <100.00%> (+0.20%) ⬆️
julearn/transformers/dataframe/set_column_types.py 100.00% <100.00%> (ø)

@github-actions
Copy link

github-actions bot commented May 25, 2023

PR Preview Action v1.4.4
Preview removed because the pull request was closed.
2023-05-25 14:37 UTC

Copy link
Collaborator

@samihamdan samihamdan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the approach and have nothing big to comment. As long as coverage is increased and this solves the todos I would approve this.

julearn/api.py Outdated Show resolved Hide resolved
julearn/pipeline/merger.py Show resolved Hide resolved
julearn/pipeline/merger.py Outdated Show resolved Hide resolved
julearn/pipeline/merger.py Show resolved Hide resolved
julearn/pipeline/pipeline_creator.py Outdated Show resolved Hide resolved
@fraimondo fraimondo changed the title [ENH] Allow list of hyperparameter options for tunning [ENH] Allow list of hyperparameter options for tuning May 25, 2023
@fraimondo fraimondo merged commit 099435b into julearn_sk_pandas May 25, 2023
20 checks passed
@fraimondo fraimondo deleted the add/hyperparam_grids branch May 25, 2023 14:35
fraimondo added a commit that referenced this pull request Jul 19, 2023
* MODIFY Internals initial

* new api examples

* MODIFY Internals initial

* Make JuTransformer work

* Hyper Tuning + new API working

Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de>

* fix for deslib import

* Estimators test OK + no more binary/multiclass

* model selection tests ok

* scoring tests OK

* merge conflict

* transformers + utils test ok

* Fix CI

* codespell

* small clean up

* fix do_scoring tests

* Stacked classifiers

* For Sami to fix

* Fix rebase

* Fix pipeline tests

* Improve SetColumnTypes

* Improve some Tests on X_types

* Make Stacking work with filtering

* example ready

* Make stares shine as regexes

* Adjust warnings and errors +testing for pipeline.py

* Make models use apply_to and needed_types

* fixed print

* remove old returned features from julearn

* rename and clean julearn.estimators to julearn.models

* remove default apply_to from available transformers

* Modify prepare clean up a bit

* Adjust confound removal, get rid of pick_columns, modify JuModel, JuTransformer, JuBaseEstiamtor

* refactor juestimators DataFrameConfoundRemover add ColumnTypes

* Modify examples

* Modify/Add important Docstrings

* Correct: check consitency imports

* Example/stratifiedkfoldcv (#181)

* Adjust example stratifiedkfoldcv

* Integrade ColumnTypes in pipeline

* Oops left print

* Refactor Structure to have a base module for ColumnTypes and JuBaseClasses

* Move JuColumnTransformer and make setting default of apply to in PipelienCreator possible

* Fix tests for old versions

* Add preprocess inspection tool

* Modify inspect

* Modify inspect

* get the right dependency for CI

* Fix lint

* Fix codespell

* Fix test api, but Sami broke it :(

* Trigger CI

* Fix coverage

* Trigger CI

* Adjust dataclass for python=3.11

* Bring back return train scores

* updated example to inspect preprocessing steps (#182)

* Add docstrings to column_types

* Fix for X_types checks in pipeline

* add/update docstrings to DataFrameConfoundRemover (#184)

* add/update docstrings to DataFrameConfoundRemover

* Make some test green again

* Adjust preprocess to remove column_types per default and improve import

* Fix tests

* Bringing back the prepare testing

* Sk pandas examples (#185)

* updated multiclass classification example

* updated inspect SVM example

Co-authored-by: More, Shammi <s.more@fz-juelich.de>
Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de>
Co-authored-by: Fede Raimondo <f.raimondo@fz-juelich.de>

* Adjust target transforming

* Fix building doc

* disable examples

* fix stacked classifier example

* Fix linter

* update: add default apply_to to pipeline creator

* LINTEEER

* update: combine pandas examples running

* Fix codespell

* Basic regress example (#188)

* fixed regression example

* formatted with black

* One more example

* One more example

* Random forest example

* Stratified kfold for regression example

* Make Scoring with classification work (#191)

* Make Scoring with classificication work

* flake

* Refactor column types

* Flake8

* Fix PipelineCreator and problem type issues

* Linter

* chore: update `pyproject.toml` (#189)

* fix: use text instead of file for license to make PyPI happy

* fix: correct format for authors and maintainers in pyproject.toml

* chore: use lowercase for keywords in pyproject.toml

* chore: make package classifier development status to 5 in pyproject.toml

Co-authored-by: Fede Raimondo <federaimondo@gmail.com>

* Add example transforming the target variable with zscore (#197)

* Add example transforming the target variable with zscore

* Correcting script with flake8

* Correct white space in title

* Adding = to title

* fix syntax

* Removing white lines to new syntax

* Removing white line for title

* Remove useless plot

Co-authored-by: Paas Oliveros, Lya Katarina <l.paas.oliveros@fz-juelich.de>
Co-authored-by: Fede Raimondo <fraimondo@dc.uba.ar>

* update confound removal example (#196)

* update confound removal example

* change name of file

* change name of file

* Fix confound removal

* Spelling

Co-authored-by: Fede <fraimondo@proton.me>

* Sk pandas examples (#190)

* added pipeline creator

* updated multiclass classification example

* updated inspect SVM example

* updated multiclass classification example

* formatted dis_plot_groupcv_inspect_svm.py

* formatted plot_cm_acc_multiclass.py

Co-authored-by: More, Shammi <s.more@fz-juelich.de>
Co-authored-by: Fede Raimondo <federaimondo@gmail.com>

* Fix hyperparameter tuning and remove "wrapping" from model and transformer name (#200)

* Fix Hyperparameter tuning
* Remove Wrapper naming convention

Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de>

* Example pca featsets (#198)

* [not working] example for two PCAs

* Fix double PCA issue

* [working] multiple PCA steps

* Fix remainder in name

* Add a name in PipelineCreator.add

* [working] multiple PCA steps with preprocess showcase

* rename

* rename

* Codespell

* Fix doc

* Fix scoring

Co-authored-by: Fede <fraimondo@proton.me>

* Adjust scoring to work with registered once (#203)

* Adjust scoring to work with registered once

* Revert back the problem type to not be overwritable

* Revert back the problem type to not be overwritable

* Grouped cv example (#205)

* Added new file for grouped CV.

* Created example of Grouped CV.

* deleted the old file of Grouped CV and created the new file for that.

* Target transformer (#215)

* WIP: Target transfomers + pipeline

* WIP

* damn git ignore

* The example works

* A todo for @samihamdan

* Type hints + docstrings in progress

* Base module ready

* Typing + docstrings of conftest

* Tests for inspect preprocessing + more types

* Fix linting

* model selection with types and docstrings

* Fix linter

* More tests with docstrings and typing hints

* linter

* Models sub-package done

* Some more tests + typing + docstrings for the pipeline module

* tests + docstrings + types for TargetPipelineCreator

* Almost done with tranformers.target, but got tired...

* Test for target confound remover done

* Some WIP... go sick and I don't remember where I was

* Finished testing confounds

* JuColumnTransformer tested

* More fixes

* Updated prepare and tests

* Updates to the docstings

* Basic API test

* I think the API is tested for the moment

* All test pass, no warnings

* Updated examples

* fix linting

* Codespell

* codespell doesn't like the new verb we invented

* ENH: allow to set X_types using regexp

* some symmetry in the log

* Fix tests

* Fix for leo

* WIP: model comparison

* add corrected t_test

* Add dependencies + fix other tets

* fix test for user-specified splits

* fix test

* chore: improve ci-docs.yml

* chore: improve ci.yml

* chore: improve docs-preview.yml

* chore: improve docs.yml

* chore: improve pypi.yml

* chore: isort

* chore: black

* chore: flake8 fixes

* Retry

* fix coverage

* Add deslib for testing in tox

* increase coverage

* test api with apply_to

* Cbpm sum (#221)

* Minimal version of CPM without weighting but sum as default, only old test adjusted

* Flake8

* lint

* Resolve set_output bug in cbpm

* Fix lint

* Julearn sk pandas row select (#220)

* Add row_select_col

* Add test row_select simple

* Adjust some logic

* Lint

* Adjust row selection to be accessible from PipelineCreator

* Adjust row selection via column type not type.

* Test and fix change_column_types transformer

* fix #213 and #141

* add return confound example

* fix groups

* fix for [ENH] Validate run_cross_validate parameters #178

* fix for #214

* Inspection (#223)

* ADD inspection tools

* Add get_fitted_params to PipelineInspector

* Add bugfix #219

* Fix preprocess import

* adjust pipe

* adjust pipe

* CV Inpsection WIP

* [WIP] Inspection tool (needs testing)

* sort the index

* Fix tests and ling

* Add some test and func for _cv

* Add inspection and test, fix predict_log_proba remove unnecessary todo notes

* Add tests inspector

* Add tests

* New docs (#228)

* Update docs theme to furo

* Add content to index

* Add new chapter

* modify requirements

* Add copybutton extension

* Add installation related info to get started

* Installation was replaced by getting started

* setup structure for chapter 4

* Add messy chapter 3 to push uptodate

* Number files based on chapter order

* Add basic index for chapter 4 and reference

* Change numbering to display in real time

* Add chapter 4 to toctree

* Comment data.rst to make build possible

* Work in progress push for Leo

* initial cbpm

* some formatting

* add introductory paragraph for confound removal and some references

* add docs on confound removal

* fix typo

* capitalise titles

* Work in progress push Leo

* Work in progress push for Leo

* Clean up made changes

* data section preliminary version done

* available pipeline steps

* clean the docs

* correct layout error

* structure and add references

* modify indices

* Update docs theme to furo

* Add content to index

* Add new chapter

* modify requirements

* Add copybutton extension

* Add installation related info to get started

* Installation was replaced by getting started

* setup structure for chapter 4

* Add messy chapter 3 to push uptodate

* Number files based on chapter order

* Add basic index for chapter 4 and reference

* Change numbering to display in real time

* Add chapter 4 to toctree

* Comment data.rst to make build possible

* Work in progress push for Leo

* initial cbpm

* some formatting

* add introductory paragraph for confound removal and some references

* add docs on confound removal

* fix typo

* capitalise titles

* Work in progress push Leo

* Work in progress push for Leo

* Clean up made changes

* data section preliminary version done

* available pipeline steps

* clean the docs

* correct layout error

* structure and add references

* modify indices

* WIP: Docs fixing, rearanging API

* restructure

* Add description of pipeline

* Update examples structure + API reference

* work in progress

* Some refactoring of the what really need know section

* fix CI

* remove comments

---------

Co-authored-by: Vera Komeyer <v.komeyer@fz-juelich.de>
Co-authored-by: LeSasse <l.sasse@fz-juelich.de>

* refactor index

* Add general CV information

* Add target preprocessing and refactor

* add reference

* Add model evaluation illustration pictures

* Add further needed scikitlearn links

* Correct typo

* Add cross-reference link

* Add cv-splitter as potential further selcted topic

* Add model evaluation content

* Fix doc sectioning and order

* Fix TOCTREE

* One less numbered element in the TOC

* Add/viz (#227)

* VIZ WIP

* add viz to the dependencies for docs

* add viz dependencies to tox

* fix linter

* delete old files

* add example back

* delete this file

* More doc on VIZ

* Add stats to viz

* Finish with VIZ docs

* Fix linter

* Fix tests

* Adaptations for proper toctree

* Rearrange content to solve toctree

* Minor clean-ups

* Add visualization information

* Fix building

* added initial version of stacking documentation for julearn

* fix markup in cbpm

* some slight improvements to stacked model documentation

* Adjust tests and fix bugs for dataframe transformers

* fix bug no hyperparameters

* fix bug hyperparams with target

* fix bug hyperparams with target

* ADD tests not fitted error target transformer

* ADD test returning inspection tool with differen return_estimator settings

* ADD test rais errors pipeline

* MODIFY insect to allow for gridsearch

* initial start with model inspection

* initial version of model_inspect.rst

* a few small typo fixes

* fix bug in which estimators were not picklable due to local function definition inside a function in base/column_types.py

* whitespaces

* apparently codespell does not like .sav as a file extension and thinks it should be save. Calling it .joblib instead

* [CI]: Use towncrier for news (#222)

* update: add towncrier configuration in pyproject.toml

* docs: refactor whats_new.rst for towncrier integration

* docs: improve contributing.rst and add towncrier instructions

* docs: improve maintaining.rst and add towncrier instructions

* update: add docs/changes/newsfragments

* chore: add 151.bugfix changelog

* chore: add 170.enh changelog

* update: improve docs/Makefile and add towncrier integration

* chore: delete Makefile

* update: add towncrier to pyproject.toml

* [ENH] Allow list of hyperparameter options for tunning (#47)

* [WIP] Multiple hyperparam grids

* fix linter

* increase coverage

* add more tests

* More tests for merger

* add news fragemtn

* Add specific gallery section for doc pages that need to be run.

* Docs on hyperparameter tuning

* some more updates

* FIX predict proba

* Update dependencies

* Fix CV + update first chapter of docs

* Fix tests

* codespell

* fix test for 3.8 maybe

* Fix inspector

* fix inpsecto

* Fix issues with doc building

* Improving some titles

* Update intesphinx links

* Fix inspection + docs

* Docs (almost) ready

* Fix codespell

* Update example of stratification for regression

* Update dependencies

* Fix the run_cross_validation docstring

* Fix test for stats

* MODIFY Target Transformation

* MODIFY K-fold for regression

* Fix for scikit-learn 1.3.0

* Fix whats'new

* Maybe ready for release?

* Lint

* codespell

* Add pearson's R scorer

* ADD: folds inspector prediction returns target value + avoid regexp when no patterns in X

* Set column types even if input is an array

* WIP: Config module

* Add tests for new config flags

* Update X Warns to 5k

* linter

* update doc

* Typo in DOC

---------

Co-authored-by: Sami Hamdan <sami.hamdan@hhu.de>
Co-authored-by: Leonard Sasse <l.sasse@fz-juelich.de>
Co-authored-by: Shammi270787 <more.shammi@gmail.com>
Co-authored-by: More, Shammi <s.more@fz-juelich.de>
Co-authored-by: antogeo <antogeo@gmail.com>
Co-authored-by: Synchon Mandal <synchon@protonmail.com>
Co-authored-by: Lya K. Paas <lya.paas@gmail.com>
Co-authored-by: Paas Oliveros, Lya Katarina <l.paas.oliveros@fz-juelich.de>
Co-authored-by: Vera Komeyer <75420154+verakye@users.noreply.github.com>
Co-authored-by: kaurao <kaustubh.patil@gmail.com>
Co-authored-by: Kimia Nazarzadeh <60739692+KNazarzadeh@users.noreply.github.com>
Co-authored-by: Vera Komeyer <v.komeyer@fz-juelich.de>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request Priority: Low Low Priority Issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants