Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use paradox for Filter settings #14

Closed
mb706 opened this issue Feb 12, 2019 · 13 comments
Closed

Use paradox for Filter settings #14

mb706 opened this issue Feb 12, 2019 · 13 comments

Comments

@mb706
Copy link
Contributor

mb706 commented Feb 12, 2019

If Filter objects had an associated ParamSet (and used its $values slot etc.), then the parameter values could more easily be tuned over. Having a ParamSet with good type and range information would also be informative for the user.

@mllg
Copy link
Sponsor Member

mllg commented Apr 18, 2019

I started this in #16, but there is still a lot missing:

  • most hyperparams are not encoded as parameters
  • we have three generic functions to filter via abs, perc or thresh. We could write a PipeOp which exposes suitable hyperparameters for these functions. Or is there a more concise way for tuning?

@mllg
Copy link
Sponsor Member

mllg commented Apr 25, 2019

All filters now have a ParamSet. The hyperparameters abs, perc or thresh and not included there (yet).

@pat-s
Copy link
Member

pat-s commented Jun 21, 2019

Moving this to the top in the prio list.

The ParamSet of each filter should contain the hyperpars (mutually exclusive)

  • abs
  • perc
  • thresh

and any possible arguments that can be passed down to the filter.

In the end, it should be possible to specify the ParamSet during construction so that mlr3pipelines can make use of all information with only one line (the filter construction) and call whatever method afterwards using the settings from the ParamSet.

@mb706 Is this correct or did I overlook something?

@mb706
Copy link
Contributor Author

mb706 commented Jun 24, 2019

It is probably good if the FilterResult class is useful both inside and outside of a PipeOp, but I don't think it it should contain stuff that is exclusively for pipeops: I would just put that code inside PipeOpFilter. The exception is metadata like the ParamSet of parameters specific to individual methods. The options as I see them are

  1. when a user uses FilterResult outside of a PipeOp, he would never use selection parameters (e.g. proportion of feats to filter) from a ParamSet and instead call $filter_perc(). Then mlr3pipelines should add these parameters inside the PipeOpFilter itself. This is the current behaviour and I think it makes sense.
  2. when a user uses FilterResult outside a PipeOp, he sometimes wants to have a function that works according to a selection parameter in a ParamSet. I'm not sure that is the case, because the paramsets are mostly used for tuning, which I imagine forces you to use pipelines, but maybe I am overlooking something. In this case the parameters could be added by FilterResult.
  3. some selection parameters are individual to a method; e.g. some methods offer different selection criteria than others, or some methods change their behaviour depending on which of the selection criteria are used. Then the parameters could be part of the individual FilterResult subclasses, but in that case I would probably also take a closer look and see if there are other questions this raises.

@pat-s
Copy link
Member

pat-s commented Jun 24, 2019

1 + 2:

I think we need to see this in practice. I would not say that users will never use the ParamSet(n_feat = 0.7) -> filter_perc() way and only go for filter_perc(0.7). But most likely, yes.

Maybe this is too much meta talk and we should first implement all of it and then get a feeling for everything.

I will first add a ParamSet focusing on manual execution of the filter by specifying the underlying filter options and then go ahead adding the process-based hyperpars (n.feat, n.perc).

@pat-s
Copy link
Member

pat-s commented Jun 24, 2019

Blocked until I know how to condition Param Limits on tasks and specify params without default.

mlr-org/paradox#234

@mb706
Copy link
Contributor Author

mb706 commented Jun 25, 2019

@pat-s
Copy link
Member

pat-s commented Jun 29, 2019

@mb706 I documented the first ParamSet locally just now.
I see that you are using the names "nfeat, frac and cutoff".
In mlr the names have been "abs, perc and thresh". Was there a reason you selected the new ones?
I am not attached to the old ones, we should just stay consistent across packages.

These generic hyperpars will be defined in the ParamSet next to the filter specific hyperpars.

filter$param_set
ParamSet: 
        id    class lower upper levels     default value
1:       k ParamInt     1   Inf                  3      
2: threads ParamInt     0   Inf                  0      
3:     abs ParamInt     1   Inf        <NoDefault>      
4:    perc ParamDbl     0     1        <NoDefault>      
5:  thresh ParamDbl  -Inf   Inf        <NoDefault>   

Setting them during construction via param_vals gives the opportunity of calling filter_*() without the second argument -> it will be queried from the ParamSet and the function will error if its missing.
Does that simplify things for mlr3pipelines?

@mb706
Copy link
Contributor Author

mb706 commented Jul 1, 2019

I don't like the old names. "perc" implies a value that goes from 0 to 100 instead of 0 to 1; "abs" sounds like we are talking about the absolute value (abs()) of something; I guess I'm least opinionated about "cutoff" vs. "thresh".

@mb706
Copy link
Contributor Author

mb706 commented Jul 1, 2019

Setting them during construction via param_vals gives the opportunity of calling filter_*() without the second argument

What behaviour is this giving if one sets abs to something and then calls filter_perc? (What if both abs and perc are given?)

@pat-s
Copy link
Member

pat-s commented Jul 1, 2019

What behaviour is this giving if one sets abs to something and then calls filter_perc?

filter_perc() and abs do not play together so it will just be ignored.
If you mean what will happen if abs is set during construction and then manually supplied to filter_abs() -> issue a warning and take manually supplied abs.

(What if both abs and perc are given?)

These are only used if their specific member fun is used (filter_perc() or filter_abs()), so they should not collide?

I don't like the old names. "perc" implies a value that goes from 0 to 100 instead of 0 to 1; "abs" sounds like we are talking about the absolute value (abs()) of something; I guess I'm least opinionated about "cutoff" vs. "thresh".

Ok, let's change them!

@berndbischl
Copy link
Sponsor Member

We agreed to change the names as the old ones were suboptimal

@pat-s
Copy link
Member

pat-s commented Jul 3, 2019

Implemented in 363e96a.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants