[MRG] Cleanup skopt/parameter.py with docs and some minor changes #75
Conversation
@AlexanderFabisch @fabianp Since you have used the library to an extent, we would like to hear your thoughts on this. The idea is to support function calls like this at some point of time
|
That's a good idea. Here are some suggestions:
However, my main focus at the moment is the optimization of real vectors so I actually don't really care about this use case. |
By the way, is it possible to set a seed in scipy.stats distributions? It would be much better if you could simply pass a seed for the random number generator in the function call like in sklearn. |
I would vote to keep it To me it seems worth having the prior and the transform separate. The prior represents where you think likely good values are and the transform takes care of turning the values into something that is "ncie" for the optimiser to handle (onehot encoding, similar ranges) |
Hi, thanks for the heads up. I'm general I'm skeptical for the need of such framework (but I could be wrong). In the usercases I'm familiar with you never need more than to set bounds and/or optionally transform to logspace, so this might be overkill. I know RandomizedSearchCV takes a distribution argument, but I've never used it for anything else besides uniform sampling (sometimes in log space). |
Thanks a lot for your feedback. @fabianp By skeptical, are you skeptical about the use of priors or the support for categorical parameters? A common use case would be the type of scaling used or the kernel in SVM (I remember the paper on randomized search had something on those lines). I also think it would be okay to support priors via arguments later. @AlexanderFabisch Yes, we would pass a random seed to the function call which will then be passed internally to the rvs method and then sampled (as being done currently at https://github.com/scikit-optimize/scikit-optimize/blob/master/skopt/gp_opt.py#L121). Were you meaning this or something else? |
@betatim Please review and if you are happy, do merge. In follow-up PR's I can address the remaining. |
That is what I meant. |
+1 for decoupling the domain/type of a variable (i.e., real, integer, categorical) from how it should be converted before being fed to the optimizer. Overall, things should be very simple by default, e.g |
Also I agree that we should start by having something very simple to begin with, i.e. supporting real integer and categorical types, all assumed to be uniformly distributed, with an optional log transform for the real values. This should cover 80-90% of the user cases. |
Is this ready for review? I am confused as this PR is adding |
Could you instead base your PR on master? It is otherwise difficult to compare what you are actually changing. Not sure what was wrong with Tim's proposal. (Seems fine with me) |
8eaf198
to
544ef4f
Compare
@@ -151,6 +178,8 @@ def __init__(self, low, high, prior='uniform', transformer='identity'): | |||
|
|||
if transformer == 'identity': | |||
self.transformer = Identity() | |||
elif transformer == 'log': | |||
self.transformer = Log() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does it make sense to allow log transform for integers? Not sure we should support it, or at least the inverse transform should be cast back to integers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe not or at least I can't think of a quick use case.
There are still tests to be added. |
Checkout the bug in the last commit :P |
Woah! Thanks for catching this. |
@abc.abstractmethod | ||
def rvs(self, n_samples=None, random_state=None): | ||
""" | ||
Sample points randomly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Change to "Randomly sample points from the original space" to explain that the samples are not in the warped space.
Great, I'll change it then. |
My thinking behind splitting the prior and transformer was that there are two problems that need solving:
Personally I find it easier to think about the prior in the original space and then let the computer transform it to the warped space for the optimiser: |
Yes, which is why it makes sense to keep sampling in the warped space just for the uniform prior. |
Proposal:
|
from sklearn.preprocessing import LabelEncoder, OneHotEncoder | ||
from sklearn.utils import check_random_state | ||
from sklearn.utils.fixes import sp_version | ||
|
||
|
||
class Identity(TransformerMixin): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed because TransformerMixin
assumes input to transform to be 2-D
I have tried to keep the API as simple as possible. |
I like where this is going :-)
|
|
||
def transform(self, values): | ||
return np.log(values) | ||
@abc.abstractmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need abc
and stuff? A simple raise NotImplementedError
wouldnt be enough?
(This is an open question, I have no strong opinion against one or the other.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am -0 on abc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The difference is that making it an abc
will prevent the instance being created if it does not implement those methods, while raising a NotImplementedError
will allow the instance being created.
(I am not a purist and I don't mind removing them)
Thanks Manoj, this is looking good! I think we should add a last test showcasing how non-trivial grids can be defined. E.g., by calling |
For ease of use, we should also accept triples for
|
return self._rvs.rvs(size=n_samples, random_state=random_state) | ||
random_vals = self._rvs.rvs(size=n_samples, random_state=random_state) | ||
if self.prior == "log-uniform": | ||
return np.exp(random_vals) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
np.exp(x)
-> 10**x
to match the np.log10
done earlier.
All right, fixed both the inverse_transform and the tests. We can introduce the tuple shorthand in another PR. Please merge if happy |
This is great, thank you! Waiting for Travis green light, and then I'll merge. |
I'll address the shorthand. |
Yay! We should also refactor the existing minimize code to make use of the new API. |
I just made #81 with a few thrown ideas. |
Whoop! |
Added docs and some minor cosmetics.
Removed prior for now so that we can compare our results with a randomized search where we have some prior knowledge about the candidates. (I understand it might be useful but YAGNI)