Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update Suggest API #510

Closed
cthoyt opened this issue Sep 7, 2019 · 40 comments
Closed

Update Suggest API #510

cthoyt opened this issue Sep 7, 2019 · 40 comments

Comments

@cthoyt
Copy link

cthoyt commented Sep 7, 2019

There is already a function in the Trial class for suggesting a discrete uniformly distributed float (Trial.suggest_discrete_uniform). It would be nice to have the same option that returns integers instead of floats.

My use case is for network representation learning - I want to suggest the embedding dimension in intervals of 50 between 50-350 (e.g. 50, 100, 150, ... , 300, 350).\

Currently, I can accomplish this by dividing out my interval then suggesting an integer and multiplying it like:

embeddings_dim = 50 * trial.suggest_int('embeddings_dim', 1, 7)

But this loses the power of tracking the value within optuna. Alternatively I could use the following, but the the data type is tracked incorrectly.

embeddings_dim = int(trial.suggest_discrete_uniform('embeddings_dim', 50, 350, q=7))
@cthoyt cthoyt changed the title Add suggest_discrete_int Add Trial.suggest_discrete_int Sep 7, 2019
@sile
Copy link
Member

sile commented Oct 1, 2019

Sorry for the late reply.
we think your proposal is reasonable.
However, instead of adding suggest_discrete_int method, we're planning to reorganize Suggest API as follows:

  • (a) [NEW_METHOD] Trial.suggest_float(name: str, low: float, high: float, log: bool, q: Optional[float])
    • sugegst_float(name, low, high, log=False, q=None) is equivalent to suggest_uniform(name, low, high)
    • suggest_float(name, low, high, log=True, q=None) is equivalent to suggest_loguniform(name, low, high)
    • suggest_float(name, low, high, log=False, q=0.1) is equivalent to suggest_discrete_uniform(name, low, high, q=0.1)
  • (b) [MODIFIED] Trial.suggest_int(name: str, low: int, high: int, q: Optional[int]) (
    • suggest_int(name, low, high, q=10) is equivalent to Trial.suggest_discret_int(name, low, high, q=10)
  • (c) [NO_CHANGE] Trial.suggest_categorical(name: str, choices: List[Union[str, float]])
  • Other existing suggest methods will be deprecated.

Note that the details of the API design is still under consideration (e.g., q may be renamed to a more descriptive name).

If you have any thought about the above new API design (draft), please let us know.

@cthoyt
Copy link
Author

cthoyt commented Oct 1, 2019

Looks good to me! Maybe rather than giving log as a Boolean, you might use a keyword argument called “distribution” in case you’d ever like to use something more exotic

@sile
Copy link
Member

sile commented Oct 2, 2019

Thank you for your suggestion.
We will consider the idea and create a draft PR within a month (maybe).

@cthoyt
Copy link
Author

cthoyt commented Oct 2, 2019

Would you like me to rename this issue, then?

@sile
Copy link
Member

sile commented Oct 3, 2019

IMO, either is OK. Feel free to change the title if you want to do it.

@cthoyt cthoyt changed the title Add Trial.suggest_discrete_int Update Suggest API Oct 3, 2019
@hayata-yamamoto
Copy link
Contributor

@sile
Do you have any progress in this?
I'm still looking forward to implementing this feature.

@sile
Copy link
Member

sile commented Feb 4, 2020

@hayata-yamamoto

Thank you for looking forward to this feature. But, unfortunately, there has been almost no progress since last year. Though Optuna dev team will be still busy for the coming months, we will keep in mind that this is a wanted feature and try to implement it if possible.

@hayata-yamamoto
Copy link
Contributor

hayata-yamamoto commented Feb 5, 2020

@sile
Oh..., I appreciate your informing.

I wonder which solution is likely to be implemented.
If design or approarch were in agreement, contributors (including me) could make this feature instead of core developers, I afraid.

@sile
Copy link
Member

sile commented Feb 7, 2020

Hi @hayata-yamamoto, sorry for the delayed response.

To clarify the design and approach, I created a contribution-welcome issue #902 for, so-called, suggest_discrete_int() method. I'd really appreciate it if you (or other contributors) could implement the feature.

@hayata-yamamoto
Copy link
Contributor

hayata-yamamoto commented Feb 7, 2020

@sile
Thanks! I'm aiming to develop this feature.

@sile
Copy link
Member

sile commented Feb 7, 2020

Great!

@kasparthommen
Copy link
Contributor

Hi all,

I think the new suggest_int method should also have a log flag (or, as suggested in another comment, a distribution parameter accepting strings).

For example,

suggest_int(1, 16, log=True)  # or, equivalently: suggest_int(1, 16, distribution='log')

would produce values between 1 and 4 equally likely as values between 4 and 16.

Model parameters that benefit from discrete log-uniform distributions are e.g.:

  • The number of neighbors in KNN
  • The number of layers in a neural network
  • The number of components in a GMM
  • The number of clusters in K-means
  • The number of components to retain in a PCA

My current workaround is to round/truncate log-uniform float values via suggest_loguniform, but that's not great.

@cthoyt
Copy link
Author

cthoyt commented Feb 18, 2020

@kasparthommen it would be nice to specify the log scale as well.

suggest_int(1, 16, distribution='log', log_base=2)

Such that [1, 2, 4, 8, 16] are generated. Otherwise, it might be assumed the base is 10 and then only [1, 10] are generated

As a question for all: would these 5 values be generated with equal probability? Or is it weighted?

@kasparthommen
Copy link
Contributor

Such that [1, 2, 4, 8, 16] are generated.

No, all values between 1 and 16 will be generated, but the probability decreases for larger values.

@hayata-yamamoto
Copy link
Contributor

hayata-yamamoto commented Feb 19, 2020

It looks a good suggestion, but should it be implemented as another function?
Comparing other suggest_ functions, I'm afraid suggest_int could be a bit huge than other functions. This difference could lead to users' confusion, I guess.

So, suggest_int should be implemented as corresponded to suggest_discrete_uniform and suggest_uniform. It's seemed that suggest_log_int should be developed apart from suggest_int.

Would it be possible to tell your thoughts, @sile @c-bata ?

@kasparthommen
Copy link
Contributor

It looks a good suggestion, but should it be implemented as another function?
Comparing other suggest_ functions, I'm afraid suggest_int could be a bit huge than other functions. This difference could lead to users' confusion, I guess.

My suggestion to add the log parameter to suggest_int does not make it any more complicated than suggest_float IMHO. Can you elaborate why you think it would become a bit "huge"?

So, suggest_int should be implemented as corresponded to suggest_discrete_uniform

I don't really care how it is implemented underneath as long as it provides the functionality I have suggested :-)

It's seemed that suggest_log_int should be developed apart from suggest_int.

I don't agree - I think we should adopt @sile's proposal (2nd post) and extend the suggest_int method a bit.

@hayata-yamamoto
Copy link
Contributor

hayata-yamamoto commented Feb 19, 2020

Can you elaborate why you think it would become a bit "huge"?

I'm sorry. This is my wrong :(

I think we should adopt @sile's proposal (2nd post) and extend the suggest_int method a bit.

If adding log paramter to suggest_int is planned, it should be done.
I don't know whether there is a plan or not.

@sile
Copy link
Member

sile commented Feb 19, 2020

When I wrote #510 (comment), I didn't plan to add a log-scale option to suggest_int method, but it sounds good idea.

@kasparthommen
Copy link
Contributor

@sile Cool, appreciated!

@himkt
Copy link
Member

himkt commented Mar 10, 2020

Hi, I'd like to work on this issue if it is possible. 😸
(If it's possible, I'll start from implementing methods listed in (a).)

@sile
Copy link
Member

sile commented Mar 11, 2020

Sure. Please go for it! > @himkt

@github-actions
Copy link
Contributor

This issue has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Apr 18, 2020
@himkt
Copy link
Member

himkt commented Apr 21, 2020

@sile

#510 (comment)

  • suggest_float is introduced in optuna now
  • suggest_float supports uniform/loguniform/discreteuniform

Is there something we have to work on in this PR?
The discussions about suggest_int have ended?

@sile
Copy link
Member

sile commented Apr 22, 2020

@himkt Thanks!
I think we might be able to address #510 (comment) ( adding log option to suggest_int) within this issue.
What do you think of it?

@sile sile removed the stale Exempt from stale bot labeling. label Apr 22, 2020
@himkt
Copy link
Member

himkt commented Apr 23, 2020

@sile
Thank you for your clarification.
OK, I will work on adding log to suggest_int later.

@sile
Copy link
Member

sile commented Apr 23, 2020

@himkt Cool! Thanks!

@nzw0301
Copy link
Member

nzw0301 commented May 1, 2020

@himkt Could I help you implement suggest_int?

I guess a new suggest_int looks like suggest_int(name: str, low: int, high: int, log: bool, base: int, step: int).
In addition, is it better to implement the following functions?

  • suggest_loguniform_int(name: str, low: int, high: int, base: int, step: int)
  • suggest_uniform_int(name: str, low: int, high: int)

as a counterpart of suggest_loguniform and suggest_uniform in suggest_float, respectively:

  • suggest_int(name, low, high, log=True, base, step) is equivalent to suggest_loguniform_int(name, low, high, base, step)
  • suggest_int(name, low, high, log=False, base, step=1) is equivalent to suggest_uniform_int(name, low, high)

@himkt
Copy link
Member

himkt commented May 1, 2020

Hi @nzw0301, thank you for your interest!
To be honest, your suggestion sounds helpful to me since I'm a little bit busy now.

I guess a new suggest_int looks like suggest_int(name: str, low: int, high: int, log: bool, base: int, step: int).

I basically agree with you but I think suggest_int doesn't have to support base, which is consistent with suggest_loguniform.

In addition, is it better to implement the following functions?

  • suggest_loguniform_int(name: str, low: int, high: int, base: int, step: int)
  • suggest_uniform_int(name: str, low: int, high: int)

I think what we have to do is implementing suggest_intloguniform because suggest_uniform_int is the same as suggest_int (It needs to implement the distribution IntLogUniformDistribution as well).

@nzw0301 What do you think?

@sile @cthoyt @hayata-yamamoto @kasparthommen
Please let me know if I've said something wrong. 🙇

@nzw0301
Copy link
Member

nzw0301 commented May 2, 2020

Thank you @himkt for your kind reply.

suggest_int doesn't have to support base

You are right! My understanding was wrong.

implementing suggest_intloguniform because suggest_uniform_int is the same as suggest_int (It needs to implement the distribution IntLogUniformDistribution as well).

It sounds fine.

@sile
Copy link
Member

sile commented May 2, 2020

Thanks, @nzw0301 and @himkt !

IMO, it's sufficient only to extend suggest_int and we don't have to implement suggest_intloguniform (or suggest_uniform_int).
Now, there is suggest_loguniform but it's just a historical reason. It has not been determined yet, but we might deprecate the method (and suggest_uniform) in favor of suggest_float in the future.

I think suggest_int doesn't have to support base, which is consistent with suggest_loguniform.

Agree.

@nzw0301
Copy link
Member

nzw0301 commented May 2, 2020

@sile Thank you for your reply! I understood.

I'll create a PR to support log argument of suggest_int since @himkt seems too busy.

@hayata-yamamoto
Copy link
Contributor

Wonderful!

@hayata-yamamoto
Copy link
Contributor

hayata-yamamoto commented May 6, 2020

@himkt @nzw0301

I basically agree with you but I think suggest_int doesn't have to support base, which is consistent with suggest_loguniform.

I understand this comment and it's reasonable, I guess.
Anyway, do you have any test case of suggest_int(high, low, step, log=True)?
Since https://github.com/optuna/optuna/pull/1201/files seems not to have test codes yet (I wrote this comment at 2020/05/06), I would like you to share supposed test cases.

To be honest, it's easy to understand if this function has a base argument. This is because it has already shown at #510 (comment)

On the other hand, if you only use natural logarithm to generate samples, it may be hard to understand at least for me. I'm afraid that I cannot remember whether natural logarithm can generate an integer value or not. If other developers have the same feeling, adding base can help us to use this function.

@nzw0301
Copy link
Member

nzw0301 commented May 6, 2020

@hayata-yamamoto

Add tests

I haven't created tests. Thank you for your kind comment.

base argument

I don't think so. Even if base is introduced, suggest_int doesn't return such values. suggest_int with log=True should be a counterpart of suggest_float for consistency. In addition, our implementations will be the almost same as automl/ConfigSpace.

Therefore, we may introduce power: bool and base: int arguments as another PR if optuna supports #510 (comment).

@himkt What do you think?

@hayata-yamamoto
Copy link
Contributor

I don't think so. Even if base is introduced, suggest_int doesn't return such values. suggest_int with log=True should be a counterpart of suggest_float for consistency. In addition, our implementations will be the almost same as automl/ConfigSpace.

I've got. That's cool.
I'm looking forward :)

@himkt
Copy link
Member

himkt commented May 6, 2020

@hayata-yamamoto Thank you for sharing your concerns.
And thank you @nzw0301 for clarifying, I agree with you.

Let me put a small note about the option for the logarithm scale.
As #510 (comment) says, all values between low and high will be generated in
suggest_int with log option. The probability decreases for larger values.

A histogram may look like here.

@kasparthommen
Copy link
Contributor

kasparthommen commented May 6, 2020

The implemention for a log-uniformly distributed integer is actually quite simple, it's int(round(sample_continuous/step))*step where sample_continuous comes from a continuous log-uniform distribution between low-step/2and high+step/2. Here's a script that demonstrates this:

import matplotlib.pyplot as plt
import numpy as np

# this function only implements the log-uniform case
def suggest_int_loguniform(low: int, high: int, step: int=1) -> int:
    # input validation - please replace by optuna's standard assertion/checking mechanisms
    assert low > 0
    assert high >= low
    assert step >= 0

    # adjust 'high' if necessary
    high -= (high - low) % step  # e.g. if low = 1, high = 6, step = 2 then adjust to high = 5

    # define corresponding continuous range and validate that as well
    uniform_low = low - step/2
    uniform_high = high + step/2
    assert uniform_low > 0

    # generate continuous log-uniform sample
    log_low = np.log(uniform_low)
    log_high = np.log(uniform_high)
    sample_continuous = np.exp(np.random.uniform(log_low, log_high))  # <-- the actual value between log_low and log_high needs to come from the sampler, not np.random.uniform

    # convert to int
    sample_int = int(round(sample_continuous/step))*step
    return sample_int


# histogram A: 1...16
np.random.seed(42)
values = []
for i in range(100000):
    values.append(suggest_int_loguniform(1, 16))

plt.figure()
plt.hist(values, bins=50)
plt.show()

# histogram B: 2...16, step=2
values = []
for i in range(100000):
    values.append(suggest_int_loguniform(2, 16, 2))

plt.figure()
plt.hist(values, bins=50)
plt.show()

@github-actions
Copy link
Contributor

This issue has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label May 20, 2020
@hvy hvy removed the stale Exempt from stale bot labeling. label May 22, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Jun 7, 2020

This issue has not seen any recent activity.

@github-actions github-actions bot added the stale Exempt from stale bot labeling. label Jun 7, 2020
@HideakiImamura HideakiImamura removed the stale Exempt from stale bot labeling. label Jun 8, 2020
@sile
Copy link
Member

sile commented Jun 11, 2020

Through the following PRs, the update of the Suggest API has been completed. All of the features will be available by the next major release (v2.0.0) planned for July.
Many thanks to all the contributors involved in this issue! Great work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants