-
Notifications
You must be signed in to change notification settings - Fork 52
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
User can request managerial info, like trial string identifiers, from cli #69
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can leave seed support for latter. I'll open an issue. Please fix the docstring for flake8 and for first person singular (comment here). Rest is fine. 🙂
2. Prefix ``'$'`` is given to substitutions from exposed properties. | ||
3. Prefix ``'_'`` is given to arguments which do not interact with. | ||
Suffix is a unique integer. | ||
4. ``'config'`` is given to user's script configuration file template |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There should be an example. The descriptions alone are not clear enough.
src/orion/core/io/space_builder.py
Outdated
argument, if not found elsewhere. Only one is allowed. | ||
|
||
.. note:: Currently positional arguments cannot define a parameter | ||
dimension. And I don't think they will; it wouldn't be good for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should not use first person singular in docstrings. We can't use positional arguments because they don't have a name assigned to. I think that's enough justification.
src/orion/core/io/space_builder.py
Outdated
USERARGS_NAMESPACE = r'\W*([a-zA-Z0-9_-]+)' | ||
USERARGS_CONFIG = '--config=' | ||
|
||
EXPOSED_PROPERTIES = ['trial.hash_name', 'trial.full_name', 'exp.name'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I understand properly, if there is some $trial.hash_name
in the command line of the user, this will be replaced by the actual trial.hash_name provided by orion. Is that right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind
ret = "Trial(experiment={0}, status={1}, params.value={2})".format( | ||
repr(self.experiment), repr(self._status), [p.value for p in self.params]) | ||
ret = "Trial(experiment={0}, status={1},\n params={2})".format( | ||
repr(self.experiment), repr(self._status), self.params_repr(sep=('\n' + ' ' * 13))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about using pprint
if we want some fancy repr
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of course, but later
def hash_name(self): | ||
"""Generate a unique name with an md5sum hash for this `Trial`. | ||
|
||
.. note:: Two trials that have the same `params` must have the same `hash_name`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. Two trials with identical parameters could have different results. We can't assume the user's script is deterministic. We could, however, add a seed
to the attributes of the Trial and pass it through an environment variable to the user's script. The user could seed the script using it, making it deterministic for a given pair of (params, seed).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it the case that we may request from the script to evaluate a particular set of parameters twice? Then, that's a design issue that we did not take care about.
only if the params
are the same, hash_name
is going to be, not the results
.
#71 is this the one that you opened because of this observation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes I opened #71 for this.
If two trials with identical parameters give different results then they should be considered different, hence have different hash names.
For hyper-parameter optimization, it would not make sense to run twice the exact same hyper-parameters unless they are all categoricals or very small interval of integers. If any is dimension is real or large interval of integers, than the optimization algorithm is better to poke around than running twice identical sets.
However, someone could want to run multiple trials with identical params to compute the variance of the performance. This is something we could support.
To make it clear, what I propose is to keep it like you did for now, our first algorithms won't use seed information anyway. But I believe we should add support for seeds later on, and include the seed information in what we use to compute the hash name of a trial. A change like that should not break anything so it's ok to postpone it. (it's possible to keep backward compatibility for prior trials after we add support for seeds)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hash_name
is supposed to be generated and given as an input to user's script so that it is able to disambiguate local directories that it may use (i.e. for snapshots or logs) among different hyperparameter trial sets. It is supposed to be generated by trials whose results are not yet known, as is the use case above. So this hash_name corresponds only to trial.params
only. To make it able to run multiple experiments for the same set of hyperparameters:
- yes we should provide the seed to the user's script. I predict a
~trial.seed
kind of command in the likes of what I've built here. So,seed
is going to be an attribute of aTrial
as well. hash_name
will be generated as a function of(trial.params, trial.seed)
to ensure uniqueness of results and reproducibility.
@@ -6,6 +6,7 @@ max_trials: 100 | |||
algorithms: | |||
gradient_descent: | |||
learning_rate: 0.1 | |||
# dx_tolerance: 1e-7 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why comment out? Should we remove it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To show that it is the default, but not used here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, that's fine
"./dir_per_trial.py", | ||
"--dir={}".format(str(tmpdir)), | ||
"--other-name~exp.name", | ||
"--name~trial.hash_name", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use ~
for both exposed variables and distributions? Wasn't it supposed to be $
in this case?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nevermind
# (notice that an appropriate learning rate was selected in this stress | ||
# test to create underdamped behaviour), that it begins to suggest same | ||
# things from the past. This is intended to be shown with the assertions | ||
# in the for-loop below. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not used random search if gradient descent causes side-effects you need to verify to make sure the problem is not md5 of bad hash creation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the reason that even in the case where Δx_t
is the minimum possible in the finite float representation scenario, which can be reached by gradient descent for example, the demand for diff hash_names => diff parameters
is true. By random search I wouldn't be able to have that stressed behaviour.
assert spacebuilder.userargs_tmpl['/yolo'] == '-yolo=' | ||
assert spacebuilder.userargs_tmpl['/arch2'] == '--arch2=' | ||
assert list(spacebuilder.userargs_tmpl.keys()) == ['_0', '/yolo', '_1', '/arch2', | ||
'$2', '_3', '_4', '_5'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, now I understand what the $
was for.
- Also, explicitly fill `trial._id` with `Trial.NoID` during init. - Minor renaming from self._type to self.type for getters in `Trial.Value`. - Change representation of a `Trial` to better print out contained parameters. Why: I need to way to identify saved stuff in local disk from the execution of my script given a particular trial. This way I can have a reproducible deterministic string identifier that I intend to pass appropriately to a command line argument (if specified from ``orion hunt``) of my executable. This will disambiguate saved states, snapshots, or logs (i.e. from tensorboard), among different trials of the same executable. The actual implementation of all save/load is left to the executable itself, as well as configuring the absolute path where the stuff will be saved to. Most (responsible) ML programs I have personally encountered have a cli way to specify some naming for the experiment to run.
- Replace name property - Planning to match all flavours of experiment naming
Copy pasting from documentation of `SpaceBuilder.build_from`: This method is also responsible for parsing semantics which allow information from the `Experiment` or a `Trial` object to be passed to user's script. For example, a command line option like ``--name~trial.hash_name`` will be parsed to mean that a unique hash identifier of the trial that defines an execution shall be passed to the option ``--name``. Usage of these properties are not obligatory, but it helps integrating with solutions which help experiment reproducibility and resumability. See :attr:`EXPOSED_PROPERTIES` to check which properties are supported.
- `SpaceBuilder.userargs_tmpl` is now an OrderedDict. - Built command line args respect **the order** and **the way** in which they where parsed. - Less acrobatics regarding configuration file path. - Tackles the cases where ``~`` in a cli arg means home folder as in a shell path name. - Logs warning wherever implicit arguable decisions are taken. - TODO: First positional may not be a config file => it's a bad restriction. - TODO: Import parsing regex elements perhaps from a orion config file Copying from documentation of `SpaceBuilder._build_from_args`: Build templates from arguments found in the original cli. Rules for namespacing: 1. Prefix ``'/'`` is given to parameter dimensions. 2. Prefix ``'$'`` is given to substitutions from exposed properties. 3. Prefix ``'_'`` is given to arguments which do not interact with. Suffix is a unique integer. 4. ``'config'`` is given to user's script configuration file template User script configuration file argument is treated specially, trying to recognise a ``--config`` option or by checking the first positional argument, if not found elsewhere. Only one is allowed. .. note:: Currently positional arguments cannot define a parameter dimension. And I don't think they will; it wouldn't be good for your health down the line. Please be responsible and give a good descriptive name to your inputs. .. note:: Templates preserve given argument order.
Reason: MongoDB version changes, message in exception changes
I got greedy. Support it with a functional test. Remove also dead code from ``SpaceBuilder._build_from_args``. Add a test to show how it fails when it does.
Reason is that '/' means special thing in UNIX path names
Codecov Report
@@ Coverage Diff @@
## master #69 +/- ##
==========================================
+ Coverage 97.31% 97.37% +0.05%
==========================================
Files 34 34
Lines 2982 3087 +105
Branches 247 258 +11
==========================================
+ Hits 2902 3006 +104
Misses 49 49
- Partials 31 32 +1
Continue to review full report at Codecov.
|
src/orion/core/io/space_builder.py
Outdated
your health down the line. Please be responsible and give a good | ||
descriptive name to your inputs. | ||
.. note:: Positional arguments cannot define a parameter | ||
dimension, because no name, but an arbitrary (semantically) one, can |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand what you mean but it seems to me that it could be explained in more simple words. Just dropping "but an arbitrary (semantically) one" would make it much simpler. I don't think the note needs to justify why we do not assign random strings as hyper-parameter names when none are explicitly given.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could have assigned
_5
as a name. Let's keep like this, please
Well that would be terrible. It would be very hard to understand what is _5
when the user try to analyze the results latter on.
The documentation needs to be clear and concise. I'm sorry but this is not.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Positional arguments cannot define a parameter dimension, because no **meaningful** name can be assigned
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👌
We could have assigned `_5` as a name. Let's keep like this, please.
…On Wed, May 9, 2018 at 8:07 PM Xavier Bouthillier ***@***.***> wrote:
***@***.**** commented on this pull request.
------------------------------
In src/orion/core/io/space_builder.py
<#69 (comment)>:
> @@ -337,10 +337,9 @@ def _build_from_args(self, cmd_args):
recognise a ``--config`` option or by checking the first positional
argument, if not found elsewhere. Only one is allowed.
- .. note:: Currently positional arguments cannot define a parameter
- dimension. And I don't think they will; it wouldn't be good for
- your health down the line. Please be responsible and give a good
- descriptive name to your inputs.
+ .. note:: Positional arguments cannot define a parameter
+ dimension, because no name, but an arbitrary (semantically) one, can
I understand what you mean but it seems to me that it could be explained
in more simple words. Just dropping "but an arbitrary (semantically) one"
would make it more simpler. I don't think the note needs to justify why we
do not assign random strings as hyper-parameter names when none are
explicitly given.
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#69 (review)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AD5VAe35OJoeH99Oog-GW1vnnqbnvWO3ks5tw4TfgaJpZM4T0c9R>
.
|
4e86544
to
54d1090
Compare
Why:
I need to way to identify saved stuff in local disk from the
execution of my script given a particular trial. This way I can have
a reproducible deterministic string identifier that I intend to pass
appropriately to a command line argument (if specified from
orion hunt
) of my executable. This will disambiguate saved states,snapshots, or logs (i.e. from tensorboard), among different trials of
the same executable.
The actual implementation of all save/load is left to the executable
itself, as well as configuring the absolute path where the stuff will
be saved to. Most (responsible) ML programs I have personally encountered
have a cli way to specify some naming for the experiment to run.