Conversation
fcf6574 to
28beba7
Compare
also parse parameters when running a flow on a task fix publishing error
Codecov Report
@@ Coverage Diff @@
## develop #253 +/- ##
===========================================
+ Coverage 90.45% 90.59% +0.13%
===========================================
Files 24 24
Lines 2064 2094 +30
===========================================
+ Hits 1867 1897 +30
Misses 197 197
Continue to review full report at Codecov.
|
Codecov Report
@@ Coverage Diff @@
## develop #253 +/- ##
===========================================
+ Coverage 90.45% 90.59% +0.14%
===========================================
Files 24 24
Lines 2064 2138 +74
===========================================
+ Hits 1867 1937 +70
- Misses 197 201 +4
Continue to review full report at Codecov.
|
| return cls(**arguments) | ||
| flow = cls(**arguments) | ||
|
|
||
| if 'sklearn' in arguments['external_version']: |
There was a problem hiding this comment.
we could restrict this even further:
- 'sklearn.'
- startswith 'sklearn.' (?)
There was a problem hiding this comment.
I will replace this with startswith('sklearn'). In the long run, we probably need to build a plugin-like system in which the converters can register themselves for an 'external_version' string.
| flow = openml.flows.functions.get_flow(flow_id) | ||
| try: | ||
| _check_flow(self) | ||
| openml.flows.functions.assert_flows_equal(self, flow) |
There was a problem hiding this comment.
.. so we actually do expect an error here, in some cases
There was a problem hiding this comment.
It's still on my todo-list to test for this expected error.
| flow = OpenMLFlow._from_dict(flow_dict) | ||
|
|
||
| if 'sklearn' in flow.external_version: | ||
| flow.model = flow_to_sklearn(flow) |
There was a problem hiding this comment.
This piece of code is removed, I can't add anything here.
| # returns flow id if the flow exists on the server, False otherwise | ||
| flow_id = flow_exists(flow.name, flow.external_version) | ||
|
|
||
| if flow_id == False: |
There was a problem hiding this comment.
Now we are back to publishing a flow before knowing whether it actually valid (before we first ran the task)?
There was a problem hiding this comment.
That's actually an issue I somehow forgot. Will take care of this.
| flow = get_flow(flow_id) | ||
| setup_id = setup_exists(flow, model) | ||
| if avoid_duplicate_runs: | ||
| flow_from_server = get_flow(flow.flow_id) |
There was a problem hiding this comment.
Why don't we require that the 'run_flow_on_task' runs on a flow from the server?
There was a problem hiding this comment.
We then couldn't change the parameter values of the model in the flow as the run function would always use the parameters from the server.
| # TODO (neccessary? is this a post condition of this function) | ||
| flow = get_flow(flow_id) | ||
|
|
||
| run.flow_id = flow.flow_id |
There was a problem hiding this comment.
should be set in run constructor
| # server before parsing the parameters | ||
| stack = list() | ||
| stack.append(flow) | ||
| while len(stack) > 0: |
There was a problem hiding this comment.
can we make a separate function of this (modularity / readability)
There was a problem hiding this comment.
I actually thought the very same thing ;) It's already done.
| openml_param_settings = openml.runs.OpenMLRun._parse_parameters(sklearn_model, downloaded_flow) | ||
| description = xmltodict.unparse(_to_dict(downloaded_flow.flow_id, openml_param_settings), pretty=True) | ||
| file_elements = {'description': ('description.arff',description)} | ||
| openml_param_settings = openml.runs.OpenMLRun._parse_parameters(flow) |
There was a problem hiding this comment.
maybe explicitly state in comments that this function raises an error if the flow does not contain all flow ids?
| def _perform_run(self, task_id, num_instances, clf, | ||
| random_state_value=None, check_setup=True): | ||
|
|
||
| def _remove_random_state(flow): |
There was a problem hiding this comment.
why remove random state? This seems like part of the behaviour we want to test
There was a problem hiding this comment.
It's removed after checking the value to make sure that assert_flow_equal works.
|
Flows are published after running them (only if they haven't been published before). @janvanrijn this is ready for review again. |
There was a problem hiding this comment.
To me it seems like a good PR. Many code seems much shorter/simpler. However, some unit tests fail (4 on my system).
This error occurs 3 times:
Error
Traceback (most recent call last):
File "/home/vanrijn/projects/openml-python/tests/test_runs/test_run_functions.py", line 347, in test_get_run_trace
run = openml.runs.run_model_on_task(task, clf, avoid_duplicate_runs=True)
File "/home/vanrijn/projects/openml-python/openml/runs/functions.py", line 36, in run_model_on_task
flow_tags=flow_tags, seed=seed)
File "/home/vanrijn/projects/openml-python/openml/runs/functions.py", line 73, in run_flow_on_task
raise ValueError('Cannot check if a run exists if the '
ValueError: Cannot check if a run exists if the corresponding flow has not been published yet!
This one one time (seems similar)
Failure
Expected :"Penalty term must be positive; got \(C=u?'abc'\)"
Actual :"Cannot check if a run exists if the corresponding flow has not been published yet!"
<Click to see difference>
ValueError: Cannot check if a run exists if the corresponding flow has not been published yet!
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/home/vanrijn/projects/openml-python/tests/test_runs/test_run_functions.py", line 214, in test_check_erronous_sklearn_flow_fails
model=clf)
AssertionError: "Penalty term must be positive; got \(C=u?'abc'\)" does not match "Cannot check if a run exists if the corresponding flow has not been published yet!"
| self : OpenMLFlow | ||
|
|
||
| """ | ||
| import openml.flows.functions |
There was a problem hiding this comment.
imports at the top! (right?)
There was a problem hiding this comment.
This is not possible because of cyclic dependencies. In particular, flow.py tries to import functions.py in order to call get_flow(), while functions.py tries to import flow.py in order to instantiate an OpenMLFlow. I will add a comment.
|
Sorry for the failing tests. It seems that I was too sloppy yesterday evening. The tests are passing. |
janvanrijn
left a comment
There was a problem hiding this comment.
some minor comment requests
|
|
||
| # skips the run if it already exists and the user opts for this in the config file. | ||
| # also, if the flow is not present on the server, the check is not needed. | ||
| flow_id = flow_exists(flow.name, flow.external_version) |
There was a problem hiding this comment.
document somewhere that we need to have the 'avoid_duplicate_runs' to false if we want offline experiments
| # <openml.flows.flow.OpenMLFlow object at 0x7fed87978160> is not JSON serializable | ||
| # Python3.6 exception message: | ||
| # Object of type 'OpenMLFlow' is not JSON serializable | ||
| if 'OpenMLFlow' in e.args[0] and \ |
There was a problem hiding this comment.
document what happens in case of the catch (and why)
There was a problem hiding this comment.
in the catch, please try to define what can reasonably fall into that (because we handle it further down) and raise exception if we have something unexpected
There was a problem hiding this comment.
I will add additional checks.
| def setup_exists(flow, model=None): | ||
| ''' | ||
| Checks whether a flow / hyperparameter configuration already exists on the server | ||
| Checks whether a hyperparameter configuration already exists on the server. |
There was a problem hiding this comment.
please document why model can be none (i.e., flow.model is set)
please raise exception if flow.model is set and model is set, and these do not agree? (should never happen but still..)
There was a problem hiding this comment.
What do you mean by models do not agree? You mean names don't match? Or the parameter names don't match? Or the parameter values?
Matching of parameter names is done somewhat in _parse_parameters, but I could make that more strict (I'll think about it...)
There was a problem hiding this comment.
Okay, I'll add a more strict check.
| parameters[_flow_id][_param_name] = _param_value | ||
|
|
||
| def _reconstruct_flow(_flow, _params): | ||
| # sets the values of flow parameters (and subflows) to |
There was a problem hiding this comment.
small todo (sorry, should have been me in prev pull request): document what types _flow (flow object?) and _params (totally forgot) are? this would make this function better understandable
Side effect: during unit testing, it is possible to add a sentinel to the flow name to not work with flows which are already uploaded to the server.
Adds #193.
Further changes: