Allow running a flow on a task by mfeurer · Pull Request #253 · openml/openml-python

mfeurer · 2017-05-11T15:11:02Z

Side effect: during unit testing, it is possible to add a sentinel to the flow name to not work with flows which are already uploaded to the server.

Adds #193.

Further changes:

model of a flow is now created when downloading the flow
parameters of a run are now parsed when the run is executed
simplifies interface of several functions
simplified downloading of a flow from a flow id

also parse parameters when running a flow on a task fix publishing error

codecov-io · 2017-05-16T12:23:44Z

Codecov Report

Merging #253 into develop will increase coverage by 0.13%.
The diff coverage is 97.67%.

@@             Coverage Diff             @@
##           develop     #253      +/-   ##
===========================================
+ Coverage    90.45%   90.59%   +0.13%     
===========================================
  Files           24       24              
  Lines         2064     2094      +30     
===========================================
+ Hits          1867     1897      +30     
  Misses         197      197

Impacted Files	Coverage Δ
openml/__init__.py	`100% <100%> (ø)`	⬆️
openml/flows/functions.py	`89.23% <100%> (-0.33%)`	⬇️
openml/flows/__init__.py	`100% <100%> (ø)`	⬆️
openml/setups/functions.py	`98.48% <100%> (+0.04%)`	⬆️
openml/runs/__init__.py	`100% <100%> (ø)`	⬆️
openml/flows/flow.py	`94.7% <100%> (+0.45%)`	⬆️
openml/runs/functions.py	`88.18% <94.73%> (+0.17%)`	⬆️
openml/runs/run.py	`95.3% <95.65%> (+0.41%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea4c9be...8fadddc. Read the comment docs.

codecov-io · 2017-05-16T12:23:44Z

Codecov Report

Merging #253 into develop will increase coverage by 0.14%.
The diff coverage is 94.11%.

@@             Coverage Diff             @@
##           develop     #253      +/-   ##
===========================================
+ Coverage    90.45%   90.59%   +0.14%     
===========================================
  Files           24       24              
  Lines         2064     2138      +74     
===========================================
+ Hits          1867     1937      +70     
- Misses         197      201       +4

Impacted Files	Coverage Δ
openml/flows/__init__.py	`100% <100%> (ø)`	⬆️
openml/exceptions.py	`100% <100%> (ø)`	⬆️
openml/runs/__init__.py	`100% <100%> (ø)`	⬆️
openml/flows/functions.py	`91.56% <100%> (+2.01%)`	⬆️
openml/__init__.py	`100% <100%> (ø)`	⬆️
openml/runs/run.py	`93.08% <84.84%> (-1.81%)`	⬇️
openml/flows/flow.py	`94.15% <95.45%> (-0.09%)`	⬇️
openml/setups/functions.py	`97.22% <96%> (-1.22%)`	⬇️
openml/runs/functions.py	`88.45% <96.55%> (+0.43%)`	⬆️
openml/_api_calls.py	`88.05% <0%> (+2.98%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ea4c9be...faf5b26. Read the comment docs.

janvanrijn

Preliminary review :)

janvanrijn · 2017-05-16T12:06:34Z

-        return cls(**arguments)
+        flow = cls(**arguments)
+
+        if 'sklearn' in arguments['external_version']:


we could restrict this even further:

'sklearn.'

startswith 'sklearn.' (?)

I will replace this with startswith('sklearn'). In the long run, we probably need to build a plugin-like system in which the converters can register themselves for an 'external_version' string.

janvanrijn · 2017-05-16T12:10:17Z

+        flow = openml.flows.functions.get_flow(flow_id)
        try:
-            _check_flow(self)
+            openml.flows.functions.assert_flows_equal(self, flow)


.. so we actually do expect an error here, in some cases

It's still on my todo-list to test for this expected error.

janvanrijn · 2017-05-16T12:11:31Z

    flow = OpenMLFlow._from_dict(flow_dict)

-    if 'sklearn' in flow.external_version:
-        flow.model = flow_to_sklearn(flow)


startswith, (..again) ?

This piece of code is removed, I can't add anything here.

janvanrijn · 2017-05-16T13:13:45Z

+    # returns flow id if the flow exists on the server, False otherwise
+    flow_id = flow_exists(flow.name, flow.external_version)
+
+    if flow_id == False:


Now we are back to publishing a flow before knowing whether it actually valid (before we first ran the task)?

That's actually an issue I somehow forgot. Will take care of this.

janvanrijn · 2017-05-16T13:21:55Z

-        flow = get_flow(flow_id)
-        setup_id = setup_exists(flow, model)
+    if avoid_duplicate_runs:
+        flow_from_server = get_flow(flow.flow_id)


Why don't we require that the 'run_flow_on_task' runs on a flow from the server?

We then couldn't change the parameter values of the model in the flow as the run function would always use the parameters from the server.

janvanrijn · 2017-05-16T13:22:35Z

-        # TODO (neccessary? is this a post condition of this function)
-        flow = get_flow(flow_id)
-
    run.flow_id = flow.flow_id


should be set in run constructor

janvanrijn · 2017-05-16T13:24:26Z

+        # server before parsing the parameters
+        stack = list()
+        stack.append(flow)
+        while len(stack) > 0:


can we make a separate function of this (modularity / readability)

and reusability :)

I actually thought the very same thing ;) It's already done.

janvanrijn · 2017-05-16T13:27:26Z

-    openml_param_settings = openml.runs.OpenMLRun._parse_parameters(sklearn_model, downloaded_flow)
-    description = xmltodict.unparse(_to_dict(downloaded_flow.flow_id, openml_param_settings), pretty=True)
-    file_elements = {'description': ('description.arff',description)}
+    openml_param_settings = openml.runs.OpenMLRun._parse_parameters(flow)


maybe explicitly state in comments that this function raises an error if the flow does not contain all flow ids?

janvanrijn · 2017-05-16T13:32:26Z

+    def _perform_run(self, task_id, num_instances, clf,
+                     random_state_value=None, check_setup=True):
+
+        def _remove_random_state(flow):


why remove random state? This seems like part of the behaviour we want to test

It's removed after checking the value to make sure that assert_flow_equal works.

… flow

mfeurer · 2017-05-17T20:40:47Z

Flows are published after running them (only if they haven't been published before). @janvanrijn this is ready for review again.

janvanrijn

To me it seems like a good PR. Many code seems much shorter/simpler. However, some unit tests fail (4 on my system).

This error occurs 3 times:

Error

Traceback (most recent call last):
  File "/home/vanrijn/projects/openml-python/tests/test_runs/test_run_functions.py", line 347, in test_get_run_trace
    run = openml.runs.run_model_on_task(task, clf, avoid_duplicate_runs=True)
  File "/home/vanrijn/projects/openml-python/openml/runs/functions.py", line 36, in run_model_on_task
    flow_tags=flow_tags, seed=seed)
  File "/home/vanrijn/projects/openml-python/openml/runs/functions.py", line 73, in run_flow_on_task
    raise ValueError('Cannot check if a run exists if the '
ValueError: Cannot check if a run exists if the corresponding flow has not been published yet!

This one one time (seems similar)

Failure
Expected :"Penalty term must be positive; got \(C=u?'abc'\)"
Actual   :"Cannot check if a run exists if the corresponding flow has not been published yet!"
 <Click to see difference>

ValueError: Cannot check if a run exists if the corresponding flow has not been published yet!

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/vanrijn/projects/openml-python/tests/test_runs/test_run_functions.py", line 214, in test_check_erronous_sklearn_flow_fails
    model=clf)
AssertionError: "Penalty term must be positive; got \(C=u?'abc'\)" does not match "Cannot check if a run exists if the corresponding flow has not been published yet!"

janvanrijn · 2017-05-17T21:17:38Z

        self : OpenMLFlow

        """
+        import openml.flows.functions


imports at the top! (right?)

This is not possible because of cyclic dependencies. In particular, flow.py tries to import functions.py in order to call get_flow(), while functions.py tries to import flow.py in order to instantiate an OpenMLFlow. I will add a comment.

mfeurer · 2017-05-18T09:21:33Z

Sorry for the failing tests. It seems that I was too sloppy yesterday evening. The tests are passing.

janvanrijn

some minor comment requests

janvanrijn · 2017-05-18T10:52:14Z


    # skips the run if it already exists and the user opts for this in the config file.
    # also, if the flow is not present on the server, the check is not needed.
+    flow_id = flow_exists(flow.name, flow.external_version)


document somewhere that we need to have the 'avoid_duplicate_runs' to false if we want offline experiments

janvanrijn · 2017-05-18T10:54:01Z

+                    # <openml.flows.flow.OpenMLFlow object at 0x7fed87978160> is not JSON serializable
+                    # Python3.6 exception message:
+                    # Object of type 'OpenMLFlow' is not JSON serializable
+                    if 'OpenMLFlow' in e.args[0] and \


document what happens in case of the catch (and why)

in the catch, please try to define what can reasonably fall into that (because we handle it further down) and raise exception if we have something unexpected

I will add additional checks.

janvanrijn · 2017-05-18T10:56:55Z

+def setup_exists(flow, model=None):
    '''
-    Checks whether a flow / hyperparameter configuration already exists on the server
+    Checks whether a hyperparameter configuration already exists on the server.


please document why model can be none (i.e., flow.model is set)
please raise exception if flow.model is set and model is set, and these do not agree? (should never happen but still..)

What do you mean by models do not agree? You mean names don't match? Or the parameter names don't match? Or the parameter values?

Matching of parameter names is done somewhat in _parse_parameters, but I could make that more strict (I'll think about it...)

Okay, I'll add a more strict check.

janvanrijn · 2017-05-18T10:58:52Z

+        parameters[_flow_id][_param_name] = _param_value

    def _reconstruct_flow(_flow, _params):
-        # sets the values of flow parameters (and subflows) to


small todo (sorry, should have been me in prev pull request): document what types _flow (flow object?) and _params (totally forgot) are? this would make this function better understandable

mfeurer force-pushed the add/#193 branch 4 times, most recently from fcf6574 to 28beba7 Compare May 16, 2017 11:51

split function run_task into two functions

8fadddc

also parse parameters when running a flow on a task fix publishing error

mfeurer force-pushed the add/#193 branch from 28beba7 to 8fadddc Compare May 16, 2017 12:00

mfeurer changed the title ~~WIP: allow running a flow on a task~~ Allow running a flow on a task May 16, 2017

mfeurer requested a review from janvanrijn May 16, 2017 12:02

janvanrijn reviewed May 16, 2017

View reviewed changes

mfeurer added 5 commits May 16, 2017 16:24

make unit test stricter

9d6bea1

incorporate comments from Jan

9a9917a

improve testing of setup_exists

beaa046

only check parameters of component if it is not older than the parent…

c46f5b7

… flow

publish flow after running it

00be401

janvanrijn requested changes May 17, 2017

View reviewed changes

mfeurer added 2 commits May 18, 2017 10:10

add comment on in-function import

57666fa

fix unit tests

7064899

mfeurer force-pushed the add/#193 branch from f336e5d to 7064899 Compare May 18, 2017 08:56

janvanrijn approved these changes May 18, 2017

View reviewed changes

work on comments from Jan

faf5b26

mfeurer merged commit d949996 into develop May 18, 2017

mfeurer deleted the add/#193 branch May 18, 2017 13:06

This was referenced May 18, 2017

Allow to pass a flow object to run_task #193

Closed

Function get_flow to work with other flows #218

Closed

Cache runs for later upload #219

Closed

mfeurer mentioned this pull request Oct 9, 2017

Add unit test for loading non-sklearn flows, fixes #218 #340

Merged

mfeurer mentioned this pull request Apr 9, 2018

Why do we copy Flow fields instead of using retrieved Flow? #437

Closed

Uh oh!

Conversation

mfeurer commented May 11, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov-io commented May 16, 2017

Codecov Report

Uh oh!

codecov-io commented May 16, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

janvanrijn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented May 17, 2017

Uh oh!

janvanrijn left a comment • edited by mfeurer Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented May 18, 2017

Uh oh!

janvanrijn left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mfeurer commented May 11, 2017 •

edited

Loading

codecov-io commented May 16, 2017 •

edited

Loading

janvanrijn left a comment •

edited by mfeurer

Loading