Feature/upload flow #167

mfeurer · 2016-09-02T15:40:11Z

No description provided.

OpenMLFlow parameters and components attribute are now of type OrderedDict, with keys being the name of the parameter/component and value the either the default value of the hyperparameter or the actual component. This makes creating OpenMLFlows easier. They can still be nicely uploaded. Also, there was a bug in the deserialization, which returned always the model of the serialized flow.

coveralls · 2016-09-06T15:43:33Z

Coverage increased (+0.2%) to 89.747% when pulling 5a0750a on feature/upload-flow into a407b75 on develop.

coveralls · 2016-09-08T18:00:03Z

Coverage increased (+0.3%) to 89.848% when pulling 7cd5741 on feature/upload-flow into a407b75 on develop.

mfeurer · 2016-09-09T13:02:08Z

This PR introduces the following new flow-related features:

Add get_flow
Fix an abstract flow specification. It is enforced that for each flow the parameter the default values are strings. Necessary for deserialization because only the library-specific code knows how to interpret data saved here.
Add a converter from/to scikit-learn. It allows serialization and deserialization of scikit-learn flows to and from the server. Each parameter default value is encoded as a json object to allow uploading through the 'abstract' flow specification. This currently expects scikit-learn 0.18 (already works with the new model selection module).

I will now write docstrings to explain the implementation. I will write documentation once the code part of the PR is approved.

mfeurer · 2016-09-09T13:47:56Z

Okay, the unit tests are now working. @amueller @janvanrijn what do you think of this PR?

coveralls · 2016-09-09T13:52:40Z

Coverage increased (+0.3%) to 89.81% when pulling bd0175a on feature/upload-flow into a407b75 on develop.

mfeurer · 2016-09-09T21:32:05Z

Apparently, there is still a bug in creating the names of the flows. The current unit test should not produce the following name:

TEST65c1217799sklearn.model_selection._search.RandomizedSearchCV(sklearn.pipeline.Pipeline(sklearn.preprocessing.data.StandardScaler,sklearn.ensemble.weight_boosting.AdaBoostClassifier(sklearn.tree.tree.DecisionTreeClassifier)),sklearn.preprocessing.data.StandardScaler,sklearn.ensemble.weight_boosting.AdaBoostClassifier(sklearn.tree.tree.DecisionTreeClassifier))

Especially, it should also contain the CV object -> test for the created component names in the unit test, also add a more complex unit test to the scikit-learn converter test.

coveralls · 2016-09-12T08:56:32Z

Coverage increased (+0.3%) to 89.81% when pulling 6bec1fa on feature/upload-flow into a407b75 on develop.

coveralls · 2016-09-12T14:06:19Z

Coverage increased (+0.3%) to 89.817% when pulling 7e6a545 on feature/upload-flow into a407b75 on develop.

coveralls · 2016-09-15T13:52:13Z

Coverage increased (+0.3%) to 89.817% when pulling 604d01e on feature/upload-flow into a407b75 on develop.

mfeurer · 2016-09-15T13:54:51Z

No idea why the PR check fails. It's at least not the fault of this PR.

coveralls · 2016-09-20T16:14:04Z

Coverage increased (+0.3%) to 89.861% when pulling 4c7673f on feature/upload-flow into a407b75 on develop.

mfeurer · 2017-01-26T19:17:29Z

Just for reference, it worked this afternoon: https://travis-ci.org/openml/openml-python/builds/195473597 so I assume this is on the OpenML side

Regarding the pipeline, I can see the issue and we'd need to add the step name to the name. Will be doing this in a few minutes, okay?

amueller · 2017-01-26T19:18:15Z

No rush on my side ;) I'll be awake longer than you are, I imagine.

amueller

First batch of comments, not done ;)

amueller · 2017-01-26T17:45:06Z

openml/util.py

@@ -5,6 +5,7 @@
 else:
    from urllib.error import URLError

+import six


why if it's not used here?

amueller · 2017-01-26T17:46:42Z

tests/__init__.py

@@ -0,0 +1,3 @@
+# Dummy to allow mock classes in the test files to have a version number for
+# their parent module
+__version__ = '0.1'


I would probably prefer mocking but it's ok for now.

amueller · 2017-01-26T17:46:50Z

tests/flows/dummy_learn/__init__.py

@@ -0,0 +1 @@
+__version__ = 1.0


why is that here? dummy learn is a a fake learning library for testing?

When I execute the tests locally under python3, it imported the module as dummy_module.dummy_forest, on travis-ci as tests.flows.dummy_learn.dummy_forest. I changed the imports now, so this is no longer needed.

amueller · 2017-01-26T17:49:18Z

openml/testing.py

@@ -42,12 +42,15 @@ def setUp(self):
        self.cached = True
        # amueller's read/write key that he will throw away later
        openml.config.apikey = "610344db6388d9ba34f6db45a3cf71de"
-        #openml.config.server = "http://capa.win.tue.nl/api/v1/xml"
-        openml.config.server = "https://test.openml.org/api/v1/xml"
+        self.production_server = "https://www.openml.org/api/v1/xml"


shouldn't this bee openml.config.server?

Yes, thanks!

amueller · 2017-01-26T17:49:51Z

tests/flows/dummy_learn/dummy_forest.py

+        return {}
+
+    def set_params(self, params):
+        return None


I feel it would be better to return self if possible.

amueller · 2017-01-26T19:31:25Z

openml/flows/sklearn_converter.py

+                    # Add the component to the list of components, add a
+                    # component reference as a placeholder to the list of
+                    # parameters, which will be replaced by the real component
+                    # when deserealizing the parameter


Old typo ;)

amueller · 2017-01-26T19:34:15Z

openml/flows/sklearn_converter.py

+    else:
+        name = class_name
+
+    # Get the external versions of all sub-components


This should probably be a function. This function is too long already.

Extracted a function here, as well as for checking that a component is not used multiple times in a flow.

amueller · 2017-01-26T19:35:18Z

openml/flows/sklearn_converter.py

+    to_visit_stack.extend(sub_components.values())
+    while len(to_visit_stack) > 0:
+        visitee = to_visit_stack.pop()
+        for external_version in visitee.external_version.split(','):


So this is a recursion into subcomponents that have already been constructed and which have lists of external versions, right? Maybe a comment?

amueller · 2017-01-26T19:36:41Z

openml/flows/sklearn_converter.py

+        visitee = to_visit_stack.pop()
+        for external_version in visitee.external_version.split(','):
+            external_versions.add(external_version)
+        to_visit_stack.extend(visitee.components.values())


Why is this necessary if visitee already has an external_version string that we just parsed? That contains all the external versions of all subcomponents, right?

You're right. I also changed the comment I added based on your comment above.

amueller · 2017-01-26T19:36:50Z

openml/flows/sklearn_converter.py

+        for external_version in visitee.external_version.split(','):
+            external_versions.add(external_version)
+        to_visit_stack.extend(visitee.components.values())
+    external_versions = list(sorted(external_versions))


Shouldn't we make sure they are unique?

By sorting it it becomes unique, right? Or did I miss something?

joaquinvanschoren · 2017-01-26T19:39:37Z

I fixed some settings

this works https://test.openml.org/api/v1/data/1?api_key=xxx
this too http://test.openml.org/api/v1/data/1?api_key=xxx
This works: http://capa.win.tue.nl/api/v1/json/data/1
This does not https://capa.win.tue.nl/api/v1/json/data/1
That’s because capa.win.tue.nl is not added to the SSL certificate yet (change has been requested)

amueller

First batch of comments.

amueller · 2017-01-26T19:41:20Z

@joaquinvanschoren great that allows us to run the tests by just changing the url of the test server.

amueller

Some coverage comments.

amueller · 2017-01-26T19:50:39Z

openml/flows/flow.py

@@ -105,7 +377,7 @@ def _ensure_flow_exists(self):
        """
        import sklearn
        flow_version = 'sklearn_' + sklearn.__version__
-        _, _, flow_id = _check_flow_exists(self._get_name(), flow_version)
+        _, _, flow_id = _check_flow_exists(self.name, flow_version)


I get an error in the tests in line 384. publish returns self which is not iterable....

This is indeed a bug, but on trying to write a test which covers this function I uncovered a bug on OpenML...

amueller · 2017-01-26T19:51:09Z

openml/flows/flow.py

@@ -117,8 +389,42 @@ def _ensure_flow_exists(self):

        return int(flow_id)


This line doesn't seem to be covered.

amueller · 2017-01-26T19:54:04Z

openml/flows/sklearn_converter.py

+
+    if isinstance(o, dict):
+        if 'oml:name' in o and 'oml:description' in o:
+            # TODO check if this code is actually called


It's not. At least not in the tests.

amueller · 2017-01-26T19:54:34Z

openml/flows/sklearn_converter.py

+    elif isinstance(o, (list, tuple)):
+        rval = [flow_to_sklearn(element, **kwargs) for element in o]
+        if isinstance(o, tuple):
+            rval = tuple(rval)


not covered.

amueller · 2017-01-26T19:55:04Z

openml/flows/sklearn_converter.py

+                # in the brackets) as the identifier
+                pos = identifier.find('(')
+                if pos >= 0:
+                    identifier = identifier[:pos]


not covered

This piece of code would actually have been a bug if it was triggered. Removed.

amueller · 2017-01-26T19:55:49Z

openml/flows/sklearn_converter.py

+
+        # Replace the component placeholder by the actual flow
+        if isinstance(rval, dict) and 'oml-python:serialized_object' in rval:
+            parameter_name, step = rval['value'].split('__')


Not covered?!

This is apparently already done in flow_to_sklearn. Deleting this now.

mfeurer · 2017-01-26T20:01:57Z

@joaquinvanschoren thanks, the tests work again.

@amueller I just added a fix for the name collision.

There is one test failing right now, I will now take care of it.

amueller

I think this is about as good as I can do. I didn't go through all the details, but if you address at least the main comments, I think we're good.

amueller · 2017-01-26T20:22:33Z

openml/flows/sklearn_converter.py

+        rval = None
+    elif isinstance(o, six.string_types):
+        rval = o
+    elif isinstance(o, (bool, int, float)):


could the three cases here could be done together as I suggested elsewhere.

amueller · 2017-01-26T20:24:24Z

openml/flows/sklearn_converter.py

+            # Steps in a pipeline or feature union
+            parameter_value = list()
+            for sub_component_tuple in rval:
+                identifier, sub_component = sub_component_tuple


maybe put the inside of this loop or the loop in a function? a lot of indentation levels and variables to keep track of.

I extracted the whole loop to get model information into its own function. This makes the _serialize_model() easier to read. Also, I added information on how to further restructure the code in case one has to touch it again.

amueller · 2017-01-26T20:27:02Z

tests/flows/test_sklearn.py

+        # different value, it is still correct as it is a propagation of the
+        # subclasses' module name
+        self.assertIn(flow.external_version,
+                      ['dummy_learn==1.0,sklearn==0.18.1',


You're still hard-coding the sklearn version here.... you can import it and get it that way?

Thanks, fixed.

mfeurer · 2017-01-26T20:31:06Z

Thanks a lot! I'll do my best to improve the code :)

amueller · 2017-01-26T20:46:56Z

I'm happy to look again later today (though I imagine you want to sleep at some point) or next week.
If you want to merge today, that's fine, but please open issues for the things I pointed out (like the bug in OpenML that you found).

The PR is really big and I think doing iterative improvements after merge are gonna be easier than trying to get everything right now.

mfeurer · 2017-01-26T21:03:14Z

Yep, I'll have to stop now. But instead of sleeping I need to prepare a presentation...

amueller · 2017-01-26T21:04:04Z

good luck with that :)

joaquinvanschoren · 2017-01-26T21:05:45Z

Good luck and thanks!

On Thu, Jan 26, 2017 at 10:04 PM Andreas Mueller ***@***.***> wrote: good luck with that :) — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#167 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABpQVy3t5JdDmmB3yAekvTq1L1YxlnUMks5rWQpEgaJpZM4Jz1ST> .

-- Thank you, Joaquin

mfeurer · 2017-01-27T15:06:28Z

@joaquinvanschoren the only test that currently fails is openml/OpenML#360.

mfeurer · 2017-01-27T16:33:40Z

Okay, I tackled all of @amueller 's issues and from my side this is ready. Waiting for a fix on OpenML.org to make sure all unittests are fine.

joaquinvanschoren · 2017-01-28T00:11:00Z

I added the run_details field and answered the issue about the '=' in the url.

On Fri, Jan 27, 2017 at 5:34 PM Matthias Feurer ***@***.***> wrote: Okay, I tackled all of @amueller <https://github.com/amueller> 's issues and from my side this is ready. Waiting for a fix on OpenML.org to make sure all unittests are fine. — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#167 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABpQV2yW_KTrN3Xe-4FmZyy_2IM3iFsEks5rWhxlgaJpZM4Jz1ST> .

-- Thank you, Joaquin

mfeurer · 2017-01-30T20:39:52Z

Thanks @amueller @joaquinvanschoren @janvanrijn getting this done :)

joaquinvanschoren · 2017-01-31T00:16:55Z

👍 🎉 🎉 🎉 Awesome! Beers on me the next time we see each other!

mfeurer added 7 commits September 1, 2016 15:16

MAINT split task and flow files into separate files

c40edf1

ADD WIP conversion from sklearn model to OpenMLFlow

4882cda

Merge branch 'develop' into feature/upload-flow

beabca0

Merge branch 'develop' into feature/upload-flow

6a1660b

ADD get_flow, serialization, deserialization of flow

5f78c73

MAINT use scikit-learn 0.18 (master) temporarily

5a0750a

mfeurer added 2 commits September 8, 2016 15:29

ADD sklearn flow serialization and deserialization

c4f73e1

FIX serialize each parameter as json

7cd5741

This was referenced Sep 9, 2016

can't download flow #112

Closed

fix flow specification! #77

Closed

mfeurer changed the title ~~WIP Feature/upload flow~~ Feature/upload flow Sep 9, 2016

FIX work around missing JSONDecodeError in python 2.7/3.4

bd0175a

mfeurer added 2 commits September 9, 2016 23:50

MAINT add docstrings and comments

d09fe4f

MAINT hardcode production server for unittests

6bec1fa

FIX/ENH naming problem, add more asserts to unit tests

7e6a545

mfeurer mentioned this pull request Sep 14, 2016

Can't submit BaggingClassifier with base_estimator param #122

Closed

FIX serialize list of integers which are argument to sklearn model

604d01e

FIX allow feature union to contain None as a preprocessing step

4c7673f

amueller reviewed Jan 26, 2017

View reviewed changes

FIX feature union with switched names

45c7bc8

amueller reviewed Jan 26, 2017

View reviewed changes

TEST fix unittest for python2.7

34243ac

amueller reviewed Jan 26, 2017

View reviewed changes

FIX adapt _check_flow_exists and test

3a90409

MAINT remove unused code

413aaae

mfeurer added 2 commits January 27, 2017 16:03

FIX _check_flow_exists

bbf6379

MAINT improve code upon Andreas' suggestions

c23c00b

MAINT improve code based on Andreas' suggestions

815c259

FIX _check_flow_exists

146a42a

mfeurer merged commit 31bf79e into develop Jan 30, 2017

mfeurer deleted the feature/upload-flow branch January 30, 2017 20:39

mfeurer mentioned this pull request Feb 1, 2017

OpenMLFlow._ensure_flow_exists does not work together with OpenMLFlow.publish() #173

Closed

		@@ -117,8 +389,42 @@ def _ensure_flow_exists(self):

		return int(flow_id)

Feature/upload flow #167

Feature/upload flow #167

Conversation

mfeurer commented Sep 2, 2016

coveralls commented Sep 6, 2016

coveralls commented Sep 8, 2016

mfeurer commented Sep 9, 2016

mfeurer commented Sep 9, 2016

coveralls commented Sep 9, 2016

mfeurer commented Sep 9, 2016

coveralls commented Sep 12, 2016

coveralls commented Sep 12, 2016

coveralls commented Sep 15, 2016

mfeurer commented Sep 15, 2016

coveralls commented Sep 20, 2016

mfeurer commented Jan 26, 2017

amueller commented Jan 26, 2017

amueller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

joaquinvanschoren commented Jan 26, 2017

amueller left a comment

Choose a reason for hiding this comment

amueller commented Jan 26, 2017

amueller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfeurer commented Jan 26, 2017

amueller left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mfeurer commented Jan 26, 2017

amueller commented Jan 26, 2017

mfeurer commented Jan 26, 2017

amueller commented Jan 26, 2017

joaquinvanschoren commented Jan 26, 2017 via email

mfeurer commented Jan 27, 2017

mfeurer commented Jan 27, 2017

joaquinvanschoren commented Jan 28, 2017 via email

mfeurer commented Jan 30, 2017

joaquinvanschoren commented Jan 31, 2017