Smartimputer #213

janvanrijn · 2017-03-21T10:33:12Z

Added support for data features
Added utils.preprocessor.ConditionalImputer
Much more ...

added the conditional imputer (for our benchmark algorithms), setup for function that discriminates between feature types

added testcase

…lass '3', which can now be hanndled)

…a list

bugfix conditional imputer

adapted test accordingly

mfeurer

Also, I think that the improved imputer should live in the benchmark study repository.

mfeurer · 2017-03-23T15:56:49Z

openml/datasets/data_feature.py

+       ----------
+       index : int
+            The index of this feature
+        name : string


str instead of string.

mfeurer · 2017-03-23T15:56:56Z

openml/datasets/data_feature.py

+            The index of this feature
+        name : string
+            Name of the feature
+        data_type : string


mfeurer · 2017-03-23T15:58:20Z

openml/datasets/data_feature.py

+    LEGAL_DATA_TYPES = ['nominal', 'numeric', 'string', 'date']
+
+    def __init__(self, index, name, data_type, nominal_values, number_missing_values):
+        assert type(index) is int, "Index is of wrong datatype"


You should use if statements here, assert statements can be turned off by the user.

mfeurer · 2017-03-23T15:59:11Z

openml/datasets/dataset.py

+        if isinstance(ignore_attribute, str):
+            self.ignore_attributes = [ignore_attribute]
+        elif isinstance(ignore_attribute, list):
+            self.ignore_attributes = ignore_attribute


There should be an else to make sure that we don't introduce any weird bugs here.

Agreed, added a value error

mfeurer · 2017-03-23T16:03:34Z

openml/datasets/dataset.py

+                                            xmlfeature['oml:data_type'],
+                                            None, #todo add nominal values (currently not in database)
+                                            int(xmlfeature['oml:number_of_missing_values']))
+                assert idx == feature.index, "Data features not provided in right order"


Should be an if + exception.

mfeurer · 2017-03-23T16:30:51Z

openml/runs/functions.py

+            the label that was predicted
+        predicted_probabilities : array (size=num_classes)
+            probabilities per class
+        class_labels : array (size=num_classes)


model_classes_mapping is not in the docstring; it's hard to tell what this function does.

mfeurer · 2017-03-23T16:36:12Z

openml/runs/functions.py

+                    model_classes = model.best_estimator_.classes_
+                else:
+                    model_classes = model.classes_
+            except AttributeError as e:


Thinking about this, we shouldn't catch anything here I think. Especially since attribute regressors can be able to work on classification tasks.

It all depends on whether we want openml-python to upload runs with client errors or block them all.
In the prior case, we should catch. In the other case, we should not.

mfeurer · 2017-03-23T16:47:07Z

tests/test_flows/test_flow.py

@@ -153,6 +153,23 @@ def test_publish_flow(self):
        flow.publish()
        self.assertIsInstance(flow.flow_id, int)

+    def test_semi_legal_flow(self):


What exactly is tested here? OpenML should reject this flow, because it contains the bagging classifier twice.

While that might be the case, it contains two distinguishable forms of bagging.

Bagging(Bagging(J48))

Bagging(J48)

Therefore, OpenML will be able to set the parameters of the individual components correct at any run, and there is no problem

mfeurer · 2017-03-23T16:48:04Z

tests/test_flows/test_flow.py

+
+        flow.publish()
+
+    def test_illegal_flow(self):


Could you please add a docstring why exactly this is illegal? Someone not too familiar with OpenML might not know about this.

mfeurer · 2017-03-23T16:49:49Z

tests/test_runs/test_run_functions.py

+        rep_no = 0
+        # TODO use different iterator to only provide a single iterator (less
+        # methods, less maintenance, less confusion)
+        for rep in task.iterate_repeats():


I'm not sure what this code is exactly doing here. In the end, you only want to test _prediction_to_row, right?

Actually, I wanted to test this in the context of _run_task_get_arffcontent without publishing the run. Adjusted.

janvanrijn added 24 commits March 5, 2017 16:49

Added supprt for data features,

b8bb34b

added the conditional imputer (for our benchmark algorithms), setup for function that discriminates between feature types

finished conditional imputer with tests

7d24294

fix for cache error in unit tests

0df8c52

fixing unit tests

bb524e7

fix unit tests (by allowing features to be "None", as before)

676b560

made imputer add constant instead for removed columns

09f6ff4

added testcase

added addtional testcases to conditional imputer

6138c52

fixed problem with 'empty' result sets (e.g., dataset anneal misses c…

e5b23ed

…lass '3', which can now be hanndled)

made dataset.get_features_of_type aware of features that will be removed

d0e0638

added complicated testcase

07eab1a

adapted dataset testcases to new assumption that ignore_attribute is …

ad36d3b

…a list

make unit tests work again

d6d87ff

fix row id attribute mask

2c07bba

added conditional importer to init (otherwise an error is raised)

5d7324e

same as prev

bc8895b

removed typo from flow listings,

b1eaf7b

bugfix conditional imputer

added semi legal and illegal flow testcases

71ec3fc

travis fix?

dbb3b58

travis fix?

567c12b

travis fix!!

bd5fdb1

added sentinel for failing test and redirected testcase to live

72af80e

remove magical import

6ebbd13

updated confitionalimputer to same codebase as sklearn

82fb7b2

adapted test accordingly

for unit test

7236528

mfeurer requested changes Mar 23, 2017

View reviewed changes

janvanrijn added 5 commits March 24, 2017 22:15

made python unit tests work with new test server setup

f9bf4f2

requested changes for pullrequest #213

9ec141c

changed string type checking

7de99ff

fixed string check with python 2 compatible code

900676a

correct string checking

905951f

mfeurer and others added 4 commits March 27, 2017 15:01

MAINT remove Jans conditional imputer

f05bcd7

MAINT improve unit test

be63814

changed doctype of get_features_by_type

1abf093

do not propagate error messages to openml server

6853392

mfeurer approved these changes Mar 27, 2017

View reviewed changes

mfeurer added 2 commits March 27, 2017 15:58

FIX replace linear regression by logistic regression

b3063f4

FIX last commit/do not push errors to server

b11d5a5

mfeurer merged commit b3262b6 into develop Mar 27, 2017

mfeurer deleted the smartimputer branch March 27, 2017 15:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Smartimputer #213

Smartimputer #213

janvanrijn commented Mar 21, 2017

mfeurer left a comment

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

mfeurer Mar 23, 2017

janvanrijn Mar 27, 2017

Smartimputer #213

Smartimputer #213

Conversation

janvanrijn commented Mar 21, 2017

mfeurer left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment