fix madlib-434 #123

Closed
wants to merge 4 commits into
from

Conversation

Projects
None yet
2 participants
Member

renyi533 commented Mar 7, 2012

Currently, the decision tree algorithm returns the result table of {Id, Class, Possibility}. For the column of 'Class', it contains the encoded information.

From the point of end users, the 'Class' column should contain information before encoding.

This pull request fix this issue.

renyi533 added some commits Mar 7, 2012

Member

renyi533 commented Mar 9, 2012

This pull request includes two commits.

The second commit only make it compatible with pull request 120.

If pull request 120 is rejected, please just accept the first commit.

If pull request 120 is accepted before this one, please merge both of the two commits.

@renyi533 renyi533 Revert "With pull request 120, we add schema name to meta table. So w…
…e do not need to add it manually. This new change is dependent on 120."

This reverts commit f94beda.
1a62270
Owner

renyi533 commented on 1a62270 Apr 17, 2012

Since pull request 120 should not be accepted, I revert the second commit here.

@haradh1 haradh1 commented on the diff Apr 18, 2012

methods/decision_tree/src/pg_gp/decision_tree.sql_in
- EXECUTE 'ALTER TABLE '||result_table_name||' DROP COLUMN leaf_id;';
+
+ metatable_name = MADLIB_SCHEMA.__get_metatable_name( tree_table_name );
+ -- we do not store schema name to the table, add it here.
+ metatable_name = 'MADLIB_SCHEMA.'||metatable_name;
+ -- Get the metatable storing the encoding information of class.
+ EXECUTE 'SELECT column_name,table_name FROM '||metatable_name||
+ ' WHERE column_type=''c'' LIMIT 1;'
+ INTO result_rec;
+ result_rec.table_name = 'MADLIB_SCHEMA.'||result_rec.table_name;
+
+ -- translate the encoded class information back
+ EXECUTE 'CREATE TABLE '||result_table_name||' AS SELECT n.id, m.'||result_rec.column_name
+ ||' as class,n.prob from '||temp_result_table||' n,'||result_rec.table_name
+ ||' m where n.class=m.key m4_ifdef(`__GREENPLUM__', `DISTRIBUTED BY (id)');';
+
EXECUTE 'DROP TABLE IF EXISTS ' || encoded_table_name || ';';
@haradh1

haradh1 Apr 18, 2012

Member

What is the encoded_table?? Isn't it result_table?

@renyi533

renyi533 Apr 19, 2012

Member

We preprocess/encode the original training table and get the encoded table, which is just a temporary table. It is not result table.

@haradh1

haradh1 Apr 19, 2012

Member

The returned value of "classify_internal" is just confusing. There's no description anywhere.

@renyi533

renyi533 Apr 19, 2012

Member

We can change or add comments in code refactoring. I think this issue is not closely related with this defect.

Let's resolve it in a separate thread. OK?

@haradh1

haradh1 Apr 19, 2012

Member

Yes, this is off topic.

Member

haradh1 commented Apr 18, 2012

I think it's good to go, at least for this issue.

Member

haradh1 commented Apr 21, 2012

Is the comment for the Class column in the classify function still valid?

Member

renyi533 commented Apr 22, 2012

The comment of "class INT - Predicted class." is not valid.

It should be "class (The same type as the class column in training set) - Predicted class.".

I cannot access my machine now. I will modify next Monday.

Member

renyi533 commented Apr 23, 2012

I just modified the comments. Please push in the change.

Member

haradh1 commented Apr 23, 2012

Thanks. Closing.

haradh1 closed this Apr 23, 2012

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment