[Design] Intermediate Representation by tonyyang-svail · Pull Request #785 · sql-machine-learning/sqlflow

tonyyang-svail · 2019-09-06T00:56:05Z

No description provided.

typhoonzero

Agree to have IR now, we still need to look into the details of the IR struct.

typhoonzero · 2019-09-06T06:33:38Z

+	ExtraSelect map[string]string      // e.g. {"validation": "select * from iris.val;"}
+	Estimator   string                 // e.g. "DNNClassifier"
+	Attribute   map[string]interface{} // e.g. {"train.epoch": 1000, "model.hidden_units": [10 10]}
+	Feature     map[string]FieldMeta   // e.g. {"sepal_length": {"float", "", [1], false}, ...}


Feature only have information about how to parse the column data, we still need information about how to construct a feature column on it.

Thanks. I added FeatureColumn field in FieldMeta.

typhoonzero · 2019-09-06T06:42:45Z

+	Select      string                 // e.g. "select * from iris.train"
+	ExtraSelect map[string]string      // e.g. {"validation": "select * from iris.val;"}
+	Estimator   string                 // e.g. "DNNClassifier"
+	Attribute   map[string]interface{} // e.g. {"train.epoch": 1000, "model.hidden_units": [10 10]}


#774 here's some of my thoughts about refactoring attribute resolution.

@typhoonzero Thanks for the link. I think Attribute map[string]interface{} here is fine.

I believe AttrMeta belong to the code generation package since different machine learning toolkits accept different attributes.

I believe AttrMeta belong to the code generation package since different machine learning toolkits accept different attributes

It is. Each submitter's code generation package define it's AttrMeta and call a function to get an Attribute map[string]interface{}.

weiguoz · 2019-09-06T07:06:07Z

+	ExtraConfig string                 // Extra configuration in JSON format. e.g. OSS credential
+	Select      string                 // e.g. "select * from iris.train"
+	Estimator   string                 // e.g. "DNNClassifier"
+	Attribute   map[string]interface{} // e.g. {"predict.batch_size": 32}


Attribute(s)?

I think singular is fine. The map type already indicates there are multiple attributes. The same reasoning applies to Feature.

Yancey0623 · 2019-09-06T09:01:04Z

+	IsSparse  bool   // e.g. false
+}
+
+// TrainIR is the intermediate representation for code generation of a training job


Maybe we don't need to implement an IR for each job, how about simplifying like:

type FeatureMeta struct { DType string Delimiter string ... } type DBConn struct { Driver string User string .... } type ClauseIR struct { Estimator string SelectClause string Attributes map[string]interface{} DBConn DBConn Features map[string]FeatureMeta ... }

Each generator can extend the ClauseIR as needed.

@Yancey1989 Thanks for the suggestion. Combining all three IRs to a single ClauseIR does save some code. However, I still advocate using separate IRs for different job types. Here is my reasoning.

Avoid confusion. The developer of xgboost.Predict would be confused by the ValidationSelect field in ClauseIR. Also, as we adding more features to SQLFlow, more fields would be added to CluaseIR, and the confusion will increase.

Less work. We either distinguish the job type in sql or in codegen. However, there are many codegens and only one sql. Distinguishing the job type in sql saves works in all codegens.

typhoonzero

LGTM generally

weiguoz

LGTM

Yancey0623

LGTM

tonyyang-svail added 2 commits September 5, 2019 17:55

[Design] Intermediate Representation

3f6f824

Update design_intermediate_representation.md

9952f30

tonyyang-svail changed the title ~~[wip] Intermediate Representation~~ [Design] Intermediate Representation Sep 6, 2019

weiguoz reviewed Sep 6, 2019

View reviewed changes

Comment thread doc/design_intermediate_representation.md Outdated

typhoonzero reviewed Sep 6, 2019

View reviewed changes

weiguoz reviewed Sep 6, 2019

View reviewed changes

Comment thread doc/design_intermediate_representation.md Outdated

weiguoz reviewed Sep 6, 2019

View reviewed changes

Yancey0623 reviewed Sep 6, 2019

View reviewed changes

tonyyang-svail added 4 commits September 6, 2019 11:16

follow comments

09f3fe5

Update design_intermediate_representation.md

1f04ed6

Update design_intermediate_representation.md

ed49d4b

polish

445dc56

typhoonzero approved these changes Sep 8, 2019

View reviewed changes

weiguoz self-requested a review September 9, 2019 00:57

weiguoz approved these changes Sep 9, 2019

View reviewed changes

Yancey0623 approved these changes Sep 9, 2019

View reviewed changes

tonyyang-svail merged commit 663c9a1 into develop Sep 9, 2019

terrytangyuan deleted the design-intermediate-representation branch September 9, 2019 01:27

tonyyang-svail mentioned this pull request Sep 10, 2019

[Intermediate Representation] XGBoost codegen refactor TODO list #805

Closed

weiguoz mentioned this pull request Oct 10, 2019

[Intermediate Representation] Analysis codegen refactor TODO list #978

Closed

3 tasks

Conversation

tonyyang-svail commented Sep 6, 2019

Uh oh!

Uh oh!

typhoonzero left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

typhoonzero left a comment

Choose a reason for hiding this comment

Uh oh!

weiguoz left a comment

Choose a reason for hiding this comment

Uh oh!

Yancey0623 left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

typhoonzero left a comment •

edited

Loading