You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: doc/design/clustermodel.md
+6-6Lines changed: 6 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -22,13 +22,13 @@ The figure below demonstrates the overall workflow for cluster model training, w
22
22
23
23
In this scenario, we focus on the extraction of data patterns in unsupervised learning.
24
24
25
-
So, the user can use `TRAIN` keyword to training a model. The user can also specify the training hyper-parameters with the keyword `WITH` and determine whether to use pre-trained model by `USING`. The training and predicting syntax looks like:
25
+
So, the user can use `TO TRAIN` keyword to training a model. The user can also specify the training hyper-parameters with the keyword `WITH` and determine whether to use pre-trained model by `USING`. The training and predicting syntax looks like:
26
26
27
-
TRAIN SQL:
27
+
TO TRAIN SQL:
28
28
29
29
```sql
30
30
SELECT*FROM input_table
31
-
TRAIN clusterModel
31
+
TO TRAIN clusterModel
32
32
WITH
33
33
model.encode_units= [100, 7]
34
34
model.n_clusters=5
@@ -38,12 +38,12 @@ USING existed_pretrain_model
38
38
INTO my_cluster_model;
39
39
```
40
40
41
-
PREDICT SQL:
41
+
TO PREDICT SQL:
42
42
43
43
```sql
44
44
SELECT*
45
45
FROM input_table
46
-
PREDICT output_table.group_id
46
+
TO PREDICT output_table.group_id
47
47
USING my_cluster_model;
48
48
```
49
49
@@ -108,7 +108,7 @@ Therefore, there are four cases in total:
108
108
109
109
- In the first stage of the clustering model on SQLFlow, we plan to achieve the `first case`. We will achieve the other cases in the later.
110
110
111
-
- Users can use the trained cluster model in` PREDICTSQL` to predict the group of input_table to get output_table.
111
+
- Users can use the trained cluster model in` TOPREDICTSQL` to predict the group of input_table to get output_table.
112
112
113
113
- Finally, the user can perform a combined aggregation operation on the output_table based on the SQL statement to obtain a result_table, which can be saved to the local dataframe and then analyzed according to his own needs.
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TRAIN and PREDICT clauses. For more discussion, please refer to the [syntax design](syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TRAIN or PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TRAIN or PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
11
+
The basic idea of SQLFlow is to extend the SELECT statement of SQL to have the TRAIN and PREDICT clauses. For more discussion, please refer to the [syntax design](syntax.md). SQLFlow translates such "extended SQL statements" into submitter programs, which forward the part from SELECT to TO TRAIN or TO PREDICT, which we call the "standard part", to the SQL engine. SQLFlow also accepts the SELECT statement without TO TRAIN or TO PREDICT clauses and would forward such "standard statements" to the engine. It is noticeable that the "standard part" or "standard statements" are not standardized. For example, various engines use different syntax for `FULL OUTER JOIN`.
12
12
13
13
- Hive supports `FULL OUTER JOIN` directly.
14
14
- MySQL doesn't have `FULL OUTER JOIN`. However, a user can emulates `FULL OUTER JOIN` using `LEFT JOIN`, `UNION` and `RIGHT JOIN`.
0 commit comments