Skip to content
Permalink
Browse files

Fix the table view issue in data transform design doc. (#1538)

* Fix the format issue for the table view

* Update the issue link
  • Loading branch information
brightcoder01 committed Nov 29, 2019
1 parent 44ed1aa commit dc48009fb573922de5a9a363d3e524a579ded92e
Showing with 3 additions and 2 deletions.
  1. +3 −2 docs/designs/data_transform.md
@@ -41,7 +41,8 @@ In the Analyze step, we will parse the TRANSFORM expression and collect the stat
In the feature column generation step, we will format the feature column template with the variable name and the statistical values to get the integral feature column definition for the transform logic.
The generated feature column definitions will be passed to the next couler step: model training. We combine them with the COLUMN expression to generated the final feature column definitions and then pass to the model. Let's take **NUMERIC(STANDARDIZE(age))** for example, the final definition will be **numeric_column('age', normalizer_fn=lambda x: x - 18.0 / 6.0)**

We plan to implement the following common used transform APIs at the first step. And we will add more according to further requirements.
We plan to implement the following common used transform APIs at the first step. And we will add more according to further requirements.

| Name | Feature Column Template | Analyzer |
|:---------------------------:|:------------------------------------------------------------------------------:|:------------------:|
| STANDARDIZE(x) | numeric_column({var_name}, normalizer_fn=lambda x : x - {mean} / {std}) | MEAN, STDDEV |
@@ -60,5 +61,5 @@ This solution can bring the following benifits:

We need figure out the following points for this further solution:

1. Model Export: Upgrade keras API to support exporting the transform logic and the model definition together to SavedModel for inference. [Issue](https://github.com/tensorflow/transform/issues/150)
1. Model Export: Upgrade keras API to support exporting the transform logic and the model definition together to SavedModel for inference. [Issue](https://github.com/tensorflow/tensorflow/issues/34618)
2. Transform Execution: We will transform the data records one by one using the transform logic in the SavedModel format and then write to a new table. We also need write a Jar, it packages the TensorFlow library, loads the SavedModel into memory and processes the input data. And then we register it as UDF in Hive or MaxCompute and use it to transform the data.

0 comments on commit dc48009

Please sign in to comment.
You can’t perform that action at this time.