Support run extendedSQL for hive #319

weiguoz · 2019-04-29T13:20:23Z

TODO:

need more tests

hungry1526 · 2019-05-04T05:19:34Z

Dockerfile

@@ -4,7 +4,7 @@ RUN apt-get update
 RUN apt-get install -y python3-pip

 RUN pip3 install --upgrade pip
-RUN pip3 install tensorflow mysql-connector-python pyhive jupyter sqlflow
+RUN pip3 install tensorflow mysql-connector-python thrift pyhive jupyter sqlflow
 # Fix jupyter server "connecting to kernel" problem
 # https://github.com/jupyter/notebook/issues/2664#issuecomment-468954423
 RUN pip3 install tornado==4.5.3


Is it also possible to incorporate this line?

typhoonzero · 2019-05-08T14:23:27Z

sql/codegen.go

+	return &columnType{n, t}
+}
+
+func translateColumnType(ct *columnType) columnType {


how about use a constant map to define the type transform, like:

var column2feature = map[string]string{ "FLOAT": "numeric_column", "FLOAT_TYPE" : "numeric_column" // for hive only }

sql/codegen.go

weiguoz · 2019-05-12T08:14:47Z

Apologize this pr is a bit complicated. Most commits here support run extendedSQL for hive.
It contains as below in details:

Enhance database compatible, like type and field name are as different as SQL engines. issue 318：describeTable
Change hive dbapi from dropbox/pyhive to cloudera/impyla in codegen, to avoid this executemany bug

typhoonzero · 2019-05-13T02:06:55Z

sqlfs/writer.go

+			return fmt.Errorf("flush to %s, error:%v", w.table, e)
+		}
+		w.buf = w.buf[:0]
+		w.flushID++


Just curious that if we are writing multiple versions of model parameters to DB ( have many flushIDs) then how to determine which model to use when running prediction?

Is there already checks that can enure allways writing parameters of the same model? Like if a user trained a DNN model first, then he change the model type to LR but didn't change the table name where we save the model parameters?

We are considering using independent storage media to save models. Let's keep on discussing in that thread.

typhoonzero · 2019-05-13T02:13:06Z

sqlfs/table.go

+	// HIVE and ODPS don't support AUTO_INCREMENT
+	// Hive and ODPS don't support BLOB, use BINARY instead
+	var stmt string
+	if driver == "mysql" || driver == "sqlite3" {


Maybe we need DriverType type to unify these checks in order to avoid things like coding mistakes like "Hive" == "hive"

Good catch.
This will be fixed in Unify DriverType name.

Dockerfile.dev

sql/codegen.go

sql/verifier.go

sqlfs/writer.go

sql/codegen.go

sqlfs/writer.go

tonyyang-svail

LGTM. Excellent job!

weiguoz added 3 commits April 29, 2019 14:44

fix field name in describeTable; add type(float_type in hive) mapping

55f76f8

add hive dependencies for python; fix bug

fb8bedb

change columnType struct

8236bc1

weiguoz requested review from wangkuiyi and tonyyang-svail April 29, 2019 13:20

weiguoz force-pushed the unify_dbschema_format branch from 4e115c0 to 3e9660e Compare April 29, 2019 13:26

add driver for sqlf

b889cd1

weiguoz force-pushed the unify_dbschema_format branch from 3e9660e to b889cd1 Compare April 30, 2019 09:43

hungry1526 reviewed May 4, 2019

View reviewed changes

weiguoz added 5 commits May 6, 2019 11:00

reduce exposed function

971fd78

change stmt to db operations

4e4b345

recover test

dc7c1d9

add test: write to hive

12fcfa1

write models to hive

b9de07a

typhoonzero reviewed May 8, 2019

View reviewed changes

weiguoz added 5 commits May 10, 2019 21:06

change dbapi from pyhive to impyla

37dde20

recover test

8febaef

Merge branch 'develop' into unify_dbschema_format

4034e7a

fix test if db-api exists

3948c1a

follow PR368, fix ci

30f3f0c

weiguoz changed the title ~~[WIP] Hold "go-describeTable" to codegen~~ Support run extendedSQL for hive May 12, 2019

weiguoz requested review from typhoonzero and hungry1526 May 12, 2019 08:15

weiguoz added feature gohive labels May 12, 2019

weiguoz self-assigned this May 12, 2019

weiguoz added 2 commits May 12, 2019 17:04

Merge branch 'develop' into unify_dbschema_format

7911027

dbg ci

875fb4b

typhoonzero reviewed May 13, 2019

View reviewed changes

typhoonzero mentioned this pull request May 13, 2019

Support saving model for multiple times and choose a best one when predicting #373

Open

tonyyang-svail reviewed May 13, 2019

View reviewed changes

Dockerfile.dev Outdated Show resolved Hide resolved

sql/codegen.go Outdated Show resolved Hide resolved

typhoonzero mentioned this pull request May 13, 2019

Unify DriverType name #374

Closed

tonyyang-svail reviewed May 13, 2019

View reviewed changes

sql/codegen.go Show resolved Hide resolved

sql/verifier.go Outdated Show resolved Hide resolved

sqlfs/writer.go Outdated Show resolved Hide resolved

weiguoz added 4 commits May 13, 2019 14:44

impyla does not rely on thrift, just remove it

6a539c2

add comments

f960027

redefine fieldTypes struct

58ff03e

Strictly type check for codegen

12e741a

tonyyang-svail previously approved these changes May 13, 2019

View reviewed changes

sql/codegen.go Outdated Show resolved Hide resolved

sqlfs/writer.go Outdated Show resolved Hide resolved

weiguoz mentioned this pull request May 13, 2019

Use independent storage media to save models #377

Closed

follow comments

0792094

weiguoz dismissed tonyyang-svail’s stale review via 0792094 May 13, 2019 09:23

weiguoz requested review from tonyyang-svail and typhoonzero May 13, 2019 09:43

tonyyang-svail approved these changes May 13, 2019

View reviewed changes

weiguoz merged commit 99cfd80 into sql-machine-learning:develop May 13, 2019

weiguoz deleted the unify_dbschema_format branch May 13, 2019 11:03

tonyyang-svail mentioned this pull request Sep 25, 2019

Add comments to make TravisCI config comprehensive #929

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support run extendedSQL for hive #319

Support run extendedSQL for hive #319

weiguoz commented Apr 29, 2019

hungry1526 May 4, 2019

typhoonzero May 8, 2019

weiguoz commented May 12, 2019 •

edited

Loading

typhoonzero May 13, 2019

typhoonzero May 13, 2019

weiguoz May 13, 2019

typhoonzero May 13, 2019

weiguoz May 13, 2019

tonyyang-svail left a comment •

edited

Loading

Support run extendedSQL for hive #319

Support run extendedSQL for hive #319

Conversation

weiguoz commented Apr 29, 2019

hungry1526 May 4, 2019

Choose a reason for hiding this comment

typhoonzero May 8, 2019

Choose a reason for hiding this comment

weiguoz commented May 12, 2019 • edited Loading

typhoonzero May 13, 2019

Choose a reason for hiding this comment

typhoonzero May 13, 2019

Choose a reason for hiding this comment

weiguoz May 13, 2019

Choose a reason for hiding this comment

typhoonzero May 13, 2019

Choose a reason for hiding this comment

weiguoz May 13, 2019

Choose a reason for hiding this comment

tonyyang-svail left a comment • edited Loading

Choose a reason for hiding this comment

weiguoz commented May 12, 2019 •

edited

Loading

tonyyang-svail left a comment •

edited

Loading