Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support run extendedSQL for hive #319

Merged
merged 21 commits into from
May 13, 2019

Conversation

weiguoz
Copy link
Collaborator

@weiguoz weiguoz commented Apr 29, 2019

fix #318

TODO:

  • need more tests

Dockerfile Outdated
@@ -4,7 +4,7 @@ RUN apt-get update
RUN apt-get install -y python3-pip

RUN pip3 install --upgrade pip
RUN pip3 install tensorflow mysql-connector-python pyhive jupyter sqlflow
RUN pip3 install tensorflow mysql-connector-python thrift pyhive jupyter sqlflow
# Fix jupyter server "connecting to kernel" problem
# https://github.com/jupyter/notebook/issues/2664#issuecomment-468954423
RUN pip3 install tornado==4.5.3
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also possible to incorporate this line?

sql/codegen.go Outdated
return &columnType{n, t}
}

func translateColumnType(ct *columnType) columnType {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about use a constant map to define the type transform, like:

var column2feature = map[string]string{
  "FLOAT":            "numeric_column",
  "FLOAT_TYPE" : "numeric_column"   // for hive only
}

sql/codegen.go Outdated Show resolved Hide resolved
@weiguoz
Copy link
Collaborator Author

weiguoz commented May 12, 2019

Apologize this pr is a bit complicated. Most commits here support run extendedSQL for hive.
It contains as below in details:

  1. Enhance database compatible, like type and field name are as different as SQL engines. issue 318:describeTable
  2. Change hive dbapi from dropbox/pyhive to cloudera/impyla in codegen, to avoid this executemany bug

@weiguoz weiguoz changed the title [WIP] Hold "go-describeTable" to codegen Support run extendedSQL for hive May 12, 2019
@weiguoz weiguoz self-assigned this May 12, 2019
return fmt.Errorf("flush to %s, error:%v", w.table, e)
}
w.buf = w.buf[:0]
w.flushID++
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious that if we are writing multiple versions of model parameters to DB ( have many flushIDs) then how to determine which model to use when running prediction?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there already checks that can enure allways writing parameters of the same model? Like if a user trained a DNN model first, then he change the model type to LR but didn't change the table name where we save the model parameters?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are considering using independent storage media to save models. Let's keep on discussing in that thread.

// HIVE and ODPS don't support AUTO_INCREMENT
// Hive and ODPS don't support BLOB, use BINARY instead
var stmt string
if driver == "mysql" || driver == "sqlite3" {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need DriverType type to unify these checks in order to avoid things like coding mistakes like "Hive" == "hive"

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.
This will be fixed in Unify DriverType name.

Dockerfile.dev Outdated Show resolved Hide resolved
sql/codegen.go Outdated Show resolved Hide resolved
sql/codegen.go Show resolved Hide resolved
sql/verifier.go Outdated Show resolved Hide resolved
sqlfs/writer.go Outdated Show resolved Hide resolved
tonyyang-svail
tonyyang-svail previously approved these changes May 13, 2019
sql/codegen.go Outdated Show resolved Hide resolved
sqlfs/writer.go Outdated Show resolved Hide resolved
Copy link
Collaborator

@tonyyang-svail tonyyang-svail left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Excellent job!

@weiguoz weiguoz merged commit 99cfd80 into sql-machine-learning:develop May 13, 2019
@weiguoz weiguoz deleted the unify_dbschema_format branch May 13, 2019 11:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Hold "go-describeTable" to codegen
4 participants