Skip to content

[Discussion] Remove some global variables in attribute.go #2493

@sneaxiy

Description

@sneaxiy

This issue further addresses: sql-machine-learning/playground#42 (comment)

Where are the global variables

  • Exported global map which stores the parameter docs of models, including

    var PremadeModelParamsDocs map[string]map[string]string
    var extractDocStringsOnce sync.Once
    // OptimizerParamsDocs stores parameters and documents of optimizers
    var OptimizerParamsDocs map[string]map[string]string
    // XGBoostObjectiveDocs stores options for xgboost objective
    var XGBoostObjectiveDocs map[string]string

    const ModelParameterJSON = `
    {
    "DNNClassifier": {

    const OptimizerParameterJSON = `
    {
    "Adadelta": {

    const XGBoostObjectiveJSON = `
    {
    "binary:hinge": "hinge loss for binary classification. This makes predictions of 0 or 1, rather than producing probabilities.",

  • Exported parameter type definitions, including

    var (
    // Bool indicates that the corresponding attribute is a boolean
    Bool = reflect.TypeOf(true)
    // Int indicates that the corresponding attribute is an integer
    Int = reflect.TypeOf(0)
    // Float indicates that the corresponding attribute is a float32
    Float = reflect.TypeOf(float32(0.))
    // String indicates the corresponding attribute is a string
    String = reflect.TypeOf("")
    // IntList indicates the corresponding attribute is a list of integers
    IntList = reflect.TypeOf([]int{})
    // Unknown type indicates that the attribute type is dynamically determined.
    Unknown = reflect.Type(nil)
    )

How to remove the exported global map which stores the parameter docs of models

Some of these global variables share the same information. For example:

  • PremadeModelParamsDocs is deserialized from ModelParameterJSON. So we can remove ModelParameterJSON. Furthermore, ModelParameterJSON is a variable which can be auto generated by python extract_docstring.py > model_parameters.go command. We can do this auto generation using go generate to generate PremadeModelParamsDocs automatically too.

  • OptimizerParameterDocs is deserialized from OptimizerParameterJSON. So we can remove OptimizerParameterJSON. Furthermore, OptimizerParameterJSON can also be auto generated by python extract_docstring.py > model_parameters.go using go generate.

  • XGBoostObjectiveDocs is deserialized from XGBoostObjectiveJSON . So we can remove XGBoostObjectiveJSON and keep XGBoostObjectiveDocs.

Note that PremadeModelParamsDocs, OptimizerParameterDocs and XGBoostObjectiveDocs are also used in cli prompt suggestion (see 1, 2, 3). So we cannot remove these 3 variables or hide them.

How to remove exported parameter type definitions

sql-machine-learning/playground#42 (comment) suggests to enhance compile time data checking using the following ways.

var distributedTrainingAttributes = attribute.Dictionary{}.
	Int("train.num_ps", 0, "", nil).
	Int("train.num_workers", 1, "", nil).
	Int("train.worker_cpu", 400, "", nil)

In this way, attribute.Description, attribute.Int, attribute.Float, etc can be hidden. The signature of Dictionary.Int would be:

func (d Dictionary) Int(name string, value int, doc string, checker func(int) error)

The only concern of this method is that we cannot support nil default value. Some of the models may have attributes with nil default values. For example, the default value of num_class in XGBoost model is nil(see here), because only multi-class (>2) classification models need num_class while the other models do not need num_class. And once num_class is provided in SQL WITH statement, it must be an integer number, so the data type of num_class attribute should be int. The meaning of nil default value in SQLFlow is:

  • If the attribute is provided in WITH statement, SQLFlow would check whether its type is right and call Description.Checker() to check whether it is valid. For example, if num_class is provided in SQL WITH statement, SQLFlow would first check whether the value of num_class is an integer, and call Description.Checker() to check whether it is a positive number.
  • If the attribute is not provided in WITH statement, nothing would be checked.

We can enhance this method to support nil default value.

NewDictionary().Int("num_class", "Number of classes").Default(5)

Since the Default method can be used both after Int(...) and Float(...), the input parameter type of Default must be interface{}. Therefore, the signature of Default method should be

func (d Dictionary) Default(interface{}) Dictionary

In this way, we cannot check whether the default value is of the right type in compile time.

  • Add both Dictionary.Int and Dictionary.IntOrNil method. The signature of Dictionary.IntOrNil method is like:
func (d Dictionary) IntOrNil(name string, doc string, checker func(int) error).

In this way, we would double the APIs to Dictionary.

  • Use optional package to make Dictionary.Int method accepts both int default value and nil default value. The signature of Dictionary.Int would be
func (d Dictionary) Int(name string, defaultValue optional.Int, 
                        doc string, checker func(int) error) {
  ...
}

var dict = Dictionary{}.
   Int("num_class", optional.Int{}, "doc1", checker1). // default value is nil
   Int("attr_with_default_value", optional.NewInt(0), "doc2", checker2) // default value is 0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions