Working with non-hardcoded data #141

flexthink · 2018-01-16T04:52:09Z

As of now, I didn't find a way to pass parameters to the data() function, and it appears to ignore everything in the code except imports. This is because Hyperas creates a new Python file out of the data, the model and everything else before attempting to train, and this works well if you're training on MNIST or some other data set that came with the framework - or on random data. But what if the data-set is selected from a drop-down or retrieved from a URL? What if you want to run it out of a script that has a config file that specifies the path to the data? What if it needs to read a database? Is there a way to do this the way Hyperas is currently set up? If not, is there anything on the roadmap?

The text was updated successfully, but these errors were encountered:

pkairys · 2018-01-27T04:47:52Z

As you pointed out hyperas is currently a simple wrapper that uses data() and model() as templates from which it formats code that it then executes. Meaning that within data() you would define everything, just like a regular script.

In all of your examples, you basically want to be able to generate new templates that hyperas can call.

e.g. lets say you have a application that uses hyperas based on an input dataset:

data_template = "def data(): \n{pipeline} \nreturn x_train,y_train,x_test,y_test"

pipelines = {'mnist': ' import something \n# some reshape \n# some scaling', ...}

def get_data_func(dset):
    pipeline = pipelines[dset]
    return data_template.format(pipeline=pipeline)

def model(x_train, x_test, y_train, y_test):
    # define model
    return {'loss' :-acc , ....}

def do_optimize(input_dset):
    data_func_string = get_data_func(input_dset)
    best_run, best_model = optim.minimize(model=model,
                                          data= data_func_string, 
                                            ...)
    return best_run, best_model

if __name__ == '__main__':
    input_dset = input('What dataset do you want optimize a model for?')
    best_run, best_model = do_optimize(input_dset)

get_data_func('mnist') would return a string like:

''' def data():
    import something
    # some reshape
    # some scaling
   return x_train,y_train,x_test,y_test'''

This currently is not allowed, but shouldn't take too long to hack out. Basically just making sure that formatting is consistent with the internal of hyperas. The source that you'd want to touch is here around line 194 or so.

Something like:

if not isinstance(data,str):
    # line 194
else:
    data_string = data

The example above is also not how you should template strings in this situation. I recommend something like jinja if you are really going to go down that path and need flexibility. It may be better to just go with regular hyperopt in this situation.

Does this help?

dehdari · 2018-03-22T00:02:35Z

Another way is to pickle the arguments for data() to a file, then in data() unpickle them. The file path would need to be hardcoded. You can also do this for the model() function. Instead of pickling the data you can also save the info as a plaintext file.

A simple example:

def data():
    import argparse                       
    import pickle

    args_file = 'data_args.pkl'    
    args = pickle.load(open(args_file, 'rb'))
    (X_train, y_train) = some_file_loader(args.train)
    (X_valid, y_valid) = some_file_loader(args.valid)
    return X_train, y_train, X_valid, y_valid


import argparse
import pickle

parser = argparse.ArgumentParser()
parser.add_argument('--train', help='Training data file', type=str, required=True)
parser.add_argument('--valid', help='Validation data file', type=str, required=True)
args = parser.parse_args()

args_file = 'data_args.pkl'
pickle.dump(args, open(args_file, 'wb'))
X_train, y_train, X_valid, y_valid = data()
best_run, best_model = optim.minimize(model=model,
                                      data=data, 
                                            ...)

maxpumperla · 2018-06-13T08:25:17Z

not sure I can do any better answering this question than @pkairys or @dehdari. Will add this to the README for future reference.

virtualdvid · 2019-04-24T06:29:32Z

For future reference, if someone else has this issue. There is a simple way to do it:

We just have to write a function that returns the args:

import argparse

def my_args():
   parser = argparse.ArgumentParser()
   parser.add_argument('--train', help='Training data file', type=str, required=True)
   parser.add_argument('--valid', help='Validation data file', type=str, required=True)
   args = parser.parse_args()
   return args

Then we can call it in minimize as follows:

best_run, best_model = optim.minimize(model=model,
                                      data=data,
                                      functions=[my_args],
                                            ...)

then call it in model:

def model(x_train, x_test, y_train, y_test):
    args = my_args()
    train_file = args.train
    valid_file = args.valid
    # define model
    return {'loss' :-acc , ....}
```

maxpumperla closed this as completed in 9c6d2a9 Jun 13, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Working with non-hardcoded data #141

Working with non-hardcoded data #141

flexthink commented Jan 16, 2018

pkairys commented Jan 27, 2018

dehdari commented Mar 22, 2018

maxpumperla commented Jun 13, 2018

virtualdvid commented Apr 24, 2019 •

edited

Loading

Working with non-hardcoded data #141

Working with non-hardcoded data #141

Comments

flexthink commented Jan 16, 2018

pkairys commented Jan 27, 2018

dehdari commented Mar 22, 2018

maxpumperla commented Jun 13, 2018

virtualdvid commented Apr 24, 2019 • edited Loading

virtualdvid commented Apr 24, 2019 •

edited

Loading