Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Working with non-hardcoded data #141

Closed
flexthink opened this issue Jan 16, 2018 · 4 comments
Closed

Working with non-hardcoded data #141

flexthink opened this issue Jan 16, 2018 · 4 comments

Comments

@flexthink
Copy link

As of now, I didn't find a way to pass parameters to the data() function, and it appears to ignore everything in the code except imports. This is because Hyperas creates a new Python file out of the data, the model and everything else before attempting to train, and this works well if you're training on MNIST or some other data set that came with the framework - or on random data. But what if the data-set is selected from a drop-down or retrieved from a URL? What if you want to run it out of a script that has a config file that specifies the path to the data? What if it needs to read a database? Is there a way to do this the way Hyperas is currently set up? If not, is there anything on the roadmap?

@pkairys
Copy link

pkairys commented Jan 27, 2018

As you pointed out hyperas is currently a simple wrapper that uses data() and model() as templates from which it formats code that it then executes. Meaning that within data() you would define everything, just like a regular script.

In all of your examples, you basically want to be able to generate new templates that hyperas can call.

e.g. lets say you have a application that uses hyperas based on an input dataset:

data_template = "def data(): \n{pipeline} \nreturn x_train,y_train,x_test,y_test"

pipelines = {'mnist': ' import something \n# some reshape \n# some scaling', ...}

def get_data_func(dset):
    pipeline = pipelines[dset]
    return data_template.format(pipeline=pipeline)

def model(x_train, x_test, y_train, y_test):
    # define model
    return {'loss' :-acc , ....}

def do_optimize(input_dset):
    data_func_string = get_data_func(input_dset)
    best_run, best_model = optim.minimize(model=model,
                                          data= data_func_string, 
                                            ...)
    return best_run, best_model

if __name__ == '__main__':
    input_dset = input('What dataset do you want optimize a model for?')
    best_run, best_model = do_optimize(input_dset)

get_data_func('mnist') would return a string like:

''' def data():
    import something
    # some reshape
    # some scaling
   return x_train,y_train,x_test,y_test'''

This currently is not allowed, but shouldn't take too long to hack out. Basically just making sure that formatting is consistent with the internal of hyperas. The source that you'd want to touch is here around line 194 or so.

Something like:

if not isinstance(data,str):
    # line 194
else:
    data_string = data

The example above is also not how you should template strings in this situation. I recommend something like jinja if you are really going to go down that path and need flexibility. It may be better to just go with regular hyperopt in this situation.

Does this help?

@dehdari
Copy link

dehdari commented Mar 22, 2018

Another way is to pickle the arguments for data() to a file, then in data() unpickle them. The file path would need to be hardcoded. You can also do this for the model() function. Instead of pickling the data you can also save the info as a plaintext file.

A simple example:

def data():
    import argparse                       
    import pickle

    args_file = 'data_args.pkl'    
    args = pickle.load(open(args_file, 'rb'))
    (X_train, y_train) = some_file_loader(args.train)
    (X_valid, y_valid) = some_file_loader(args.valid)
    return X_train, y_train, X_valid, y_valid


import argparse
import pickle

parser = argparse.ArgumentParser()
parser.add_argument('--train', help='Training data file', type=str, required=True)
parser.add_argument('--valid', help='Validation data file', type=str, required=True)
args = parser.parse_args()

args_file = 'data_args.pkl'
pickle.dump(args, open(args_file, 'wb'))
X_train, y_train, X_valid, y_valid = data()
best_run, best_model = optim.minimize(model=model,
                                      data=data, 
                                            ...)

@maxpumperla
Copy link
Owner

not sure I can do any better answering this question than @pkairys or @dehdari. Will add this to the README for future reference.

@virtualdvid
Copy link

virtualdvid commented Apr 24, 2019

For future reference, if someone else has this issue. There is a simple way to do it:

We just have to write a function that returns the args:

import argparse

def my_args():
   parser = argparse.ArgumentParser()
   parser.add_argument('--train', help='Training data file', type=str, required=True)
   parser.add_argument('--valid', help='Validation data file', type=str, required=True)
   args = parser.parse_args()
   return args

Then we can call it in minimize as follows:

best_run, best_model = optim.minimize(model=model,
                                      data=data,
                                      functions=[my_args],
                                            ...)

then call it in model:

def model(x_train, x_test, y_train, y_test):
    args = my_args()
    train_file = args.train
    valid_file = args.valid
    # define model
    return {'loss' :-acc , ....}
```

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants