Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support with big dataset #3

Open
htw2012 opened this issue Nov 22, 2018 · 2 comments
Open

Support with big dataset #3

htw2012 opened this issue Nov 22, 2018 · 2 comments

Comments

@htw2012
Copy link

htw2012 commented Nov 22, 2018

Hi,

If we want to find architectures on big datasets such as ImageNet, how to change our code to support it?

Thanks

@kirthevasank
Copy link
Owner

You should start here, which can take an architecture and evaluate it on your problem: https://github.com/kirthevasank/nasbot/blob/master/demos/cnn_function_caller.py

For specifics on converting our nn representation into TF, look at this directory: https://github.com/kirthevasank/nasbot/tree/master/cg

@htw2012
Copy link
Author

htw2012 commented Nov 23, 2018

Thank you, I try to use the generator to load the original data. I do it as follows:

model.train(input_fn=lambda: input_fn("train", training=True, batch_size=params['trainBatchSize']), steps=params['trainNumStepsPerLoop'])
results = model.evaluate(input_fn=lambda: input_fn("valid", training=False, batch_size=params['valiBatchSize']), steps=params['valiNumStepsPerLoop'])
    def input_fn(partition, training, batch_size):
        """Generate an input_fn for the Estimator."""

        def _input_fn():
            if partition == "train":
                dataset = tf.data.Dataset.from_generator(generator(trfile), (tf.float32, tf.int32), ((feature_dim), ()))
            else:
                dataset = tf.data.Dataset.from_generator(generator(vafile), (tf.float32, tf.int32), ((feature_dim), ()))

            # We call repeat after shuffling, rather than before, to prevent separate
            # epochs from blending together.
            if training:
                dataset = dataset.shuffle(10 * batch_size, seed=RANDOM_SEED).repeat()

            dataset = dataset.batch(batch_size)
            dataset = dataset.map(preprocess_text)
            iterator = dataset.make_one_shot_iterator()
            features, labels = iterator.get_next()
            return features, labels

        return _input_fn
FEATURES_KEY = 'x'
    
    # Define input layer (hook in data)
    def generator(inputfile):

        def _gen():
            with open(inputfile) as fr:
                for line in fr:
                    print("line", line)
                    feature, label = line_processor(embedding, line)
                    yield feature, label
        return _gen

    def preprocess_text(image, label):
        features = {FEATURES_KEY: image}
        return features, label

But when I debug the code in this line:
model_fn = get_model_fn(mlp, params['learningRate'], num_classes)

It will also point to this function.

def mlp_definition(features, nn, num_classes):
    """ Defines layers in tensorflow neural network, using info from nn python structure. """
    # Define input layer, cast data as tensor
    features = features['x']
    layers = [tf.reshape(tf.cast(features, tf.float32), features.shape)]  ### NEED TO VERIFY FLOAT32

I got the error TypeError: 'function' object has no attribute '__getitem__'.
This features is type of function not a IteratorGetNext, I don't know where I was wrong. Could you help me?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants