Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Save models with SavedModel #329

Closed
upalchowdhury opened this issue May 6, 2019 · 27 comments
Closed

Save models with SavedModel #329

upalchowdhury opened this issue May 6, 2019 · 27 comments
Labels
feature New feature or request looking into it

Comments

@upalchowdhury
Copy link

upalchowdhury commented May 6, 2019

Hello ,
I know there is #55 issue for exposing the model . i have read the issue . it was suggested that you can use a ludwig developed model in tensorflow serving . I have been trying to convert it to in a format so i can use it in tensorflow serving.

For example , i have followed , readallcat.csv example and develped model in ludwig which is in the format ,

checkpoint model_weights.data-00000-of-00001 model_weights.meta
model_hyperparameters.json model_weights.index train_set_metadata.json

Now for tensorflow serving the model needs to be in the below format
as mentioned https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md
,

assets/
variables/
variables.data--of-
variables.index
saved_model.pb

I have been trying to use tensorflow "SavedModel api" , but no luck for the conversion . if you please provide some guidance will be a big help . Thank you

@w4nderlust w4nderlust changed the title This is related to https://github.com/uber/ludwig/issues/55 Save models with SavedModel May 7, 2019
@w4nderlust
Copy link
Collaborator

w4nderlust commented May 7, 2019

@upalchowdhury Ludwig at the moment saves models using tf.train.Saver, what you are requesting would require to use SavedModel.
here are more info on the differences:
https://stackoverflow.com/questions/46513923/tensorflow-how-and-why-to-use-savedmodel
https://stackoverflow.com/questions/42216208/should-tensorflow-users-prefer-savedmodel-over-checkpoint-or-graphdef
it seems like using SavedModel is a better solution moving forward, but i have to assess and understand the advantage and disadvantage better to figure out how to prioritize this.
Adding it to the feature requests in the meantime.

@w4nderlust w4nderlust added the feature New feature or request label May 7, 2019
@ifokeev
Copy link
Contributor

ifokeev commented Jun 10, 2019

@w4nderlust mlflow, for example, use SavedModel as a default format for saving/serving https://mlflow.org/docs/latest/python_api/mlflow.tensorflow.html#mlflow.tensorflow.save_model, and it's kinda standard now:

WARNING: SessionBundle has been deprecated and is no longer supported. Switch to SavedModel immediately.

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/contrib/session_bundle/README.md#tensorflow-inference-model-format

https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/README.md#background

@w4nderlust
Copy link
Collaborator

@ifokeev thanks for the additional info. I was already pretty convinced, now I'm even more convinced.
Would love to get help on this if someone wants to, otherwise i will eventually work on it, bringing it up in the priorities.

@ifokeev
Copy link
Contributor

ifokeev commented Jun 11, 2019

@w4nderlust I'm working now on converting Saver to SavedModel, I could put the code in this thread if it can help. But sorry I don't have time for a full PR

@w4nderlust
Copy link
Collaborator

@ifokeev that would be appreciated. s long as you can run the test and confirm that it works, you can still do a PR even if it's not 100% complete. Either way, any help is welcome.

@ifokeev
Copy link
Contributor

ifokeev commented Jun 11, 2019

@w4nderlust i'm using this code to build SavedModel, but getting empty variables dir. What i do wrong?

def restore_tensors(model_definition, model):
    # https://github.com/uber/ludwig/blob/master/ludwig/api.py#L524
    input_tensors = {}
    for input_feature in model_definition['input_features']:
        input_tensors[input_feature['name']] = getattr(model, input_feature['name'])

    output_tensors = {}
    for output_feature in model_definition['output_features']:
        output_tensors[output_feature['name']] = getattr(model, output_feature['name'])

    return input_tensors, output_tensors

input_tensors, output_tensors = restore_tensors(
    ludwig_model.model_definition,
    ludwig_model.model,
)

current_dir = os.path.dirname(os.path.abspath(__file__))
saved_model_path = os.path.join(current_dir, 'saved_model')

builder = saved_model_builder.SavedModelBuilder(saved_model_path)

builder.add_meta_graph_and_variables(
    ludwig_model.model.session,
    [tf.saved_model.tag_constants.SERVING],
    signature_def_map={
        'predict': tf.saved_model.predict_signature_def(input_tensors, output_tensors)
    },
    strip_default_attrs=True
)

builder.save()

it feels like I get input/output tensors wrong =\

@ifokeev
Copy link
Contributor

ifokeev commented Jun 11, 2019

added

saver=ludwig_model.model.saver

to builder.add_meta_graph_and_variables and now works fine

ludwig_model = LudwigModel(
    model_definition=None,
    model_definition_file=model_definition_url,
)

train_stats = ludwig_model.train(data_csv=preprocessed_data_url)

def get_tensors(model_definition, model):
    # https://github.com/uber/ludwig/blob/master/ludwig/api.py#L524
    input_tensors = {}
    for input_feature in model_definition['input_features']:
        input_tensors[input_feature['name']] = getattr(model, input_feature['name'])

    output_tensors = {}
    for output_feature in model_definition['output_features']:
        output_tensors[output_feature['name']] = getattr(model, output_feature['name'])

    return input_tensors, output_tensors

input_tensors, output_tensors = get_tensors(
    ludwig_model.model_definition,
    ludwig_model.model,
)

current_dir = os.path.dirname(os.path.abspath(__file__))
saved_model_path = os.path.join(current_dir, 'saved_model')
shutil.rmtree(saved_model_path)

builder = saved_model_builder.SavedModelBuilder(saved_model_path)

builder.add_meta_graph_and_variables(
    ludwig_model.model.session,
    [tf.saved_model.tag_constants.SERVING],
    signature_def_map={
        'predict': tf.saved_model.predict_signature_def(input_tensors, output_tensors)
    },
    strip_default_attrs=True,
    saver=ludwig_model.model.saver,
)

builder.save()

but

mlflow.tensorflow.save_model(
    tf_saved_model_dir=saved_model_path,
    tf_meta_graph_tags=[tf.saved_model.tag_constants.SERVING],
    tf_signature_def_key='predict',
    path=saved_model_path
)

getting

RuntimeError: The Session graph is empty.  Add operations to the graph before calling run().

=\

@w4nderlust
Copy link
Collaborator

Are you able to load the model you saved with builder?
In that case it looks like there's a problem with mlflow.
In theory after you call train() and before calling close() the ludwig_model.model.session should still be open, but can doublecheck that the session was not None when you saved it?
I'm not familiar with mlflow, but my uneducated guess is that you are trying to save after a the session has been closed.

@ifokeev
Copy link
Contributor

ifokeev commented Jun 12, 2019

@w4nderlust
this way it works

with ludwig_model.model.session as sess:
    builder.add_meta_graph_and_variables(
        sess,
        [tf.saved_model.tag_constants.SERVING],
        signature_def_map={
            'predict': tf.saved_model.predict_signature_def(input_tensors, output_tensors)
        },
        strip_default_attrs=True,
        saver=ludwig_model.model.saver,
    )

but why?
I see that session is not closed

@w4nderlust
Copy link
Collaborator

Not 100% sure, it probably has to do with how builder internally uses both the provided session. I guess the problem is that by doing this you close the session at the end of the of the block (as my understanding is tha close() is called automatically) which means that in order to use the model it will have to start the session again, but that may not be a concern for you as it will be for me when moving to SavedModel.

@ifokeev
Copy link
Contributor

ifokeev commented Jun 12, 2019

looks like it's ok to remain the session open
https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/saved_model/builder_impl.py#L298

I will check with forced close() then

@w4nderlust
Copy link
Collaborator

Mmm I thought the context manager would close it... anyway, better like this I guess :)
f you don't mind could you share the full code in one single post so that when working on this I can reference all of it?
Thank you so much.

@gogasca
Copy link
Contributor

gogasca commented Jun 14, 2019

@ifokeev Is this going to be part of a new PR? I would love to test it with TF serving or AI Platform serving.

@ifokeev
Copy link
Contributor

ifokeev commented Jun 14, 2019

@gogasca I'm going to return to this task in 2-3 days. The code from #329 (comment) works, the full script I will add later

@jadjoubran
Copy link

Hey @ifokeev
Would you mind posting the full script if possible? That would be of huge help
Thanks!

@ifokeev
Copy link
Contributor

ifokeev commented Jul 5, 2019

@jadjoubran my full script looks like this:

import collections
import shutil
import logging

import mlflow
import mlflow.tensorflow

import tensorflow as tf
from tensorflow.python.saved_model import builder as saved_model_builder

from ludwig.api import LudwigModel

logging.basicConfig(level=logging.INFO)

_logger = logging.getLogger(__name__)

tracking_uri = os.environ.get('MLFLOW_TRACKING_URI')
mlflow.set_tracking_uri(tracking_uri)

preprocessed_data_url = str(sys.argv[1])
model_definition_url = str(sys.argv[2])
save_path = 'model'

ludwig_model = LudwigModel(
    model_definition=None,
    model_definition_file=model_definition_url,
)

_logger.info('Training...')
ludwig_model.train(data_csv=preprocessed_data_url)

def get_tensors(model_definition, model):
    # https://github.com/uber/ludwig/blob/master/ludwig/api.py#L524
    input_tensors = {}
    for input_feature in model_definition['input_features']:
        input_tensors[input_feature['name']] = getattr(model, input_feature['name'])

    output_tensors = {}
    for output_feature in model_definition['output_features']:
        output_tensors[output_feature['name']] = getattr(model, output_feature['name'])

    return input_tensors, output_tensors

input_tensors, output_tensors = get_tensors(
    ludwig_model.model_definition,
    ludwig_model.model,
)

current_dir = os.path.dirname(os.path.abspath(__file__))
saved_model_path = os.path.join(current_dir, 'saved_model')
shutil.rmtree(saved_model_path, ignore_errors=True)

builder = saved_model_builder.SavedModelBuilder(saved_model_path)

with ludwig_model.model.session as sess:
    builder.add_meta_graph_and_variables(
        sess,
        [tf.saved_model.tag_constants.SERVING],
        signature_def_map={
            'predict': tf.saved_model.predict_signature_def(input_tensors, output_tensors)
        },
        strip_default_attrs=True,
        saver=ludwig_model.model.saver,
    )

builder.save()

shutil.rmtree(save_path, ignore_errors=True)

_logger.info('Saving SavedModel...')

mlflow.tensorflow.save_model(
    tf_saved_model_dir=saved_model_path,
    tf_meta_graph_tags=[tf.saved_model.tag_constants.SERVING],
    tf_signature_def_key='predict',
    path=save_path
)

_logger.info('Saving artifacts...')

mlflow.log_artifacts(save_path)

_logger.info('Artifacts saved!')

@jadjoubran
Copy link

Thank you so much! It's working ✅

@w4nderlust
Copy link
Collaborator

w4nderlust commented Jul 6, 2019

@ifokeev thanks for the code. Just to be sure: with builder.save() you are saving the model as a SavedModel, while the remaining of the code is for packaging it for MLFlow, right? So your code could be potentially two different functions in Ludwig's API: save_for_serving() and save_for_mlflow() or something like it. Would that work for you?

@ifokeev
Copy link
Contributor

ifokeev commented Jul 6, 2019

@w4nderlust MLFlow accepts only SavedModel for serving. In my code I'm logging SavedModel to s3 and then serve it with other script. MLFlow saves it in mlflow format. So there is no need of special save_for_mlflow() function.

w4nderlust added a commit that referenced this issue Jul 6, 2019
…odel.py and exposes it in the API with a save_for_serving() function
@w4nderlust
Copy link
Collaborator

@ifokeev and @upalchowdhury I would be glad if you could check out this branch and tell me if it works for you; https://github.com/uber/ludwig/tree/save_for_serving
It's not the final solution, will probably transition to use only SavedModel in the future, but for the moment it should cover your requirements. In case it does and you can use it without errors, I'll add some test and merge it.

w4nderlust added a commit that referenced this issue Jul 13, 2019
…odel.py and exposes it in the API with a save_for_serving() function (#425)
@Guufii
Copy link

Guufii commented Aug 23, 2019

Dear @w4nderlust and @ifokeev, I am trying to convert the saved model of MNIST to onnx format. I have tried the new implementation and it still is not working properly. On problem seems to be the way how the session is used. The current implementation:

    session = self.initialize_session()
    builder = saved_model_builder.SavedModelBuilder(save_path)
    builder.add_meta_graph_and_variables(
        session,
        [tf.saved_model.tag_constants.SERVING],
        signature_def_map={
            'predict': tf.saved_model.predict_signature_def(
                input_tensors, output_tensors)
        },
        strip_default_attrs=True,
        saver=self.saver,
    )

leads to a error "The Session graph is empty." when converting the model with

   $ python -m tf2onnx.convert --opset 10 --fold_const --saved-model saved_model/ --output model.onnx

When changing the structure to (like in the example of @ifokeev):

  #session = self.initialize_session()
    builder = saved_model_builder.SavedModelBuilder(save_path)
    with self.initialize_session() as session:
        builder.add_meta_graph_and_variables(
            session,
            [tf.saved_model.tag_constants.SERVING],
            signature_def_map={
            'predict': tf.saved_model.predict_signature_def(
                input_tensors, output_tensors)
            },
            strip_default_attrs=True,
            saver=self.saver,
        )
    builder.save()

The a graph is exported. However the inputs are not found in the graph. As the following error suggests:

$ python -m tf2onnx.convert --opset 10 --fold_const --saved-model saved_model/ --output 
model.onnx
2019-08-23 14:58:44,065 - WARNING - inputs [image_path/image_path:0] is not in frozen graph,         delete them
2019-08-23 14:58:44.069427: E tensorflow/tools/graph_transforms/transform_graph.cc:332]     fold_constants: Ignoring error You must feed a value for placeholder tensor     'label_Label/label_Label_placeholder' with dtype int64 and shape [?]
 [[{{node label_Label/label_Label_placeholder}}]]
2019-08-23 14:58:44,072 - INFO - Using tensorflow=1.13.1, onnx=1.5.0, tf2onnx=1.5.3/7b598d
2019-08-23 14:58:44,072 - INFO - Using opset <onnx, 10>
2019-08-23 14:58:44,074 - INFO - Optimizing ONNX model
2019-08-23 14:58:44,076 - INFO - After optimization: no change
2019-08-23 14:58:44,076 - INFO - 
2019-08-23 14:58:44,076 - INFO - Successfully converted TensorFlow model saved_model/ to ONNX
2019-08-23 14:58:44,077 - INFO - ONNX model is saved at model.onnx

BTW. tf2onnx.convert works stable with other saved_models I tried.

@ifokeev
Copy link
Contributor

ifokeev commented Oct 25, 2019

@w4nderlust your code works correct, but only with session closing:

        with session as sess:
            self.ludwig_model.save_savedmodel(saved_model_path)

Without it you'll get The Session graph is empty.

Also i'm getting the same error with output placeholder (why?)

packages/tensorflow/python/client/session.py\", line 1429, in _call_tf_sessionrun\n    run_metadata)\ntensorflow.python.framework.errors_impl.InvalidArgumentError: You must feed a value for placeholder tensor 'Data80.target/Data80.target_placeholder' with dtype bool and shape

I understand this as you have feed_dict with inputs and outputs to feed while session.run
https://github.com/uber/ludwig/blob/67fec7ab1908972c3e21c333f283d3af7d118e80/ludwig/models/model.py#L258-L265

but i don't understand how to create savedModel with inputs and outputs separated

some info: https://stackoverflow.com/a/50899659

@ifokeev
Copy link
Contributor

ifokeev commented Nov 1, 2019

fast note

after full day of research I found the solution: my previous function about collecting tensors was wrong and @w4nderlust copied it as is.

Something that could be:

        def get_tensors(model_definition, model):
            # https://github.com/uber/ludwig/blob/master/ludwig/api.py#L524
            input_tensors = {}
            for input_feature in model_definition['input_features']:
                input_tensors[input_feature['name']] = getattr(model, input_feature['name'])

            for output_feature in model_definition['output_features']:
                input_tensors[output_feature['name']] = getattr(model, output_feature['name'])

            output_tensors = {}
            # https://github.com/uber/ludwig/blob/0404cf21ec42c4802dbbc928c5df78318181298b/ludwig/models/model.py#L921
            # we need to collect these output tensors and they have different name from `getattr(model, output_feature['name'])`
            output_nodes = model.get_output_nodes(collect_predictions=True, only_predictions=True)

            for output_feature in output_nodes:
                output_tensors[output_feature] = output_nodes[output_feature]['predictions']

            return input_tensors, output_tensors

The output tensors have different names in Ludwig (not getattr(model, output_feature['name']))

@w4nderlust
Copy link
Collaborator

There's an error in that code:

input_tensors[output_feature['name']] = getattr(model, output_feature['name'])

This adds the outputs to the dict of the inputs. As other people are referring to this code, it's important to correct it.

@ifokeev
Copy link
Contributor

ifokeev commented Feb 7, 2020

@w4nderlust I use it in this way, because I want to input all fields (inputs and outputs). Output field could be empty.

FYI: if you don't need it, just remove

 for output_feature in model_definition['output_features']:
                input_tensors[output_feature['name']] = getattr(model, output_feature['name'])

@w4nderlust
Copy link
Collaborator

@ifokeev i believe there's a misunderstanding about what an input feature and an output feature is. Output features cannot be inputs, they would be called inputs instead. And if for some reason you want to provide the save values as both input and output (seems weird to me as the best model will be the one that ignores everything else and only uses that input for predicting the output, making it kinda useless, but maybe you have other reasons for doing it) then you need to have two identical columns in your data with different names, one to use as input and one to use as output (line in an autoencoder for instance).

I'm considering introducing an optional "id" field to allow loading two features from the same column and having them have a different identifier, which will solve the issue, but I haven't implemented it yet nor have i decided which is the best way to implement it.

@w4nderlust w4nderlust added this to In progress in Ludwig Development Feb 18, 2020
Ludwig Development automation moved this from In progress to Done Feb 27, 2020
@w4nderlust
Copy link
Collaborator

The latest merged PR solves the issues with SavedModel. There's also a test now one could use as a reference example code for using it: https://github.com/uber/ludwig/blob/master/tests/integration_tests/test_savedmodel.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature New feature or request looking into it
Projects
Development

No branches or pull requests

6 participants