Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

get_model Takes more too long to initialise so the endpoint times out #78

Closed
m4nuC opened this issue May 15, 2019 · 6 comments
Closed

Comments

@m4nuC
Copy link

m4nuC commented May 15, 2019

In the first get_model() call, the weight needs to be loaded and model initialized which is taking 10mins or so. The problem is that Sagemaker endpoint timeout after 60 seconds so the model never has time to initialize.

Would you have any suggestions on ways to address this?

Thanks

@m4nuC m4nuC changed the title get_model Take more than a minute to initialise so the request is killed get_model Takes more too long to initialise so the endpoint times out May 15, 2019
@pm3310
Copy link
Contributor

pm3310 commented May 15, 2019

Hey @m4nuC

Great question! Please, find below an example solution

from __future__ import absolute_import

import os
# Do not remove the following line
import sys;sys.path.append("..")  # NOQA

import logging
import concurrent.futures


_MODEL_PATH = os.path.join('/opt/ml/', 'model')  # Path where all your model(s) live in

log = logging.getLogger(__name__)


class ModelLoader:
    def __init__(self, load_method):
        log.info("setting up ModelLoader")
        # Invoke load method asynchronously so process does not block.
        self.model = None
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
        self.future = self.executor.submit(load_method)
        self.future.add_done_callback(self.done)

    def get_model(self):
        if self.future.done():
            return self.model

        log.error("get_model called before ready")
        raise Exception("model not loaded")

    def done(self, future):
        """Callback method invoked when model load complete. Sets us to ready status."""
        log.info("model load done")
        self.model = future.result()
        log.info("shutting down executor")
        self.executor.shutdown(wait=False)

    def get_ready(self):
        return self.future.done()


class ModelService(object):
    model = None

    @staticmethod
    def load_model():
        # Load your model, your model that takes time, here
        from sklearn.externals import joblib

        return joblib.load(os.path.join(_MODEL_PATH, 'model.pkl'))

    @classmethod
    def get_model(cls):
        """Get the model object for this instance, loading it if it's not already loaded."""
        if cls.model is None:

            cls.model = ModelLoader(cls.load_model)
        return cls.model

    @classmethod
    def predict(cls, input):
        """For the input, do the predictions and return them."""
        return cls.model.get_model().predict(input)


ModelService.get_model()


def predict(json_input):
    """
    Prediction given the request input
    :param json_input: [dict], request input
    :return: [dict], prediction
    """

    # TODO Transform json_input and assign the transformed value to model_input
    try:
        model_input = json_input['features']
        prediction = ModelService.predict(model_input)

        result = {'prediction': prediction.item()}

        return result
    except Exception as e:
        return {"error": str(e)}

Essentially, it spawns a thread to load a long-loading model. @ilazakis Do you think we can make the ModelLoader a Sagify utility?

Pls, @m4nuC let me know if it solved your issue.

Thanks

@ilazakis
Copy link
Contributor

Adding some model-loading specific utilities sounds sensible @pm3310, yes.

Loading the model on a different thread will cause the sync call to the API not to block, but if it still needs 10 minutes to load, the end user will still not get anything back, the client or a proxy or similar in between will time out anyway. We could return a specific "loading model, please try again in X minutes" response to mitigate the bad experience.

One thing we could do to solve the actual problem is tie the loading of the model to the deploy command. Training and deploying takes time anyway, so if we add it right after the deploy command, the model will be loaded for whoever calls the predict endpoint first.

Open to any suggestions.

@m4nuC
Copy link
Author

m4nuC commented May 16, 2019

Thanks for the input guys, I 've ended up using threading as well but with a less refined solution than @pm3310 suggest. I will try this out and confirm.

@pm3310
Copy link
Contributor

pm3310 commented May 16, 2019

@ilazakis I like the idea of returning a specific "loading model, please try again in X minutes" response to mitigate the bad experience.

@m4nuC
Copy link
Author

m4nuC commented May 16, 2019

Works great indeed. Thanks @pm3310
I have made a little modification handle case where model is not yet loaded using a cusomt exception. However I am not sure it's idiomatic python. See below.

from __future__ import absolute_import

import os
# Do not remove the following line
import sys;sys.path.append("..")  # NOQA

import logging
import concurrent.futures


_MODEL_PATH = os.path.join('/opt/ml/', 'model')  # Path where all your model(s) live in

log = logging.getLogger(__name__)

class ModelNotYetLoadedException(Exception):
    def __init__(self, message):
        super().__init__(message)

class ModelLoader:
    def __init__(self, load_method):
        log.info("setting up ModelLoader")
        # Invoke load method asynchronously so process does not block.
        self.model = None
        self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
        self.future = self.executor.submit(load_method)
        self.future.add_done_callback(self.done)

    def get_model(self):
        if self.future.done():
            return self.model

        log.error("get_model called before ready")
        raise ModelNotYetLoadedException("model not loaded")

    def done(self, future):
        """Callback method invoked when model load complete. Sets us to ready status."""
        log.info("model load done")
        self.model = future.result()
        log.info("shutting down executor")
        self.executor.shutdown(wait=False)

    def get_ready(self):
        return self.future.done()


class ModelService(object):
    model = None

    @staticmethod
    def load_model():
        # Load your model, your model that takes time, here
        from sklearn.externals import joblib

        return joblib.load(os.path.join(_MODEL_PATH, 'model.pkl'))

    @classmethod
    def init_model(cls):
        """Get the model object for this instance, loading it if it's not already loaded."""
        if cls.model is None:
            cls.model = ModelLoader(cls.load_model)
        return cls.model

    @classmethod
    def predict(cls, input):
        """For the input, do the predictions and return them."""
        try:
            return cls.model.get_model().predict(input)

        except ModelNotYetLoadedException as e:
            return 'model not yet loaded'

ModelService.init_model()


def predict(json_input):
    """
    Prediction given the request input
    :param json_input: [dict], request input
    :return: [dict], prediction
    """

    # TODO Transform json_input and assign the transformed value to model_input
    try:
        model_input = json_input['features']
        prediction = ModelService.predict(model_input)

        result = {'prediction': prediction.item()}

        return result
    except Exception as e:
        return {"error": str(e)}

@pm3310
Copy link
Contributor

pm3310 commented May 16, 2019

@m4nuC Perfect solution ;-)

@pm3310 pm3310 closed this as completed May 17, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants