-
Notifications
You must be signed in to change notification settings - Fork 69
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
get_model Takes more too long to initialise so the endpoint times out #78
Comments
Hey @m4nuC Great question! Please, find below an example solution
Essentially, it spawns a thread to load a long-loading model. @ilazakis Do you think we can make the ModelLoader a Sagify utility? Pls, @m4nuC let me know if it solved your issue. Thanks |
Adding some model-loading specific utilities sounds sensible @pm3310, yes. Loading the model on a different thread will cause the sync call to the API not to block, but if it still needs 10 minutes to load, the end user will still not get anything back, the client or a proxy or similar in between will time out anyway. We could return a specific "loading model, please try again in X minutes" response to mitigate the bad experience. One thing we could do to solve the actual problem is tie the loading of the model to the deploy command. Training and deploying takes time anyway, so if we add it right after the deploy command, the model will be loaded for whoever calls the predict endpoint first. Open to any suggestions. |
Thanks for the input guys, I 've ended up using threading as well but with a less refined solution than @pm3310 suggest. I will try this out and confirm. |
@ilazakis I like the idea of returning a specific "loading model, please try again in X minutes" response to mitigate the bad experience. |
Works great indeed. Thanks @pm3310 from __future__ import absolute_import
import os
# Do not remove the following line
import sys;sys.path.append("..") # NOQA
import logging
import concurrent.futures
_MODEL_PATH = os.path.join('/opt/ml/', 'model') # Path where all your model(s) live in
log = logging.getLogger(__name__)
class ModelNotYetLoadedException(Exception):
def __init__(self, message):
super().__init__(message)
class ModelLoader:
def __init__(self, load_method):
log.info("setting up ModelLoader")
# Invoke load method asynchronously so process does not block.
self.model = None
self.executor = concurrent.futures.ThreadPoolExecutor(max_workers=1)
self.future = self.executor.submit(load_method)
self.future.add_done_callback(self.done)
def get_model(self):
if self.future.done():
return self.model
log.error("get_model called before ready")
raise ModelNotYetLoadedException("model not loaded")
def done(self, future):
"""Callback method invoked when model load complete. Sets us to ready status."""
log.info("model load done")
self.model = future.result()
log.info("shutting down executor")
self.executor.shutdown(wait=False)
def get_ready(self):
return self.future.done()
class ModelService(object):
model = None
@staticmethod
def load_model():
# Load your model, your model that takes time, here
from sklearn.externals import joblib
return joblib.load(os.path.join(_MODEL_PATH, 'model.pkl'))
@classmethod
def init_model(cls):
"""Get the model object for this instance, loading it if it's not already loaded."""
if cls.model is None:
cls.model = ModelLoader(cls.load_model)
return cls.model
@classmethod
def predict(cls, input):
"""For the input, do the predictions and return them."""
try:
return cls.model.get_model().predict(input)
except ModelNotYetLoadedException as e:
return 'model not yet loaded'
ModelService.init_model()
def predict(json_input):
"""
Prediction given the request input
:param json_input: [dict], request input
:return: [dict], prediction
"""
# TODO Transform json_input and assign the transformed value to model_input
try:
model_input = json_input['features']
prediction = ModelService.predict(model_input)
result = {'prediction': prediction.item()}
return result
except Exception as e:
return {"error": str(e)} |
@m4nuC Perfect solution ;-) |
In the first
get_model()
call, the weight needs to be loaded and model initialized which is taking 10mins or so. The problem is that Sagemaker endpoint timeout after 60 seconds so the model never has time to initialize.Would you have any suggestions on ways to address this?
Thanks
The text was updated successfully, but these errors were encountered: