Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make readiness timeout configurable #13

Closed
mohammedi-haroune opened this issue Apr 11, 2021 · 9 comments · Fixed by #15
Closed

Make readiness timeout configurable #13

mohammedi-haroune opened this issue Apr 11, 2021 · 9 comments · Fixed by #15

Comments

@mohammedi-haroune
Copy link
Contributor

mohammedi-haroune commented Apr 11, 2021

I'm having an issue integrating cdsdashboards to our ML platform http://iko.ai, my dashboard server is timing out during readiness and there is no way to configure the time out value because it's hardcoded here: https://github.com/ideonate/jhsingle-native-proxy/blob/master/jhsingle_native_proxy/proxyhandlers.py#L700

I'd like to be able to configure the readiness timeout value through command-line argument and/or environment variables?

Debug logs:

INFO:tornado.application:SuperviseAndProxyHandler http_get 55527
DEBUG:tornado.application:Allowing Hub user USER_NAME
DEBUG:tornado.application:Storing origin host SERVER_NAME
INFO:tornado.application:['streamlit', 'run', '/home/USER_NAME/./vosk_app.py', '--server.port=55527', '--server.headless=True', '--browser.             serverAddress=SERVER_NAME', '--browser.gatherUsageStats=false']
DEBUG:tornado.application:Trying to start mainprocess
DEBUG:tornado.application:Started mainprocess
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.00495457649230957 seconds, next check in 0.01s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.017713308334350586 seconds, next check in 0.02s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.04067730903625488 seconds, next check in 0.04s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.08370685577392578 seconds, next check in 0.08s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.1665360927581787 seconds, next check in 0.16s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.329953670501709 seconds, next check in 0.32s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 0.6533911228179932 seconds, next check in 0.64s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 1.2969000339508057 seconds, next check in 1.28s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 2.5811028480529785 seconds, next check in 2.56s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 5.144717454910278 seconds, next check in 4.847672929763794s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 9.998777389526367 seconds, next check in -0.0035436248779296877s
DEBUG:tornado.application:Connection to http://localhost:55527/ refused
DEBUG:tornado.application:Readyness: False after 10.007751226425171 seconds, next check in -0.007087249755859375s
@mohammedi-haroune mohammedi-haroune changed the title Make ready timeout configurable Make readiness timeout configurable Apr 11, 2021
@danlester
Copy link
Member

This idea probably makes sense, and I would like to add it as an option at some point.

However, it may be more efficient for you to use the --ready-check-path=/ready-check argument instead. This allows you to specify an alternative URL (e.g. /ready-check) that will tell jhsingle-native-proxy if the web server is ready. It is inefficient to request the main Streamlit app to do this really - maybe you can choose the URL to a smaller static asset from Streamlit that at least would indicate the Streamlit web server is ready to receive requests. This should be able to respond much quicker, and ultimately means that JupyterHub can redirect users to the dashboard sooner, so their request for the main Streamlit app can begin sooner too.

Please let me know if that helps.

@mohammedi-haroune
Copy link
Contributor Author

mohammedi-haroune commented Apr 12, 2021

I'll try to set --ready-check-path=/healthz for streamlit and what this what it gives

@mohammedi-haroune
Copy link
Contributor Author

Unfortunately, I couldn't reproduce the bug again, even though --ready-check-path=/healthz is working for streamlit dashboards, we can't be sure it fixes the bug until reproducing it.

I'll get back whenever the bug is reproduced.

@danlester
Copy link
Member

Thanks for the update, but just to clarify do you mean:

Adding --ready-check-path=/healthz seems to work well for your dashboard starting OK. But that you no longer have the problem anyway, even if you drop --ready-check-path again.

?

@mohammedi-haroune
Copy link
Contributor Author

Thanks for the update, but just to clarify do you mean:

Adding --ready-check-path=/healthz seems to work well for your dashboard starting OK. But that you no longer have the problem anyway, even if you drop --ready-check-path again.

?

Yes, that's it.

@mohammedi-haroune
Copy link
Contributor Author

mohammedi-haroune commented Apr 13, 2021

The bug is still happening even with --ready-check-path=/healthz

@mohammedi-haroune
Copy link
Contributor Author

I reproduced the bug with the following flask app:

from flask import Flask
from time import sleep

app = Flask(__name__)

@app.route('/')
def hello_world():
    sleep(30)
    return 'Hello, World!'

and theses configurations:

c.CDSDashboardsConfig.presentation_types = ['flask']

c.VariableMixin.extra_presentation_launchers = {
    'flask': {
        'args': [
            'flask', 'run', '{--}port={port}'
        ],
        'env': {'FLASK_APP': 'app.py'},
    }
}

@mohammedi-haroune
Copy link
Contributor Author

A better reproduction app:

from flask import Flask
from time import sleep, time

app = Flask(__name__)

started_at = time()
ready_after = 30

@app.route('/')
def hello_world():
    now = time()
    if now - started_at > ready_after:
        return 'Appliation ready!'
    else:
        # sleep for more that 1 seconds to make the _ready_check call fail: 
        # https://github.com/yuvipanda/simpervisor/blob/master/simpervisor/process.py#L196
        # Even status code 400 is considered as ready as per https://github.com/ideonate/jhsingle-native-proxy/blob/master/jhsingle_native_proxy/proxyhandlers.py#L707
        # return 'Application not yet ready! Please try later', 400
        sleep(5)

Why better?

The first app will always fail to start even if we increase the ready_timeout, it fails because the ready_func call is only waited for 1 second in https://github.com/yuvipanda/simpervisor/blob/master/simpervisor/process.py#L196

@danlester
Copy link
Member

By the way, it would be great if you'd like to take a look at the experimental Flask framework:

https://cdsdashboards.readthedocs.io/en/stable/chapters/userguide/frameworks/flask.html

That should be more flexible for being able to specify a flask script to run, and it does it through gunicorn instead of the dev server. There may be some tweaks needed...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants