Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cell timeout (presumably due to the large output) #157

Open
dsavchenko opened this issue Mar 20, 2024 · 6 comments
Open

Cell timeout (presumably due to the large output) #157

dsavchenko opened this issue Mar 20, 2024 · 6 comments

Comments

@dsavchenko
Copy link
Member

Running model_CTA_events_from_file with default parameters in MMODA leads to the backend exception

Backend failed. CellTimeoutError('A cell timed out while it was being executed, after 4 seconds.
The message was: Timeout waiting for IOPub output.

which refers to the injected cell where outputs are glued.

This affects the overall user experience, in the frontend I first get gateway timeout. Also, this problem seems to somehow interact with the callback, if I re-request just after the timeout, the status is constantly "progress". Subsequent re-request shows the actual backend exception.

I didn't manage to reproduce the issue by running nb2workflow locally (with the fresh environment installed from requirements.txt). But it's reproducible by running the container in the local docker.

@dsavchenko
Copy link
Member Author

@volodymyrss @burnout87 seems we will need a file storage for this also

@volodymyrss
Copy link
Member

@volodymyrss @burnout87 seems we will need a file storage for this also

did you try this?

@dsavchenko
Copy link
Member Author

did you try this?

I don't really see how it's related.

Our case rather corresponds to what is discussed in nteract/papermill#426
Reading it, I have an impression that there is no way around

The event file here is ~35M, apart from the abovementioned buffer problem, this is already quite a lot to send it in an http response.

@dsavchenko
Copy link
Member Author

Reported in karabo
Screenshot 2024-08-20 110547

@anawas
Copy link

anawas commented Aug 20, 2024

How to reproduce

The above error was caused by karabo-dirty-image-sim workflow. To reproduce use the parameters:
RA: 250
Dec: -80
survey: gleam
start_freq_mhz: 72
num_of_channels: 10
freq_inc_mhz: 8
num_of_time_steps: 1
telescope: ASKAP
field_of_view: 2
The error pops up immediately. When you try to see the notebook, you're greeted with the error:
Output notebook currently not available. Our team is notified and is working on it.

Working case

The workflow works if you set num_of_channels=1. The execution time then is significantly shorter.

Additional info

The timeout happens in the function sky = get_sky_from_survey(...). This gets the sky model from a third party service. And it caches the downloaded file (about 450 MB) in a S3 bucket. Not sure which procedure takes too much time.

@volodymyrss
Copy link
Member

could @anawas check what is the size of the output? Maybe it hits a problem and we need to adjust something.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants