Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kaleido fails on Docker on Lambda #74

Closed
carterthayer opened this issue Feb 24, 2021 · 12 comments
Closed

Kaleido fails on Docker on Lambda #74

carterthayer opened this issue Feb 24, 2021 · 12 comments
Milestone

Comments

@carterthayer
Copy link

I am trying to generate a plotly chart and save it as a PNG in memory. I have my application packaged up and running in Docker locally. I am then using AWS Lambda to run this container, but fail when the save image command runs.

I get the error:

'NoneType' object has no attribute 'encode': AttributeError
--
Traceback (most recent call last):
File "/var/task/my_app/mymodule.py", line 134, in send_report
graph_img = fig.to_image(format="png")
File "/var/lang/lib/python3.6/site-packages/plotly/basedatatypes.py", line 3743, in to_image
return pio.to_image(self, *args, **kwargs)
File "/var/lang/lib/python3.6/site-packages/plotly/io/_kaleido.py", line 132, in to_image
fig_dict, format=format, width=width, height=height, scale=scale
File "/var/lang/lib/python3.6/site-packages/kaleido/scopes/plotly.py", line 117, in transform
img = response.get("result").encode("utf-8")
AttributeError: 'NoneType' object has no attribute 'encode'

It seems weird to me that this would be working when I run it on Docker locally, but not on lambda.
Do you have any ideas or guidance into what is happening with Kaleido so I can figure out why it is failing?

@carterthayer
Copy link
Author

I also occasionally get this error when trying different memory settings.


[ERROR] ValueError: Failed to start Kaleido subprocess. Error stream:
--
[0224/183020.408878:WARNING:resource_bundle.cc(435)] locale_file_path.empty() for locale
prctl(PR_SET_NO_NEW_PRIVS) failed
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.420983:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 1 time(s)
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.530107:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 2 time(s)
[0224/183021.530652:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 3 time(s)
[0224/183021.531130:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 4 time(s)
[0224/183021.603838:ERROR:network_service_instance_impl.cc(262)] Network service crashed, restarting service.
[0224/183021.604498:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 5 time(s)
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.628785:ERROR:network_service_instance_impl.cc(262)] Network service crashed, restarting service.
prctl(PR_SET_NO_NEW_PRIVS) failed
[0224/183021.644413:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 6 time(s)
[0224/183021.644502:FATAL:gpu_data_manager_impl_private.cc(439)] GPU process isn't usable. Goodbye.
#0 0x5651f7ce8f89 base::debug::CollectStackTrace()
#1 0x5651f7c578e3 base::debug::StackTrace::StackTrace()
#2 0x5651f7c68005 logging::LogMessage::~LogMessage()
#3 0x5651f697c437 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess()
#4 0x5651f697a3ce content::GpuDataManagerImplPrivate::FallBackToNextGpuMode()
#5 0x5651f69790ff content::GpuDataManagerImpl::FallBackToNextGpuMode()
#6 0x5651f69824a0 content::GpuProcessHost::RecordProcessCrash()
#7 0x5651f67d7603 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed()
#8 0x5651f6832963 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread()
#9 0x5651f6832ba5 base::internal::Invoker<>::RunOnce()
#10 0x5651f7c9802b base::TaskAnnotator::RunTask()
#11 0x5651f7ca8b3e base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
#12 0x5651f7ca88d0 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
#13 0x5651f7d03dc9 base::MessagePumpLibevent::Run()
#14 0x5651f7ca90c5 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
#15 0x5651f7c850ee base::RunLoop::Run()
#16 0x5651f67f18a4 content::BrowserProcessSubThread::IOThreadRun()
#17 0x5651f7cbe997 base::Thread::ThreadMain()
#18 0x5651f7cf88fe base::(anonymous namespace)::ThreadFunc()
#19 0x7f1326b2940b start_thread
#20 0x7f132572a09f __GI___clone
Task trace:
#0 0x5651f6832806 content::internal::ChildProcessLauncherHelper::PostLaunchOnLauncherThread()
#1 0x5651f683229c content::internal::ChildProcessLauncherHelper::StartLaunchOnClientThread()
#2 0x5651f6cddaac content::VizProcessTransportFactory::ConnectHostFrameSinkManager()
#3 0x5651f82c8b06 mojo::SimpleWatcher::Context::Notify()
#4 0x5651f6cddaac content::VizProcessTransportFactory::ConnectHostFrameSinkManager()
Task trace buffer limit hit, update PendingTask::kTaskBacktraceLength to increase.
Received signal 6
#0 0x5651f7ce8f89 base::debug::CollectStackTrace()
#1 0x5651f7c578e3 base::debug::StackTrace::StackTrace()
#2 0x5651f7ce8b25 base::debug::(anonymous namespace)::StackDumpSignalHandler()
#3 0x7f1326b337e0 (/usr/lib64/libpthread-2.26.so+0x117df)
#4 0x7f1325670c20 __GI_raise
#5 0x7f13256720c8 __GI_abort
#6 0x5651f7ce7a85 base::debug::BreakDebugger()
#7 0x5651f7c684a2 logging::LogMessage::~LogMessage()
#8 0x5651f697c437 content::(anonymous namespace)::IntentionallyCrashBrowserForUnusableGpuProcess()
#9 0x5651f697a3ce content::GpuDataManagerImplPrivate::FallBackToNextGpuMode()
#10 0x5651f69790ff content::GpuDataManagerImpl::FallBackToNextGpuMode()
#11 0x5651f69824a0 content::GpuProcessHost::RecordProcessCrash()
#12 0x5651f67d7603 content::BrowserChildProcessHostImpl::OnProcessLaunchFailed()
#13 0x5651f6832963 content::internal::ChildProcessLauncherHelper::PostLaunchOnClientThread()
#14 0x5651f6832ba5 base::internal::Invoker<>::RunOnce()
#15 0x5651f7c9802b base::TaskAnnotator::RunTask()
#16 0x5651f7ca8b3e base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWorkImpl()
#17 0x5651f7ca88d0 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::DoWork()
#18 0x5651f7d03dc9 base::MessagePumpLibevent::Run()
#19 0x5651f7ca90c5 base::sequence_manager::internal::ThreadControllerWithMessagePumpImpl::Run()
#20 0x5651f7c850ee base::RunLoop::Run()
#21 0x5651f67f18a4 content::BrowserProcessSubThread::IOThreadRun()
#22 0x5651f7cbe997 base::Thread::ThreadMain()
#23 0x5651f7cf88fe base::(anonymous namespace)::ThreadFunc()
#24 0x7f1326b2940b start_thread
#25 0x7f132572a09f __GI___clone
r8: 0000000000000000  r9: 00007f131ce5e3c0 r10: 0000000000000008 r11: 0000000000000246
r12: 00007f131ce5f688 r13: 00007f131ce5e660 r14: 00007f131ce5f690 r15: aaaaaaaaaaaaaaaa
di: 0000000000000002  si: 00007f131ce5e3c0  bp: 00007f131ce5e610  bx: 0000000000000006
dx: 0000000000000000  ax: 0000000000000000  cx: 00007f1325670c20  sp: 00007f131ce5e3c0
ip: 00007f1325670c20 efl: 0000000000000246 cgf: 002b000000000033 erf: 0000000000000000
trp: 0000000000000000 msk: 0000000000000000 cr2: 0000000000000000
[end of stack trace]
Calling _exit(1). Core file will not be generated.
Traceback (most recent call last):  File "/var/task/my_app/mymodule.py", line 134, in send_report    graph_img = fig.to_image(format="png")  File "/var/lang/lib/python3.8/site-packages/plotly/basedatatypes.py", line 3743, in to_image    return pio.to_image(self, *args, **kwargs)  File "/var/lang/lib/python3.8/site-packages/plotly/io/_kaleido.py", line 131, in to_image    img_bytes = scope.transform(  File "/var/lang/lib/python3.8/site-packages/kaleido/scopes/plotly.py", line 103, in transform    response = self._perform_transform(  File "/var/lang/lib/python3.8/site-packages/kaleido/scopes/base.py", line 280, in _perform_transform    self._ensure_kaleido()  File "/var/lang/lib/python3.8/site-packages/kaleido/scopes/base.py", line 188, in _ensure_kaleido    raise ValueError(message)

@jonmmease
Copy link
Collaborator

Hi @carterthayer,

Huh, I don't have any idea off hand why lambda would behave differently here.

  • Are you calling kaleido through plotly.py (using fig.to_image or pio.to_image) or directly (using scope.transform)?
  • What version of kaleido do you have?
  • Are you customizing any of the chromium flags? (https://github.com/plotly/Kaleido/wiki/Customizing-Chromium-Flags). The second error you posted mentions the GPU crashing, which is surprising because the default set of chromium flags includes --disable-gpu which should be using pure software rendering.

@carterthayer
Copy link
Author

I'm just doing a regular fig.to_image(format="png") on Kaleido version 0.1.0.

I am not customizing any of the Chromium flags. Do you have any you suggest that I might try?

@jonmmease
Copy link
Collaborator

I don't have any specific chromium flag recommendations. The current set of flags was chosen to try to make kaleido work by default inside docker containers. One user saw an improvement using the --single-process flag (#45), but this was more of a memory issue I think.

Could you try the following from the failing configuration to see if any more logging info is available?

import plotly.io as pio

try:
     fig.to_image()
except:
     print(pio.kaleido.scope._std_error.getvalue().decode("utf8"))

@carterthayer
Copy link
Author

Could you try the following from the failing configuration to see if any more logging info is available?

import plotly.io as pio

try:
     fig.to_image()
except:
     print(pio.kaleido.scope._std_error.getvalue().decode("utf8"))
[0225/004752.748704:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 5 time(s)
--
prctl(PR_SET_NO_NEW_PRIVS) failed
[0225/004752.802197:ERROR:network_service_instance_impl.cc(262)] Network service crashed, restarting service.
[0225/004752.802795:WARNING:gpu_process_host.cc(1217)] The GPU process has crashed 6 time(s)
prctl(PR_SET_NO_NEW_PRIVS) failed

@jonmmease
Copy link
Collaborator

Ok, it does look like this GPU crashing error is the root issue here.

We already set the --disable-gpu chromium flag by default, so I'm not sure why this would be happening. My only idea at the moment would be to try adding all of the --disable-gpu-* flags (e.g. --disable-gpu-compositing, see https://peter.sh/experiments/chromium-command-line-switches/) to see if that makes any difference.

Here's an SO post related to using chromium headless on Lambda: https://stackoverflow.com/questions/65429877/aws-lambda-container-running-selenium-with-headless-chrome-works-locally-but-not. And the accepted solution includes using the --disable-gpu-sandbox and --single-process flags.

@carterthayer
Copy link
Author

It worked! One of those flags did anyway.

I'll do some trial and error to figure out which one it was and update the issue for others who come along and find this.

@jonmmease
Copy link
Collaborator

Awesome! Yeah, we'd really appreciate it if you could narrow down which flag helped. If it doesn't look like it would cause any issues in other use cases, it would be great to add it to the default set.

@carterthayer
Copy link
Author

It was --single-process that fixed it for me.

I added

import plotly.io as pio
                                                                         
pio.kaleido.scope.chromium_args += ("--single-process",) 

Thanks for your help @jonmmease

@jonmmease
Copy link
Collaborator

Ok! thanks letting us know.

Looks like the need for --single-process on Lambda is a known situation:

As we discussed a bit in #45, the --single-process flag isn't recommended for use beyond debugging (https://www.chromium.org/developers/design-documents/process-models). So we shouldn't make it the overall default. Would you be willing to do a couple more experiments to see if either of the other process flags listed in https://www.chromium.org/developers/design-documents/process-models also fix the issue? In particular --process-per-site and --process-per-tab? These aren't listed as unsafe, and so they would be candidates for adding as defaults if they make a different in this context.

Alternatively, I'd be open to checking environment variables to add --single-process specifically when running on Lambda (https://docs.aws.amazon.com/lambda/latest/dg/configuration-envvars.html) if we determine it's the only way to get things working.

Thanks!

@jonmmease
Copy link
Collaborator

Update: I tried --process-per-site and --process-per-tab on AWS Lamba and both fail with the GPU crash error messages as described in this issue.

#76 adds the --single-process flag to the default set of chromium flags when kaleido detects that it is running on AWS lambda (based on the presence of the LAMBDA_RUNTIME_DIR environment variable).

@jonmmease
Copy link
Collaborator

Automatic AWS Lambda detection released in 0.2.0.

@jonmmease jonmmease added this to the 0.2.0 milestone Mar 4, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants