Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposed fix to: (The app is attempting to load the component from ****, and hasn't received its "streamlit" message.) #7046

Closed
mkleinbort-ic opened this issue Jul 20, 2023 · 48 comments · Fixed by #8179
Assignees
Labels
feature:custom-components type:enhancement Requests for feature enhancements or new features

Comments

@mkleinbort-ic
Copy link

Problem

Custom components often fail to load when apps are deployed to Azure Container Apps or GCP App Run.

image

This has been reported many times and there is no known solution:

Solution

I think implementing some optional caching of custom component resources (e.g. style sheets) at installation time / the first time the app is run would greatly mitigate this issue.

@mkleinbort-ic mkleinbort-ic added the type:enhancement Requests for feature enhancements or new features label Jul 20, 2023
@mkleinbort-ic mkleinbort-ic changed the title Proposed fix to The app is attempting to load the component from ****, and hasn't received its "streamlit" message. Proposed fix to: (The app is attempting to load the component from ****, and hasn't received its "streamlit" message.) Jul 20, 2023
@mkleinbort-ic
Copy link
Author

This is a big deal to us - if a solution is known please share it 😅

@carolinedlu
Copy link
Collaborator

Hey @mkleinbort-ic,
Thanks for flagging this! Just wanted to confirm that 1. This is happening when your app is deployed on Azure Container Apps and GCP App Run (not Community Cloud), and 2. You're using the most recent version of Streamlit?

@mkleinbort-ic
Copy link
Author

Yes, this has been an issue with basically all versions of Streamlit since custom components came out - but to be clear, I'm on 1.25.0 at the moment.

I've not tested this in community cloud .

The issue seems to happen regularly (1 in every 3 page loads) when accessing a Azure Container app or GCP App Run (I'm sure it's also common in other deployments, but these are the ones I've experienced myself).

@carolinedlu
Copy link
Collaborator

Thanks for clarifying! I'm checking with our team to see if we have any work planned in this area and will update you when I have more info

@layandreas
Copy link

I’m deploying to cloud run and could at least mitigate the issue by using session affinity: https://cloud.google.com/run/docs/configuring/session-affinity

However it still pops up once in a while.

@carolinedlu
Copy link
Collaborator

That's super helpful, @layandreas – I saw a few users mention that in the forum as well

@mkleinbort-ic
Copy link
Author

My thought is that this is an issue with the plugin ecosystem in general - there may be solutions for CloudRun or for AzureContainerApps and elsewhere - but the common issue seems to cut across many type of deployments.

I think the root cause is that some of the files needed to run the Streamlit Custom Components are fetched over the network when an app starts. The issue is made worse by these ephemeral container runtime given each instance starts with a "blank slate" and has to fetch the resources on start up.

@drmaddy736
Copy link

Yea please fix this

@temibabs
Copy link

Is anyone working on this?

@mkleinbort-ic
Copy link
Author

Is anyone going to fix this? Or are streamlit components not production ready.

@mkleinbort-ic
Copy link
Author

@carolinedlu 👆

@mkleinbort-ic
Copy link
Author

I feel like this is being ignored by Streamlit - the issue has been raised time and time again for 3+ years! and still no clear communication on what the issue is or if anyone at Streamlit plans to address it.

@IndigoJay
Copy link

I experience this with Streamlit 1.28.0, deployed on Amazon AWS ECS.

After a Streamlit task container is restarted and back online, I load the Streamlit webapp and expect to see an aggrid table. I receive the message “Your app is having trouble loading the st_aggrid.agGrid component.” Then I wait 3 to 5 minutes and reload the page with CTRL-F5. Then everything works (until the next time the task is restarted).

@KeyRotate
Copy link

This issue has been bothering me for quite some time and is affecting the customer experience, hopefully the Streamlit team will be able to resolve this issue. My application is also deployed on GCP

@janhenner
Copy link

I'm confirming the issue with AWS ECS. It's the main blocker for Streamlit's enterprise use in my view. Happy to see this solved early in 2024 🎉!

@simonbohnen
Copy link

@janhenner Where do you see that this will be solved in early 2024?

@janhenner
Copy link

@janhenner Where do you see that this will be solved in early 2024?

It's only my wishful thinking 🤔 at this point when I write "Happy to see this solved early in 2024 🎉!"

On NOV 24, 2023 I had reached out to hello@streamlit.io with this and I just pinged them given I received no response so far.

@v1pz3n
Copy link

v1pz3n commented Jan 27, 2024

Does anyone have any insight on how to solve this issue with hosted streamlit applications?

We are facing the same problem with the streamlit-feedback plugin which renders an iframe on every rerun.

We are using it in a conversational chat and after around 5 interactions it starts showing this error.

We will likely have to give up on taking the applications into production due to this limitation and lack of guidance from the Streamlit team.

@simonbohnen
Copy link

@v1pz3n I think using session affinity fixes the problem. When using session affinity, all requests of the same client are sent to the same server replica. @layandreas described how to activate session affinity for Google Cloud here.

If you are deploying using Kubernetes, you can enable session affinity like this for your service:

[...]
spec:
  [...]
  sessionAffinity: ClientIP

Read more about Kubernetes Session Affinity here.

@iuiu34
Copy link

iuiu34 commented Jan 31, 2024

@simonbohnen thanks for the fix, i'll give it a try.
Why do we need to enable sessionAffinity in components to properly work, but not in native st widgets?
do you know?

@simonbohnen
Copy link

I'm not sure honestly. Maybe the caching works differently?

@iuiu34
Copy link

iuiu34 commented Jan 31, 2024

Yepp, should be something like that. But i was wondering if it's possible to fix it on component side.
I think streamlit team should work to have same experience in native & non-native widgets

@layandreas
Copy link

Session affinity seemed to reduce this issues, but it was still prevalent. In the end I just went without components as it was just too common.

@sfc-gh-jcarroll
Copy link
Collaborator

👋 I'm on the Streamlit product team and wanted to:

  • Acknowledge all the pain this has caused and the reasonable frustration about the lack of a clear response or way to solve it
  • Let y'all know that I'm starting to actively look at this issue and ideate about some solutions.

I don't have any commitments yet on what/when a solve will be available but wanted to let you know.

@layandreas
Copy link

layandreas commented Feb 2, 2024

Thinking out loud, if we had a way (such as in config.toml) to declare certain components to pre-load the static assets on web server startup, does it seem like that would help mitigate this issue? I know there's also been discussion in the past about making the timeout configurable.

We also recently introduced some loading / skeleton elements, and wondering about showing those instead of the ugly warning in cases where we do need some placeholder temporarily so it's less jarring.

Maybe relevant: At least in my case after a full page refresh the custom component did always load. Might just have been due to connecting to a different cloud run instance on reconnect - not sure about that.

@mkleinbort-ic
Copy link
Author

@sfc-gh-jcarroll - thank you for taking a look, very excited to see streamlit components realize their potential.

Having components pre-load their static assets would mitigate the issue, though I worry that this will make application starts very slow.

Would it not be easier to have a way to "dry-run" the streamlit application and cache the resources required by custom components? I imagine this is effectively how streamlit avoids the issue with native components.

@sfc-gh-jcarroll
Copy link
Collaborator

Makes sense. Yep we definitely will not make any changes here that have a significant effect on the server boot time. Will report back when we have an update on this.

@v1pz3n
Copy link

v1pz3n commented Feb 2, 2024

@v1pz3n I think using session affinity fixes the problem. When using session affinity, all requests of the same client are sent to the same server replica. @layandreas described how to activate session affinity for Google Cloud here.

If you are deploying using Kubernetes, you can enable session affinity like this for your service:

[...]
spec:
  [...]
  sessionAffinity: ClientIP

Read more about Kubernetes Session Affinity here.

In my case this does not solve the problem because it always occurs whenever a quantity x of refreshes happen in the component, independent of the number of users/sessions.

Providing more details on the environment, I'm using AWS with ECS, ALB (with sticky session) and CloudFront with a single task/container. However we did tests eliminating CloudFront and still the problem persists.

One curious thing is that testing locally the problem does not occur. So we made several changes in the proxy and CDN layer, including CORS and other headers that could be affecting, since locally it works and externally it does not.

I also updated the config.toml including the external urls but I believe the solution is not there.

The problem always happens after having to reload 8 to 15 iframes, I suspect the problem lies there. We're even considering the possibility of rewriting the component so that it does not include an iframe.

@v1pz3n
Copy link

v1pz3n commented Feb 2, 2024

Another thing, even with all the debugging enabled on the server and the browser console, I cannot get more information about the error.

Curiously it is only visible for half a second, making it difficult to debug the problem because it gets hidden so quickly.

@sfc-gh-jcarroll
Copy link
Collaborator

Thanks @v1pz3n for the detailed information.

Curiously it is only visible for half a second, making it difficult to debug the problem because it gets hidden so quickly.

It sounds like in your use case currently, the assets do eventually arrive in the browser and the app + components load as expected, but the delay is longer than desired which causes the warning to show up. Did I understand that correctly?

In that case, if we improve the UI to show like a subtle loading animation instead of the warning for an extended period, would it mitigate the issue for you? Or not?

I think we can take a multi-prong approach here and also optimize the latency, but it seems like a simple UI improvement would alleviate a lot of the pain for the case where the assets do arrive but just take too long.

@IndigoJay
Copy link

IndigoJay commented Feb 2, 2024

It sounds like in your use case currently, the assets do eventually arrive in the browser and the app + components load as expected, but the delay is longer than desired which causes the warning to show up. Did I understand that correctly?

I experience this with Streamlit 1.28.1 and the Streamlit-aggrid 0.3.4 library, deployed on Amazon AWS ECS. After a Streamlit container is restarted and back online, I load the Streamlit webapp (can be 1 minute, or 1 hour later) and expect to see an aggrid table. I receive the message “Your app is having trouble loading the st_aggrid.agGrid component.” I let the message display for a few minutes then reload the page with CTRL-F5. Then everything works until the next time the container is restarted. If you run multiple containers, they all need to be touched after restart, or else end-users will see the error.

More comments on this issue at PablocFonseca/streamlit-aggrid#7

@v1pz3n
Copy link

v1pz3n commented Feb 10, 2024

Thanks @v1pz3n for the detailed information.

Curiously it is only visible for half a second, making it difficult to debug the problem because it gets hidden so quickly.

It sounds like in your use case currently, the assets do eventually arrive in the browser and the app + components load as expected, but the delay is longer than desired which causes the warning to show up. Did I understand that correctly?

In that case, if we improve the UI to show like a subtle loading animation instead of the warning for an extended period, would it mitigate the issue for you? Or not?

I think we can take a multi-prong approach here and also optimize the latency, but it seems like a simple UI improvement would alleviate a lot of the pain for the case where the assets do arrive but just take too long.

Exactly, in my case this would solve the problem as I was unable to identify a lack of resource on the server or browser. What accelerates the appearance of the problem is the amount and size of information exchanged between the server and the user, as we are using a chat and with each LLM response the feedback plugin is rendered.

Even if you say this, it even makes sense why it works locally and not remotely.

@MohakChugh
Copy link

I can confirm this is still an issue with the latest streamlit version, and does not work on ECS fargates in AWS. Looking forward to a fix for this from the streamlit team.

@sfc-gh-jcarroll
Copy link
Collaborator

We're working now on a fix in the UI so that this warning will show up much less often and the user has a nice experience in the case of latency. We're hoping this will go a very long way to reducing the pain from this issue.

We also put together a flow diagram showing how the calls and network requests go when you use a custom component in an app. Hoping this will help you to understand the flow and identify when latency is showing up.

CustomComponents_Flow

Based on the flow, it wasn't obvious to me where something like "the component is slow the first time the app runs and faster in later sessions" would occur, unless there is some asset caching happening in the cloud provider or a CDN. If we are still seeing this causing a lot of pain after the UI fix, we can dig into where the latency is coming from and whether there's work in the Streamlit side to make that better.

@sfc-gh-jcarroll
Copy link
Collaborator

Here's what the update would look like - aggrid takes a few seconds to load, but you see the pulsing skeleton instead of the warning. Thoughts?

component_skeleton.mov

@simonbohnen
Copy link

That'd be fantastic for us!

@sfc-gh-jcarroll
Copy link
Collaborator

The UI update will be in the next release which is targeted in ~2 weeks. You are welcome to try the nightly build from tomorrow as well to see how it works in your use case (we'd love to hear if you do!)

Our hypothesis is that this resolves 80-90% of the issue where latency / performance is a factor, so we can close this. But welcome further reports and/or new issues (feel free to tag me) if you still have problems beyond the next release.

@zeqi2000
Copy link

@sfc-gh-jcarroll Hello, When I upgraded streamlit from version1.30.0 to version1.32.1, the wait for the component to load did get much longer, but it still didn't load successfully in the end, and the warning still appeared.

@MohakChugh
Copy link

The new update fixes the issue for me. Thanks to the dev team for prioritising the fix! 🙌🏻

@hakankaraoguz
Copy link

This doesn't fix Azure Web application that uses st-star-rating It waits to load the component for minutes and then fail with the same fail message.

@sfc-gh-jcarroll
Copy link
Collaborator

@zeqi2000 there are some performance issues with 1.32 / 1.32.1 that we are working to address which might be contributing to that behavior.

@hakankaraoguz was it ever working for you before in Azure Web application? If not, it sounds like a proxy / network configuration issue similar to others described in this thread. There isn't a fix we can apply in Streamlit in that case.

If it was working before and stopped working, if you could provide a more detailed repro (preferably in a new issue) we can investigate. Feel free to tag me if you file one.

@hakankaraoguz
Copy link

Hello @sfc-gh-jcarroll Thank you for your prompt reply. Before the fix it was working fine after couple of page refreshes. With the fix, it tries to load the component for a longer time but in the end it fails with the error message as before.

@sfc-gh-jcarroll
Copy link
Collaborator

Thanks @hakankaraoguz - it's difficult for us to diagnose or assess with limited information, and we aren't able to easily set up an Azure Web environment to attempt a repro.

If you could:

  • Run a test with the latest release 1.32.2
  • Share sample app code
  • Record a video of the browser behavior, including a few app refreshes and with the developer console log view open and readable

Then we could follow up and investigate. As I mentioned, it would be great to do this in a new issue and tag me. Otherwise, we won't have enough information to help. Thanks!

@vinay235
Copy link

vinay235 commented Mar 25, 2024

@sfc-gh-jcarroll I have deployed a demo web app on Azure with the latest release and the issue still persists. Do you need the recording using the webapp and the Developer console open during the interactions to know more?

@sfc-gh-jcarroll
Copy link
Collaborator

That would be awesome, yep @vinay235 ! Thanks!

@edders51
Copy link

edders51 commented Apr 5, 2024

I am still having this issue, in both cloud run and app engine. Using 1.32.2

@raethlein
Copy link
Collaborator

We have detected an issue that in a different component sometimes led to a blank state. @kmcgrady has a fix here: #8434
I wonder whether it's related

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:custom-components type:enhancement Requests for feature enhancements or new features
Projects
None yet
Development

Successfully merging a pull request may close this issue.