-
Notifications
You must be signed in to change notification settings - Fork 519
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
"Channel type not available" summarization error #4414
Comments
Looking at last 24 hours, here are how many events are in Performance table per runtimeVersion: >= 0.28.7 And this is how many channel errors we have: So either something got much better in 0.28.9 (but might be still there), or it's not corelated to runtime version (i.e. it's something external). |
Note - last two weeks it got much worse on Friday, which (given UTC times) may very likely point to read-out file. FVDhiOnGL%2BoJneT5IbJWlTBzL8eU8yBOjvN4pcPfrbk%3D 182 Top file also hit a lot of "Cannot read property 'call' of undefined", which may suggest it's related to mismatch in code versions between summarizer and main client. That's actually true for all but 4th file in this list, so looks like there is some relationship. |
Looking at the git commits, the only change I am seeing specifically in 0.28.7 is 12898b1 and it doesn't look related to this 0.28.9 contained the fix for the summarizer timeout issue that was causing documents to error out after they had reached max number of ops prior to summary. The other changes from 0.28.7 -> 0.28.9 don't look related |
On joining with the Bohemia errors, I'm seeing the highest number also for 0.28.7 package/loader versions although across a couple different app versions |
For the fourth document on the list of top erroring ones above, where the ID is VpDT9YnurYKX7aVHhFLtzhlwsOPLQUHKKzEyIS7JFQ8%3D, I was also seeing the "Error 409 from the server, msg = , type = cors" error message immediately prior to the "Channel type not available" error union kind=outer database("Office Fluid").Office_Fluid_FluidRuntime_* |
Found this document Q50eQ%2FtQmJY8Re0Fc8udsPH621ini7LkRsI9ki12AJE%3D where we see the error happen after a summary ack timeout instead of a server error union kind=outer database("Office Fluid").Office_Fluid_FluidRuntime_* |
PR with more telemetry has been submitted. |
@vladsud I am getting 0 hits in last 30days for this query: |
@jatgarg, I do not know what "has" operator does, but "contains" is the right one and it finds a ton of hits. |
ohh, it is same but better performance wise, imo. Kusto always sayd to use has instead of contains. let me check. |
All of these errors are for @ms/atmentions package with id atmentions. No other data store faces this issue. |
Aha, I know what it is. It's _search/01 blob that we output for search. This will (eventually) be addressed by @arinwt'swork to move DDSs (and data stores level up) into its own "channels" sub-tree. Meanwhile we should assess if these issues are affecting users in any way. If not, we should just wait for right structure of snapshots to happen. |
Keeping this item for tracking only (as no actual additional work is required - the issue will go away once snapshots for DDSs are under their own path). Moving to January as it will not happen this month. |
I think we should do a hotfix to exclude "_search" in the meantime? That change we are waiting for requires multiple releases. Then we can resolve this issue with the hotfix and leave the other issue to track that change. |
What is not clear to me is why we are trying to load that "channel"? It definitely has no ops, as this is not real channel. It is present (as a sub-tree) in previous snapshot, and now summarizer decided to realize data store and summarize it, but it should go only after channels that changed (had ops on top of latest snapshot). Is this because we switch (for some reason) to full summary, and thus disregard incremental process? |
Just hit that issue in debugger when forcing full = true for summaries. Which makes sense - full tries to rehydrate all DDSs, and it treats _search as DDS. @arinwt - can you please do an assessment if it deserves a patch? It's simple code changes, the only reason I hesitate is that I do not want to see "_search" term in Fluid repo, we need to not forget to delete it once we have better structure to isolate channels. |
This Chapter 1 file (which is very important for our planning) fails (with no tinkering in debugger) with these errors. |
Fixed has been submitted to ignore _search sub-tree. This issue will be tracking reversal of this fix once it becomes unnecessary with the work that Arin is doing. |
Forked the only remaining work item here as #4963 |
Top OCE issue (assuming other two top issues are addressed - still waiting full confirmation).
717 hits per last 24 hours!
Office_Fluid_FluidRuntime_Error
| where Event_Time > ago(1d)
| where Data_error contains "Channel type not available"
| where Data_eventName contains ":Summar"
| summarize count() by Data_eventName, Data_message, Data_stack
Based on data, first hits are
Really huge spike on Nov 17-18.
Based on versions, huge jump in 0.28.7
Typical stack:
Error: Channel type not available
at D.loadChannel (https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:384:1600)
at async D.summarizeInternal (https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:384:1208)
at async $.summarize (https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:162:3095)
at async https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:392:7506
at async Promise.all (index 0)
at async Object.summarize (https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:392:7307)
at async Object.t.summarize (https://officefluidprodversionedcdn.azureedge.net/container/hashed/atMentionsComponent.3e08f53224494453f793.js:1:17770)
at async J.summarizeInternal (https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:162:13855)
at async $.summarize (https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:162:3095)
at async https://officefluidprodversionedcdn.azureedge.net/container/hashed/officeContainer.2f220687466f2e8891d1.js:231:17197,
The text was updated successfully, but these errors were encountered: