Orchestrator fails with multilingual model on Azure Sites #1287

nephinj · 2021-08-25T17:26:44Z

Versions

@microsoft/botframework-cli/4.14.1
win32-x64
node-v14.13.0

Describe the bug

When building a model with pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx azure sites does not seem to find the model or thinks it is corrupt. Code works in the emulator and with ngork when running it locally. If I switch to the default English model it all works fine locally and on Azure Sites.

To Reproduce

Add data sources to the model from QnAMaker.
bf orchestrator:add -t qna --id "<your id here>" -k "<key here>" --routingName <routing name>
Create the model

md model
bf orchestrator:basemodel:get --versionId=pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx --out ./model

Create the generated

md generated
cd CognativeModels
bf orchestrator:create --hierarchical --in CognitiveModels --model ../model --out ../generated

Expected behavior

Expect code to find the model and return the right result if using the English or multi-lingual model.

Screenshots

Error in the log file:
Hosting environment: Production
Content root path: D:\home\site\wwwroot
Now listening on: http://127.0.0.1:45432
Application started. Press Ctrl+C to shut down.
EXCEPTION THROWN - utility_onnx::OnnxUtility::InitOnnxSession(): e.what()=Load model from D:\home\site\wwwroot\model\model.onnx failed:bad allocation, FILE=D:\a\1\s\oc\utility\OnnxUtility.h, LINE=117
EXCEPTION THROWN - OC - EmbedderBase::EmbedderBase(json const& config, const string onnxVocabFileDefault, const string onnxModelFileDefault): e.what()=Load model from D:\home\site\wwwroot\model\model.onnx failed:bad allocation, FILE=D:\a\1\s\oc\EmbedderBase.cc, LINE=57
fail: Microsoft.Bot.Builder.Integration.AspNet.Core.BotFrameworkHttpAdapter[0]
[OnTurnError] unhandled error : Failed to find or load Model with path D:\home\site\wwwroot\model
System.InvalidOperationException: Failed to find or load Model with path D:\home\site\wwwroot\model
---> System.ApplicationException: Load model from D:\home\site\wwwroot\model\model.onnx failed:bad allocation
at Microsoft.BotFramework.Orchestrator.Orchestrator..ctor(String baseModelConfigOrPath)
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.b__39_0(String path)
--- End of inner exception stack trace ---
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.b__39_0(String path)
at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func2 valueFactory)
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.InitializeModel()
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.RecognizeAsync(DialogContext dc, Activity activity, CancellationToken cancellationToken, Dictionary2 telemetryProperties, Dictionary2 telemetryMetrics)
at SSC.Chatbot.QnABot1.OnMessageActivityAsync(ITurnContext1 turnContext, CancellationToken cancellationToken) in D:\a\1\s\Bots\QnABot.cs:line 121
at Microsoft.Bot.Builder.ActivityHandler.OnTurnAsync(ITurnContext turnContext, CancellationToken cancellationToken)
at SSC.Chatbot.QnABot`1.OnTurnAsync(ITurnContext turnContext, CancellationToken cancellationToken) in D:\a\1\s\Bots\QnABot.cs:line 97
at Microsoft.Bot.Builder.TelemetryLoggerMiddleware.OnTurnAsync(ITurnContext context, NextDelegate nextTurn, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.Integration.ApplicationInsights.Core.TelemetryInitializerMiddleware.OnTurnAsync(ITurnContext context, NextDelegate nextTurn, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.BotFrameworkAdapter.TenantIdWorkaroundForTeamsMiddleware.OnTurnAsync(ITurnContext turnContext, NextDelegate next, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.MiddlewareSet.ReceiveActivityWithStatusAsync(ITurnContext turnContext, BotCallbackHandler callback, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.BotAdapter.RunPipelineAsync(ITurnContext turnContext, BotCallbackHandler callback, CancellationToken cancellationToken)

Bot displayed error:

File seems to exist at the location it is looking:

Additional details

Tested with the following models:
(fails) --versionId=pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx
(fails) --versionId=pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx
(works) --versionId=pretrained.20210608.microsoft.dte.01.06.int.unicoder_multilingual.onnx
(works) no versionId specified.

[bug][Orchestrator]

The text was updated successfully, but these errors were encountered:

scheyal · 2021-08-26T18:17:24Z

@daveta - can you please investigate, advise?

tsuwandy · 2021-08-26T19:45:23Z

@nephinj, just to confirm, the multilingual model works when running locally? what's the size of the azure vm?

daveta · 2021-08-26T19:46:15Z

@nephinj : Specifically, is it x64? And how much memory does the VM have? We recommend x64 VM's.

hcyang · 2021-08-27T04:11:16Z

Hi @nephinj, can you list the file sizes of the unzipped model folder?

In the error message dump, "bad allocation" indicated that the system/machine does not have sufficient (contiguous) memory to load the Orchestrator Multilingual ONNX model file. Please provision a larger Azure VM and try again. Bad allocation is a standard C++ exception if a program cannot allocate enough memory, please see https://www.cplusplus.com/reference/new/bad_alloc/ for details. Thanks.

nephinj · 2021-08-27T18:06:31Z

Model works when running locally. The only multilingual model I could get to work on Azure was pretrained.20210608.microsoft.dte.01.06.int.unicoder_multilingual.onnx. I was running the 32bit with the P1V2 Production Sku which has 3.5 GB of memory. I can try changing it to x64 to see if that helps and playing with the various plans.

hcyang · 2021-08-27T18:23:46Z

Hi @nephinj, the "int" model is quantized and 1/4 in size compared to the model it got quantized from. The quantized model could perform a little worse than its original model, but only about 1% drop (micro average accuracy) in our experiences.
Even though we also built Orchestrator for the Windows 32-bit platform, but Orchestrator is more optimal for x64.

nephinj · 2021-08-30T13:51:33Z

Switching to x64 and bumping the memory to 14GB did not help. I can see 6GB of memory free in application insights.

hcyang · 2021-08-30T17:44:12Z

Hi @nephinj, please give us the exact Azure VM config you are using, so we can repro. Thanks.

nephinj · 2021-08-30T18:12:37Z

It is running on Azure App Service with these settings:

hcyang · 2021-08-31T21:03:31Z

Hi @nephinj, thanks for the Azure configuration. We can repro the issue on an Azure VM. Will debug,

hcyang · 2021-09-01T01:01:15Z

Hi @nephinj, we suspect that you might be using a x86 nodejs installation in your Azure VM, thus the bf-cli packages were also installed as their 32-bit versions. Please double check if your Azure VM were installed with x64 version or not.

hcyang · 2021-09-01T15:00:14Z

Hi @nephinj, we have tested several Azure VM scenarios and the bigger multilingual models (800MB+) can only be loaded using x64 build which can be installed along with an x64 NodeJS installation. We were able to run Orchestrator on the even bigger 12L multilingual model (1GB+) using a VM provisioned with only 4GB main memory (B2s sku). We also tested x86 Orchestrator (installed by an x86 NodeJS installation) on a variety of x64 Windows Azure VMs with memory provisioned from 8GB to 64GB. Just like what you described, Orchestrator cannot load the larger multilingual models (6L or 12L), but was able to load smaller ones (quantized or EN-only models).

Since you started with "32bit with the P1V2 Production Sku which has 3.5 GB of memory", I suspected the NodeJS installation was still x86 even after you upgraded the VM to x64. Based on my experiences, I think the 3.5GB P1V2 VM should be sufficient to load any Orchestrator models as long as its NodeJS and Orchestrator packages were x64.

nephinj · 2021-09-02T15:12:28Z

@hcyang Thanks for helping to look into this. I can verify by running node -p "process.arch" in the Azure console that it was still 32 bit. Looks like I also had to set the WEBSITE_NODE_DEFAULT_VERSION == ~14 in the application settings to get it to switch. I will retest the model this afternoon.

hcyang · 2021-09-13T21:41:14Z

Hi @nephinj, I think this issue is well understood and can be closed. Let us know if you have more questions.

munozemilio · 2021-09-20T20:50:23Z

Hi @nephinj I'm closing this due to inactivity. Feel free to reopen if needed

nephinj added bug Indicates an unexpected problem or an unintended behavior. needs-triage The issue has just been created and it has not been reviewed by the team. labels Aug 25, 2021

dmvtech added Bot Services Required for internal Azure reporting. Do not delete. Do not change color. customer-reported Issue is created by anyone that is not a collaborator in the repository. labels Aug 26, 2021

tracyboehrer assigned scheyal Aug 26, 2021

scheyal assigned daveta Aug 26, 2021

hcyang assigned hcyang and unassigned daveta and scheyal Aug 27, 2021

hcyang added customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. and removed needs-triage The issue has just been created and it has not been reviewed by the team. labels Aug 27, 2021

hcyang added Area: AI-Orchestrator and removed bug Indicates an unexpected problem or an unintended behavior. labels Sep 1, 2021

munozemilio added this to the R15 milestone Sep 10, 2021

munozemilio closed this as completed Sep 20, 2021

This was referenced Jul 31, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 47-studio-org/botframework-cli#19

Open

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 47-studio-org/botframework-cli#20

Open

MarcelRaschke mentioned this issue Jul 31, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 MarcelRaschke/botframework-cli#13

Open

baby636 mentioned this issue Jul 31, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 baby636/botframework-cli#40

Open

MarcelRaschke mentioned this issue Jul 31, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 47-studio-org/botframework-cli#21

Open

snyk-bot mentioned this issue Aug 1, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 MarcelRaschke/botframework-cli#14

Open

MarcelRaschke mentioned this issue Aug 1, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 47-studio-org/botframework-cli#22

Open

snyk-bot mentioned this issue Aug 1, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 47-studio-org/botframework-cli#23

Open

baby636 mentioned this issue Aug 1, 2022

[Snyk] Security upgrade node-fetch from 2.6.7 to 3.2.10 baby636/botframework-cli#41

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Orchestrator fails with multilingual model on Azure Sites #1287

Orchestrator fails with multilingual model on Azure Sites #1287

nephinj commented Aug 25, 2021 •

edited

Loading

scheyal commented Aug 26, 2021

tsuwandy commented Aug 26, 2021

daveta commented Aug 26, 2021

hcyang commented Aug 27, 2021

nephinj commented Aug 27, 2021

hcyang commented Aug 27, 2021

nephinj commented Aug 30, 2021

hcyang commented Aug 30, 2021

nephinj commented Aug 30, 2021 •

edited

Loading

hcyang commented Aug 31, 2021

hcyang commented Sep 1, 2021

hcyang commented Sep 1, 2021

nephinj commented Sep 2, 2021

hcyang commented Sep 13, 2021

munozemilio commented Sep 20, 2021

Orchestrator fails with multilingual model on Azure Sites #1287

Orchestrator fails with multilingual model on Azure Sites #1287

Comments

nephinj commented Aug 25, 2021 • edited Loading

Versions

Describe the bug

To Reproduce

Expected behavior

Screenshots

Additional details

scheyal commented Aug 26, 2021

tsuwandy commented Aug 26, 2021

daveta commented Aug 26, 2021

hcyang commented Aug 27, 2021

nephinj commented Aug 27, 2021

hcyang commented Aug 27, 2021

nephinj commented Aug 30, 2021

hcyang commented Aug 30, 2021

nephinj commented Aug 30, 2021 • edited Loading

hcyang commented Aug 31, 2021

hcyang commented Sep 1, 2021

hcyang commented Sep 1, 2021

nephinj commented Sep 2, 2021

hcyang commented Sep 13, 2021

munozemilio commented Sep 20, 2021

nephinj commented Aug 25, 2021 •

edited

Loading

nephinj commented Aug 30, 2021 •

edited

Loading