Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Orchestrator fails with multilingual model on Azure Sites #1287

Closed
nephinj opened this issue Aug 25, 2021 · 15 comments
Closed

Orchestrator fails with multilingual model on Azure Sites #1287

nephinj opened this issue Aug 25, 2021 · 15 comments
Assignees
Labels
Area: AI-Orchestrator Bot Services Required for internal Azure reporting. Do not delete. Do not change color. customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. customer-reported Issue is created by anyone that is not a collaborator in the repository.
Milestone

Comments

@nephinj
Copy link

nephinj commented Aug 25, 2021

Versions

@microsoft/botframework-cli/4.14.1
win32-x64
node-v14.13.0

Describe the bug

When building a model with pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx azure sites does not seem to find the model or thinks it is corrupt. Code works in the emulator and with ngork when running it locally. If I switch to the default English model it all works fine locally and on Azure Sites.

To Reproduce

  1. Add data sources to the model from QnAMaker.
    bf orchestrator:add -t qna --id "<your id here>" -k "<key here>" --routingName <routing name>
  2. Create the model
md model
bf orchestrator:basemodel:get --versionId=pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx --out ./model
  1. Create the generated
md generated
cd CognativeModels
bf orchestrator:create --hierarchical --in CognitiveModels --model ../model --out ../generated

Expected behavior

Expect code to find the model and return the right result if using the English or multi-lingual model.

Screenshots

Error in the log file:
Hosting environment: Production
Content root path: D:\home\site\wwwroot
Now listening on: http://127.0.0.1:45432
Application started. Press Ctrl+C to shut down.
EXCEPTION THROWN - utility_onnx::OnnxUtility::InitOnnxSession(): e.what()=Load model from D:\home\site\wwwroot\model\model.onnx failed:bad allocation, FILE=D:\a\1\s\oc\utility\OnnxUtility.h, LINE=117
EXCEPTION THROWN - OC - EmbedderBase::EmbedderBase(json const& config, const string onnxVocabFileDefault, const string onnxModelFileDefault): e.what()=Load model from D:\home\site\wwwroot\model\model.onnx failed:bad allocation, FILE=D:\a\1\s\oc\EmbedderBase.cc, LINE=57
fail: Microsoft.Bot.Builder.Integration.AspNet.Core.BotFrameworkHttpAdapter[0]
[OnTurnError] unhandled error : Failed to find or load Model with path D:\home\site\wwwroot\model
System.InvalidOperationException: Failed to find or load Model with path D:\home\site\wwwroot\model
---> System.ApplicationException: Load model from D:\home\site\wwwroot\model\model.onnx failed:bad allocation
at Microsoft.BotFramework.Orchestrator.Orchestrator..ctor(String baseModelConfigOrPath)
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.b__39_0(String path)
--- End of inner exception stack trace ---
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.b__39_0(String path)
at System.Collections.Concurrent.ConcurrentDictionary2.GetOrAdd(TKey key, Func2 valueFactory)
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.InitializeModel()
at Microsoft.Bot.Builder.AI.Orchestrator.OrchestratorRecognizer.RecognizeAsync(DialogContext dc, Activity activity, CancellationToken cancellationToken, Dictionary2 telemetryProperties, Dictionary2 telemetryMetrics)
at SSC.Chatbot.QnABot1.OnMessageActivityAsync(ITurnContext1 turnContext, CancellationToken cancellationToken) in D:\a\1\s\Bots\QnABot.cs:line 121
at Microsoft.Bot.Builder.ActivityHandler.OnTurnAsync(ITurnContext turnContext, CancellationToken cancellationToken)
at SSC.Chatbot.QnABot`1.OnTurnAsync(ITurnContext turnContext, CancellationToken cancellationToken) in D:\a\1\s\Bots\QnABot.cs:line 97
at Microsoft.Bot.Builder.TelemetryLoggerMiddleware.OnTurnAsync(ITurnContext context, NextDelegate nextTurn, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.Integration.ApplicationInsights.Core.TelemetryInitializerMiddleware.OnTurnAsync(ITurnContext context, NextDelegate nextTurn, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.BotFrameworkAdapter.TenantIdWorkaroundForTeamsMiddleware.OnTurnAsync(ITurnContext turnContext, NextDelegate next, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.MiddlewareSet.ReceiveActivityWithStatusAsync(ITurnContext turnContext, BotCallbackHandler callback, CancellationToken cancellationToken)
at Microsoft.Bot.Builder.BotAdapter.RunPipelineAsync(ITurnContext turnContext, BotCallbackHandler callback, CancellationToken cancellationToken)

Bot displayed error:
Bot error

File seems to exist at the location it is looking:
Azure Console

Additional details

Tested with the following models:
(fails) --versionId=pretrained.20210205.microsoft.dte.00.06.unicoder_multilingual.onnx
(fails) --versionId=pretrained.20201210.microsoft.dte.00.12.unicoder_multilingual.onnx
(works) --versionId=pretrained.20210608.microsoft.dte.01.06.int.unicoder_multilingual.onnx
(works) no versionId specified.

[bug][Orchestrator]

@nephinj nephinj added bug Indicates an unexpected problem or an unintended behavior. needs-triage The issue has just been created and it has not been reviewed by the team. labels Aug 25, 2021
@dmvtech dmvtech added Bot Services Required for internal Azure reporting. Do not delete. Do not change color. customer-reported Issue is created by anyone that is not a collaborator in the repository. labels Aug 26, 2021
@scheyal
Copy link
Contributor

scheyal commented Aug 26, 2021

@daveta - can you please investigate, advise?

@tsuwandy
Copy link
Contributor

@nephinj, just to confirm, the multilingual model works when running locally? what's the size of the azure vm?

@daveta
Copy link
Contributor

daveta commented Aug 26, 2021

@nephinj : Specifically, is it x64? And how much memory does the VM have? We recommend x64 VM's.

@hcyang hcyang assigned hcyang and unassigned daveta and scheyal Aug 27, 2021
@hcyang
Copy link
Contributor

hcyang commented Aug 27, 2021

Hi @nephinj, can you list the file sizes of the unzipped model folder?

In the error message dump, "bad allocation" indicated that the system/machine does not have sufficient (contiguous) memory to load the Orchestrator Multilingual ONNX model file. Please provision a larger Azure VM and try again. Bad allocation is a standard C++ exception if a program cannot allocate enough memory, please see https://www.cplusplus.com/reference/new/bad_alloc/ for details. Thanks.

@hcyang hcyang added customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. and removed needs-triage The issue has just been created and it has not been reviewed by the team. labels Aug 27, 2021
@nephinj
Copy link
Author

nephinj commented Aug 27, 2021

Model works when running locally. The only multilingual model I could get to work on Azure was pretrained.20210608.microsoft.dte.01.06.int.unicoder_multilingual.onnx. I was running the 32bit with the P1V2 Production Sku which has 3.5 GB of memory. I can try changing it to x64 to see if that helps and playing with the various plans.

@hcyang
Copy link
Contributor

hcyang commented Aug 27, 2021

Hi @nephinj, the "int" model is quantized and 1/4 in size compared to the model it got quantized from. The quantized model could perform a little worse than its original model, but only about 1% drop (micro average accuracy) in our experiences.
Even though we also built Orchestrator for the Windows 32-bit platform, but Orchestrator is more optimal for x64.

@nephinj
Copy link
Author

nephinj commented Aug 30, 2021

Switching to x64 and bumping the memory to 14GB did not help. I can see 6GB of memory free in application insights.
image

@hcyang
Copy link
Contributor

hcyang commented Aug 30, 2021

Hi @nephinj, please give us the exact Azure VM config you are using, so we can repro. Thanks.

@nephinj
Copy link
Author

nephinj commented Aug 30, 2021

It is running on Azure App Service with these settings:
image
image

@hcyang
Copy link
Contributor

hcyang commented Aug 31, 2021

Hi @nephinj, thanks for the Azure configuration. We can repro the issue on an Azure VM. Will debug,

@hcyang
Copy link
Contributor

hcyang commented Sep 1, 2021

Hi @nephinj, we suspect that you might be using a x86 nodejs installation in your Azure VM, thus the bf-cli packages were also installed as their 32-bit versions. Please double check if your Azure VM were installed with x64 version or not.

@hcyang
Copy link
Contributor

hcyang commented Sep 1, 2021

Hi @nephinj, we have tested several Azure VM scenarios and the bigger multilingual models (800MB+) can only be loaded using x64 build which can be installed along with an x64 NodeJS installation. We were able to run Orchestrator on the even bigger 12L multilingual model (1GB+) using a VM provisioned with only 4GB main memory (B2s sku). We also tested x86 Orchestrator (installed by an x86 NodeJS installation) on a variety of x64 Windows Azure VMs with memory provisioned from 8GB to 64GB. Just like what you described, Orchestrator cannot load the larger multilingual models (6L or 12L), but was able to load smaller ones (quantized or EN-only models).

Since you started with "32bit with the P1V2 Production Sku which has 3.5 GB of memory", I suspected the NodeJS installation was still x86 even after you upgraded the VM to x64. Based on my experiences, I think the 3.5GB P1V2 VM should be sufficient to load any Orchestrator models as long as its NodeJS and Orchestrator packages were x64.

@hcyang hcyang added Area: AI-Orchestrator and removed bug Indicates an unexpected problem or an unintended behavior. labels Sep 1, 2021
@nephinj
Copy link
Author

nephinj commented Sep 2, 2021

@hcyang Thanks for helping to look into this. I can verify by running node -p "process.arch" in the Azure console that it was still 32 bit. Looks like I also had to set the WEBSITE_NODE_DEFAULT_VERSION == ~14 in the application settings to get it to switch. I will retest the model this afternoon.

@munozemilio munozemilio added this to the R15 milestone Sep 10, 2021
@hcyang
Copy link
Contributor

hcyang commented Sep 13, 2021

Hi @nephinj, I think this issue is well understood and can be closed. Let us know if you have more questions.

@munozemilio
Copy link
Member

Hi @nephinj I'm closing this due to inactivity. Feel free to reopen if needed

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area: AI-Orchestrator Bot Services Required for internal Azure reporting. Do not delete. Do not change color. customer-replied-to Indicates that the team has replied to the issue reported by the customer. Do not delete. customer-reported Issue is created by anyone that is not a collaborator in the repository.
Projects
None yet
Development

No branches or pull requests

7 participants