Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash #10038

glefundes · 2021-12-14T18:37:19Z

Describe the bug
I'm currently migrating a service deployed as a serverless function on AWS Lambda to the new ARM64 Graviton2 processor. Importing onnxruntime throws a cpuinfo error and crashes the code with the following messages:

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
--
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what():  /onnxruntime_src/onnxruntime/core/common/cpuid_info.cc:62 onnxruntime::CPUIDInfo::CPUIDInfo() Failed to initialize CPU info.

The files /sys/devices/system/cpu/possible and /sys/devices/system/cpu/present don't exist and apparently this causes the crash. Is this expected behaviour? I'm not sure how to proceed. Is onnxruntime currently not supported by Graviton2 processors? The contents of /proc/cpuinfo are as follows:


processor	: 0
--
BogoMIPS	: 243.75
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0c
CPU revision	: 1
processor	: 1
BogoMIPS	: 243.75
Features	: fp asimd evtstrm aes pmull sha1 sha2 crc32 atomics fphp asimdhp cpuid asimdrdm lrcpc dcpop asimddp ssbs
CPU implementer	: 0x41
CPU architecture: 8
CPU variant	: 0x3
CPU part	: 0xd0c
CPU revision	: 1

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux (AWS Lambda python runtime)
ONNX Runtime installed from (source or binary): binary (with pip)
ONNX Runtime version: 1.10.0
Python version: 3.8.5

The text was updated successfully, but these errors were encountered:

jcreinhold · 2022-01-04T19:38:45Z

I'm also experiencing this issue with a similar setup (see "System information" below). The error message is below as well (the same as the OP). I can add more details if needed/helpful.

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what(): /onnxruntime_src/onnxruntime/core/common/cpuid_info.cc:62 onnxruntime::CPUIDInfo::CPUIDInfo() Failed to initialize CPU info.

System information

OS Platform and Distribution (e.g., Linux Ubuntu 16.04): Linux (AWS Lambda Python arm64 Docker container)
ONNX Runtime installed from (source or binary): binary (with pip)
ONNX Runtime version: 1.10.0
Python version: 3.9

skottmckay · 2022-01-05T08:20:58Z

@chenfucn is this a known issue?

Should we handle cpuinfo failing more gracefully? If it's not critical to have the cpu info maybe logging and ignoring the error is an option.

chenfucn · 2022-01-05T16:50:27Z

Thanks for the info. This is a surprise. Here we are actually leveraging pytorch cpuinfo. This library is used in both pytorch and tensorflow. Do you guys have knowledge about pytorch cpuinfo library facing similar issues?

Currently we are using cpuinfo lib to detect hybrid cores and SDOT UDOT instruction support. Ignoring cpuinfo failure means we lose these functionalities and will cause performance degradation. Especially with DOT instructions, the matrix multiplication can be multiple times slower if we don't use DOT instructions and fall back to neon cores.

I can implement a very crude DOT detection logic in case of cpuinfo failure. However the best solution should be cpuinfo library authors to fix this problem.

chenfucn · 2022-01-05T17:06:46Z

@glefundes and @jcreinhold could you also file this issue to pytorch cpuinfo repo while I prepare a PR to get around this?

jcreinhold · 2022-01-05T17:51:22Z

Thanks for the fast response. I filed the issue on cpuinfo here: pytorch/cpuinfo#76

Let me know if you need me to test anything.

chenfucn · 2022-01-05T17:51:34Z

#10199

workdd · 2022-01-14T08:38:36Z

Could I know if this issue has been resolved? I'm currently having the same problem.

chenfucn · 2022-01-19T19:13:06Z

the above PR already merged, can you try it out?

workdd · 2022-02-07T06:23:06Z

Thanks for response.
Did it published 'pip' also? I installed onnxruntime packages using just below command.
pip install onnxruntime
And I'm still faced same issue.

skottmckay · 2022-02-07T07:52:46Z

It would be in the nightly package until the next official release. https://test.pypi.org/project/ort-nightly/

workdd · 2022-02-08T03:42:38Z

Thank you for the fast response.
Then I'll wait next official release.

jcreinhold · 2022-02-11T14:28:18Z

Thanks for the quick response to this issue. I'm happy to test out the implementation when there is a release candidate, but I've already deployed the model on x86 hardware and want as little downtime as possible.

Will PR #10199 fix what @chenfucn brought up in the below comment?

Currently we are using cpuinfo lib to detect hybrid cores and SDOT UDOT instruction support. Ignoring cpuinfo failure means we lose these functionalities and will cause performance degradation. Especially with DOT instructions, the matrix multiplication can be multiple times slower if we don't use DOT instructions and fall back to neon cores.

Or does pytorch/cpuinfo#76 need to be resolved to fix that problem?

yufenglee · 2022-02-11T16:24:25Z

You need to include both #10199 and #10334 .

stale · 2022-04-16T05:54:48Z

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

glefundes · 2022-05-25T13:21:52Z

Just got the chance to test release 1.11.1 on Graviton2 instances on AWS and can confirm that while the cpuinfo erro messages still show, execution is no longer halted and the lambda call finishes as expected. Thank you all :)

jcampbell05 · 2023-03-13T17:19:40Z

Good afternoon, we are suddenly getting this error for 1.14 on Graviton2. I'm not sure if there has been a regression ?

skottmckay · 2023-03-14T04:55:46Z

@jcampbell05 There has been no change that I can see to print a warning instead of failing with an exception. What error exactly are you seeing?

onnxruntime/onnxruntime/core/common/cpuid_info.cc

Line 136 in 538d648

    
           LOGS_DEFAULT(WARNING) << "Failed to init pytorch cpuinfo library, may cause CPU EP performance degradation due to undetected CPU features.";

jcampbell05 · 2023-03-14T09:34:59Z

So I'm seeing the following from Python. rolling back to 1.11.1 fixes it for us.

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
--
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what():  /onnxruntime_src/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered.

skottmckay · 2023-03-15T04:48:41Z

There's no exception thrown in the latest code so the failure is most likely coming from somewhere else. Problem is there's no default logger so the real error isn't clear. The Environment needs to be created prior to calling into other ORT code as that provides the default logger. However it's weird that that hasn't happened if you're calling from python as we typically create the environment internally so that it's available when needed.

Can you share the python code using ORT up to where it breaks?

jcampbell05 · 2023-04-20T11:02:15Z

@skottmckay it took a while to track it down but it appears it's simply just this, since none of our other code has executed yet.

import onnxruntime

DoctorSlimm · 2023-04-27T05:07:35Z

TLDR: I don't see any Solution to this issue to using ONNX in AWS Lambda? Docker Image builds and runs fine locally on my M1 mac but in the cloud this happens:

Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what():  /onnxruntime_src/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered.

Pls Help.. Really need to run inference in AWS Lambda 🥲

tianleiwu · 2023-04-27T18:42:40Z

@DoctorSlimm, @jcampbell05

Could you try the following package (built with #15661) to see whether the issue is resolved? You can rename the .zip file to .whl file and install like the following:

mv ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.zip ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

pip uninstall onnxruntime

pip install ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.zip

DoctorSlimm · 2023-04-28T22:25:57Z

@tianleiwu

Still the same Error when I run it in the cloud, works totally fine when I run the function locally, but fails when I invoke it in AWS.

NOTE: I am building it locally on a M1 Mac and then pushing it to ECR registry

Local Build command, run in the same directory as the other files

docker build --platform linux/arm64 -t FUNCTION-NAME .

Here is my dockerfile:

FROM public.ecr.aws/lambda/python:3.9-arm64 AS model


# Install the runtime interface client
RUN python3.9 -m pip install --target . awslambdaric
RUN python3.9 -m pip install python-dotenv onnxruntime "transformers[torch]"

# https://github.com/microsoft/onnxruntime/issues/10038
ADD ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.zip ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.zip
RUN mv ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.zip ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
RUN python3.9 -m pip uninstall -y onnxruntime
RUN python3.9 -m pip install ort_nightly-1.15.0.dev20230427003-cp39-cp39-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

# Set Production Environment
ENV ENV=prod

# Copy files
COPY app.py ./

# Copy onnx directory
COPY onnx onnx


# Set Up Entrypoints
COPY ./entry_script.sh /entry_script.sh
ADD aws-lambda-rie-arm64 /usr/local/bin/aws-lambda-rie-arm64
ENTRYPOINT ["/entry_script.sh"]
CMD [ "app.handler" ]

app.py

import json
import traceback
from time import time
import numpy as np
from dotenv import load_dotenv
from onnxruntime import InferenceSession
from transformers import AutoTokenizer

load_dotenv()

# Worth Investigating
# https://blog.ml6.eu/the-art-of-pooling-embeddings-c56575114cf8
# https://github.com/UKPLab/sentence-transformers/issues/46#issuecomment-1152816277

tokenizer = AutoTokenizer.from_pretrained('onnx')
session = InferenceSession("onnx/model.onnx")


def lambda_handler(event, context):
    try:
        if 'ping' in event:
            print('Pinging')
            t0 = time()
            return {
                'total_time': time() - t0,
            }
        if 'modelInputs' in event:
            print('Inference\n')
            model_inputs = event['modelInputs']
            text = model_inputs['text']
            encoded_inputs = tokenizer(text, return_tensors="np")
            model_outputs = session.run(
                None, input_feed=dict(encoded_inputs)
            )  # (1, 1, 11, 768)

            token_embeddings = model_outputs[0]  # (1, 11, 768)
            special_token_ids = [
                tokenizer.cls_token_id,
                tokenizer.unk_token_id,
                tokenizer.sep_token_id,
                tokenizer.pad_token_id,
                tokenizer.mask_token_id,
            ]

            # Mask to exclude special tokens from pooling calculation
            mask = np.ones(token_embeddings.shape[:-1], dtype=bool)

            # Max Pooling Sentence Embedding
            for special_token_id in special_token_ids:
                mask &= encoded_inputs != special_token_id
            max_pooled_embeddings = np.max(token_embeddings * mask[..., np.newaxis], axis=1)
            max_pooled_embeddings = np.mean(max_pooled_embeddings, axis=0)

            # Mean Pooling Sentence Embedding
            for special_token_id in special_token_ids:
                mask &= encoded_inputs != special_token_id  # Exclude special tokens from mask
            mean_pooled_embeddings = np.sum(token_embeddings * mask[..., np.newaxis], axis=1)  # Apply mask and take sum over sequence dimension
            mean_pooled_embeddings = np.mean(mean_pooled_embeddings, axis=0)  # Take mean over batch dimension

            return {
                'statusCode': 200,
                'body': json.dumps(
                    {
                        'modelOutputs': {
                            # 'raw': model_outputs.tolist(),
                            'token_embeddings': token_embeddings.tolist(),
                            'max_pooled_embeddings': max_pooled_embeddings.tolist(),
                            'mean_pooled_embeddings': mean_pooled_embeddings.tolist(),
                        }
                    }
                )
            }

    except Exception as e:
        return {
            'error': str(traceback.format_exc()) + str(e)
        }

Response when run in AWS

{
  "errorType": "Runtime.ExitError",
  "errorMessage": "RequestId: dd954162-257e-448e-824e-0b78342f503a Error: Runtime exited with error: signal: aborted"
}

Log output

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what():  /onnxruntime_src/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered.
Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what():  /onnxruntime_src/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered.
START RequestId: dd954162-257e-448e-824e-0b78342f503a Version: $LATEST
RequestId: dd954162-257e-448e-824e-0b78342f503a Error: Runtime exited with error: signal: aborted
Runtime.ExitError
END RequestId: dd954162-257e-448e-824e-0b78342f503a
REPORT RequestId: dd954162-257e-448e-824e-0b78342f503a	Duration: 2625.36 ms	Billed Duration: 2626 ms	Memory Size: 128 MB	Max Memory Used: 39 MB

johnsonchau-bulb · 2023-06-26T11:54:15Z

@DoctorSlimm is there any update on your solution? I tried out your method of installing from nightly builds and it still leads to the same error:

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors

Note: I'm also using aws lambda with ARM architecture

DoctorSlimm · 2023-06-26T12:22:14Z

@DoctorSlimm is there any update on your solution? I tried out your method of installing from nightly builds and it still leads to the same error:
Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible

Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present

Error in cpuinfo: failed to parse both lists of possible and present processors
Note: I'm also using aws lambda with ARM architecture

Hello my dude! Using X86 (or whatever the OTHER architecture is, maybe it's called AMD64) Architecture (+ maybe a few other tweaks including increasing the memory of the function to at least a few GB) I think solved it!

Will be getting back into this stuff later this week so will likely have more concrete answers then but for now I'm pretty sure that

Using X86 and increasing Memory gets you 97% of the way there - Good Luck!

johnsonchau-bulb · 2023-06-26T13:04:12Z

@DoctorSlimm I see, I was experimenting with x86 architecture but the docker buildx took incredibly long. I'm also on a M1 mac, which I saw you are also on. Will keep trying this x86 method out! Thank you +++++

MengLinMaker · 2023-09-13T09:03:19Z

@johnsonchau-bulb It's likely that AWS lambda ARM does not populate CPU info into the "/sys" folder. So essentially onnxruntime is trying to read a nonexistent file and directory.

The following test confirms this:

# Script to test the existence of folders 
import os
print(os.listdir('/'))
print(os.listdir('/sys'))
print(os.listdir('/sys/devices'))

Result - "/sys" has no content:

['bin', 'boot', 'dev', 'etc', 'home', 'lib', 'media', 'mnt', 'opt', 'proc', 'root', 'run', 'sbin', 'srv', 'sys', 'tmp', 'usr', 'var']
[]
[ERROR] FileNotFoundError: [Errno 2] No such file or directory: '/sys/devices'

johnsonchau-bulb · 2023-09-13T13:07:10Z

@MengLinMaker thanks!

As a sidenote, I would not recommend deploying hugging face models in AWS lambda as it takes a long time to download models. Furthermore, even when using EFS to connect to lambda to cache the model, the read/write speeds are not fast enough to load LLMs in lambda quickly. Leaving this here to help anyone who wants to build an AI API microservice.

MengLinMaker · 2023-09-13T14:01:02Z

@chenfucn, Referencing your PR #10199:
I located the file reader code in pytorch/cpuinfo that may be causing the file read issues for AWS lambda ARM64.

My AWS lambda directory probing tests confirm that these files do not exist, so read attempts lead to error:

/sys/devices/system/cpu/possible
/sys/devices/system/cpu/present

I also agree that the fix should be made in pytorch/cpuinfo as this is a cleaner solution.
Looking at the coding, failing should return a null pointer.

Actually, it may be this logger in python/cpuinfo that's throwing the exception.

MengLinMaker · 2023-09-13T14:22:55Z

As a sidenote, I would not recommend deploying hugging face models in AWS lambda as it takes a long time to download models. Furthermore, even when using EFS to connect to lambda to cache the model, the read/write speeds are not fast enough to load LLMs in lambda quickly. Leaving this here to help anyone who wants to build an AI API microservice.

@johnsonchau-bulb Thanks, almost dove down that rabbit hole.

Currently trying to decrease an 1.8GB docker image to 1.1GB by replacing pytorch with onnxruntime. My model is around 150mb.
Lambda cold start times are horrible through, up to 20 secs. So I'm breaking the model into sections so I can cold start them at the same time.

MengLinMaker · 2023-09-17T15:07:39Z

@DoctorSlimm I see, I was experimenting with x86 architecture but the docker buildx took incredibly long. I'm also on a M1 mac, which I saw you are also on. Will keep trying this x86 method out! Thank you +++++

Can confirm that x86_64 is compatible with onnxruntime.

@johnsonchau-bulb, I found that creating a CICD pipeline with Github Actions is a nice solution for deploying x86_64 lambdas from Apple Silicon. I'm using Serverless Framework to deploy a dev app on commits.

For smaller onnx models, it is possible to deploy without docker by quantising the onnx model to reduce size and using serverless-python-requirements zip dependency option.

satyajit-bagchi · 2023-10-13T14:39:02Z

I also ran into this error today on lambda ARM64 architecture, with onnxruntime==1.16.1.

Is there a recommended version of onnxruntime which works on AWS Arm64 devices?

chenfucn · 2023-10-13T16:28:16Z

@chenfucn, Referencing your PR #10199: I located the file reader code in pytorch/cpuinfo that may be causing the file read issues for AWS lambda ARM64.

My AWS lambda directory probing tests confirm that these files do not exist, so read attempts lead to error:

/sys/devices/system/cpu/possible

/sys/devices/system/cpu/present

I also agree that the fix should be made in pytorch/cpuinfo as this is a cleaner solution. Looking at the coding, failing should return a null pointer.

Actually, it may be this logger in python/cpuinfo that's throwing the exception.

Thank you for the investigation!

I am not sure this is on cpuinfo's shoulders anymore. CPU feature detection is already a complex matter. CPUinfo library already has lots of code handling many different platforms. Why does AMW lambda missing two files that are present in most other Linux platforms? This would make cross platform programming unnecessarily complex. Why?

These two system files provide CPU information such as instruction set, number of big little cores, cache size, etc. This information is vital for onnxruntime to provide necessary performance on ARM64 systems. Without it, onnxruntime could run an order of magnitude slower. At that point I doubt onnxruntime is useful.

jcampbell05 · 2023-10-13T16:45:54Z

Thank you for the investigation!

I am not sure this is on cpuinfo's shoulders anymore. CPU feature detection is already a complex matter. CPUinfo library already has lots of code handling many different platforms. Why does AMW lambda missing two files that are present in most other Linux platforms? This would make cross platform programming unnecessarily complex. Why?

These two system files provide CPU information such as instruction set, number of big little cores, cache size, etc. This information is vital for onnxruntime to provide necessary performance on ARM64 systems. Without it, onnxruntime could run an order of magnitude slower. At that point I doubt onnxruntime is useful.

It's because ARM Lambda uses the custom Amazon Graviton Processor and in order to get the best support also run's Amazon Linux 2. I'm not sure if this is the exact reason they don't have /sys/devices/system/cpu/possible

But I have found that Amazon Linux and in particularly the one used for AWS Lambda is very heavily restricted and many things disabled on it.

For example you can't use multi processing from python because /dev/shm isn't supported by Amazon Linux on ARM Lambda which is almost always provided by any other OS

Regardless, the main issue is just that the lates onnxruntime crashes, even running very very slow would be an upgrade.

MengLinMaker · 2023-10-13T22:46:25Z

I also ran into this error today on lambda ARM64 architecture, with onnxruntime==1.16.1.

Is there a recommended version of onnxruntime which works on AWS Arm64 devices?

satyajit-bagchi, To summarise, ARM Lambda is missing a lot of features required by Onnxruntime that is standard across other Linux machines. As such, supporting ARM Lambda is unlikely and probably not worth the effort.

On a positive note, Onnxruntime does work on x86_64 lambda.
If you have an ARM machine, one deployment solution is to configure a deployment pipeline on an x86_64 machine - eg: GitHub Actions.

Hope this saves weeks of debugging.

After the 0.2.0 version, there are even more possible cases to consider. I found it too annoying to do all the testing manually. Add some tests to make it easy to test everything automatically. The test python3Packages.invisible-watermark.tests.withOnnx-rivaGan would fail in the nix sandbox on aarch64-linux: ``` Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /build/source/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered. /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 9: 5 Aborted (core dumped) invisible-watermark --verbose --action encode --type bytes --method 'rivaGan' --watermark 'asdf' --output output.png '/nix/store/srl698a32n9d2pmyf5zqfk65gjzq3mhp-source/test_vectors/original.jpg' Exit code of invisible-watermark was 134 while 0 was expected. ``` so I have disabled that test. I believe microsoft/onnxruntime#10038 describes the same issue.

jcampbell05 · 2023-11-02T15:34:03Z

So interestingly the Intel Lambda also doesn't mount /sys and as a result cpuinfo has to implement a bunch of workarounds to detect the features. It will still emit a warning in this case.

So looks like this will be fixed as soon as cpuinfo is also fixed fro arm64.

But the fact it emits an error sounds like it might be a case of it not realising it needs to use those workarounds

pytorch/cpuinfo#14

After the 0.2.0 version, there are even more possible cases to consider. I found it too annoying to do all the testing manually. Add some tests to make it easy to test everything automatically. The test python3Packages.invisible-watermark.tests.withOnnx-rivaGan would fail in the nix sandbox on aarch64-linux: ``` Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present Error in cpuinfo: failed to parse both lists of possible and present processors terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException' what(): /build/source/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered. /build/.attr-0l2nkwhif96f51f4amnlf414lhl4rv9vh8iffyp431v6s28gsr90: line 9: 5 Aborted (core dumped) invisible-watermark --verbose --action encode --type bytes --method 'rivaGan' --watermark 'asdf' --output output.png '/nix/store/srl698a32n9d2pmyf5zqfk65gjzq3mhp-source/test_vectors/original.jpg' Exit code of invisible-watermark was 134 while 0 was expected. ``` so I have disabled that test. I believe microsoft/onnxruntime#10038 describes the same issue.

dandiep · 2023-12-20T04:41:14Z

Would just like to add on that this is definitely a real issue for us... Migrating back to x86 introduces a whole separate set of problems that we're not really prepared to do right now.

ctippur · 2024-02-11T07:21:41Z

Yes, this is still a issue for us as well. I see this on lambda cloudwatch logs. I am running flask based api code using aws-lambda-adapter. I tried using x86_64 but the API wouldnt even get called, giving me error The requested URL was not found on the server. If you entered the URL manually please check your spelling and try again. Not sure if this is related.

Error in cpuinfo: failed to parse the list of possible processors in /sys/devices/system/cpu/possible
--
Error in cpuinfo: failed to parse the list of present processors in /sys/devices/system/cpu/present
Error in cpuinfo: failed to parse both lists of possible and present processors
terminate called after throwing an instance of 'onnxruntime::OnnxRuntimeException'
what():  /onnxruntime_src/include/onnxruntime/core/common/logging/logging.h:294 static const onnxruntime::logging::Logger& onnxruntime::logging::LoggingManager::DefaultLogger() Attempt to use DefaultLogger but none has been registered.

jcampbell05 · 2024-02-12T12:56:14Z

I was wondering if there wouldn't at least be a way of providing CPU info as a fallback. Since onnxruntime is looking at /sys folder for CPU info flags.

Wondering if workaround for us is to manually make these files in Lambda.

wbudd · 2024-02-23T23:31:12Z

Wondering if workaround for us is to manually make these files in Lambda.

@jcampbell05 Not exactly straightforward, but I feel like it should nonetheless be possible: zip up a tool like fakechroot as a Lambda layer, and generate the needed /sys paths inside some fake chroot destination within /tmp (being Lambda user-writable) such as /tmp/soveryfake (via a subprocess call from the main Lambda script): mkdir -p /tmp/soveryfake/sys/devices/system/cpu && echo "0-5" > /tmp/soveryfake/sys/devices/system/cpu/cpu_possible. Then, wrap the actual ORT-based inference subprocess within the fakechroot: /opt/bin/fakechroot --use-system-libs fakeroot chroot /tmp/soveryfake /opt/bin/my_ort_infer_prog.py.

Unfortunately, some quick and dirty testing suggests that the above method doesn't work as-is...
The reason (probably) being that /sys/* (and /proc/*) aren't ordinary files. fakechroot seems to show their wrapped process the real /sys/* file tree as-is rather than whatever is defined in the chroot origin (whereas emulating stuff inside /usr works just fine, for example).

Maybe tools like unshare or bubblewrap are better capable at faking /sys as a plain user, but I have my doubts about that.

(In my case, the better solution was to create a dedicated C++ inference mini-program statically compiled with the ONNX library parts needed. When done inside a multi-stage Dockerfile while building ONNX from source too, one can then grep and replace the /sys/* invocations with their own /tmp/* paths of choice, while also enabling a whole bunch of optimizations in terms of model size, binary/layer size, memory usage. All of this together greatly reduces the overall runtime of the Lambda.)

jcampbell05 · 2024-02-26T11:40:47Z

(In my case, the better solution was to create a dedicated C++ inference mini-program statically compiled with the ONNX library parts needed. When done inside a multi-stage Dockerfile while building ONNX from source too, one can then grep and replace the /sys/* invocations with their own /tmp/* paths of choice, while also enabling a whole bunch of optimizations in terms of model size, binary/layer size, memory usage. All of this together greatly reduces the overall runtime of the Lambda.)

Nice I didn't realise this was an option, If you have any pointer around specficially building this runtime for Lambda then I would very much appreciate links to where I can read about that

astahlman · 2024-03-21T23:38:23Z

I'm able to run inference on an arm64 Lambda by building without cpuinfo via the CMake flag onnxruntime_ENABLE_CPUINFO, i.e.,

python tools/ci_build/build.py \
    --build_dir build/Linux \
    --config RelWithDebInfo \
    --build_shared_lib \
    --parallel \
    --compile_no_warning_as_error \
    --skip_submodule_sync \
    --build_wheel \
    --allow_running_as_root \
    --cmake_extra_defines onnxruntime_ENABLE_CPUINFO=OFF

neo · 2024-04-10T22:05:27Z

I'm able to run inference on an arm64 Lambda by building without cpuinfo via the CMake flag onnxruntime_ENABLE_CPUINFO, i.e.,

python tools/ci_build/build.py \
    --build_dir build/Linux \
    --config RelWithDebInfo \
    --build_shared_lib \
    --parallel \
    --compile_no_warning_as_error \
    --skip_submodule_sync \
    --build_wheel \
    --allow_running_as_root \
    --cmake_extra_defines onnxruntime_ENABLE_CPUINFO=OFF

I was able to have it running with the above custom build – however, it seems very slow... is it because of the following @chenfucn mentioned?

I am not sure this is on cpuinfo's shoulders anymore. CPU feature detection is already a complex matter. CPUinfo library already has lots of code handling many different platforms. Why does AMW lambda missing two files that are present in most other Linux platforms? This would make cross platform programming unnecessarily complex. Why?

These two system files provide CPU information such as instruction set, number of big little cores, cache size, etc. This information is vital for onnxruntime to provide necessary performance on ARM64 systems. Without it, onnxruntime could run an order of magnitude slower. At that point I doubt onnxruntime is useful.

and I do wonder if we can do detection without cpuinfo 🤔

Thanks for the quick response to this issue. I'm happy to test out the implementation when there is a release candidate, but I've already deployed the model on x86 hardware and want as little downtime as possible.

Will PR #10199 fix what @chenfucn brought up in the below comment?

Currently we are using cpuinfo lib to detect hybrid cores and SDOT UDOT instruction support. Ignoring cpuinfo failure means we lose these functionalities and will cause performance degradation. Especially with DOT instructions, the matrix multiplication can be multiple times slower if we don't use DOT instructions and fall back to neon cores.

Or does pytorch/cpuinfo#76 need to be resolved to fix that problem?

MengLinMaker · 2024-04-16T22:52:44Z

and I do wonder if we can do detection without cpuinfo 🤔

@neo AWS Arm64 does not provide the files required for "detection":

As suggested by the maintainers, Arm64 Lambda deviates from the norm by not providing these typical linux files:

/sys/devices/system/cpu/possible
/sys/devices/system/cpu/present

So the conclusion is that this issue should be ideally fixed by AWS instead. The issue is outside the scope of ONNXruntime and pytorch/cpuinfo.

A possible work around could be replacing any references to /sys/devices in the code and add your own files (No idea how that would work). If time is priority, then getting ONNXruntime working on Arm64 Lambda is probably a waste of time.

ytaous added the ep:arm-cpu label Dec 14, 2021

jcreinhold mentioned this issue Jan 5, 2022

cpuinfo causes crash on AWS Lambda with arm64 processor pytorch/cpuinfo#76

Open

chenfucn mentioned this issue Jan 5, 2022

Tolerate cpuinfo init failure #10199

Merged

stale bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 16, 2022

glefundes closed this as completed May 25, 2022

stale bot removed the stale issues that have not been addressed in a while; categorized by a bot label May 25, 2022

tianleiwu mentioned this issue Apr 24, 2023

onnxRuntimeException and DefaultLogger issues in AWS Lambda runtime #15650

Open

tianleiwu reopened this Apr 24, 2023

tianleiwu mentioned this issue Apr 24, 2023

Fix CPUInfo logging #15661

Closed

Luflosi mentioned this issue Oct 27, 2023

python3Packages.invisible-watermark: 0.1.5 -> 0.2.0 NixOS/nixpkgs#260784

Merged

12 tasks

adieuadieu mentioned this issue Apr 29, 2024

bug/Execution speed is very slow in AWS LAMBDA environment Unstructured-IO/unstructured#2916

Open

hacker1024 mentioned this issue May 16, 2024

Build failure: python3Packages.chromadb NixOS/nixpkgs#312068

Open

Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash #10038

Importing onnxruntime on AWS Lambdas with ARM64 processor causes crash #10038

Comments

glefundes commented Dec 14, 2021 • edited

jcreinhold commented Jan 4, 2022

skottmckay commented Jan 5, 2022

chenfucn commented Jan 5, 2022

chenfucn commented Jan 5, 2022

jcreinhold commented Jan 5, 2022

chenfucn commented Jan 5, 2022

workdd commented Jan 14, 2022

chenfucn commented Jan 19, 2022

workdd commented Feb 7, 2022

skottmckay commented Feb 7, 2022

workdd commented Feb 8, 2022

jcreinhold commented Feb 11, 2022

yufenglee commented Feb 11, 2022

stale bot commented Apr 16, 2022

glefundes commented May 25, 2022

jcampbell05 commented Mar 13, 2023

skottmckay commented Mar 14, 2023

jcampbell05 commented Mar 14, 2023

skottmckay commented Mar 15, 2023

jcampbell05 commented Apr 20, 2023

DoctorSlimm commented Apr 27, 2023

tianleiwu commented Apr 27, 2023

DoctorSlimm commented Apr 28, 2023 • edited

johnsonchau-bulb commented Jun 26, 2023 • edited

DoctorSlimm commented Jun 26, 2023 • edited

johnsonchau-bulb commented Jun 26, 2023

MengLinMaker commented Sep 13, 2023 • edited

johnsonchau-bulb commented Sep 13, 2023

MengLinMaker commented Sep 13, 2023 • edited

MengLinMaker commented Sep 13, 2023

MengLinMaker commented Sep 17, 2023 • edited

satyajit-bagchi commented Oct 13, 2023

chenfucn commented Oct 13, 2023 • edited

jcampbell05 commented Oct 13, 2023 • edited

MengLinMaker commented Oct 13, 2023

jcampbell05 commented Nov 2, 2023 • edited

dandiep commented Dec 20, 2023

ctippur commented Feb 11, 2024

jcampbell05 commented Feb 12, 2024 • edited

wbudd commented Feb 23, 2024

jcampbell05 commented Feb 26, 2024

astahlman commented Mar 21, 2024

neo commented Apr 10, 2024 • edited

MengLinMaker commented Apr 16, 2024

glefundes commented Dec 14, 2021 •

edited

DoctorSlimm commented Apr 28, 2023 •

edited

johnsonchau-bulb commented Jun 26, 2023 •

edited

DoctorSlimm commented Jun 26, 2023 •

edited

MengLinMaker commented Sep 13, 2023 •

edited

MengLinMaker commented Sep 13, 2023 •

edited

MengLinMaker commented Sep 17, 2023 •

edited

chenfucn commented Oct 13, 2023 •

edited

jcampbell05 commented Oct 13, 2023 •

edited

jcampbell05 commented Nov 2, 2023 •

edited

jcampbell05 commented Feb 12, 2024 •

edited

neo commented Apr 10, 2024 •

edited