Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error cpuinfo with pytorch on aws lambda #14

Closed
obendidi opened this issue Dec 12, 2018 · 12 comments
Closed

error cpuinfo with pytorch on aws lambda #14

obendidi opened this issue Dec 12, 2018 · 12 comments

Comments

@obendidi
Copy link

obendidi commented Dec 12, 2018

Hi I already opened an issue on the pytroch repo, but i think it is more appropriate to create one here instead, so some contexte:

I'm trying to use maskrcnn_benchmark in aws lambda env, I package pytroch on docker like so :

RUN git clone --recursive https://github.com/pytorch/pytorch.git --branch=v1.0.0

ENV NO_CUDA=1
ENV NO_TEST=1
ENV NO_NNPACK=1
ENV NO_QNNPACK=1
ENV NO_MKLDNN=1
ENV NO_DISTRIBUTED=1
ENV NO_CUDNN=1
ENV NO_FBGEMM=1
ENV NO_MIOPEN=1
ENV NO_CAFFE2_OPS=1
ENV BUILD_BINARY=0

RUN cd pytorch && python setup.py install

even through just installig it directly with pip install torch works the same way and gives the same error (originally I thought that it's maybe NNPACK or MKLDNN that tries to use multiple cpu's and generate the error)

the torch package is build successfully and I'm able to upload it on aws lambda, but I get the following errors when I try to run inference :
Error in cpuinfo: failed to parse the list of possible procesors in /sys/devices/system/cpu/possible
Error in cpuinfo: failed to parse the list of present procesors in /sys/devices/system/cpu/present

Thank you in advance, and if you have any additional questions I'll be happy to oblige !
Bendidi

@Maratyszcza
Copy link
Collaborator

Hi @Bendidi, could you post the contents of /sys/devices/system/cpu/present and /sys/devices/system/cpu/possible files on this system?

@obendidi
Copy link
Author

obendidi commented Dec 14, 2018

No such file or directory: '/sys/devices/system/cpu/' , I'm running this on amazon lambda function , so I guess it's normal to not have access to the cpuinfo ( kind of the principle of lambda functions)

And I don't think I have access to know if sysfs is mounted or not in the system

The weird thing is that it was working in pytorch 0.4.1 (cf reply pytorch/pytorch#14968 (comment))

@Maratyszcza
Copy link
Collaborator

@Bendidi Thank you, this is useful info. Not being able to parse /sys/devices/system/cpu/present and /sys/devices/system/cpu/possible should be a non-fatal error in cpuinfo, but probably other things get messed up. Could you build cpuinfo (mkdir build && cd build && cmake .. && make) and run cpu-info utility on AWS lambda? It would also help to have the content of /proc/cpuinfo file on a lambda instance

@soumith soumith changed the title error cpuinfo with pytroch on aws lambda error cpuinfo with pytorch on aws lambda Dec 15, 2018
@obendidi
Copy link
Author

Hi @Maratyszcza there is another issue in here about the same problem pytorch/pytorch#15213, I will try to bundle cpuinfo and run it on lambda (might be a bit hard since the architecture of aws lambda was not made for this kind of deployements)

@Maratyszcza
Copy link
Collaborator

@Bendidi I expect that this issue is fixed with a61747a, but would need CPUID dump to create a test case and make sure we don't regress it in the future. Could you run ./cpuid-dump from cpuinfo build on a AWS Lambda instance?

@obendidi
Copy link
Author

Dump of /proc/cpuinfo on aws lambda instance:

processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model	: 63
model name	: Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
stepping	: 2
microcode	: 0x3d
cpu MHz	: 2900.040
cache size	: 25600 KB
physical id	: 0
siblings	: 2
core id	: 0
cpu cores	: 1
apicid	: 0
initial apicid	: 0
fpu	: yes
fpu_exception	: yes
cpuid level	: 13
wp	: yes
flags	: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs	: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5800.13
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model	: 63
model name	: Intel(R) Xeon(R) CPU E5-2666 v3 @ 2.90GHz
stepping	: 2
microcode	: 0x3d
cpu MHz	: 2900.040
cache size	: 25600 KB
physical id	: 0
siblings	: 2
core id	: 0
cpu cores	: 1
apicid	: 1
initial apicid	: 1
fpu	: yes
fpu_exception	: yes
cpuid level	: 13
wp	: yes
flags	: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology cpuid pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm cpuid_fault invpcid_single pti fsgsbase bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs	: cpu_meltdown spectre_v1 spectre_v2 spec_store_bypass l1tf
bogomips	: 5800.13
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

for cpuid-dump still looking for ways to build cpuinfo in lambda , should probably get it soon (i hope)

@parthi2929
Copy link

@Maratyszcza In which build we could expect the fix if its in progress?

@Maratyszcza
Copy link
Collaborator

@parthi2929 The issue was fixed in PyTorch nightly about a month ago, however, cpuinfo still reports error when sysfs doesn't have the expected files. pytorch/pytorch#16107 downgrades error in this case to a warning.

@jcampbell05
Copy link

This appears to have re-appeared fro ARM64 Lambda

@Tickets14
Copy link

wngrades error in this case to

I'm also experiencing this issue.

@malfet
Copy link
Contributor

malfet commented Nov 7, 2023

Please open a new issue and post full error there

@harrisMLEng
Copy link

Similar issue. But does it mean that I have to install a arm compatible pytorch version ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants