[BUG] [Jetson Nano Conda install hangs on installing pip dependencies] #304

emeldar · 2021-01-11T07:24:23Z

Describe the bug
When creating the conda environment on a Jetson Nano Development kit, the installation proceeds until installing pip dependencies, where it hangs indefinitely.

Steps/Code to reproduce bug
Fresh Jetpack install on Jetson Nano board.
Follow instructions for building from source on Jetson Nano exactly.

Expected behavior
Expected to install the environment.

Environment details (please complete the following information):

Environment location: Jetson Nano board with Jetpack SDK
Method of cuSignal install: conda (specifically miniforge)

I've never used conda before, so I don't know exactly what logs are needed, but this is the last output from the install before it hangs:
Installing pip dependencies: ...working...

The text was updated successfully, but these errors were encountered:

emeldar · 2021-01-11T07:47:33Z

Upon trying to install the pip packages manually, I found that all of them are installed except for cupy>=8.0.0. When trying to install it manually using the pip binary from the environment, it hangs indefinitely while building the wheel for cupy. This might be the source of the issue, but I'm unsure what to do to build cupy for the aarch64 processor. Here is the line it hangs on:
Building wheels for collected packages: cupy
Building wheel for cupy (setup.py) ...

znmeb · 2021-01-11T08:06:10Z

I've got scripts that install cupy and cusignal successfully on both a Nano and an AGX-Xavier. cupy takes a long time to install - it is compiling many kernels for the GPU using cicc.

On the AGX-Xavier it takes almost 47 minutes to install cupy!

Successfully installed cupy-8.3.0 fastrlock-0.5
2736.81user 43.31system 46:44.74elapsed 99%CPU (0avgtext+0avgdata 2640716maxresident)k
98096inputs+5670632outputs (22major+11542275minor)pagefaults 0swaps

It is probably working - open another terminal on your Nano and do top - you should see the cicc compiles running. They're single-threaded; if there's some way to run four of them concurrently it would cut the install time down.

awthomp · 2021-01-11T13:26:30Z

Hi @eldaromer -- thanks for submitting an issue to cuSignal, and thanks for the quick input, @znmeb! I'd like to echo Ed's comments and say that cupy takes a very long time to compile on the Jetson platform, particularly the Nano. I'd recommend retrying the cupy pip install before you go to bed and report back the status. I'm happy to work with the cupy developers to get this working if we uncover some Jetson/aarch64 specific issue!

leofang · 2021-01-11T15:13:58Z

Hi all, a cupy guru here. Could you please set this env var

export CUPY_NVCC_GENERATE_CODE="arch=compute_XX,code=sm_XX"

with XX being your device's compute capability, and then install cupy. I hope this would make the compilation a lot faster.

awthomp · 2021-01-11T15:15:17Z

Hi all, a cupy guru here. Could you please set this env var
export CUPY_NVCC_GENERATE_CODE="arch=compute_XX,code=sm_XX"
with XX being your device's compute capability, and then install cupy. I hope this would make the compilation a lot faster.

Thanks for the info, Leo! I'll update our documentation to reflect this suggestion too.

leofang · 2021-01-11T20:17:59Z

@znmeb @eldaromer Let us know if it helps reduce the compilation time.

emeldar · 2021-01-11T21:28:53Z

@leofang I have set the environment variable as instructed before with the compute capability set to 53 for the Jetson Nano. I am currently running the cupy install again, and I'm timing how long it takes. I will keep you updated.

Creating a pre built wheel for the Nano as mentioned in a new issue above would be of great utility.

emeldar · 2021-01-11T22:03:16Z

Ok, installing of the Nano with the environment variable took ~30 minutes to complete. Maybe that should be added to the build instructions so users know what to expect. Thank you all for the help.

znmeb · 2021-01-12T00:18:15Z

My current setup compiles for 53 (TX1 and Nano), 62 (TX2) and 72 (AGX Xavier and I assume also Xavier NX). https://developer.nvidia.com/cuda-gpus. That dropped the compile time on the AGX Xavier to 30 minutes from 47. After that, cusignal only takes about two minutes.

Given how useful cupy is I think a pre-built wheel on conda-forge is a great idea. I'd like to see more of RAPIDS.AI migrated to conda-forge, even though a lot of it will only run on Volta or later.

leofang · 2021-01-12T02:25:43Z

Thank you @znmeb @eldaromer for quick feedbacks. Indeed I've been wanting to build CuPy for ARM (which I assume is for Jetson devices?) on conda-forge. However it's currently blocked by a few needed infrastructure changes, for example this one. Perhaps you could open an issue on CuPy's issue tracker to let them know your need and evaluate if PFN has the resource and bandwidth to support pip wheels for ARM? (I am not from PFN so I can't speak for them on this.)

cc: @jakirkham Looks like we have at least two serious Jetson users in need of CuPy on ARM 🙂

jakirkham · 2021-01-12T02:34:30Z

Well the first step would be packaging cudatoolkit. There's some initial work in PR ( conda-forge/cudatoolkit-feedstock#4 ) if someone would like to take a crack at it 😉

znmeb · 2021-01-12T03:09:58Z

Well the first step would be packaging cudatoolkit. There's some initial work in PR ( conda-forge/cudatoolkit-feedstock#4 ) if someone would like to take a crack at it 😉

I'm trying to push out a release but I can test on a 4 GB Nano and a 16 GB AGX Xavier in my spare time (cringes as my 3090 feels unloved) :-)

arrow-cpp and pyarrow-cuda are on my conda-forge wishlist too, BTW. And POCL.

leofang · 2021-01-12T03:17:59Z

This is perhaps something I can learn from you guys 🙂 I always imagine I can buy a Jetson device and make it sit and run on my desk, like a Raspberry Pi (which I don't have either). Is it the case? What's the best/cheapest/fastest way to set up a Jetson environment? What's the use case(s) for running cuSingal on Jetsons?

znmeb · 2021-01-12T03:26:01Z

This is perhaps something I can learn from you guys 🙂 I always imagine I can buy a Jetson device and make it sit and run on my desk, like a Raspberry Pi (which I don't have either). Is it the case? What's the best/cheapest/fastest way to set up a Jetson environment? What's the use case(s) for running cuSingal on Jetsons?

For now, plunk down the $700 for an AGX Xavier, or the $400 for a Xavier NX. The Nano only has 4 GB of RAM, which I find more of a constraint than the cores or the Maxwell GPU.

My use case is digital audio, but IIRC the original motivation was software-defined radio.

awthomp · 2021-01-12T03:30:21Z

This is perhaps something I can learn from you guys 🙂 I always imagine I can buy a Jetson device and make it sit and run on my desk, like a Raspberry Pi (which I don't have either). Is it the case? What's the best/cheapest/fastest way to set up a Jetson environment? What's the use case(s) for running cuSingal on Jetsons?

For now, plunk down the $700 for an AGX Xavier, or the $400 for a Xavier NX. The Nano only has 4 GB of RAM, which I find more of a constraint than the cores or the Maxwell GPU.

My use case is digital audio, but IIRC the original motivation was software-defined radio.

Yes! @leofang, @znmeb is correct on SDR being the first Jetson use case. We have folks plugging in a ~20 dollar rtlsdr and doing GPU based FM demod, signal and modulation recognition, resampling and display, etc.

As for "how to get started" - you basically install JetPack on the Jetson and you're plopped into an Ubuntu environment.

leofang · 2021-01-18T06:31:40Z

Thanks for interesting answers, @awthomp @znmeb! $400 is very attracting -- now I don't know if I should get a PS5 or a Xavier first 😂 SDR seems to be a cool thing I never heard of, and I'm glad I asked!

Back to the slow compilation issue, @znmeb @eldaromer it occurs to me that I didn't think too hard on the CPU performance difference. On a normal x86-64 system we always see thrust and cub being the two slowest components to build, but perhaps on Jetson compiling other modules could also take non-negligible time.

If you have time, could you please try

Build with the verbose flag -v, ex: pip install -v cupy, and eyeballing which modules are the slowest to build (sorry I don't have a better recommendation here)
Try cranking up the env var CUPY_NUM_BUILD_JOBS to a higher number and see if it improves. Its default is 4 IIRC.

Let me know if it helps (or not)!

znmeb · 2021-01-18T10:19:53Z

@leofang OK - I'm adding pyarrow with CUDA to the script. When I get that done I'll time the cupy on both the AGX-Xavier and the Nano. It looked to me like it was only using one core.

znmeb · 2021-01-19T22:04:19Z

OK ... here we go!

nano-cupy.log
agx-xavier-cupy.log

I ran both with CUPY_NUM_BUILD_JOBS equal nproc, so 4 on the Nano and 8 on the AGX-Xavier. For both,

export CUPY_NVCC_GENERATE_CODE="arch=compute_53,code=sm_53;arch=compute_62,code=sm_62;arch=compute_72,code=sm_72"

The bottom line(s):

Nano: used 1.96 cores out of 4 on average (196%CPU)

3865.11user 68.66system 33:25.19elapsed 196%CPU (0avgtext+0avgdata 1973300maxresident)k
652688inputs+3901608outputs (1958major+9014625minor)pagefaults 0swaps

AGX-Xavier: used 2.30 cores out of 8 on average

2056.31user 38.91system 15:10.60elapsed 230%CPU (0avgtext+0avgdata 1973048maxresident)k
16inputs+3901576outputs (0major+8841728minor)pagefaults 0swaps

emeldar added ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 11, 2021

github-actions bot added this to Needs prioritizing in Bug Squashing Jan 11, 2021

awthomp self-assigned this Jan 11, 2021

awthomp mentioned this issue Jan 11, 2021

[DOC] Update README to optimize CuPy build time on Jetson #305

Merged

awthomp added 2 - In Progress Currenty a work in progress doc Documentation and removed ? - Needs Triage Need team to review and classify bug Something isn't working labels Jan 11, 2021

ajschmidt8 closed this as completed in #305 Jan 11, 2021

Bug Squashing automation moved this from Needs prioritizing to Closed Jan 11, 2021

This was referenced Jan 12, 2021

[Feature Request] Build cupy wheels for aarch64 cupy/cupy#4541

Closed

Build CuPy for Arm64? conda-forge/cupy-feedstock#48

Closed

awthomp mentioned this issue Jan 12, 2021

[DOC] Specify CuPy install time on Jetson Platform #306

Merged

kmaehashi mentioned this issue Feb 19, 2021

pip install problem: Could not find a version that satisfies the requirement cupy-cuda102 cupy/cupy#4663

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] [Jetson Nano Conda install hangs on installing pip dependencies] #304

[BUG] [Jetson Nano Conda install hangs on installing pip dependencies] #304

emeldar commented Jan 11, 2021

emeldar commented Jan 11, 2021

znmeb commented Jan 11, 2021

awthomp commented Jan 11, 2021

leofang commented Jan 11, 2021

awthomp commented Jan 11, 2021

leofang commented Jan 11, 2021

emeldar commented Jan 11, 2021

emeldar commented Jan 11, 2021

znmeb commented Jan 12, 2021

leofang commented Jan 12, 2021

jakirkham commented Jan 12, 2021

znmeb commented Jan 12, 2021

leofang commented Jan 12, 2021

znmeb commented Jan 12, 2021

awthomp commented Jan 12, 2021

leofang commented Jan 18, 2021

znmeb commented Jan 18, 2021

znmeb commented Jan 19, 2021

[BUG] [Jetson Nano Conda install hangs on installing pip dependencies] #304

[BUG] [Jetson Nano Conda install hangs on installing pip dependencies] #304

Comments

emeldar commented Jan 11, 2021

emeldar commented Jan 11, 2021

znmeb commented Jan 11, 2021

awthomp commented Jan 11, 2021

leofang commented Jan 11, 2021

awthomp commented Jan 11, 2021

leofang commented Jan 11, 2021

emeldar commented Jan 11, 2021

emeldar commented Jan 11, 2021

znmeb commented Jan 12, 2021

leofang commented Jan 12, 2021

jakirkham commented Jan 12, 2021

znmeb commented Jan 12, 2021

leofang commented Jan 12, 2021

znmeb commented Jan 12, 2021

awthomp commented Jan 12, 2021

leofang commented Jan 18, 2021

znmeb commented Jan 18, 2021

znmeb commented Jan 19, 2021