Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Conda Packages #65

Open
TomAugspurger opened this issue Feb 18, 2019 · 20 comments

Comments

@TomAugspurger
Copy link
Contributor

commented Feb 18, 2019

It'd be nice to have conda packages for UCX and uxc-py.

I have a start at
https://github.com/TomAugspurger/ucx-py/tree/conda-recipes in https://github.com/TomAugspurger/ucx-py/blob/conda-recipes/conda-recipes/ucx/meta.yaml.

You should be able to conda install -c tomaugspurger ucx into a new environment (conda create -n ucx-dev python=3.7) and then activate. Then things like ucx_perftest will be on your PATH.

A few questions

  1. Where do we want these recipes to live? I think we should eventually submit them to conda-forge, but I've found it's helpful for projects to also maintain their own, especially for relatively complex projects. I'd be happy if these ended up in UCX, and I'm happy to help maintain them going forward.
  2. CUDA/GPU support: I've never built a conda package that uses CUDA. My first attempt at including CUDA failed with
checking for cuPointerGetAttribute in -lcuda... no
configure: error: CUDA support is requested but cuda packages can't found
Traceback (most recent call last):
  File "/home/nfs/an.taugspurger/miniconda3/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 456, in main
    execute(sys.argv[1:])
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 447, in execute
    verify=args.verify, variants=args.variants)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/api.py", line 208, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 2314, in build_tree
    notest=notest,
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 1477, in build
    cwd=src_dir, stats=build_stats)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/utils.py", line 374, in check_call_env
    return _func_defaulting_env_to_os_environ('call', *popenargs, **kwargs)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/utils.py", line 354, in _func_defaulting_env_to_os_envi$on
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-e', '/home/nfs/an.taugspurger/miniconda3/conda-bld/ucx_1550491030832/work/conda_bui$d.sh']' returned non-zero exit status 1.

I'm guessing others from Anaconda or NVIDIA will be able to help here. Does UCX require just CUDA runtime, or does it need the driver API?

  1. Ensure we're handling the networking libraries properly: I'm not sure how things like libibverbs are supposed to be handled from conda's side of things.
@Akshay-Venkatesh

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

@TomAugspurger I'll give this a shot now.

It'd be nice to have conda packages for UCX and uxc-py.

I have a start at
https://github.com/TomAugspurger/ucx-py/tree/conda-recipes in https://github.com/TomAugspurger/ucx-py/blob/conda-recipes/conda-recipes/ucx/meta.yaml.

You should be able to conda install -c tomaugspurger ucx into a new environment (conda create -n ucx-dev python=3.7) and then activate. Then things like ucx_perftest will be on your PATH.

A few questions

  1. Where do we want these recipes to live? I think we should eventually submit them to conda-forge, but I've found it's helpful for projects to also maintain their own, especially for relatively complex projects. I'd be happy if these ended up in UCX, and I'm happy to help maintain them going forward.
  2. CUDA/GPU support: I've never built a conda package that uses CUDA. My first attempt at including CUDA failed with
checking for cuPointerGetAttribute in -lcuda... no
configure: error: CUDA support is requested but cuda packages can't found
Traceback (most recent call last):
  File "/home/nfs/an.taugspurger/miniconda3/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 456, in main
    execute(sys.argv[1:])
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 447, in execute
    verify=args.verify, variants=args.variants)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/api.py", line 208, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 2314, in build_tree
    notest=notest,
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 1477, in build
    cwd=src_dir, stats=build_stats)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/utils.py", line 374, in check_call_env
    return _func_defaulting_env_to_os_environ('call', *popenargs, **kwargs)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/utils.py", line 354, in _func_defaulting_env_to_os_envi$on
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-e', '/home/nfs/an.taugspurger/miniconda3/conda-bld/ucx_1550491030832/work/conda_bui$d.sh']' returned non-zero exit status 1.

I'm guessing others from Anaconda or NVIDIA will be able to help here. Does UCX require just CUDA runtime, or does it need the driver API?

Is libcuda present in LD_LIBRARY_PATH? I don't know how conda environment changes this. I see that you're using the default cuda install path here:
https://github.com/TomAugspurger/ucx-py/commit/9da4a9cf9f396df0ea445d3903148ed89c248523#diff-e129b271fb0c71a01158dd83bef626f0R13
^ This is valid on the machine where you're installing through conda right?

  1. Ensure we're handling the networking libraries properly: I'm not sure how things like libibverbs are supposed to be handled from conda's side of things.
@Akshay-Venkatesh

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

It'd be nice to have conda packages for UCX and uxc-py.

I have a start at
https://github.com/TomAugspurger/ucx-py/tree/conda-recipes in https://github.com/TomAugspurger/ucx-py/blob/conda-recipes/conda-recipes/ucx/meta.yaml.

You should be able to conda install -c tomaugspurger ucx into a new environment (conda create -n ucx-dev python=3.7) and then activate. Then things like ucx_perftest will be on your PATH.

A few questions

  1. Where do we want these recipes to live? I think we should eventually submit them to conda-forge, but I've found it's helpful for projects to also maintain their own, especially for relatively complex projects. I'd be happy if these ended up in UCX, and I'm happy to help maintain them going forward.

That'll be great. Can we add this as part of the ongoing PR into UCX master?

  1. CUDA/GPU support: I've never built a conda package that uses CUDA. My first attempt at including CUDA failed with
checking for cuPointerGetAttribute in -lcuda... no
configure: error: CUDA support is requested but cuda packages can't found
Traceback (most recent call last):
  File "/home/nfs/an.taugspurger/miniconda3/bin/conda-build", line 11, in <module>
    sys.exit(main())
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 456, in main
    execute(sys.argv[1:])
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/cli/main_build.py", line 447, in execute
    verify=args.verify, variants=args.variants)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/api.py", line 208, in build
    notest=notest, need_source_download=need_source_download, variants=variants)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 2314, in build_tree
    notest=notest,
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/build.py", line 1477, in build
    cwd=src_dir, stats=build_stats)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/utils.py", line 374, in check_call_env
    return _func_defaulting_env_to_os_environ('call', *popenargs, **kwargs)
  File "/home/nfs/an.taugspurger/miniconda3/lib/python3.6/site-packages/conda_build/utils.py", line 354, in _func_defaulting_env_to_os_envi$on
    raise subprocess.CalledProcessError(proc.returncode, _args)
subprocess.CalledProcessError: Command '['/bin/bash', '-e', '/home/nfs/an.taugspurger/miniconda3/conda-bld/ucx_1550491030832/work/conda_bui$d.sh']' returned non-zero exit status 1.

I'm guessing others from Anaconda or NVIDIA will be able to help here. Does UCX require just CUDA runtime, or does it need the driver API?

  1. Ensure we're handling the networking libraries properly: I'm not sure how things like libibverbs are supposed to be handled from conda's side of things.

Not sure what handling network libraries mean here. If it's of any help, libibverbs from standard OFED or from Mellanox OFED (MOFED) can be installed without an actual NIC being present. I suppose conda would check if they exist and install if not. The code should be handling API requests in the presence or absence of the NICs. Is this useful at all?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Feb 18, 2019

@TomAugspurger I'll give this a shot now.

Thanks, are you trying the pre-built packages I uploaded, or building a new package? If you build a package off my branch it'd be

$ git checkout conda-recipes-gpu
$ conda install conda conda-build -y  # update conda and conda-build
$ conda debug conda-recipes/ucx

conda-build is what builds the actual packages (Handles all the dependencies, environment setup. It's a nifty tool). Typical use is conda build /path/to/recipe. To debug while making a recipe, conda debug /path/to/recipe is useful. It'll print out a command to get you into the build environment, and then you can start building with ./conda_build.sh, which will run our build.sh.

Is libcuda present in LD_LIBRARY_PATH?

Before running conda_build.sh, my path is LD_LIBRARY_PATH is

/home/nfs/an.taugspurger/ucx-dev/lib:/home/nfs/an.taugspurger/ucx-dev/ucx/install/lib:/usr/local/cuda/lib64:

That'll be great. Can we add this as part of the ongoing PR into UCX master?

Yep, I think that makes sense if the other UCX maintainers are interested.

Not sure what handling network libraries mean here.

Probably just me not understanding these things :) I think some kind of "check if they exist, fail if not" is the best we can do. I don't really know how this gets expressed in a conda recipe, so that when a user does conda install ucx, it'll fail at the right time. I'll ask around, I'm sure we've dealt with something like this before.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Feb 18, 2019

How do you do an "optional dependency" in C code? I think if we have two packages ucx and uxc-gpu (and I guess ucx-py and ucx-py-gpu), the we'll need to ensure we can compile ucx-py without the cuda files. So lines in buffer_ops.c like

 #include <cuda.h>
 #include <cuda_runtime.h>

will need to be conditional.

Then, in the Cython wrapper in ucp_py_buffer_helper, if a user calls alloc_cuda (or something that calls it like recv_cuda) we'd raise a RuntimeError telling them to use install the right package.

@Akshay-Venkatesh

This comment has been minimized.

Copy link
Contributor

commented Feb 18, 2019

This is usually through configure (which dictates compiler parameters to Makefile) + ifdef macros. Usually in an option like --with-cuda=/usr/local/cuda is passed then configure sets a compiler option -D_WITH_CUDA, and inside the the C code the two headers would wrapped around macros like this:

#ifdef _WITH_CUDA
 #include <cuda.h>
 #include <cuda_runtime.h>
#endif 

Does this help?

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Mar 1, 2019

I think I mostly have the "conditional" compilation stuff done on my branch: https://github.com/TomAugspurger/ucx-py/tree/conda-recipes

I've built CPU-only versions of UCX and ucx-py

Those do not seem to have picked up the necessary files for infiniband

(ucx-dev-test) an.taugspurger@dgx05:/tmp$ ucx_info -d | grep 'Memory domain'
# Memory domain: tcp
# Memory domain: posix
# Memory domain: sysv
# Memory domain: self

Haven't had luck with cuda / gpu packages yet. Looking at how rapids does it.

(base) an.taugspurger@dgx05:/tmp/ucx-dev-env$ ./ucx/install/bin/ucx_info -d | grep 'Memory domain'
# Memory domain: tcp
# Memory domain: ib/mlx5_3
# Memory domain: ib/mlx5_2
# Memory domain: ib/mlx5_1
# Memory domain: ib/mlx5_0
# Memory domain: rdmacm
# Memory domain: cuda_cpy
# Memory domain: cuda_ipc
# Memory domain: posix
# Memory domain: sysv
# Memory domain: self
# Memory domain: knem
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Mar 5, 2019

For reference, having trouble with GPU packages and ucx 1.5

Making install in src/ucm
make[1]: Entering directory '/home/nfs/an.taugspurger/miniconda3/conda-bld/debug_1551823583841/work/src/ucm'
  CC       event/libucm_la-event.lo
  CC       malloc/libucm_la-malloc_hook.lo
  CC       mmap/libucm_la-install.lo
  CC       util/libucm_la-replace.lo
  CC       util/libucm_la-log.lo
  CC       util/libucm_la-reloc.lo
  CC       util/libucm_la-sys.lo
  CC       bistro/libucm_la-bistro.lo
  CC       bistro/libucm_la-bistro_x86_64.lo
  CC       bistro/libucm_la-bistro_aarch64.lo
  CC       bistro/libucm_la-bistro_ppc64.lo
  CC       cuda/libucm_la-install.lo
  CC       ptmalloc286/libucm_la-malloc.lo
  CCLD     libucm.la
/home/nfs/an.taugspurger/miniconda3/conda-bld/debug_1551823583841/_build_env/bin/../lib/gcc/x86_64-conda_cos6-linux-gnu/7.3.0/../../../../x86_64-conda_cos6-linux-gnu/bin/ld: cannot find -lcuda
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:640: libucm.la] Error 1
make[1]: Leaving directory '/home/nfs/an.taugspurger/miniconda3/conda-bld/debug_1551823583841/work/src/ucm'
make: *** [Makefile:686: install-recursive] Error 1

It seems like the LDFLAGS in src/ucm/Makefile is incorrect. By changing from

 LDFLAGS =  -lcudart -lcuda -L/usr/local/cuda/lib64/

to

 LDFLAGS =  -lcudart -lcuda -L/usr/local/cuda/lib64/  -L/usr/local/cuda/lib64/stubs

then

make clean
make install

in that directory finishes. Need to figure out why just the first is being added.

This is on my conda-recipes branch (cf9913a9fcdac6735219d29023b56389bca44b73) and am running

$ conda debug conda-recipes/ucx-gpu --numpy=1.14 --python=3.7
$ <copy-paste command>
$ ./conda_build.sh
@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Mar 6, 2019

We're able to build CPU and GPU conda packages for UCX & ucx-py against ucx master. However, we see a runtime error when doing anything with a listener / client openucx/ucx#3319.

At the moment, I'm inclined to continue debugging against ucx master. Our suspicion is that the build error in
https://github.com/Akshay-Venkatesh/ucx-py/issues/33#issuecomment-469878468 are due to a bug in UCX's autotools setup. I'm not familiar enough with autotools to know for certain. But, this has all been changed in UCX master, so any work there will be short-lived.

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Mar 7, 2019

#42 provides recipes as a starting point. We'll leave this open till things are building with ucx-master.

@jakirkham

This comment has been minimized.

Copy link
Member

commented Mar 28, 2019

cc @quasiben (as Tom's comment appears related to our own recent build issues)

@quasiben

This comment has been minimized.

Copy link
Contributor

commented Mar 28, 2019

Oh! This explains quit a bit!

@TomAugspurger

This comment has been minimized.

Copy link
Contributor Author

commented Mar 28, 2019

@Akshay-Venkatesh Akshay-Venkatesh transferred this issue from rapidsai/ucx-py Apr 2, 2019

@Akshay-Venkatesh Akshay-Venkatesh transferred this issue from another repository Apr 2, 2019

@mrocklin

This comment has been minimized.

Copy link
Member

commented May 28, 2019

@jakirkham can you give an update here about the current challenges to packaging UCX and UCX-Py along with any suggestions you may have about what we can do to make progress?

@jakirkham

This comment has been minimized.

Copy link
Member

commented May 28, 2019

It's been a little bit since I've touched these recipes as other tasks have bumped them. That said, my recollection is we started to make CDTs using conda skeleton rpm. This seems to work ok if the package isn't too complicated, but can run into issues if things go slightly off the beaten path. For example, one particular issue I recall is one RPM contained a broken symlink. Diagnosing and fixing these errors in the CDTs would be the next step. The recipes are in this branch if anyone wants to take a look.

@mrocklin

This comment has been minimized.

Copy link
Member

commented May 28, 2019

@mrocklin

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

@jakirkham can you recommend a path forward here? Are people other than you able to do this work? How long would it take you? Is it possible to interleave this with other work that you're doing?

@jakirkham

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

can you recommend a path forward here?

I think this is already answered in the comment above. Though please feel free to ask more questions if something remains unclear.

Are people other than you able to do this work?

No objections to other people doing the work. Again hopefully the comment above answers how this would be handled.

How long would it take you? Is it possible to interleave this with other work that you're doing?

I'm not sure as I haven't looked at this in a while. I could probably pick this up in a couple of days if needed. Please let me know.

Should add how much time is spent on the recipes generally depends on how we handle upstreaming this work. ( #113 )

@mrocklin

This comment has been minimized.

Copy link
Member

commented Jun 3, 2019

No objections to other people doing the work. Again hopefully the comment above answers how this would be handled.

Do we know anyone who you think is capable of doing this?

I'm not sure as I haven't looked at this in a while. I could probably pick this up in a couple of days if needed. Please let me know.

It would be useful to have this in a week or two if that's easy to do. You're also doing other things that are valuable though. It sounds like we need a bit more information here in order to make decisions about prioritization. If you can spend a small amount of time to determine how expensive this is, that would be helpful.

@quasiben

This comment has been minimized.

Copy link
Contributor

commented Jun 3, 2019

I might be able to resume work on conda packaging ucx this week. But it would be good if others were interested as well

@mrocklin

This comment has been minimized.

Copy link
Member

commented Jun 4, 2019

@quasiben I suspect that you have your hands full with dask-cudf issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants
You can’t perform that action at this time.