Skip to content
This repository was archived by the owner on Nov 1, 2024. It is now read-only.

Conversation

@bearzx
Copy link
Contributor

@bearzx bearzx commented Jun 2, 2022

In this PR I'm attempting to introduce a temporary workaround to fix the velox dylib issue, via a post-build hack. This is to make conda install torcharrow work out of the box.

Currently, you will encounter an error of dylibs not found when velox is dlopen-ing them. This is because the location of those dylibs were determined at compile time. For example: /usr/local/opt/gflags/lib/libgflags.2.2.dylib was installed to a default location by brew install.

We can specify the dylib dependencies in the conda recipe, but they will be installed to the conda environment locations, rather than the system ones (/user/local/...). As a result, the velox dylib (_torcharrow.*.so) won't be able to find those libraries.

On macos there is a tool called install_name_tool that can change the dylib paths, my plan is to modify those dylib paths from absolute ones to @loader_path/... so that the velox .so can find the dylibs installed by conda.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 2, 2022
- libsodium
{{ environ.get('CONDA_PYTORCH_CONSTRAINT') }}
- pytorch
# {{ environ.get('CONDA_PYTORCH_CONSTRAINT') }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shall we just remove this?

Copy link
Contributor Author

@bearzx bearzx Jun 2, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I'm experimenting with this, sometimes having the pytorch nightly build as a dependency can make conda confused when installing. Previously I thought it was solved by only leaving pytorch in the "run" section, but it actually didn't solve the problem. Just having pytorch here seems to be simpler.

- BUILD_VERSION
- CPU_TARGET

#GitHub Actions usually provide Mac without AVX2 support, for now just skip the test until Velox can be run without AVX2
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so the test is already enabled now, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep, that's why I'm removing the stale comment here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might need to add command pytest --no-header -v torcharrow/test back the test

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah here I'm only running an import test for now. For some reason the last time I ran the tests, there were some ones failing on some path comparison issues (don't remember the details, but something like comparing './abc' to 'abc failed). This will go into another PR to investigate.

- glog==0.4.0
- libsodium
{{ environ.get('CONDA_PYTORCH_CONSTRAINT') }}
- pytorch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious: why do we need to build with pytorch? (since we don't build text UDFs by default)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in the run section, which specifies the dependencies needed to use torcharrow. I think torcharrow still imports pytorch right?

@bearzx
Copy link
Contributor Author

bearzx commented Jun 6, 2022

Ok after a million times of trial-and-error, I finally found a way to adjust the recipe to

  1. Run the conda build as usual.
  2. Use install_name_tool to change the dylib paths for _torcharrow.so post build.
  3. Specify the right versions of runtime dependencies that velox is happy with (notably glog==0.6.0 and arrow-cpp==8.0.0).

The most tricky part is perhaps in step 2, because conda build doesn't provide a good entry point to do a final change before it packages everything into a .tar.bz2 file. Adding stuff after python setup.py in build.sh sort of worked, but there are some extra steps to specify the compatible dylibs versions after, so it will still end up creating broken build (I still don't know all the details here, just making the observations based on my experiments).

Since the conda package is just a .tar.bz2 file, the approach I ended up taking was to unpack it, install_name_tool, pack it again using the tar command, before uploading happens. One more detail here: we'll need to run tar within the target folder so that the file list doesn't have a ./ prefix, this is critical because anaconda upload specifically hard-coded this to read the metadata. It sounds and works pretty hacky, but might be fine as a temp fix before we have static link option available on velox.

Now you should be able to run conda install -c conda-forge -c pytorch-nightly torcharrow to install TA on macos and have it works out of the box.

@bearzx
Copy link
Contributor Author

bearzx commented Jun 6, 2022

@bearzx bearzx requested review from ejguan and kgpai June 6, 2022 23:40
@facebook-github-bot
Copy link
Contributor

@bearzx has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

2 similar comments
@facebook-github-bot
Copy link
Contributor

@bearzx has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@facebook-github-bot
Copy link
Contributor

@bearzx has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

- typing
- tabulate
- pyarrow
- arrow
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

btw arrow on conda is a completely different lib ...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and arrow-cpp

- arrow-cpp==8.0.0
- cffi
- glog==0.4.0
- glog==0.6.0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

anything changed on Velox side?

Copy link
Contributor Author

@bearzx bearzx Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think velox always uses the glog installed from brew, which was recently updated from 0.5.0 to 0.6.0. Building with glog0.6.0 and running with glog0.4.0 will not work (some symbol not found error when running). However, the conda default channel hasn't updated to 0.6.0 yet, it only exists on conda-forge, hence this change.

Copy link
Contributor Author

@bearzx bearzx Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is also why we need to specify -c conda-forge when installing.

Copy link
Contributor

@wenleix wenleix left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

@@ -0,0 +1,33 @@
function fix_velox_dylib_paths() {
velox_so=$1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we check there is exactly one parameter ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you elaborate?

rm ./${pkg_name}.tar.bz2
cd ./${pkg_name}
source ../../../packaging/fix_conda_dylib_paths.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just use ${GITHUB_WORKSPACE}/packging/...?

bearzx added a commit that referenced this pull request Jun 7, 2022
Summary:
In this PR I'm attempting to introduce a temporary workaround to fix the velox dylib issue, via a post-build hack. This is to make `conda install torcharrow` work out of the box.

Currently, you will encounter an error of dylibs not found when velox is dlopen-ing them. This is because the location of those dylibs were determined at compile time. For example: `/usr/local/opt/gflags/lib/libgflags.2.2.dylib` was installed to a default location by `brew install`.

We can specify the dylib dependencies in the conda recipe, but they will be installed to the conda environment locations, rather than the system ones (`/user/local/...`). As a result, the velox dylib (`_torcharrow.*.so`) won't be able to find those libraries.

On macos there is a tool called `install_name_tool` that can change the dylib paths, my plan is to modify those dylib paths from absolute ones to `loader_path/...` so that the velox .so can find the dylibs installed by conda.

Pull Request resolved: #355

Reviewed By: wenleix

Differential Revision: D36956804

Pulled By: bearzx

fbshipit-source-id: 2551910eaec0c52b59b769de67c52aae633bbbc6
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D36956804

Summary:
In this PR I'm attempting to introduce a temporary workaround to fix the velox dylib issue, via a post-build hack. This is to make `conda install torcharrow` work out of the box.

Currently, you will encounter an error of dylibs not found when velox is dlopen-ing them. This is because the location of those dylibs were determined at compile time. For example: `/usr/local/opt/gflags/lib/libgflags.2.2.dylib` was installed to a default location by `brew install`.

We can specify the dylib dependencies in the conda recipe, but they will be installed to the conda environment locations, rather than the system ones (`/user/local/...`). As a result, the velox dylib (`_torcharrow.*.so`) won't be able to find those libraries.

On macos there is a tool called `install_name_tool` that can change the dylib paths, my plan is to modify those dylib paths from absolute ones to `loader_path/...` so that the velox .so can find the dylibs installed by conda.

Pull Request resolved: #355

Reviewed By: wenleix

Differential Revision: D36956804

Pulled By: bearzx

fbshipit-source-id: 0bf133a8cd9b63466706e355c7ac18004b807609
@facebook-github-bot
Copy link
Contributor

This pull request was exported from Phabricator. Differential Revision: D36956804

facebook-github-bot pushed a commit that referenced this pull request Jun 29, 2022
Summary:
In this PR I'm attempting to introduce a temporary workaround to fix the velox dylib issue, via a post-build hack. This is to make `conda install torcharrow` work out of the box.

Currently, you will encounter an error of dylibs not found when velox is dlopen-ing them. This is because the location of those dylibs were determined at compile time. For example: `/usr/local/opt/gflags/lib/libgflags.2.2.dylib` was installed to a default location by `brew install`.

We can specify the dylib dependencies in the conda recipe, but they will be installed to the conda environment locations, rather than the system ones (`/user/local/...`). As a result, the velox dylib (`_torcharrow.*.so`) won't be able to find those libraries.

On macos there is a tool called `install_name_tool` that can change the dylib paths, my plan is to modify those dylib paths from absolute ones to `loader_path/...` so that the velox .so can find the dylibs installed by conda.

X-link: #355

Reviewed By: wenleix

Differential Revision: D36956804

Pulled By: bearzx

fbshipit-source-id: 720ce2fcc7815fdb0211f8b5d9d72f9d817f6080
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants