Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix libtorch static builds that regressed #1056

Merged
merged 1 commit into from
Jun 14, 2022

Conversation

powderluv
Copy link
Contributor

libtorch-static used to ship libtorch.a until 1.0.1 and has
regressed since forcing people to build from source.

This fixes the build commands and add BUILD_SHARED_LIBS=OFF
It also removes build it as a debug build, and turns off
USE_TENSORPIPE until pytorch/tensorpipe#449
is fixed.

Tested on my system to verify a libtorch.a is produced but can't
test the CI

Fixes: pytorch/pytorch#70898

@t-vi
Copy link

t-vi commented Jun 10, 2022

Just today I wondered about this. Thank you!

@powderluv
Copy link
Contributor Author

@seemethere @ezyang can you please help review or send to the right person ?

@makslevental
Copy link
Contributor

makslevental commented Jun 10, 2022

@malfet can you take a look at this?

malfet
malfet previously requested changes Jun 10, 2022
Copy link
Contributor

@malfet malfet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, where BUILD_STATIC variable referenced on line 144 is defined? (Imo it should be BUILD_SHARED_VAR shouldn't it?)

manywheel/build_libtorch.sh Outdated Show resolved Hide resolved
@malfet
Copy link
Contributor

malfet commented Jun 10, 2022

Also, we probably want to keep debug symbol information, but only for shared library builds (as static ones indeed can not be separated into library and debug symbols)

libtorch-static used to ship libtorch.a until 1.0.1 and has
regressed since forcing people to build from source.

This fixes the build commands and add BUILD_SHARED_LIBS=OFF
It turns off USE_TENSORPIPE until
pytorch/tensorpipe#449 is fixed.

Tested on my system to verify a libtorch.a is produced but can't
test the CI
@powderluv
Copy link
Contributor Author

Friendly ping to review. Thanks thanks

@ezyang ezyang merged commit 17bfd84 into pytorch:main Jun 14, 2022
@makslevental
Copy link
Contributor

thanks @ezyang

@malfet
Copy link
Contributor

malfet commented Jun 16, 2022

Reverting as it broke all static builds in nightlies, see https://hud.pytorch.org/pytorch/pytorch/commit/a4fa62064e1863d108c655f541bce4d806cf1823

malfet added a commit that referenced this pull request Jun 16, 2022
malfet added a commit that referenced this pull request Jun 16, 2022
@makslevental
Copy link
Contributor

here's the error (string interpolation in ninja?)

2022-06-16T07:12:10.6986542Z 
2022-06-16T07:12:10.6986902Z    '/usr/bin/ninja-build' '-C' '/pytorch/build' '-t' 'cleandead'
2022-06-16T07:12:10.6987210Z 
2022-06-16T07:12:10.6987341Z   failed with:
2022-06-16T07:12:10.6987535Z 
2022-06-16T07:12:10.6987793Z    ninja: error: build.ninja:53433: bad $-escape (literal $ must be written as $$)
2022-06-16T07:12:10.6987971Z 
2022-06-16T07:12:10.6988035Z   
2022-06-16T07:12:10.6988123Z 
2022-06-16T07:12:10.6988127Z 
2022-06-16T07:12:10.6988131Z 
2022-06-16T07:12:10.7665603Z CMake Error:
2022-06-16T07:12:10.7665931Z   Running
2022-06-16T07:12:10.7666383Z 
2022-06-16T07:12:10.7666860Z    '/usr/bin/ninja-build' '-C' '/pytorch/build' '-t' 'recompact'
2022-06-16T07:12:10.7667097Z 
2022-06-16T07:12:10.7667167Z   failed with:
2022-06-16T07:12:10.7667258Z 
2022-06-16T07:12:10.7667477Z    ninja: error: build.ninja:53433: bad $-escape (literal $ must be written as $$)
2022-06-16T07:12:10.7667649Z 
2022-06-16T07:12:10.7667711Z   
2022-06-16T07:12:10.7667800Z 
2022-06-16T07:12:10.7667804Z 

atalman added a commit to atalman/builder that referenced this pull request Jun 17, 2022
@powderluv
Copy link
Contributor Author

Any ideas on what went wrong here ? The same command works ok on couple systems i tried. Is the build.ninja file available somewhere ?

@powderluv
Copy link
Contributor Author

I have isolated the breakage to USE_MKL / USE_MKLDNN generating wrong Ninja rules like $<TARGET_FILE:dnnl_graph>

I have filed pytorch/pytorch#80012 to fix the issue in PyTorch but will land a fix for static builds with MKL disabled for now until it is fixed.

powderluv added a commit to powderluv/builder that referenced this pull request Jun 22, 2022
Prior attempt to land was reverted due to a failure with MKLDNN
pytorch#1056

Disable MKLDNN in static builds until it is fixed. It is tracked
in pytorch/pytorch#80012

TEST: With and without MKLDNN to recreate the last failure and
test that it builds without MKLDNN.
malfet pushed a commit that referenced this pull request Jun 22, 2022
Prior attempt to land was reverted due to a failure with MKLDNN
#1056

Disable MKLDNN in static builds until it is fixed. It is tracked
in pytorch/pytorch#80012

TEST: With and without MKLDNN to recreate the last failure and
test that it builds without MKLDNN.
powderluv added a commit to nod-ai/libtorch that referenced this pull request Jun 28, 2022
Prior attempt to land was reverted due to a failure with MKLDNN
pytorch#1056

Disable MKLDNN in static builds until it is fixed. It is tracked
in pytorch/pytorch#80012

TEST: With and without MKLDNN to recreate the last failure and
test that it builds without MKLDNN.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

libtorch.a is not in pytorch static zip archives
6 participants