Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Link statically against TensorFlow C library #83

Closed
lastzero opened this issue Dec 19, 2018 · 20 comments
Closed

Link statically against TensorFlow C library #83

lastzero opened this issue Dec 19, 2018 · 20 comments

Comments

@lastzero
Copy link
Member

@lastzero lastzero commented Dec 19, 2018

Issuehunt badges

As a user, I want to download a single binary for PhotoPrism so that I can easily use it on any computer without complicated installation.

Key to this endeavour is that we stick with TensorFlow as our only C library dependency and link it statically into our binary (maybe dlib too, another ML library). While it is currently not available in a static version from Google, we can build it ourselves or wait until next year when Bazel has a rule to build a fully static library.

AFAIK, a static version of TensorFlow can be built with cmake (there are howtos out there for various platforms): https://github.com/tensorflow/tensorflow/tree/master/tensorflow/contrib/cmake

Acceptance Criteria:

  • On Linux, make build MUST create a photoprism binary that is statically linked against TensorFlow (no shared libraries needed anymore)
  • Support for other operating systems and processors MAY be implemented
  • Support for other operating systems and processors SHOULD be documented (how to do it)
  • Support for CPU extensions, such as SSE4.1, SSE4.2, AVX, and AVX2 SHOULD be enabled

IssueHunt Summary

graciousgrey graciousgrey has been rewarded.

Backers (Total: $60.00)

Submitted pull Requests


Tips


IssueHunt has been backed by the following sponsors. Become a sponsor

@lastzero lastzero added this to the MVP milestone Dec 19, 2018
@IssueHuntBot
Copy link

@IssueHuntBot IssueHuntBot commented Dec 20, 2018

@lastzero has funded $40.00 to this issue. See it on IssueHunt

@IssueHuntBot
Copy link

@IssueHuntBot IssueHuntBot commented Dec 21, 2018

@issuehuntfest has funded $20.00 to this issue. See it on IssueHunt

@lastzero lastzero mentioned this issue Dec 22, 2018
0 of 6 tasks complete
@lastzero
Copy link
Member Author

@lastzero lastzero commented May 29, 2019

Static TensorFlow build on Linux and OS X works, see tensorflow/tensorflow#28388 and #99

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 19, 2019

Building the lib with env JOB_COUNT=1 ./tensorflow/contrib/makefile/build_all_linux.sh on ARM64 failed:

gcc --std=c++11 -DIS_SLIM_BUILD -fno-exceptions -DNDEBUG -O3 -march=native -fPIC -MT 
/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/batch_matmul_op_real.o -MMD -MP -MF 
/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/dep//tensorflow/core/kernels/batch_matmul_op_real.Td 
-I. 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/ 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/eigen 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/gemmlowp 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/nsync/public 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/fft2d 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/double_conversion 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/absl 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/proto/ 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/proto_text/ 
-I/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/protobuf-host/include 
-I/usr/local/include -c tensorflow/core/kernels/batch_matmul_op_real.cc -o /home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/batch_matmul_op_real.o
In file included from /home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/../../../Eigen/Core:296:0,
                 from /home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/Tensor:14,
                 from ./third_party/eigen3/unsupported/Eigen/CXX11/Tensor:1,
                 from ./tensorflow/core/kernels/batch_matmul_op_impl.h:24,
                 from tensorflow/core/kernels/batch_matmul_op_real.cc:16:
/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralBlockPanelKernel.h: 
In function ‘void Eigen::internal::gebp_kernel<LhsScalar, RhsScalar, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>::operator()(const DataMapper&, const LhsScalar*, const RhsScalar*, Index, Index, Index, Eigen::internal::gebp_kernel<LhsScalar, RhsScalar, Index, DataMapper, mr, nr, ConjugateLhs, ConjugateRhs>::ResScalar, Index, Index, Index, Index) [with LhsScalar = Eigen::half; RhsScalar = Eigen::half; Index = long int; DataMapper = Eigen::internal::blas_data_mapper<Eigen::half, long int, 0, 0>; int mr = 2; int nr = 4; bool ConjugateLhs = false; bool ConjugateRhs = false]’:
/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/downloads/eigen/unsupported/Eigen/CXX11/../../../Eigen/src/Core/products/GeneralBlockPanelKernel.h:1879:3: internal compiler error: in emit_move_insn, at expr.c:3698
   }
   ^
Please submit a full bug report,
with preprocessed source if appropriate.
See <file:///usr/share/doc/gcc-7/README.Bugs> for instructions.
tensorflow/contrib/makefile/Makefile:840: recipe for target '/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/batch_matmul_op_real.o' failed
make: *** [/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/obj/tensorflow/core/kernels/batch_matmul_op_real.o] Error 1
@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 19, 2019

Might be related to tensorflow/tensorflow#25323

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 19, 2019

Bug seems to be fixed in gcc 8.3.1, apparently no fix for gcc 7: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=89752

Compiling with gcc 8.3.0 fails with a different error:

g++ -M -std=c++11 -DNSYNC_USE_CPP11_TIMEPOINT -DNSYNC_ATOMIC_CPP11 -I../../platform/c++11.futex -I../../platform/c++11 -I../../platform/gcc -I../../platform/posix -pthread -I../../public -I../../internal ../../internal/*.c ../../testing/*.c ../../platform/linux/src/nsync_semaphore_futex.c ../../platform/c++11/src/per_thread_waiter.cc ../../platform/c++11/src/yield.cc ../../platform/c++11/src/time_rep_timespec.cc ../../platform/c++11/src/nsync_panic.cc \
	  ../../platform/c++11/src/start_thread.cc > dependfile
g++ -DNSYNC_USE_CPP11_TIMEPOINT -DNSYNC_ATOMIC_CPP11 -I../../platform/c++11.futex -I../../platform/c++11 -I../../platform/gcc -I../../platform/posix -pthread -I../../public -I../../internal -O -std=c++11 -Werror -Wall -Wextra -pedantic -c ../../internal/common.c
g++ -DNSYNC_USE_CPP11_TIMEPOINT -DNSYNC_ATOMIC_CPP11 -I../../platform/c++11.futex -I../../platform/c++11 -I../../platform/gcc -I../../platform/posix -pthread -I../../public -I../../internal -O -std=c++11 -Werror -Wall -Wextra -pedantic -c ../../internal/counter.c
../../internal/counter.c: In function ‘nsync::nsync_counter_s_* nsync::nsync_counter_new(uint32_t)’:
../../internal/counter.c:39:28: error: ‘void* memset(void*, int, size_t)’ clearing an object of type ‘struct nsync::nsync_counter_s_’ with no trivial copy-assignment; use value-initialization instead [-Werror=class-memaccess]
   memset (c, 0, sizeof (*c));
                            ^
../../internal/counter.c:29:8: note: ‘struct nsync::nsync_counter_s_’ declared here
 struct nsync_counter_s_ {
        ^~~~~~~~~~~~~~~~
cc1plus: all warnings being treated as errors
../../platform/posix/make.common:72: recipe for target 'counter.o' failed

Can somebody help and maybe point to an Ubuntu bionic PPA that has a working compiler for ARM64?

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 21, 2019

Compiling with GCC 5 - as suggested on https://devtalk.nvidia.com/default/topic/1055131/jetson-agx-xavier/building-tensorflow-1-13-on-jetson-xavier/ - failed too:

/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/host_obj/tensorflow/core/util/test_log.pb.o:(.data.rel.ro._ZTVN6google8protobuf8internal12MapEntryImplIN10tensorflow35BenchmarkEntry_ExtrasEntry_DoNotUseENS0_7MessageESsNS3_10EntryValueELNS1_14WireFormatLite9FieldTypeE9ELS8_11ELi0EEE[_ZTVN6google8protobuf8internal12MapEntryImplIN10tensorflow35BenchmarkEntry_ExtrasEntry_DoNotUseENS0_7MessageESsNS3_10EntryValueELNS1_14WireFormatLite9FieldTypeE9ELS8_11ELi0EEE]+0x58): 
undefined reference to `google::protobuf::Message::InitializationErrorString() const'
collect2: error: ld returned 1 exit status
tensorflow/contrib/makefile/Makefile:899: recipe for target '/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/host_bin/proto_text' failed
make: *** [/home/photoprism/Development/photoprism/tensorflow/tensorflow/contrib/makefile/gen/host_bin/proto_text] Error 1

NVIDIA should really provide a working TensorFlow lib with their base software as compiling it is not only very difficult but also takes a VERY long time, especially on a Jetson Nano. We are talking about hours and days.

Our adapted build script (Bazel also failed to compile after several hours!):

#!/usr/bin/env bash
# Copyright 2016 The TensorFlow Authors. All Rights Reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
#     http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
# ==============================================================================
# Downloads and builds all of TensorFlow's dependencies for Linux, and compiles
# the TensorFlow library itself. Supported on Ubuntu 14.04 and 16.04.

set -e

# Make sure we're in the correct directory, at the root of the source tree.
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd ${SCRIPT_DIR}/../../../

source "${SCRIPT_DIR}/build_helper.subr"
JOB_COUNT="${JOB_COUNT:-$(get_job_count)}"

# Remove any old files first.
make -f tensorflow/contrib/makefile/Makefile clean
rm -rf tensorflow/contrib/makefile/downloads

# Pull down the required versions of the frameworks we need.
tensorflow/contrib/makefile/download_dependencies.sh

# Compile nsync.
# Don't use  export var=`something` syntax; it swallows the exit status.
HOST_NSYNC_LIB=`tensorflow/contrib/makefile/compile_nsync.sh`
TARGET_NSYNC_LIB="$HOST_NSYNC_LIB"
export HOST_NSYNC_LIB TARGET_NSYNC_LIB

# Compile protobuf.
tensorflow/contrib/makefile/compile_linux_protobuf.sh

# Build TensorFlow.
make -j"${JOB_COUNT}" -f tensorflow/contrib/makefile/Makefile \
  OPTFLAGS="-O3 -D_GLIBCXX_USE_CXX11_ABI=0 -march=native" \
  CXXFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0 --std=c++11 -march=native" \
  HOST_CXXFLAGS="-D_GLIBCXX_USE_CXX11_ABI=0 --std=c++11 -march=native" \
  MAKEFILE_DIR=$SCRIPT_DIR

Cross compiling would be another option... not sure how easy this is since we need native GPU support and maybe need to link against NVIDIA libs.

See also https://devtalk.nvidia.com/default/topic/1055987/jetson-nano/tensorflow-for-c-tensorflow-so-static-linking-from-go/

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 21, 2019

And even IF we cross compile to speed it up, we still need a working compiler... otherwise it just fails faster.

@lastzero lastzero self-assigned this Jun 21, 2019
@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 22, 2019

This is how building Bazel fails

INFO: From Compiling src/main/cpp/blaze_util_posix.cc:
src/main/cpp/blaze_util_posix.cc: In function 'uint64_t blaze::AcquireLock(const string&, bool, bool, blaze::BlazeLock*)':
src/main/cpp/blaze_util_posix.cc:650:30: warning: ignoring return value of 'int ftruncate(int, __off_t)', declared with attribute warn_unused_result [-Wunused-result]
   (void) ftruncate(lockfd, 0);
                              ^
ERROR: /root/bazel/src/main/java/com/google/devtools/build/lib/buildeventservice/BUILD:17:1: Building src/main/java/com/google/devtools/build/lib/buildeventservice/libbuildeventservice.jar (6 source files) and running annotation processors (OptionProcessor, AutoAnnotationProcessor, AutoValueProcessor) failed: Worker process quit or closed its stdin stream when we tried to send a WorkRequest:

---8<---8<--- Exception details ---8<---8<---
java.io.IOException: Stream closed
	at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
	at java.io.OutputStream.write(OutputStream.java:116)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3003)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeStringNoTag(CodedOutputStream.java:2872)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeString(CodedOutputStream.java:2718)
	at com.google.protobuf.GeneratedMessageV3.writeString(GeneratedMessageV3.java:2719)
	at com.google.devtools.build.lib.worker.WorkerProtocol$WorkRequest.writeTo(WorkerProtocol.java:1006)
	at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:98)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:330)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:172)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:121)
	at com.google.devtools.build.lib.exec.SpawnRunner.execAsync(SpawnRunner.java:225)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:123)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:88)
	at com.google.devtools.build.lib.actions.SpawnActionContext.beginExecution(SpawnActionContext.java:41)
	at com.google.devtools.build.lib.exec.ProxySpawnActionContext.beginExecution(ProxySpawnActionContext.java:60)
	at com.google.devtools.build.lib.actions.SpawnContinuation$1.execute(SpawnContinuation.java:80)
	at com.google.devtools.build.lib.rules.java.JavaCompileAction$JavaActionContinuation.execute(JavaCompileAction.java:494)
	at com.google.devtools.build.lib.rules.java.JavaCompileAction.beginExecution(JavaCompileAction.java:314)
	at com.google.devtools.build.lib.actions.Action.execute(Action.java:123)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$5.execute(SkyframeActionExecutor.java:849)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.continueAction(SkyframeActionExecutor.java:983)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.run(SkyframeActionExecutor.java:955)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.runStateMachine(ActionExecutionState.java:116)
	at com.google.devtools.build.lib.skyframe.ActionExecutionState.getResultOrDependOnFuture(ActionExecutionState.java:77)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:581)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:716)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:258)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:451)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:399)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
---8<---8<--- End of exception details ---8<---8<---

---8<---8<--- Start of log, file at /tmp/bazel_UTISB4cE/out/bazel-workers/worker-2-Javac.log ---8<---8<---
(empty)
---8<---8<--- End of log ---8<---8<---
Target //src:bazel_nojdk failed to build
INFO: Elapsed time: 1431.336s, Critical Path: 717.28s
INFO: 1324 processes: 1107 local, 217 worker.
FAILED: Build did NOT complete successfully

ERROR: Could not build Bazel
@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 22, 2019

Same with Bazel 0.19.2:

ERROR: /tmp/bazel_utLOs7ae/out/external/desugar_jdk_libs/src/share/classes/java/BUILD:1:1: Building external/desugar_jdk_libs/src/share/classes/java/libjava.jar (212 source files) failed: Worker process quit or closed its stdin stream when we tried to send a WorkRequest:

---8<---8<--- Exception details ---8<---8<---
java.io.IOException: Stream closed
	at java.lang.ProcessBuilder$NullOutputStream.write(ProcessBuilder.java:433)
	at java.io.OutputStream.write(OutputStream.java:116)
	at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82)
	at java.io.BufferedOutputStream.write(BufferedOutputStream.java:126)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.doFlush(CodedOutputStream.java:3003)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeStringNoTag(CodedOutputStream.java:2872)
	at com.google.protobuf.CodedOutputStream$OutputStreamEncoder.writeString(CodedOutputStream.java:2718)
	at com.google.protobuf.GeneratedMessageV3.writeString(GeneratedMessageV3.java:2719)
	at com.google.devtools.build.lib.worker.WorkerProtocol$WorkRequest.writeTo(WorkerProtocol.java:1006)
	at com.google.protobuf.AbstractMessageLite.writeDelimitedTo(AbstractMessageLite.java:98)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.execInWorker(WorkerSpawnRunner.java:318)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.actuallyExec(WorkerSpawnRunner.java:160)
	at com.google.devtools.build.lib.worker.WorkerSpawnRunner.exec(WorkerSpawnRunner.java:115)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:106)
	at com.google.devtools.build.lib.exec.AbstractSpawnStrategy.exec(AbstractSpawnStrategy.java:75)
	at com.google.devtools.build.lib.exec.SpawnActionContextMaps$ProxySpawnActionContext.exec(SpawnActionContextMaps.java:362)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction.internalExecute(SpawnAction.java:288)
	at com.google.devtools.build.lib.analysis.actions.SpawnAction.execute(SpawnAction.java:295)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeActionTask(SkyframeActionExecutor.java:1001)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.prepareScheduleExecuteAndCompleteAction(SkyframeActionExecutor.java:930)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.access$800(SkyframeActionExecutor.java:121)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:770)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor$ActionRunner.call(SkyframeActionExecutor.java:725)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at com.google.devtools.build.lib.skyframe.SkyframeActionExecutor.executeAction(SkyframeActionExecutor.java:478)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.checkCacheAndExecuteIfNeeded(ActionExecutionFunction.java:519)
	at com.google.devtools.build.lib.skyframe.ActionExecutionFunction.compute(ActionExecutionFunction.java:216)
	at com.google.devtools.build.skyframe.AbstractParallelEvaluator$Evaluate.run(AbstractParallelEvaluator.java:422)
	at com.google.devtools.build.lib.concurrent.AbstractQueueVisitor$WrappedRunnable.run(AbstractQueueVisitor.java:368)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
	at java.lang.Thread.run(Thread.java:748)
---8<---8<--- End of exception details ---8<---8<---

---8<---8<--- Start of log, file at /tmp/bazel_utLOs7ae/out/bazel-workers/worker-1-Javac.log ---8<---8<---
(empty)
---8<---8<--- End of log ---8<---8<---
Target //src:bazel_nojdk failed to build
INFO: Elapsed time: 1725.964s, Critical Path: 499.81s, Remote (0.00% of the time): [queue: 0.00%, setup: 0.00%, process: 0.00%]
INFO: 1478 processes: 1311 local, 167 worker.
FAILED: Build did NOT complete successfully

ERROR: Could not build Bazel
@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 22, 2019

I'll stop working on this is there is no tangible result after 5 days. Major waste of time. I hope the compiler bug gets fixed and Nvidia offers TensorFlow for C as download because not everyone should have to compile it. That's ridiculous, given they promote their hardware as suited for TensorFlow and machine learning.

See https://devtalk.nvidia.com/default/topic/1055131/jetson-agx-xavier/building-tensorflow-1-13-on-jetson-xavier/?offset=8#5354133

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 22, 2019

On OS X, a static lib is compiling using GCC 4.2, but I can't build a static bin with Go since there are no static versions of OS libs like on Linux: tensorflow/tensorflow#23649 (comment)

Not sure how to only include libtensorflow.a statically and not all other system libs... again help is much appreciated, although I'll eventually figure this out if I stop having a life 😉

lastzero added a commit that referenced this issue Jun 23, 2019
lastzero added a commit that referenced this issue Jun 23, 2019
@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 23, 2019

A static lib can now also be compiled on Linux using GCC 4.8 (all later versions are doomed to fail), see https://dl.photoprism.org/tensorflow/v1.13.1/core-avx-i/

However, libtensorflow.a doesn't seem to be enough to build a static Go binary. Again, advice from C / TensorFlow specialists is much appreciated 👍

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 23, 2019

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 23, 2019

Managed to build Bazel 0.19.2 after adding 6 GB of swap, also managed to build static tensorflow-core.a lib on Jetson Nano (ARM64) after compiling a day and a night:

https://dl.photoprism.org/tensorflow/v1.13.1/nvidia-jetson/

As tensorflow-core.a doesn't seem to be enough to compile PhotoPrism with TensorFlow, I tried to build tensorflow.so using Bazel, but no config for ARM64 exists:

$ bazel build --config=opt --config=nonccl //tensorflow/tools/pip_package:build_pip_package --incompatible_remove_native_http_archive=false --verbose_failures --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0"
WARNING: ignoring LD_PRELOAD in environment.
Starting local Bazel server and connecting to it...
DEBUG: /home/photoprism/.cache/bazel/_bazel_photoprism/a82c5925e91e6d1a9494d22d9b2ee512/external/bazel_tools/tools/cpp/lib_cc_configure.bzl:115:5: 
Auto-Configuration Warning: 'TMP' environment variable is not set, using 'C:\Windows\Temp' as default
ERROR: No toolchain found for cpu 'aarch64'. Valid cpus from default_toolchain entries are: [
]. Valid toolchains are: [
  local_linux: --cpu='local' --compiler='compiler',
  local_darwin: --cpu='darwin' --compiler='compiler',
  local_windows: --cpu='x64_windows' --compiler='msvc-cl',
]
INFO: Elapsed time: 14.870s
INFO: 0 processes.
FAILED: Build did NOT complete successfully (2 packages loaded)
@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 23, 2019

BTW: This issue is funded, so you'll get a reward if you help us 💰

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 23, 2019

My personal highlight is that Bazel uses 'C:\Windows\Temp' as default temp path on Linux and they call this AUTO CONFIGURATION 🥳

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 24, 2019

Think I fixed all issues and came up with a sound TensorFlow GPU configuration except that Bazel is still compiling. Jetson Nano has a load of > 7 and RAM utilization of ~200%. From the manual:

Note that the number of concurrent jobs that Bazel will run is determined not only by the --jobs setting, but also by Bazel's scheduler, which tries to avoid running concurrent jobs that will use up more resources (RAM or CPU) than are available, based on some (very crude) estimates of the resource consumption of each job. The behavior of the scheduler can be controlled by the --ram_utilization_factor option.

https://docs.bazel.build/versions/master/user-manual.html

In reality, if you have 4 cores, Bazel will run 4 jobs even it needs all the RAM in the world and your mouse is not moving anymore. --jobs 2 should fix this.

@lastzero
Copy link
Member Author

@lastzero lastzero commented Jun 24, 2019

Pre-compiled Bazel binaries for the Jetson (Ubuntu 18.04 ARM64) can be downloaded here: https://dl.photoprism.org/tensorflow/bazel/nvidia-jetson/

For TensorFlow 1.14.0, you'll need bazel-0.24.1. Rename it to /usr/local/bin/bazel and set the executable flags.

lastzero added a commit that referenced this issue Jun 25, 2019
lastzero added a commit that referenced this issue Jun 26, 2019
lastzero added a commit that referenced this issue Jun 26, 2019
lastzero added a commit that referenced this issue Jun 26, 2019
lastzero added a commit that referenced this issue Jun 27, 2019
lastzero added a commit that referenced this issue Jun 27, 2019
@graciousgrey graciousgrey self-assigned this Jan 17, 2020
graciousgrey added a commit that referenced this issue Jan 17, 2020
graciousgrey added a commit that referenced this issue Jan 17, 2020
@issuehunt-app
Copy link

@issuehunt-app issuehunt-app bot commented Jan 19, 2020

@lastzero has rewarded $48.00 to @graciousgrey. See it on IssueHunt

  • 💰 Total deposit: $60.00
  • 🎉 Repository reward(10%): $6.00
  • 🔧 Service fee(10%): $6.00
lastzero added a commit that referenced this issue Jan 19, 2020
* #83 Add NewLocation() function

* #83 Add NewPlace() function

* #83 Add tests for maps/places/location

* #83 Add tests for maps/location

* #83 Add tests for internal/config

* #83 Add test for meta/exif

* #83 Add testfiles
@lastzero lastzero closed this Jan 19, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
3 participants