Update libedgetpu repo for compatibility of recent versions of Tensorflow. #60

feranick · 2024-01-31T17:02:43Z

This PR:

Updates the URL calls to pull recent versions of TF (currently v2.15.0), TF/crosstools, and libusb (1.0.26).
Updates versions of bazel needed in docker.
It refactors WORKSPACE to conform to the deprecation of TensorFlow/Toolchains separate repo. More info here.
Use xz compression for compatibility with older versions of debian (bullseye)
Adds missing support for "stddef.h"

This is in relation to issues: tensorflow/tensorflow#62371 and #53

Prevent compilation fail

No longer needed. deb packaging will be done natively.

Refactored WORKSPACE after deprecation of Tensorflow/toolchains. More info: https://discuss.tensorflow.org/t/tensorflow-toolchains-has-been-deprecated-and-moved-into-tensorflow-tensorflow/7713

Skillnoob · 2024-02-12T06:36:03Z

@dmitriykovalev can you or some other maintainer please review this pr. Merging this would fix many of the issues users of the coral edge tpu have with newer tensorflow versions and python versions.

feranick · 2024-03-05T17:01:24Z

The reason I am asking is that I am building libcoral against 2.16.0 (as planned) not 2.17.0 as in the current master.

Namburger · 2024-03-05T17:11:07Z

@feranick I'm trying to find out, would be nice to get this included as part of 2.16

feranick · 2024-03-05T17:12:09Z

Thanks. If that doesn't happen, we will have to wait until 2.17, which will be quite down the road...

…

On 3/5/24 12:11 PM, Nam Vu wrote: @feranick <https://github.com/feranick> I'm trying to find out, would be nice to get this included as part of 2.16 — Reply to this email directly, view it on GitHub <#60 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAAIY7YN7JWPA4OZLIPFTITYWX4DPAVCNFSM6AAAAABCTNMRJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGI2TAMZRHE>. You are receiving this because you were mentioned.Message ID: ***@***.***>

--------------csmsq23xzmNWXml3up6Ub6QP Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: 8bit <!DOCTYPE html><html><head> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> Thanks. If that doesn't happen, we will have to wait until 2.17, which will be quite down the road... <div class="moz-cite-prefix">On 3/5/24 12:11 PM, Nam Vu wrote: </div> <blockquote type="cite" ***@***.***"> <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-url="/users/feranick/hovercard" data-octo-click="hovercard-link-click" data-octo-dimensions="link_type:self" href="https://github.com/feranick" ***@***.***</a> I'm trying to find out, would be nice to get this included as part of 2.16 — Reply to this email directly, <a href="#60 (comment)" moz-do-not-send="true">view it on GitHub</a>, or <a href="https://github.com/notifications/unsubscribe-auth/AAAIY7YN7JWPA4OZLIPFTITYWX4DPAVCNFSM6AAAAABCTNMRJ2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMYTSNZZGI2TAMZRHE" moz-do-not-send="true">unsubscribe</a>. You are receiving this because you were mentioned.<img src="https://github.com/notifications/beacon/AAAIY777YUDDRRI7I6SX4WLYWX4DPA5CNFSM6AAAAABCTNMRJ2WGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTV7D3I6.gif" alt="" moz-do-not-send="true" width="1" height="1">Message ID: <google-coral/libedgetpu/pull/60/c1979250319@github.com> <script type="application/ld+json">[ { ***@***.***": "http://schema.org", ***@***.***": "EmailMessage", "potentialAction": { ***@***.***": "ViewAction", "target": "#60 (comment)", "url": "#60 (comment)", "name": "View Pull Request" }, "description": "View this Pull Request on GitHub", "publisher": { ***@***.***": "Organization", "name": "GitHub", "url": "https://github.com" } } ]</script> </blockquote> </body> </html>

--------------csmsq23xzmNWXml3up6Ub6QP--

feranick · 2024-03-05T17:30:37Z

@Namburger So I tested it against the current TF master with both MacOS and Linux and all seems to work. To allow the use of TM master for temporary testing, I locally changed this in the libcoral/WORKSPACE`:

http_archive(
        name = "org_tensorflow",
        urls = [
            "https://github.com/tensorflow/tensorflow/archive/refs/heads/master.tar.gz",
        ],
        sha256 = "21a8363e3272a19977e2f0d12dcb87d1cb61ff0a79d20cfe456d9840e45e18d6",
        strip_prefix = "tensorflow-" + "master",
        )

Namburger · 2024-03-05T17:37:30Z

@feranick working with the tensorflow repo is a learning experience for me... I'll do a first pass to see how to get this backported to 2.16, will let you know if we'll need a plan b

feranick · 2024-03-05T18:34:09Z

I'd discuss it with @mihaimaruseac. My previous PR was quickly cherrypicked into 2.16.0 thanks to him.

Namburger · 2024-03-05T18:58:56Z

@feranick I've also talked to him, doesn't seems like there is a plan to pick this to 2.16 at this time :/

For your change suggestion,

http_archive(
        name = "org_tensorflow",
        urls = [
            "https://github.com/tensorflow/tensorflow/archive/refs/heads/master.tar.gz",
        ],
        sha256 = "21a8363e3272a19977e2f0d12dcb87d1cb61ff0a79d20cfe456d9840e45e18d6",
        strip_prefix = "tensorflow-" + "master",
        )

Could it works to just set TENSORFLOW_COMMIT=79ecb3f8bb6bd73f0115fa9a97b630a6f745a426?

feranick · 2024-03-05T19:27:52Z

@Namburger That is unfortunate. With regards to the specific commit, that would build against 2.17.0 at this point, correct? Given that is unstable and under development, I wouldn't think it's a good idea......

feranick · 2024-03-05T19:59:38Z

@Namburger, Compilation of libcoral works fine with MacOS and Linux by building against TF commit 79ecb3f. I updated the PR to reflect that and now all is working.

The concerns remains, though, that it's building against an unstable version fo 2.17.0. Also, should be libedgetpu be built against the same TF (stable is now against 2.15.0, but I am prepping the next with TF 2.16 as soon as it reaches stable).

Namburger · 2024-03-05T20:29:36Z

@feranick I hear the concerns, it doesn't look like we can get that in for 2.16 so unless we want to sync to a 2.17 dev version, we're stuck in limbo until it comes out.. which probably won't happen any time soon...
It's very unfortunate that ultimately what blocks us is some visibility rules by bazel..

At this point my suggestion to move forward and keep all the main rocal repos updated is to sync to 79ecb3f8bb6bd73f0115fa9a97b630a6f745a426, we can monitor for breakage and apply updates as 2.17 is being developed. WDYT?

feranick · 2024-03-05T20:37:37Z

This sounds good to me @Namburger. The repos are already locally in sync with that commit.

I already found a possible breakage due to 2.17 (See log here) when crosscompiling for arm (there are no issues in MacOS and Linux for x86). I forked 2.16.0 and added the visibility patch to test whether the issue is in 2.17 or not. Will report back.

feranick · 2024-03-05T20:55:41Z

@Namburger since this PR has been merged and concerns libedgetpu (not libcoral), I am going to continue this reporting to the relevant PR:

google-coral/libcoral#36

Namburger · 2024-03-05T20:59:30Z

This sounds good to me @Namburger. The repos are already locally in sync with that commit.

I already found a possible breakage due to 2.17 (See log here) when crosscompiling for arm (there are no issues in MacOS and Linux for x86). I forked 2.16.0 and added the visibility patch to test whether the issue is in 2.17 or not. Will report back.

Interesting, doesn't look like neon_fully_connected_arm32.cc have been changed for 7 months, that seems to indicates the compiler doesn't like the asm..

mihaimaruseac · 2024-03-05T21:31:18Z

So, unfortunately this landed too late in the 2.16 release process (it was supposed to also have an RC1 to handle bugs from RC0 but the release team decided otherwise).

If you pin to a commit from master branch, you will always build at that commit, but will need to always test when you move to a separate one. Regarding the commit to pick, I would suggest either the commit where the support got added or one of the nightly commits afterwards (check if there is a pip wheel built)

feranick · 2024-03-05T21:36:23Z

Well, as unfortunate as that is, I am really grateful to you both @mihaimaruseac and @Namburger to try to push this forward. I am still running a few build tests, it's pretty easy to set either version we want to build against.

If I were to build against the latest nightly, how do I find its corresponding commit?

mihaimaruseac · 2024-03-05T22:00:57Z

If you build on the day immediately after the nightly release, you can use the commit at the top of the nightly branch

If you build some days later, tf.version should provide the information (I don't recall the exact API, left the actual TF team around 2 years ago and now I'm just consulting from time to time). Alternatively, you can look at the matching GH Action run and take the commit where the action ran at. That would be 019e960 in this screenshot:

(which corresponds to https://github.com/tensorflow/tensorflow/actions/runs/8150842615)

feranick · 2024-03-05T22:55:43Z

Interesting, doesn't look like neon_fully_connected_arm32.cc have been changed for 7 months, that seems to indicates the compiler doesn't like the asm..

The issue is only specific to arm7a. This is the commit where edits are exactly what the compiler is complaining about. There are no specific details on what drove the commit in first place....

tensorflow/tensorflow@8419c70

Namburger · 2024-03-05T23:20:26Z

Interesting, doesn't look like neon_fully_connected_arm32.cc have been changed for 7 months, that seems to indicates the compiler doesn't like the asm..

The issue is only specific to arm7a. This is the commit where edits are exactly what the compiler is complaining about. There are no specific details on what drove the commit in first place....

tensorflow/tensorflow@8419c70

humm, that certainly sounds like the issue. As I understand it, that commit changes the constraint of the registers used for that ops which matches the error:

external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc: In function 'void tflite::optimized_4bit::NeonRunKernelNoSDot(const uint8_t*, const int8_t*, int32_t*, int, int, int, int, int, int) [with int RowsLeft = 4; int RowsRight = 1; int Cols = 32]':
external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc:192:7: error: 'asm' operand has impossible constraints
  192 |       asm volatile(KERNEL_4x1
      |

feranick · 2024-03-05T23:23:22Z

Indeed. Weirdly enough, the change in that commit was only applied to

neon_fully_connected_arm32.cc

not on

neon_fully_connected_aarch64_sdot.cc

which in fact works fine....

feranick · 2024-03-06T04:10:39Z

@Namburger I tested it with a local fork of TF 2.17.0 with the neon_fully_connected_arm32.cc reverted back to be in line with the other architectures, and compilation still fails even with the same error:

external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc: In function 'void tflite::optimized_4bit::NeonRunKernelNoSDot(const uint8_t*, const int8_t*, int32_t*, int, int, int, int, int, int) [with int RowsLeft = 4; int RowsRight = 1; int Cols = 32]':
external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc:192:7: error: 'asm' operand has impossible constraints
  192 |       asm volatile(KERNEL_4x1
      |       ^~~
INFO: Elapsed time: 34.950s, Critical Path: 17.60s

Note that this file (and the whole 4bit folder) was introduced with TF 2.14)

So maybe the commit was intended to address this problem, but didn't...

feranick · 2024-03-06T04:38:36Z

@Namburger Disclaimer: I know close to nothing about asm. Yet, I tried to address this error for what seems like a limitation in the architecture. Following some comments from here, I replaced "r" with "g" so that in external/org_tensorflow/tensorflow/lite/kernels/internal/optimized/4bit/neon_fully_connected_arm32.cc line 192 what is currently:

:
                   : [lhs_val] "r"(lhs_val), [rhs_val] "r"(rhs_val),
                     [element_ptr] "r"(element_ptr), [bit_shift] "r"(bit_shift),
                     [run_depth] "r"(run_depth)

is changed to:

:
                   : [lhs_val] "g"(lhs_val), [rhs_val] "g"(rhs_val),
                     [element_ptr] "g"(element_ptr), [bit_shift] "g"(bit_shift),
                     [run_depth] "g"(run_depth)

Compilation proceeds beyind this point, but it then stops with a similar error for another external library:

Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from external/ruy/ruy/pack_arm.cc:16:
external/ruy/ruy/pack_arm.h:492:9: warning: multi-line comment [-Wcomment]
  492 | #endif  // (RUY_PLATFORM_NEON_64 || RUY_PLATFORM_NEON_32) && \
      |         ^
external/ruy/ruy/pack_arm.cc: In function 'void ruy::Pack8bitColMajorForNeon4Cols(const ruy::PackParams8bit&)':
external/ruy/ruy/pack_arm.cc:264:3: error: 'asm' operand has impossible constraints
  264 |   asm volatile(
      |   ^~~
INFO: Elapsed time: 32.683s, Critical Path: 20.21s

Not encouraging.

Namburger · 2024-03-06T15:36:23Z

hi @feranick with all of the changes, would you be able to send a set of commands to reproduce the very first issues?
I believe it looks something like this? ( but may requires you to push some commits to your folk of tensorflow and then sync your fork of libedgetpu to it?)

git clone git@github.com:google-coral/libcoral.git && cd libcoral
git fetch origin pull/36/head:pull-36
git checkout pull-36
git submodule init && git submodule update
make DOCKER_IMAGE=debian:buster DOCKER_CPUS="armv7a" DOCKER_TARGETS=tests docker-build

I'm talking to some folks, trying to get an easy process to repro the original issues and continue on. It appears that element_ptr needs to be a r+ and there could be more issues as you pointed out so we'll need to work this out one step at a time, unfortunately :(

Namburger · 2024-03-06T16:01:14Z

A procedure to making code changes to tensorflow and test them against this build would be nice to have also, not sure if this can be easily done with bazel, though

mihaimaruseac · 2024-03-06T16:06:38Z

One idea would be to fork TensorFlow, make changes in your own fork (and sync from time to time) and then change

libedgetpu/workspace.bzl

Lines 68 to 76 in b5820ad

    
           maybe( 
        
               http_archive, 
        
               name = "org_tensorflow", 
        
               urls = [ 
        
                   "https://github.com/tensorflow/tensorflow/archive/" + tensorflow_commit + ".tar.gz", 
        
               ], 
        
               sha256 = tensorflow_sha256, 
        
               strip_prefix = "tensorflow-" + tensorflow_commit, 
        
           )

to point to the fork and the last commit there.

Then, once it all works, you can make a PR from the fork back to TF and once that lands you can change the workspace.bzl file to point back to upstream TF

feranick · 2024-03-06T16:13:08Z

@mihaimaruseac Thanks. This is effectively what I did for testing. An advantage of this is that ibedgetpu could still be built on a stable version of TF (2.16.0, forked and with the visibility patch). I'd note that libedgetpu can be used independently from libcoral/pycoral (in fact that is my way of using the edgeTPU). Yet it would allow to keep testing and building libcoral/pycoral, while libedgetpu would be "stable".

feranick · 2024-03-06T16:16:45Z

hi @feranick with all of the changes, would you be able to send a set of commands to reproduce the very first issues? I believe it looks something like this? ( but may requires you to push some commits to your folk of tensorflow and then sync your fork of libedgetpu to it?)

Hi @Namburger , here 's a slightly revised version of your commands (I can't clone via git@github.com:google-coral/libcoral.git, only via https). Also you should build using debian:bookworm as it has support for python3.11:

git clone https://github.com/google-coral/libcoral.git && cd libcoral
git fetch origin pull/36/head:pull-36
git checkout pull-36
git submodule init && git submodule update
make DOCKER_IMAGE=debian:bookworm DOCKER_CPUS="armv7a" DOCKER_TARGETS=tests docker-build

This commands will surely trigger the issue (just tested it). As for libedgetpu it currently builds against TF 2.16.0-rc0, but the issue here happens regardless of the TF version libedgetpu is built against. Can change it to TF 2.17.0+visibility if needed (yet, see comment)

Namburger · 2024-03-07T15:46:59Z

by mean of an update, the author of tensorflow/tensorflow@8419c70 promised to take a look sometimes this week!

feranick · 2024-03-15T02:29:04Z

by mean of an update, the author of tensorflow/tensorflow@8419c70 promised to take a look sometimes this week!

Hi @Namburger, just wondering whether there is an update on this... Thanks!

feranick added 29 commits January 27, 2024 16:57

Add support for "stddef.h"

a9a963f

Prevent compilation fail

UPdated README.md

d4ad093

Update version of TF to 2.5.3

4831c4a

Fix typos

6525947

Build against TF v2.8.4

53aacb9

Build against TF2.9.3

a82c669

Update crosstools

8cd8bef

Updated to support TF 2.10.1

e57d004

Added support for TF 2.11.1

9be2306

Added support for TF2.12.1

27db1d3

Added support for TF 2.13.1

420d1d3

Added repos for TF 2.14.1 and TF2.15.0 - not working yet.

5b68a67

fix typo

311200b

Added notes on compilation under MacOS

013a4a6

Added Linux dependencies to python3-dev

7194dae

Update version of Bazeel in Dockerfile.

b7eb5f2

Update README with more modern images for docker

fbbcf63

Notes to build dbian debs from docker builds

cfd9fa6

Added custom dbian/rules for docker buiilds

672d853

Preparation for making deb packages via Docker

93d8e41

Revert commit 93d8e41

6bd253f

No longer needed. deb packaging will be done natively.

Add notes on how to package debs for arm64 and armhf

7651065

Use xz for deb compression for compatibility with debian

f0a6b15

Fix indentation

5820cda

Updated README.md

8053200

Updated libusb to version 1.0.26

1112738

Updated debian/changelog

fbaed0e

Added support for TensorFlow v2.15.0

05b32cb

Refactored WORKSPACE after deprecation of Tensorflow/toolchains. More info: https://discuss.tensorflow.org/t/tensorflow-toolchains-has-been-deprecated-and-moved-into-tensorflow-tensorflow/7713

Updated required version of bazel for Docker

90b03d9

Namburger mentioned this pull request Mar 6, 2024

Update libcoral for Python 3.11 and modern versions of Tensorflow google-coral/libcoral#36

Open

feranick mentioned this pull request May 13, 2024

Update pycoral for Python 3.11 and modern versions of Tensorflow google-coral/pycoral#137

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update libedgetpu repo for compatibility of recent versions of Tensorflow. #60

Update libedgetpu repo for compatibility of recent versions of Tensorflow. #60

feranick commented Jan 31, 2024 •

edited

Skillnoob commented Feb 12, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024 via email

feranick commented Mar 5, 2024 •

edited

Namburger commented Mar 5, 2024 •

edited

feranick commented Mar 5, 2024 •

edited

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

mihaimaruseac commented Mar 5, 2024

feranick commented Mar 5, 2024

mihaimaruseac commented Mar 5, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024

feranick commented Mar 6, 2024 •

edited

feranick commented Mar 6, 2024

Namburger commented Mar 6, 2024 •

edited

Namburger commented Mar 6, 2024 •

edited

mihaimaruseac commented Mar 6, 2024

feranick commented Mar 6, 2024

feranick commented Mar 6, 2024

Namburger commented Mar 7, 2024

feranick commented Mar 15, 2024

Update libedgetpu repo for compatibility of recent versions of Tensorflow. #60

Update libedgetpu repo for compatibility of recent versions of Tensorflow. #60

Conversation

feranick commented Jan 31, 2024 • edited

Skillnoob commented Feb 12, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024 via email

feranick commented Mar 5, 2024 • edited

Namburger commented Mar 5, 2024 • edited

feranick commented Mar 5, 2024 • edited

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

mihaimaruseac commented Mar 5, 2024

feranick commented Mar 5, 2024

mihaimaruseac commented Mar 5, 2024

feranick commented Mar 5, 2024

Namburger commented Mar 5, 2024

feranick commented Mar 5, 2024

feranick commented Mar 6, 2024 • edited

feranick commented Mar 6, 2024

Namburger commented Mar 6, 2024 • edited

Namburger commented Mar 6, 2024 • edited

mihaimaruseac commented Mar 6, 2024

feranick commented Mar 6, 2024

feranick commented Mar 6, 2024

Namburger commented Mar 7, 2024

feranick commented Mar 15, 2024

feranick commented Jan 31, 2024 •

edited

feranick commented Mar 5, 2024 •

edited

Namburger commented Mar 5, 2024 •

edited

feranick commented Mar 5, 2024 •

edited

feranick commented Mar 6, 2024 •

edited

Namburger commented Mar 6, 2024 •

edited

Namburger commented Mar 6, 2024 •

edited