add Huggingface2CoreML convert file #18

jxiong22 · 2023-09-22T05:34:54Z

This file offers a Hugging Face version, clip-vit-base-patch32, with separate text and image encoders. It also fixes image encoder precision errors.

mazzzystar · 2023-09-22T05:38:39Z

@jxiong22
Thanks for your great work ! Will check out the model & precision asap : )

mazzzystar · 2023-09-22T06:33:55Z

I've run your scripts, and this is what I've found:

Text Encoder

max_seq_length=76
PyTorch TextEncoder ckpt out for "a photo of a cat":
>>> tensor([ 0.1555,  0.0733, -0.2448, -0.2212, -0.1934,  0.2052, -0.3175, -0.7824,
        -0.1816,  0.1943], grad_fn=<SliceBackward0>)

CoreML TextEncoder ckpt out for "a photo of a cat":
>>> [ 0.15597123  0.07306742 -0.24520203 -0.22147842 -0.1928216   0.20461065
 -0.31734398 -0.7811562  -0.1815224   0.19413628]

max_seq_length=77
PyTorch TextEncoder ckpt out for "a photo of a cat":
>>> tensor([ 0.1555,  0.0733, -0.2448, -0.2212, -0.1934,  0.2052, -0.3175, -0.7824,
        -0.1816,  0.1943], grad_fn=<SliceBackward0>)

CoreML TextEncoder ckpt out for "a photo of a cat":
>>> [ 0.15597123  0.07306742 -0.24520203 -0.22147842 -0.1928216   0.20461065
 -0.31734398 -0.7811562  -0.1815224   0.19413628]

It seems to be the same output whenever you set max_seq_length to be 77 or 76.

Image Encoder

I've compared both my original hard-coded norm vs transformer.Normalize(), and the later performs way better in L1 distance precision : )

Using hard-coded norm:

PyTorch ImageEncoder ckpt out for IMG_2115.jpg:
>>> tensor([ 0.2361, -0.0980, -0.0022,  0.2364,  0.1279, -0.1041,  0.3530,  0.0853,
        -0.0293,  0.0784], grad_fn=<SliceBackward0>)

CoreML ImageEncoder ckpt out for IMG_2115.jpg:
>>> [ 0.350561    0.025289   -0.13452446  0.1267291  -0.1897895  -0.14739564
  0.05819088  0.30193368 -0.2142085   0.27992135]

The average L1 distance is 0.173860315

Using transformers.Normalize():

PyTorch ImageEncoder ckpt out for jpg:
>>> tensor([ 0.3709,  0.0213, -0.0549,  0.1092, -0.2229, -0.2131,  0.0909,  0.2031,
        -0.2159,  0.2603], grad_fn=<SliceBackward0>)

CoreML ImageEncoder ckpt out for jpg:
>>> [ 0.34969723  0.02547197 -0.134884    0.12836841 -0.19035722 -0.14880568
  0.05857127  0.30046463 -0.21491489  0.2801294 ]

The average L1 distance is 0.037187212.

Really thanks for the PR ! This would help me to improve current image encoder performance. As for text encoder, the precision is already Okay, so that would not be a big issue I think.

Could you validate the precision influence for max_seq_length ? Once we've got the same result, I'll merge this PR : )

jxiong22 · 2023-09-22T07:11:54Z

That's weird. On my end, I can't figure out why I'm getting bad results when I set max_seq_length to 77. If 77 works well for you, that's even better. What versions of Python, coremltool, and Torch are you using?

max_seq_length = 76:
PyTorch TextEncoder ckpt out for "a photo of a cat":
>>> tensor([ 0.1555,  0.0733, -0.2448, -0.2212, -0.1934,  0.2052, -0.3175, -0.7824,
        -0.1816,  0.1943], grad_fn=<SliceBackward0>)

CoreML TextEncoder ckpt out for "a photo of a cat":
>>> [ 0.15560171  0.0732335  -0.24512495 -0.22117633 -0.19336982  0.20523793
 -0.3182205  -0.78206545 -0.18144566  0.19457956]

max_seq_length = 77:
PyTorch TextEncoder ckpt out for "a photo of a cat":
>>> tensor([ 0.1555,  0.0733, -0.2448, -0.2212, -0.1934,  0.2052, -0.3175, -0.7824,
        -0.1816,  0.1943], grad_fn=<SliceBackward0>)

CoreML TextEncoder ckpt out for "a photo of a cat":
>>> [-0.066312    0.17878246  0.40718645 -0.08806399  0.26841    -0.22685118
  0.2679821  -1.7103907  -0.33836532  0.28941655]

mazzzystar · 2023-09-22T07:17:57Z

My envs are:

Python 3.9.15
accelerate-0.23.0
transformers-4.33.2
coremltoos (latest, 7.0)
torch 1.13.1

jxiong22 · 2023-09-22T07:43:33Z

There has been no improvement in your environment. The results are still unsatisfactory with max_seq_length = 77 on my side.

mazzzystar · 2023-09-22T07:45:59Z

Okay, then I'll try to verify this issue, and feedback here if there is a progress.

jxiong22 · 2023-09-22T07:51:42Z

Thank you. If 77 is functional for you, how about setting it as the default value and including a comment for future reference in case others encounter the same issue as I did?

mazzzystar · 2023-09-22T07:56:34Z

Is is OKay that I merge this PR now and add a notes in README, so others can refer to your notebook for export ?

jxiong22 · 2023-09-22T07:58:30Z

Sounds good to me. Thanks!

…

On Fri, Sep 22, 2023 at 00:56 Ke Fang ***@***.***> wrote: Is is OKay that I merge this PR now and add a notes in README, so others can refer to your notebook for export ? — Reply to this email directly, view it on GitHub <#18 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/APO45DQRKUCFSKFTELUBXTTX3VADZANCNFSM6AAAAAA5CR66JA> . You are receiving this because you were mentioned.Message ID: ***@***.***>

mazzzystar · 2023-09-22T08:13:47Z

@jxiong22
Merged. https://github.com/mazzzystar/Queryable#core-ml-export

add Huggingface2CoreML convert file

73b5d63

mazzzystar merged commit 0cc1953 into mazzzystar:main Sep 22, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add Huggingface2CoreML convert file #18

add Huggingface2CoreML convert file #18

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023 via email

mazzzystar commented Sep 22, 2023

add Huggingface2CoreML convert file #18

add Huggingface2CoreML convert file #18

Conversation

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

Text Encoder

Image Encoder

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023

mazzzystar commented Sep 22, 2023

jxiong22 commented Sep 22, 2023 via email

mazzzystar commented Sep 22, 2023