Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add three new open clip roberta base models #860

Merged
merged 34 commits into from Nov 29, 2022
Merged

Conversation

OrangeSodahub
Copy link
Contributor

@OrangeSodahub OrangeSodahub commented Nov 16, 2022

Goals : align with openclip v2.7.0
Changes :

  • Add three new models: roberta-ViT-B-32::laion2b-s12b-b32k xlm-roberta-base-ViT-B-32::laion5b-s13b-b90k, and xlm-roberta-large-ViT-H-14::frozen_laion5b_s13b_b90k;
  • Add LayernormFp32 (original Layernorm handles fp16) (Default precision: fp32 on cpu and fp16 on gpu);
  • Split original CLIP to TextTransformer, VisionTransformer and add _build_text_tower _build_vision_tower for seperately building;
  • Rearrange modules;
  • Fix bugs on flash attention (only use on cuda).
  • Docs will be updated in docs: add three new open clip roberta base models #862

* feat: bump openclip to v2.5.0

* fix: conflicts

* fix: default fp32 on cpu and fp16 on gpu

* feat: add two new models

* fix: remove debug

* fix: add roberta models (test)

* fix: model name xlm

* fix: (wip)
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@jina-ai jina-ai deleted a comment from github-actions bot Nov 16, 2022
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@codecov
Copy link

codecov bot commented Nov 16, 2022

Codecov Report

Merging #860 (04c130a) into main (e4717a3) will increase coverage by 0.10%.
The diff coverage is 91.74%.

@@            Coverage Diff             @@
##             main     #860      +/-   ##
==========================================
+ Coverage   80.28%   80.38%   +0.10%     
==========================================
  Files          22       22              
  Lines        1633     1448     -185     
==========================================
- Hits         1311     1164     -147     
+ Misses        322      284      -38     
Flag Coverage Δ
cas 80.38% <91.74%> (+0.10%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
server/clip_server/model/flash_attention.py 22.22% <ø> (+2.22%) ⬆️
server/clip_server/model/pretrained_models.py 98.41% <ø> (ø)
server/clip_server/model/model.py 75.09% <91.42%> (+4.98%) ⬆️
client/clip_client/__init__.py 100.00% <100.00%> (ø)
server/clip_server/__init__.py 100.00% <100.00%> (ø)
server/clip_server/model/openclip_model.py 93.10% <100.00%> (+3.44%) ⬆️
server/clip_server/model/trt_utils.py 56.04% <0.00%> (-27.48%) ⬇️
server/clip_server/model/clip_trt.py 69.38% <0.00%> (-16.33%) ⬇️

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions github-actions bot added size/l and removed size/xl labels Nov 21, 2022
@github-actions github-actions bot added size/xl and removed size/l labels Nov 21, 2022
@github-actions
Copy link

This PR exceeds the recommended size of 1000 lines. Please make sure you are NOT addressing multiple issues with one PR. Note this PR might be rejected due to its size.

@github-actions github-actions bot added size/l and removed size/xl labels Nov 21, 2022
@ZiniuYu ZiniuYu marked this pull request as ready for review November 21, 2022 08:02
Comment on lines 60 to 65
assert not need_weights, "not allowed to return weights."
assert q.dtype in [
torch.float16,
torch.bfloat16,
], f"flash attention only support torch.float16 or torch.bfloat16 but got {q.dtype}."
assert q.is_cuda, "flash attention only support cuda."
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would suggest removing these asserts. It's much safe, but degrading the performance a bit.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And what's more, from the function's parameter, seq_len. It seems that the flash-attention implementation can only be used for the text encoder. Is it can be applied to a vision transformer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it could. Every image tensor first convert to sentence-like tensor before fed into model.

server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/clip_server/model/model.py Show resolved Hide resolved
server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/clip_server/model/model.py Outdated Show resolved Hide resolved
server/setup.py Outdated Show resolved Hide resolved
@ZiniuYu ZiniuYu changed the title feat: bump openclip to v2.5.0 feat: add three new clip roberta base models Nov 21, 2022
@ZiniuYu ZiniuYu changed the title feat: add three new clip roberta base models feat: add three new open clip roberta base models Nov 21, 2022
Copy link
Member

@numb3r3 numb3r3 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@numb3r3 numb3r3 merged commit f251539 into main Nov 29, 2022
@numb3r3 numb3r3 deleted the bump-openclip-v2.50 branch November 29, 2022 06:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants