Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Documentation on the differences between the different models #273

Open
eduardo4jesus opened this issue Apr 22, 2023 · 3 comments
Open

Documentation on the differences between the different models #273

eduardo4jesus opened this issue Apr 22, 2023 · 3 comments

Comments

@eduardo4jesus
Copy link

Currently, there are three model type available.

I could not find any documentation on the difference between them. Is there any available? If not, could someone elaborate on that?

Many thanks.

@franchesoni
Copy link

There is a paper accompanying the repository. The models are the same except for neural network size, B stands for "base" and is the smallest, L is "large" and H is "huge". The paper reports that the performance difference between L and H isn't much and I would recommend L if your machine supports it. However, B is lighter and not far behind in performance.

@eduardo4jesus
Copy link
Author

@franchesoni, thank you so much. I added a PR #300 on this. I would appreciate to have your feedback.

@pinksloyd
Copy link

pinksloyd commented May 5, 2023

I've run extensive testing on the models using a wide variety of images. Here is a part of the print-log used when testing and a sample image (locally on my RTX3080):

vit_h
Registering model... 12:48:03
Reading image... 12:48:08
Making masks... 12:48:08
Done at: 12:48:14 | Amount:
13
Making image from mask... 12:48:14
Done...? | 12:48:17 | Time taken: 10.918561458587646
vit_h-14-52-09-3 6000053882598877-13-masks
vit_l
Registering model... 12:48:17
Reading image... 12:48:19
Making masks... 12:48:19
Done at: 12:48:22 | Amount:
17
Making image from mask... 12:48:22
Done...? | 12:48:25 | Time taken: 5.4358086585998535
vit_l-14-55-36-3 4636905193328857-17-masks
vit_b
Registering model... 12:48:25
Reading image... 12:48:26
Making masks... 12:48:26
Done at: 12:48:28 | Amount:
10
Making image from mask... 12:48:28
Done...? | 12:48:31 | Time taken: 2.9744369983673096
vit_b-14-58-04-2 34617018699646-10-masks

I found that on average vit_l has the best performance/accuracy-tradeoff. vit_h is the most accurate but slowest, and vit_b the fastest but the least accurate.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants