-
Notifications
You must be signed in to change notification settings - Fork 63
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Assorted improvements in embed #33
Comments
Another note; after @mheinzinger fixed the batching code, there is now no fallback for CPU like there is in SeqVec:
Ideally, also the transformer models should fallback on single processing then CPU on hard samples (for consistency). EDIT current stats (using bert): max_amino_acids:40k --> 87.65 GiB |
Should we just make this a generic part of all embedders in the |
Whenever
No. If you decide to embed using the pipeline on a CPU system, you need the CPU fallback (that will be the default, I suppose), and then it will die if it doesn't work. |
Addresses the first three bullet point of GH-33
Reading through current develop, things that I noticed:
(for general purpose users) we have to decide if we want to make it
from bio_embeddings import SeqVecEmbedder
orfrom bio_embeddings.embed import SeqVecEmbedder
; now it's inconsistent: https://github.com/sacdallago/bio_embeddings/blob/14f1de5754221452c27d2e2c5420f191bb2ecc00/bio_embeddings/__init__.pyUp to you @konstin
Once that has been decided, all notebooks in
examples
need to be revised and updated!Speaking of notebooks, this one seves as and example of what to expect when the model files are not provided (in the case of SeqVec). After the introduction of your
with_download
method, I think this will need re-writing.Finally: once you are done with decisions and improvements, please make sure all relevant notebooks run and are up to date. E.g.: this one still uses the "constrained albert" (see warning message), but that should not happen anymore ;). Not relevant notebooks:
project_visualize_custom_embedings
andproject_visualize_pipeline_embeddings
The text was updated successfully, but these errors were encountered: