Transformer Embeddings

This library simplifies and streamlines the usage of encoder transformer models supported by HuggingFace's transformers library (model hub or local) to generate embeddings for string inputs, similar to the way sentence-transformers does.

Please note that starting with v4, we have dropped support for Python 3.7. If you need to use this library with Python 3.7, the latest compatible release is version 3.1.0.

Why use this over HuggingFace's `transformers` or `sentence-transformers`?

Under the hood, we take care of:

Can be used with any model on the HF model hub, with sensible defaults for inference.
Setting the PyTorch model to eval mode.
Using no_grad() when doing the forward pass.
Batching, and returning back output in the format produced by HF transformers.
Padding / truncating to model defaults.
Moving to and from GPUs if available.

Installation

You can install Transformer Embeddings via pip from PyPI:

$ pip install transformer-embeddings

Usage

from transformer_embeddings import TransformerEmbeddings

transformer = TransformerEmbeddings("model_name")

If you have a previously instantiated model and / or tokenizer, you can pass that in.

transformer = TransformerEmbeddings(model=model, tokenizer=tokenizer)

transformer = TransformerEmbeddings(model_name="model_name", model=model)

or

transformer = TransformerEmbeddings(model_name="model_name", tokenizer=tokenizer)

Note: The model_name should be included if only 1 of model or tokenizer are passed in.

Embeddings

To get output embeddings:

embeddings = transformer.encode(["Lorem ipsum dolor sit amet",
                                 "consectetur adipiscing elit",
                                 "sed do eiusmod tempor incididunt",
                                 "ut labore et dolore magna aliqua."])
embeddings.output

Pooled Output

To get pooled outputs:

from transformer_embeddings import TransformerEmbeddings, mean_pooling

transformer = TransformerEmbeddings("model_name", return_output=False, pooling_fn=mean_pooling)

embeddings = transformer.encode(["Lorem ipsum dolor sit amet",
                                "consectetur adipiscing elit",
                                "sed do eiusmod tempor incididunt",
                                "ut labore et dolore magna aliqua."])

embeddings.pooled

Exporting the Model

Once you are done testing and training the model, it can be exported into a single tarball:

from transformer_embeddings import TransformerEmbeddings

transformer = TransformerEmbeddings("model_name")
transformer.export(additional_files=["/path/to/other/files/to/include/in/tarball.pickle"])

This tarball can also be uploaded to S3, but requires installing the S3 extras (pip install transformer-embeddings[s3]). And then using:

from transformer_embeddings import TransformerEmbeddings

transformer = TransformerEmbeddings("model_name")
transformer.export(
    additional_files=["/path/to/other/files/to/include/in/tarball.pickle"],
    s3_path="s3://bucket/models/model-name/date-version/",
)

Contributing

Contributions are very welcome. To learn more, see the Contributor Guide.

License

Distributed under the terms of the Apache 2.0 license, Transformer Embeddings is free and open source software.

Issues

If you encounter any problems, please file an issue along with a detailed description.

Credits

This project was partly generated from @cjolowicz's Hypermodern Python Cookiecutter template.

Name		Name	Last commit message	Last commit date
Latest commit History 67 Commits
.github		.github
src/transformer_embeddings		src/transformer_embeddings
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierignore		.prettierignore
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
noxfile.py		noxfile.py
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml

License

HeadspaceMeditation/transformer-embeddings

Folders and files

Latest commit

History

Repository files navigation

Transformer Embeddings

Why use this over HuggingFace's transformers or sentence-transformers?

Installation

Usage

Embeddings

Pooled Output

Exporting the Model

Contributing

License

Issues

Credits

About

Resources

License

Code of conduct

Stars

Watchers

Forks

Languages

Why use this over HuggingFace's `transformers` or `sentence-transformers`?