finalfusion
is a Python package for reading, writing and using finalfusion embeddings, but also supports other commonly used embeddings like fastText, GloVe and word2vec.
The Python package supports the same types of embeddings as the finalfusion-rust crate:
Vocabulary
- No subwords
- Subwords
Embedding matrix
- Array
- Memory-mapped
- Quantized
- Norms
- Metadata
This package extends (de-)serialization capabilities of finalfusion
~.Chunk
s by allowing loading and writing single chunks. E.g. a ~.Vocab
can be loaded from a finalfusion spec file without loading the ~.Storage
. Single chunks can also be serialized to their own files through ~.Chunk.write
. This is different from the functionality of finalfusion-rust
, loading stand-alone components is only supported by the Python package. Reading will fail with other tools from the finalfusion
ecosystem.
It integrates nicely with .numpy
since its ~.Storage
types can be treated as numpy arrays.
finalfusion
comes with some scripts <finalfusion.scripts>
to convert between embedding formats, do analogy and similarity queries and turn bucket subword embeddings into explicit subword embeddings.
The package is implemented in Python with some Cython
extensions, it is not based on bindings to the finalfusion-rust crate.
self
quickstart install modules/re-exports modules/api finalfusion.scripts
genindex
modindex
search