Skip to content

Encode Sentences

Tianyu Gao edited this page May 19, 2021 · 2 revisions

After loading the model as model, you can encode sentences by

model.encode(sentences, device=None, return_numpy=False, normalize_to_unit=True, keepdim=False, batch_size=64, max_length=128)

Inputs

  • sentences: a string or a list of strings.
  • device: cuda or cpu.
  • return_numpy: whether to return numpy arrays (True) or return PyTorch tensors (False by default).
  • normalize_to_unit: whether to normalize the output embeddings as unit vectors.
  • keepdim: if the input is a single sentence, whether to keep the batch-size dimension of the output embedding.
  • batch_size: if the input is a list of sentences, the batch size for encoding. Usually larger batch sizes lead to higher efficiency (as long as it can fit into your computing devices).
  • max_length: truncate the sentences if they exceed the maximum length.

Outputs

  • A numpy array or a PyTorch tensor with size (n, dim), where n is # sentences.