In [1]:
%%html
<style>
table {float:left}
</style>

# Embedding


## Overview

what is it?
**numberical representation** of:
* word
* sentence
* image
* audio

**BEFORE word embedding:one-hot encoding**

* sparse
* vocab size
* pure position index, lose all kinds of information

In [1]:
import torch
import torch.nn.functional as F

# Tensor of indices for our words
# king -> 0, queen -> 1, man -> 2, woman -> 3
indices = torch.tensor([0, 1, 2, 3])

# Number of classes (unique words)
num_classes = 4

# One-hot encoding
one_hot_encoded = F.one_hot(indices, num_classes=num_classes)

print(one_hot_encoded)

tensor([[1, 0, 0, 0],
        [0, 1, 0, 0],
        [0, 0, 1, 0],
        [0, 0, 0, 1]])


| king | queen | man | woman |
|------|-------|-----|-------|
|  1   |   0   |  0  |   0   |
|  0   |   1   |  0  |   0   |
|  0   |   0   |  1  |   0   |
|  0   |   0   |  0  |   1   |


why not just using an integer?

* ordinal relationship is implied in integer encoding or label encoding
* for categorical values, no such relationship exists


### what is a good embedding (words/texts)? 

we want to capture:

* context of the paragraph
* semantic property
* syntatic property (grammar)

$$f(w_i) = \theta_i $$
f(cat) = (0.3, 0, 0.4 ...)


### Why embeddings is important?

* they are the compact form of compressed data
* they preserve relationship within the data
* they are the output of DL layer - a **linear view** into complex **non-linear relationship** learned by the model



before word2vec, there is Bengio's "A Neural Probablistic Language Model" (2003) https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf

(benjio_nnlm.py)


## word2vec (2013)

* Mikolov paper "Efficient Estimation of Word Representation in Vector Space". 
* Continous bag-of-words model (CBoW)


In [15]:
import torch
import torch.nn as nn

tiny_vocab="We must forever conduct our struggle on the high plane of dignity and discipline".split()

class CBOW(nn.Module):
    def __init__(self, vocab_size, embedding_dim):
        super().__init__()
        self.embedding = nn.Embedding(vocab_size, embedding_dim)
        self.linear = nn.Linear(embedding_dim, vocab_size)
    
    def forward(self, X):
        embeddings = self.embedding(X)
        bow = embeddings.mean(dim=1)
        logits = self.linear(bow)
        return logits

torch.manual_seed(42)
dummy_cbow = CBOW(vocab_size=len(tiny_vocab), embedding_dim=3)
dummy_cbow.embedding.state_dict()

OrderedDict([('weight',
              tensor([[ 1.9269,  1.4873,  0.9007],
                      [-2.1055,  0.6784, -1.2345],
                      [-0.0431, -1.6047, -0.7521],
                      [ 1.6487, -0.3925, -1.4036],
                      [-0.7279, -0.5594, -0.7688],
                      [ 0.7624,  1.6423, -0.1596],
                      [-0.4974,  0.4396, -0.7581],
                      [ 1.0783,  0.8008,  1.6806],
                      [ 1.2791,  1.2964,  1.5736],
                      [-0.8455,  1.3123,  0.6872],
                      [-1.0892, -0.3553, -0.9138],
                      [-0.6581,  0.0499,  2.2667],
                      [ 1.1790, -0.4345, -1.3864],
                      [-1.2862, -1.4032,  0.0360]]))])

| Word      | Value 1 | Value 2 | Value 3 |
|-----------|---------|---------|---------|
| We        | 1.9269  | 1.4873  | 0.9007  |
| must      | -2.1055 | 0.6784  | -1.2345 |
| forever   | -0.0431 | -1.6047 | -0.7521 |
| conduct   | 1.6487  | -0.3925 | -1.4036 |
| our       | -0.7279 | -0.5594 | -0.7688 |
| struggle  | 0.7624  | 1.6423  | -0.1596 |
| on        | -0.4974 | 0.4396  | -0.7581 |
| the       | 1.0783  | 0.8008  | 1.6806  |
| high      | 1.2791  | 1.2964  | 1.5736  |
| plane     | -0.8455 | 1.3123  | 0.6872  |
| of        | -1.0892 | -0.3553 | -0.9138 |
| dignity   | -0.6581 | 0.0499  | 2.2667  |
| and       | 1.1790  | -0.4345 | -1.3864 |
| discipline| -1.2862 | -1.4032 | 0.0360  |


### notes on embedding layer
* `nn.Embedding` is a look up table: randomly initialized with vocab size and embedding dimension
* it is dense, not sparse anymore
* you can retrieve the embedding value via **token IDs**, which is one of functions of a tokenizer 
* e.g. 50,000 vacob size, 300 - 1500 dimensions


In [21]:
## pytorch embedding layer
import torch
from torch import nn
torch.manual_seed(42)

test_embedding = nn.Embedding(10, 3) # 10 tensors, each with dimension 3

# pass in 2-d tensor
idx = torch.as_tensor([[2,3,4,5]]).long()

test_embedding(idx)

# uncomment for error case
# idx=torch.tensor([10])
# test_embedding(idx)


tensor([[[-0.0431, -1.6047, -0.7521],
         [ 1.6487, -0.3925, -1.4036],
         [-0.7279, -0.5594, -2.3169],
         [-0.2168, -1.3847, -0.8712]]], grad_fn=<EmbeddingBackward0>)

In [24]:
# column-based mean
test_embedding(idx).mean(dim=1)

tensor([[ 0.1652, -0.9853, -1.3360]], grad_fn=<MeanBackward1>)

### target and context

* target is the word we want to predict, context is the surroundings that we use to predict.
* the following code is manually define one target word from vocabulary - index 5, and use the rest of the words as context.


In [10]:
def find_indices(keys, targets):
    # Create a dictionary mapping each key to its index
    key_to_index = {key: index for index, key in enumerate(keys)}

    # Find the index for each target string
    indices = [key_to_index.get(target, -1) for target in targets]

    return indices

context_words = tiny_vocab.copy()
target_words = 'struggle'
context_words.remove(target_words)
context_idx = find_indices(tiny_vocab, context_words)
target_idx =  find_indices(tiny_vocab, [target_words])
print(context_idx, target_idx)

[0, 1, 2, 3, 4, 6, 7, 8, 9, 10, 11, 12, 13] [5]


### get embeddings from context words

the following code is to illustrate:

* we use batch context words as input to embedding layer
* the embedding layer uses "embeddings" of ALL context words, and compute mean
* since each "embeddings" of the word is 3 dim, the output is also 3-dim
* these 3-dim vector is also known as "features"
* we will use these features to compute **logits**


In [19]:
batch_context = torch.as_tensor([context_idx]).long()
batch_target = torch.as_tensor([target_idx]).long()
dummy_cbow.embedding(batch_context)

tensor([[[ 1.9269,  1.4873,  0.9007],
         [-2.1055,  0.6784, -1.2345],
         [-0.0431, -1.6047, -0.7521],
         [ 1.6487, -0.3925, -1.4036],
         [-0.7279, -0.5594, -0.7688],
         [-0.4974,  0.4396, -0.7581],
         [ 1.0783,  0.8008,  1.6806],
         [ 1.2791,  1.2964,  1.5736],
         [-0.8455,  1.3123,  0.6872],
         [-1.0892, -0.3553, -0.9138],
         [-0.6581,  0.0499,  2.2667],
         [ 1.1790, -0.4345, -1.3864],
         [-1.2862, -1.4032,  0.0360]]], grad_fn=<EmbeddingBackward0>)

In [20]:
cbow_features = dummy_cbow.embedding(batch_context).mean(dim=1)
cbow_features

tensor([[-0.0108,  0.1012, -0.0056]], grad_fn=<MeanBackward1>)

### compute logits using embedding features

the embedding features (3-dim) is passed to linear layer to compute logits: a probalistic output, where the large value indicts *the most likely outcome*.


In [26]:
dummy_cbow.linear

Linear(in_features=3, out_features=14, bias=True)

Logits: In the context of classification tasks, the term "logits" typically refers to the raw, unnormalized scores (output of the last linear layer) that a classification model outputs, which are then passed through a softmax function to obtain probabilities. If nn.Linear() is the last layer in a classification model, and you haven't applied an activation function like softmax to its output, then yes, the output of nn.Linear() can be considered as logits.

In [28]:
logits = dummy_cbow.linear(cbow_features)
logits

tensor([[-0.3772,  0.1676, -0.0930, -0.4483,  0.0243, -0.4446, -0.4631, -0.3511,
         -0.5342, -0.3302,  0.5974,  0.1433,  0.1483, -0.5540]],
       grad_fn=<AddmmBackward0>)

In [48]:
torch.argmax(logits)

tensor(10)

In [50]:
tiny_vocab[10]

'of'

In [30]:
import torch
import torch.nn.functional as F
# Apply softmax along the last dimension (dim=-1) to convert logits into probabilities
# The dimension parameter specifies the axis along which softmax is computed
# if the input is 2 x 3 x 3, where 2 is the number of classes
# the softmax will be applied the last class, which is 2
probabilities = F.softmax(logits, dim=-1)
probabilities

tensor([[0.0552, 0.0951, 0.0733, 0.0514, 0.0824, 0.0516, 0.0506, 0.0566, 0.0472,
         0.0578, 0.1462, 0.0929, 0.0933, 0.0462]], grad_fn=<SoftmaxBackward0>)

* "of" is the predicted word, where the target word is "struggle"
* but ... this is random initialized model, and we need to LEARN
* The point is, given enough dataset of context words and target, we could TRAIN the CBOW model using `nn.CrossEntropyLoss()` to learn the actual word embeddings.


**How are embeddings being used?**

* similarity search
* clustering
* and many more

## Global Vectors (GloVe)

Pennington, "GloVe: Global Vector for Word Representation" (2014)

* the following note is more about how to use it, rather than explaining it though.


In [61]:
# download pre-trained word embeddings
from gensim import downloader
glove = downloader.load('glove-wiki-gigaword-50')

INFO:gensim.models.keyedvectors:loading projection weights from /ccsopen/home/f7b/gensim-data/glove-wiki-gigaword-50/glove-wiki-gigaword-50.gz
INFO:gensim.utils:KeyedVectors lifecycle event {'msg': 'loaded (400000, 50) matrix of type float32 from /ccsopen/home/f7b/gensim-data/glove-wiki-gigaword-50/glove-wiki-gigaword-50.gz', 'binary': False, 'encoding': 'utf8', 'datetime': '2023-12-09T11:19:37.374334', 'gensim': '4.3.2', 'python': '3.11.5 (main, Sep 11 2023, 13:54:46) [GCC 11.2.0]', 'platform': 'Linux-4.18.0-372.32.1.el8_6.x86_64-x86_64-with-glibc2.28', 'event': 'load_word2vec_format'}


In [75]:
king = glove['king']
queen = glove['queen']
man = glove['man']
woman = glove['woman']
print(f"king = {king}\n queen = {queen}\n man = {man}\n woman = {woman}")

king = [ 0.50451   0.68607  -0.59517  -0.022801  0.60046  -0.13498  -0.08813
  0.47377  -0.61798  -0.31012  -0.076666  1.493    -0.034189 -0.98173
  0.68229   0.81722  -0.51874  -0.31503  -0.55809   0.66421   0.1961
 -0.13495  -0.11476  -0.30344   0.41177  -2.223    -1.0756   -1.0783
 -0.34354   0.33505   1.9927   -0.04234  -0.64319   0.71125   0.49159
  0.16754   0.34344  -0.25663  -0.8523    0.1661    0.40102   1.1685
 -1.0137   -0.21585  -0.15155   0.78321  -0.91241  -1.6106   -0.64426
 -0.51042 ]
 queen = [ 0.37854    1.8233    -1.2648    -0.1043     0.35829    0.60029
 -0.17538    0.83767   -0.056798  -0.75795    0.22681    0.98587
  0.60587   -0.31419    0.28877    0.56013   -0.77456    0.071421
 -0.5741     0.21342    0.57674    0.3868    -0.12574    0.28012
  0.28135   -1.8053    -1.0421    -0.19255   -0.55375   -0.054526
  1.5574     0.39296   -0.2475     0.34251    0.45365    0.16237
  0.52464   -0.070272  -0.83744   -1.0326     0.45946    0.25302
 -0.17837   -0.73398   -0.20

In [76]:
import torch.nn.functional as F
synthetic_queen = king - man + woman
F.cosine_similarity(torch.from_numpy(synthetic_queen), torch.from_numpy(queen), dim=0)

tensor(0.8610)

In [77]:
from sentence_transformers import SentenceTransformer, models,util
util.cos_sim(synethtic_queen, queen)

tensor([[0.8610]])

In [78]:
util.cos_sim(queen, queen)

tensor([[1.]])

In [79]:
glove.similar_by_vector(queen, topn=5)

[('queen', 1.0000001192092896),
 ('princess', 0.8515165448188782),
 ('lady', 0.805060863494873),
 ('elizabeth', 0.7873042225837708),
 ('king', 0.7839043736457825)]

## Embeddings: Part 2



### Transformer vs. sentence transformer

* *regular* transformers works at *word/token*-level embeddings, not *sentence*-level embeddings.
* regular transformer CAN produce sentence embeddings by performing *pooling* operation such as element-wise arithmetic mean on its token-level embeddings.
* A good pooling choice for BERT is CLS pooling - BERT has special `<CLS>` token that is suppose to capture all the sequence information. It gets tuned on the next-sentence prediction (NSP) during pre-training.




### The Process of Generating Embeddings

Given a sentence, how do we get its embeddings?

1 **Initial tokenization**: converting raw text into a sequence of toke IDs that model can understand. 

2 **Embedding layer** (BERT/GPT starts with such layer): tokenized inputs passed through this layer, convert token ID into initial vector representations (embeddings). In transformer models, positional enbeddings are also added at this stage.

3 **Passing through model layers** The vector representation from embedding layer pass through the rest of the model layers such as self-attention in transformer and Feed-forward networks. Each layer process the input, and **refine** the embeddings, adding contextual information.

4 **Contextualized embeddings** By the time the input reaches the final layer of the model, embeddings are deeply contextualized. The final-layer embeddings can be considered as sentence embeddings.


5 **Pooling** (optional)

* aggregate word/token embeddings into a single sentence embeddings
* transform a **variable-length inputs** into **fixed-length output**
  * mean pooling (avg across dimensions across all tokens)
  * max pooling (max across dimensions across all tokens)
  * CLS token pooling (in models such as BERT, first token is a special classification token, CLS)
  * others ... such as adaptive pooling
  


In [5]:
from transformers import BertModel, BertTokenizer
import torch

# Load pre-trained model and tokenizer
model_name = "bert-base-uncased"
model = BertModel.from_pretrained(model_name)
tokenizer = BertTokenizer.from_pretrained(model_name)

# Sample text
text = "we must forever conduct our struggle on the high plane of dignity and discipline."

# Encode text
inputs = tokenizer(text, return_tensors="pt")

# Extract embeddings
with torch.no_grad():
    outputs = model(**inputs)

# The output is a tuple, where the first item contains the hidden states
# The hidden states are the embeddings; for BERT, you typically use the last hidden state
embeddings = outputs.last_hidden_state

print("Shape of embeddings:", embeddings.shape)
# The shape of the embeddings is (batch_size, sequence_length, hidden_size)

Shape of embeddings: torch.Size([1, 17, 768])


### My (mis)percetion

1. Not all models (embeddings) are created equal
   * Deep neural networks such as transformer-based LLM models is "supposely" good at capture the contextual relationship but ... 
   * It should not be considered a "default"
   * However, just because LLaMA is powerful model, it doesn't mean it has a good embedding model

2. For different downstream tasks, you may need different embeddings models

3. We extract and last hidden state, but the process of generating embeddings is a inference process that pass through the whole network.

4. Embeddings also needs FINE TUNING

### Using FORGE-S as embedddings


In [3]:
from sentence_transformers import SentenceTransformer, models,util
model_path = "/proj/f7b/forge-s-instruct-base1"
word_embedding_model = models.Transformer(model_path, max_seq_length=512)
word_embedding_model.tokenizer.pad_token=word_embedding_model.tokenizer.eos_token
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension(), pooling_mode="mean" )
model = SentenceTransformer(modules=[word_embedding_model, pooling_model]).cuda()

word_embedding_model.get_word_embedding_dimension()


2064

In [None]:
e1=model.encode("I am a happy person")
e2=model.encode("the sky is falling")
e3=model.encode("I am a sad person")
e4=model.encode("I am a happy person")
e5=model.encode("I am happy person")
print(util.cos_sim(e1, e2))
print(util.cos_sim(e1, e3))
print(util.cos_sim(e1, e4))
print(util.cos_sim(e1, e5))

### Using UAE-Large-V1

In [55]:
from angle_emb import AnglE

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()

e1=angle.encode("I am a happy person")
e2=angle.encode("the sky is falling")
e3=angle.encode("I am a sad person")
e4=angle.encode("I am a happy person")
e5=angle.encode("I am happy person")
print(util.cos_sim(e1, e2))
print(util.cos_sim(e1, e3))
print(util.cos_sim(e1, e4))
print(util.cos_sim(e1, e5))

tensor([[0.3978]])
tensor([[0.6961]])
tensor([[1.0000]])
tensor([[0.9913]])


### Cross-encoder BERT

* this is a setup or application of BERT
* the goal is to **compare** inputs such as sentence pairs
* This is typically done by concatenating two pieces of text with a special token **[SEP]** between them. For example, the input to the model: **[CLS] Sentence A [SEP] Sentence B [SEP]** 
* 
  

## Evaluating Embedding Models

Massive Text Embedding Benchmark (MTEB) Leaderboard:
https://huggingface.co/spaces/mteb/leaderboard


In [57]:
from mteb import MTEB
from sentence_transformers import SentenceTransformer


from sentence_transformers import SentenceTransformer, models,util
from transformers import AutoTokenizer
model_path = "/proj/f7b/forge-s-instruct-base1"
word_embedding_model = models.Transformer(model_path, max_seq_length=512)
word_embedding_model.tokenizer.pad_token=word_embedding_model.tokenizer.eos_token
pooling_model = models.Pooling(word_embedding_model.get_word_embedding_dimension())
model = SentenceTransformer(modules=[word_embedding_model, pooling_model]).cuda()

model_name = "forge-s-instruct"

evaluation = MTEB(tasks=["STS22", "STSBenchmark"], task_langs=["en"])
results = evaluation.run(model, output_folder=f"results/{model_name}")

INFO:sentence_transformers.SentenceTransformer:Use pytorch device: cuda
INFO:mteb.evaluation.MTEB:

## Evaluating 2 tasks:


INFO:mteb.evaluation.MTEB:

********************** Evaluating STS22 **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STS22
INFO:mteb.abstasks.AbsTaskSTS:Task: STS22, split: test, language: en. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 199 sentences1...


Batches:   0%|          | 0/4 [00:00<?, ?it/s]

INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 199 sentences2...


Batches:   0%|          | 0/4 [00:00<?, ?it/s]

INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STS22 on test took 11.55 seconds
INFO:mteb.evaluation.MTEB:Scores: {'en': {'cos_sim': {'pearson': 0.485509947364836, 'spearman': 0.6161613971544603}, 'manhattan': {'pearson': 0.5348150640402276, 'spearman': 0.5936056205858613}, 'euclidean': {'pearson': 0.5152768391990737, 'spearman': 0.5910811029047897}}, 'evaluation_time': 11.55}
INFO:mteb.evaluation.MTEB:

********************** Evaluating STSBenchmark **********************
INFO:mteb.evaluation.MTEB:Loading dataset for STSBenchmark
INFO:mteb.abstasks.AbsTaskSTS:
Task: STSBenchmark, split: validation. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1500 sentences1...


Batches:   0%|          | 0/24 [00:00<?, ?it/s]

INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1500 sentences2...


Batches:   0%|          | 0/24 [00:00<?, ?it/s]

INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STSBenchmark on validation took 5.00 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.6049113878845497, 'spearman': 0.6185633984074522}, 'manhattan': {'pearson': 0.5470664545417403, 'spearman': 0.5743376557638655}, 'euclidean': {'pearson': 0.542056519608006, 'spearman': 0.5689490776912116}, 'evaluation_time': 5.0}
INFO:mteb.abstasks.AbsTaskSTS:
Task: STSBenchmark, split: test. Running...
INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1379 sentences1...


Batches:   0%|          | 0/22 [00:00<?, ?it/s]

INFO:mteb.evaluation.evaluators.STSEvaluator:Encoding 1379 sentences2...


Batches:   0%|          | 0/22 [00:00<?, ?it/s]

INFO:mteb.evaluation.evaluators.STSEvaluator:Evaluating...
INFO:mteb.evaluation.MTEB:Evaluation for STSBenchmark on test took 3.88 seconds
INFO:mteb.evaluation.MTEB:Scores: {'cos_sim': {'pearson': 0.5055691457273894, 'spearman': 0.4798726674235717}, 'manhattan': {'pearson': 0.4839596536539791, 'spearman': 0.4779540214185534}, 'euclidean': {'pearson': 0.4718242055922367, 'spearman': 0.46747634615269257}, 'evaluation_time': 3.88}


!python mteb_meta.py results/forge-s-instruct

In [None]:
---
tags:
- mteb
model-index:
- name: forge-s-instruct
  results:
  - task:
      type: STS
    dataset:
      type: mteb/sts22-crosslingual-sts
      name: MTEB STS22 (en)
      config: en
      split: test
      revision: 6d1ba47164174a496b7fa5d3569dae26a6813b80
    metrics:
    - type: cos_sim_pearson
      value: 48.5509947364836
    - type: cos_sim_spearman
      value: 61.616139715446025
    - type: euclidean_pearson
      value: 51.527683919907375
    - type: euclidean_spearman
      value: 59.10811029047897
    - type: manhattan_pearson
      value: 53.48150640402276
    - type: manhattan_spearman
      value: 59.36056205858613
  - task:
      type: STS
    dataset:
      type: mteb/stsbenchmark-sts
      name: MTEB STSBenchmark
      config: default
      split: test
      revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
    metrics:
    - type: cos_sim_pearson
      value: 50.55691457273894
    - type: cos_sim_spearman
      value: 47.98726674235717
    - type: euclidean_pearson
      value: 47.18242055922367
    - type: euclidean_spearman
      value: 46.747634615269256
    - type: manhattan_pearson
      value: 48.395965365397906
    - type: manhattan_spearman
      value: 47.79540214185534
---

## Embeddings and RAG


What is RAG?

The goal of RAG:
* reduce halluciation
* fact-check and citation
* flexibility (no retraining)
* domain-specific

The cons of RAG:
* depends on semantic search 
* existing data/database
* potential issue with latency (2-step process)
* context length limit:
   * input + retrieved + response < context length
   * gpt3-turbo: 4096 (roughly 3 pages)
   
   

Steps:

* Document retrieval
* Combine inputs (query) with retrieved documents
* Pass to decoder transformer
* Generate response


In [58]:
from angle_emb import AnglE

angle = AnglE.from_pretrained('WhereIsAI/UAE-Large-V1', pooling_strategy='cls').cuda()
query_embedding = angle.encode('what is the cpu type on frontier')
texts_embedding = angle.encode([
    """scheduling policy of frontier is that in a simple batch queue system, jobs run in a first-in, first-out (FIFO) order.""",
    """Frontier is a HPE Cray EX supercomputer located at the Oak Ridge Leadership Computing Facility. """,
    """Each Frontier compute node consists of [1x] 64-core AMD “Optimized 3rd Gen EPYC” CPU (with 2 hardware threads per physical core) with access to 512 GB of DDR4 memory. Each node also contains [4x] AMD MI250X, each with 2 Graphics Compute Dies (GCDs) for a total of 8 GCDs per node.""",
    """system interconnect of frontier is that the Frontier nodes are connected with [4x] HPE Slingshot 200 Gbps (25 GB/s) NICs providing a node-injection bandwidth of 800 Gbps (100 GB/s).""",
    """File systems of frontier is that Frontier is connected to Orion, a parallel filesystem based on Lustre and HPE ClusterStor, with a 679 PB usable namespace (/lustre/orion/).""",
])


print("Similarity:", util.dot_score(query_embedding, texts_embedding))
print("Similarity:", util.cos_sim(query_embedding, texts_embedding))



Similarity: tensor([[174.6098, 236.6645, 242.2771, 186.1550, 180.4875]])
Similarity: tensor([[0.5718, 0.7040, 0.7826, 0.5898, 0.5810]])
