# Warming Up - Coumputing Sentence Embeddings

---

## 참조
- [Computing Sentence Embeddings](https://www.sbert.net/examples/applications/computing-embeddings/README.html)

# 1. 배경

Sentence Embedding 에 대한 여러 테크닉을 배우는 노트북 입니다.

# 2. 사용 예시

## 2.1. Input Sequence Length

Transformer models like BERT / RoBERTa / DistilBERT etc. the runtime and the memory requirement grows quadratic with the input length. This limits transformers to inputs of certain lengths. A common value for BERT & Co. are 512 word pieces, which corresponde to about 300-400 words (for English). Longer texts than this are truncated to the first x word pieces.

By default, the provided methods use a limit fo 128 word pieces, longer inputs will be truncated. You can get and set the maximal sequence length like this:

In [1]:
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')

print("Max Sequence Length:", model.max_seq_length)

#Change the length to 200
model.max_seq_length = 200

print("Max Sequence Length:", model.max_seq_length)

Max Sequence Length: 256
Max Sequence Length: 200


## 2.2. Storing & Loading Embeddings

The easiest method is to use pickle to store pre-computed embeddings on disc and to load it from disc. This can especially be useful if you need to encode large set of sentences.

In [2]:
from sentence_transformers import SentenceTransformer
import pickle

model = SentenceTransformer('all-MiniLM-L6-v2')
sentences = ['This framework generates embeddings for each input sentence',
    'Sentences are passed as a list of string.', 
    'The quick brown fox jumps over the lazy dog.']


embeddings = model.encode(sentences)

#Store sentences & embeddings on disc
with open('embeddings.pkl', "wb") as fOut:
    pickle.dump({'sentences': sentences, 'embeddings': embeddings}, fOut, protocol=pickle.HIGHEST_PROTOCOL)

#Load sentences & embeddings from disc
with open('embeddings.pkl', "rb") as fIn:
    stored_data = pickle.load(fIn)
    stored_sentences = stored_data['sentences']
    stored_embeddings = stored_data['embeddings']
    print("stored_sentences: \n", stored_sentences)
    print("stored_embeddings shape: \n", stored_embeddings.shape)    

stored_sentences: 
 ['This framework generates embeddings for each input sentence', 'Sentences are passed as a list of string.', 'The quick brown fox jumps over the lazy dog.']
stored_embeddings shape: 
 (3, 384)


## 2.3. Multi-Process / Multi-GPU Encoding

You can encode input texts with more than one GPU (or with multiple processes on a CPU machine). For an example, see: computing_embeddings_mutli_gpu.py.

아래와 같이 4개의 GPU를 다 사용함.

![multiple_embedding.png](img/multiple_embedding.png)

In [3]:
import IPython

IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}

In [4]:
%%time

from sentence_transformers import SentenceTransformer
#Create a large list of 100k sentences
sentences = ["This is sentence {}".format(i) for i in range(50000)]

#Define the model
model = SentenceTransformer('all-MiniLM-L6-v2')

#Start the multi-process pool on all available CUDA devices
pool = model.start_multi_process_pool()

#Compute the embeddings using the multi-process pool
emb = model.encode_multi_process(sentences, pool)
print("Embeddings computed. Shape:", emb.shape)

#Optional: Stop the proccesses in the pool
model.stop_multi_process_pool(pool)


huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
	- Av

## 2.4. Sentence Embeddings with Transformers (영어)

In [5]:
from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return sum_embeddings / sum_mask



#Sentences we want sentence embeddings for
sentences = ['This framework generates embeddings for each input sentence',
             'Sentences are passed as a list of string.',
             'The quick brown fox jumps over the lazy dog.']

#Load AutoModel from huggingface model repository
tokenizer = AutoTokenizer.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")
model = AutoModel.from_pretrained("sentence-transformers/all-MiniLM-L6-v2")

#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')

#Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("sentence_embeddings shape: \n", sentence_embeddings.shape)

Downloading:   0%|          | 0.00/350 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/226k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/455k [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/112 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/612 [00:00<?, ?B/s]

Downloading:   0%|          | 0.00/86.7M [00:00<?, ?B/s]

sentence_embeddings shape: 
 torch.Size([3, 384])


## 2.5. Sentence Embeddings with Transformers (한글)

In [6]:
# from datasets import load_dataset
from transformers import (
    ElectraModel, 
    ElectraTokenizer, 
    ElectraForSequenceClassification, 
    Trainer, 
    TrainingArguments, 
    set_seed
)

In [7]:
from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    sum_embeddings = torch.sum(token_embeddings * input_mask_expanded, 1)
    sum_mask = torch.clamp(input_mask_expanded.sum(1), min=1e-9)
    return sum_embeddings / sum_mask


sentences = ['엄마가 라면을 준비 해주셨는데, 너무 맛 있었다',
             '라면의 면발이 꼬들꼬들 하여서 입맛에 딱 좋았다',
             '하지만 라면이 좀 매워서, 다음 부터는 순한 맛으로 해달라고 애기 했다']

#Load AutoModel from huggingface model repository
# tokenizer = AutoTokenizer.from_pretrained('monologg/koelectra-small-v3-discriminator')
# model = AutoModel.from_pretrained("monologg/koelectra-small-v3-discriminator")

tokenizer = ElectraTokenizer.from_pretrained('monologg/koelectra-small-v3-discriminator')
model = ElectraModel.from_pretrained("monologg/koelectra-small-v3-discriminator")



#Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, max_length=128, return_tensors='pt')

#Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

#Perform pooling. In this case, mean pooling
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
print("sentence_embeddings shape: \n", sentence_embeddings.shape)

Downloading:   0%|          | 0.00/54.0M [00:00<?, ?B/s]

Some weights of the model checkpoint at monologg/koelectra-small-v3-discriminator were not used when initializing ElectraModel: ['discriminator_predictions.dense.weight', 'discriminator_predictions.dense_prediction.weight', 'discriminator_predictions.dense_prediction.bias', 'discriminator_predictions.dense.bias']
- This IS expected if you are initializing ElectraModel from the checkpoint of a model trained on another task or with another architecture (e.g. initializing a BertForSequenceClassification model from a BertForPreTraining model).
- This IS NOT expected if you are initializing ElectraModel from the checkpoint of a model that you expect to be exactly identical (initializing a BertForSequenceClassification model from a BertForSequenceClassification model).


sentence_embeddings shape: 
 torch.Size([3, 256])


# 3. 커널 리스타트


- 커널 리스타트에 대한 내용이 있습니다. 클릭 후 가장 하단의 "3.커널 리스타팅" 을 참조 하세요.
    - [리스타트 상세](https://github.com/gonsoomoon-ml/NLP-HuggingFace-On-SageMaker/blob/main/1_NSMC-Classification/2_WarmingUp/0.1.warming_up_yelp_review.ipynb)

In [8]:
import IPython

IPython.Application.instance().kernel.do_shutdown(True)

{'status': 'ok', 'restart': True}