In [None]:
pip install sentence-transformers


In [None]:
pip install torch


In [None]:
pip install scikit-learn

In [10]:
from sentence_transformers import SentenceTransformer
import torch

In [None]:
# Initialize the sentiment Transformer model 

model = SentenceTransformer('distilbert-base-uncased')
model 

In [12]:

## Code Implementation for sentence Transformer 

def encode_sentences(sentences):
    """
    Encodes a list of sentences into fixed-length embeddings.
    Args:
        sentences (list of str): Sentences to encode.
    Returns:
        torch.Tensor: Embeddings tensor.
    """
    embedding = model.encode(sentences, convert_to_tensor=True)
    return embedding



###  Testing with Sample Sentence

sample_sentences = [
        "Apple releases the new iPhone this fall.",
        "The movie received excellent reviews from critics.",
        "Climate change poses significant risks to biodiversity."
    ]
    
embeddings = encode_sentences(sample_sentences)

for i, sentence in enumerate(sample_sentences):
    print(f'Sentance {i+1}                       :\"{sentence}\"')
    print(f'Embeddings of first 6 dimensions: {embeddings[i][:6].cpu().numpy()}....\n')

print(f'lenghth of Fixed Dimensions each sentence: {len(embeddings[0])}' )


Sentance 1                       :"Apple releases the new iPhone this fall."
Embeddings of first 6 dimensions: [-0.35644856 -0.07131387  0.40865803 -0.04822442  0.2467562  -0.19633032]....

Sentance 2                       :"The movie received excellent reviews from critics."
Embeddings of first 6 dimensions: [ 0.333448   -0.22270063  0.00373817  0.11186788 -0.15513083  0.09906107]....

Sentance 3                       :"Climate change poses significant risks to biodiversity."
Embeddings of first 6 dimensions: [ 0.16737251  0.24889603 -0.26863915  0.38667983  0.25752744 -0.44028884]....

lenghth of Fixed Dimensions each sentence: 768


### Note: Only the first six dimensions of each high-dimensional embedding are displayed for clarity .

### 3. Architectural Choices Discussion

#### **Transformer Backbone**

- **Choice:** `distilbert-base-uncased` model Selected for its efficiency and lightweight nature compared to BERT, which speeds up training and inference.

#### **Pooling Strategy**

- **Choice:** Mean Pooling (default in `SentenceTransformer`)
- **Reasoning:** Averages token embeddings to create a fixed-length sentence embedding, balancing simplicity and effectiveness in capturing semantic information.

#### **Framework**

- **Choice:** PyTorch which offers dynamic computation graphs and is widely supported in the NLP community, facilitating easy integration and customization.

#### **Pre-trained Model Utilization**

- **Choice:** Leveraged a pre-trained `distilbert-base-uncased` model
- **Reasoning:** Utilizes transfer learning to benefit from extensive training on large datasets, reducing the need for extensive computational resources and training data.

#### **Embedding Output Configuration**

- **Choice:** Outputs embeddings as PyTorch tensors
- **Reasoning:** Ensures compatibility with downstream tasks such as Named Entity Recognition (NER) or Sentiment Analysis, where tensor operations are essential.
