<a href="https://colab.research.google.com/github/gnoejh/ict1022/blob/main/Components/embedding.ipynb" target="_parent"><img src="https://colab.research.google.com/assets/colab-badge.svg" alt="Open In Colab"/></a>

# Components of Neural Networks

## Neural Network Embedding

## Overview

## Understanding Embeddings

Embeddings are dense vector representations of discrete variables, allowing us to represent words, sentences, or any other entities in a continuous vector space.

## Concept of Mapping to Embedding Space

```mermaid
graph LR
    subgraph Input Space
        A["cat"] 
        B["dog"]
        C["kitten"]
    end
    
    subgraph Embedding Space
        direction LR
        D[["[0.2, 0.8]"]] 
        E[["[0.3, 0.6]"]] 
        F[["[0.1, 0.9]"]] 
    end
    
    A --> D
    B --> E
    C --> F
    
    style Input Space fill:#f9f,stroke:#333,stroke-width:2px
    style Embedding Space fill:#bbf,stroke:#333,stroke-width:2px
```

The diagram shows how discrete input items (words) are mapped to continuous vector representations in the embedding space. Similar concepts (like 'cat' and 'kitten') are mapped to nearby points in the embedding space.


## Concept of Neural Network Embedding Process

Concept of the embedding process can be represented as a neural network transformation:

```mermaid
graph LR
    subgraph Input Layer
        X["One-hot vector<br/>[1,0,0,...,0]"] 
    end
    
    subgraph Hidden Layer
        H1["W·X + b"]
    end
    
    subgraph Output Layer
        Y["Embedding vector<br/>[0.2, 0.8, ..., 0.3]"]
    end
    
    X --> |"W∈ℝ^(d×v)"| H1
    H1 --> |"activation"| Y
    
    style Input Layer fill:#f9f,stroke:#333,stroke-width:2px
    style Hidden Layer fill:#bbf,stroke:#333,stroke-width:2px
    style Output Layer fill:#bfb,stroke:#333,stroke-width:2px
```

Where:
- $X \in \{0,1\}^v$ is the one-hot input vector (vocabulary size $v$)
- $W \in \mathbb{R}^{d\times v}$ is the weight matrix ($d$ is embedding dimension)
- $b \in \mathbb{R}^d$ is the bias vector
- The embedding vector $Y = f(WX + b)$ where $f$ is an activation function

This transformation learns to map discrete tokens to continuous vectors while preserving semantic relationships.


## Embedding Theory

An embedding is a mathematical function that maps discrete objects (words, sentences, images) into continuous vector spaces while preserving semantic relationships:

$f: X \rightarrow \mathbb{R}^n$

where $X$ is the input space and $n$ is the dimensionality of the embedding space.


# Neural Network–Based Embedding Methods

Below is an organized summary of various embedding methods using neural networks, suitable for inclusion in your Jupyter Notebook.

---

## 1. Fully Connected (FC) Network Embeddings
- **Description:**  
  Feed-forward networks with one or more hidden layers that transform input data into a lower-dimensional, dense representation.
- **Use Cases:**  
  - Structured or tabular data  
  - Intermediate representations in supervised tasks

---

## 2. Autoencoder (AE) Embeddings
- **Description:**  
  Neural networks that learn to compress data into a latent space (encoder) and then reconstruct the original input (decoder).  
- **Variations:**  
  - **Denoising Autoencoders:** Learn robust features by reconstructing clean inputs from corrupted ones.  
  - **Variational Autoencoders (VAEs):** Impose a probabilistic structure on the latent space for smooth and continuous embeddings.
- **Use Cases:**  
  - Dimensionality reduction  
  - Unsupervised feature learning  
  - Data compression

---

## 3. Transformer-Based Embeddings
- **Description:**  
  Models built on self-attention mechanisms that generate context-aware representations for sequences.  
- **Key Components:**  
  - **Token Embeddings:** Convert tokens to dense vectors.  
  - **Positional Embeddings:** Encode the order of tokens in the sequence.
- **Use Cases:**  
  - Natural language processing  
  - Context-dependent representation learning

---

## 4. Convolutional Neural Network (CNN) Embeddings
- **Description:**  
  Networks using convolutional layers to capture spatial hierarchies in data, especially images.
- **Use Cases:**  
  - Image classification  
  - Object detection  
  - Other vision-related tasks

---

## 5. Recurrent Neural Network (RNN) Embeddings
- **Description:**  
  Networks designed for sequential data processing, using hidden states to capture temporal dependencies.  
- **Architectures:**  
  - LSTMs (Long Short-Term Memory)  
  - GRUs (Gated Recurrent Units)
- **Use Cases:**  
  - Time series analysis  
  - Speech recognition  
  - Sequential modeling in NLP

---

## 6. Graph Neural Network (GNN) Embeddings
- **Description:**  
  Models that learn representations for nodes, edges, or entire graphs by leveraging the relational structure of graph data.
- **Examples:**  
  - Graph Convolutional Networks (GCNs)  
  - GraphSAGE
- **Use Cases:**  
  - Social network analysis  
  - Recommendation systems  
  - Any tasks involving relational data

---

## 7. Metric Learning & Contrastive Embeddings
- **Description:**  
  Techniques that optimize embedding spaces by enforcing similarity between related instances and dissimilarity between unrelated ones.
- **Techniques:**  
  - **Siamese Networks/Triplet Loss:** Learn embeddings that bring similar items closer while pushing dissimilar items apart.  
  - **Contrastive Learning:** Methods like SimCLR compare different augmented views of the same instance.
- **Use Cases:**  
  - Face recognition  
  - Image retrieval  
  - Similarity-based tasks

---

## 8. Self-Supervised Learning Embeddings
- **Description:**  
  Models that learn useful representations from unlabeled data by solving pretext tasks or predicting parts of the input.
- **Examples:**  
  - **Vision:** BYOL, DINO  
  - **NLP:** Masked language modeling
- **Use Cases:**  
  - Pre-training for downstream tasks  
  - Enhancing robustness and generalization

---

## 9. Hybrid and Domain-Specific Embeddings
- **Description:**  
  Methods that combine multiple modalities or tailor embeddings to specific domains.
- **Examples:**  
  - **Multimodal Embeddings:** Joint embeddings of text and images, e.g., CLIP.  
  - **Adversarial Networks:** Using intermediate features (e.g., from the discriminator) as embeddings.
- **Use Cases:**  
  - Multimodal data integration  
  - Domain-specific representation learning

---


## Sentence Embeddings

In [9]:
# Load sentence transformer model
import torch
from sentence_transformers import SentenceTransformer

# Load the model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Create embeddings for sentences
sentences = [
    'This is a sample sentence.',
    'Another different sentence.',
    'This sentence is similar to the first one.'
]

embeddings = model.encode(sentences)
print(f"Shape of embeddings: {embeddings.shape}")

Shape of embeddings: (3, 384)


In [10]:
import numpy as np
from sklearn.metrics.pairwise import cosine_similarity

def compute_similarity(sent1, sent2):
    emb1 = model.encode([sent1])
    emb2 = model.encode([sent2])
    return cosine_similarity(emb1, emb2)[0][0]

# Example similarities
pairs = [
    ("I love programming", "I enjoy coding"),
    ("The weather is nice", "It's a beautiful day"),
    ("The weather is nice", "Python is awesome")
]

for s1, s2 in pairs:
    sim = compute_similarity(s1, s2)
    print(f"Similarity between '{s1}' and '{s2}': {sim:.3f}")

Similarity between 'I love programming' and 'I enjoy coding': 0.817
Similarity between 'The weather is nice' and 'It's a beautiful day': 0.471
Similarity between 'The weather is nice' and 'Python is awesome': 0.179


In [11]:
def semantic_search(query, corpus, top_k=3):
    query_embedding = model.encode([query])
    corpus_embeddings = model.encode(corpus)
    
    similarities = cosine_similarity(query_embedding, corpus_embeddings)[0]
    top_results = np.argsort(similarities)[-top_k:][::-1]
    
    return [(corpus[i], similarities[i]) for i in top_results]

# Example corpus
corpus = [
    "Machine learning is fascinating",
    "Deep neural networks are powerful",
    "Natural language processing is amazing",
    "The cat is sleeping on the mat",
    "AI can solve complex problems"
]

query = "AI and machine learning"
results = semantic_search(query, corpus)

print(f"Query: {query}\n")
for doc, score in results:
    print(f"Score: {score:.3f} | Document: {doc}")

Query: AI and machine learning

Score: 0.651 | Document: Machine learning is fascinating
Score: 0.582 | Document: AI can solve complex problems
Score: 0.485 | Document: Deep neural networks are powerful


In [12]:
from sklearn.cluster import KMeans

# Example documents
documents = [
    "Machine learning models need data",
    "AI requires computational power",
    "Cats are wonderful pets",
    "Dogs make loyal companions",
    "Neural networks learn patterns",
    "Kittens play with yarn"
]

# Create embeddings
doc_embeddings = model.encode(documents)

# Cluster documents
n_clusters = 2
kmeans = KMeans(n_clusters=n_clusters, random_state=42)
clusters = kmeans.fit_predict(doc_embeddings)

# Print clusters
for i in range(n_clusters):
    print(f"\nCluster {i}:")
    for j, doc in enumerate(documents):
        if clusters[j] == i:
            print(f"- {doc}")


Cluster 0:
- Cats are wonderful pets
- Dogs make loyal companions
- Kittens play with yarn

Cluster 1:
- Machine learning models need data
- AI requires computational power
- Neural networks learn patterns



KMeans is known to have a memory leak on Windows with MKL, when there are less chunks than available threads. You can avoid it by setting the environment variable OMP_NUM_THREADS=1.



## Stock Embedding

In [13]:
# Install required packages if not already installed
%pip install yfinance plotly scikit-learn

import yfinance as yf
import pandas as pd
import torch
import torch.nn as nn
import torch.optim as optim
import numpy as np
from sklearn.preprocessing import MinMaxScaler
import plotly.express as px

tickers = ["AAPL", "MSFT", "GOOGL", "AMZN", "TSLA", 
           "NVDA", "AVGO", "AMD", "QCOM", "IBM", 
           "IONQ", "RGTI", "PLTR", "ACHR", "RDW",
           "BKSY", "VUZI", "NNDM", "SNDL", "TLRY"]

# Download closing prices for each ticker
data = {}
for ticker in tickers:
    stock = yf.Ticker(ticker)
    df = stock.history(period="1y")
    data[ticker] = df["Close"]

# Combine data into a single DataFrame (dates as rows, tickers as columns)
df_stocks = pd.DataFrame(data)
df_stocks = df_stocks.ffill().dropna()  # simple cleaning
print(df_stocks.head())

# Scale each column (stock) independently
scaler = MinMaxScaler()
data_scaled = scaler.fit_transform(df_stocks)

# Transpose so each row represents one stock’s time series
# df_stocks.shape: (time_steps, tickers) -> (tickers, time_steps)
data_scaled = data_scaled.T

# Convert to PyTorch tensor
data_tensor = torch.tensor(data_scaled, dtype=torch.float32)
print("Data tensor shape (stocks, time_series_length):", data_tensor.shape)



                                 AAPL        MSFT       GOOGL        AMZN  \
Date                                                                        
2024-02-27 00:00:00-05:00  181.771698  404.392639  138.378372  173.539993   
2024-02-28 00:00:00-05:00  180.567383  404.630798  135.887405  173.160004   
2024-02-29 00:00:00-05:00  179.900543  410.505951  137.959885  176.759995   
2024-03-01 00:00:00-05:00  178.815659  412.351837  136.644653  178.220001   
2024-03-04 00:00:00-05:00  174.277100  411.776245  132.868347  177.580002   

                                 TSLA       NVDA        AVGO         AMD  \
Date                                                                       
2024-02-27 00:00:00-05:00  199.729996  78.678688  127.900284  178.000000   
2024-02-28 00:00:00-05:00  202.039993  77.640991  127.228340  176.539993   
2024-02-29 00:00:00-05:00  201.880005  79.089577  128.320618  192.529999   
2024-03-01 00:00:00-05:00  202.639999  82.255676  138.057480  202.639999   
202

In [16]:
class Autoencoder(nn.Module):
    def __init__(self, input_dim, embedding_dim=3):
        super(Autoencoder, self).__init__()
        # Encoder: compress input into embedding_dim
        self.encoder = nn.Sequential(
            nn.Linear(input_dim, 64),
            nn.ReLU(),
            nn.Linear(64, 32),
            nn.ReLU(),
            nn.Linear(32, embedding_dim)
        )
        # Decoder: reconstruct input from embedding
        self.decoder = nn.Sequential(
            nn.Linear(embedding_dim, 32),
            nn.ReLU(),
            nn.Linear(32, 64),
            nn.ReLU(),
            nn.Linear(64, input_dim)
        )
        
    def forward(self, x):
        z = self.encoder(x)
        reconstructed = self.decoder(z)
        return reconstructed, z

input_dim = data_tensor.shape[1]  # number of time steps
embedding_dim = 3
model = Autoencoder(input_dim, embedding_dim)
print(model)

# Training parameters
epochs = 1000
learning_rate = 1e-3

criterion = nn.MSELoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
model.train()
for epoch in range(epochs):
    optimizer.zero_grad()
    reconstructed, _ = model(data_tensor)
    loss = criterion(reconstructed, data_tensor)
    loss.backward()
    optimizer.step()
    
    if (epoch+1) % 100 == 0:
        print(f"Epoch {epoch+1}/{epochs}, Loss: {loss.item():.6f}")

# Evaluation
model.eval()
with torch.no_grad():
    _, embeddings = model(data_tensor)
embeddings = embeddings.numpy()
print("Embeddings shape:", embeddings.shape)

# Create a DataFrame for plotting
df_embeddings = pd.DataFrame(embeddings, columns=["Dim1", "Dim2", "Dim3"])
df_embeddings["Ticker"] = tickers  # Make sure the order matches

# Plot using Plotly
fig = px.scatter_3d(df_embeddings, x="Dim1", y="Dim2", z="Dim3", text="Ticker", title="3D Stock Embeddings (Autoencoder)", height=1000)
fig.show()


Autoencoder(
  (encoder): Sequential(
    (0): Linear(in_features=251, out_features=64, bias=True)
    (1): ReLU()
    (2): Linear(in_features=64, out_features=32, bias=True)
    (3): ReLU()
    (4): Linear(in_features=32, out_features=3, bias=True)
  )
  (decoder): Sequential(
    (0): Linear(in_features=3, out_features=32, bias=True)
    (1): ReLU()
    (2): Linear(in_features=32, out_features=64, bias=True)
    (3): ReLU()
    (4): Linear(in_features=64, out_features=251, bias=True)
  )
)
Epoch 100/1000, Loss: 0.022534
Epoch 200/1000, Loss: 0.010809
Epoch 300/1000, Loss: 0.008030
Epoch 400/1000, Loss: 0.007145
Epoch 500/1000, Loss: 0.005608
Epoch 600/1000, Loss: 0.004437
Epoch 700/1000, Loss: 0.003480
Epoch 800/1000, Loss: 0.002709
Epoch 900/1000, Loss: 0.002147
Epoch 1000/1000, Loss: 0.001686
Embeddings shape: (20, 3)
