Skip to content

Neighbor-aware embeddings for contextual disambiguation #10

@luigiagent

Description

@luigiagent

Problem

HashingEmbedder embeds each element independently. "Search" next to a text input means something different from "Search" next to a results heading. Two identically-named elements in different page regions produce identical embeddings.

Proposal

Include neighboring elements as weak signals:

func (h *HashingEmbedder) vectorizeWithContext(
    el ElementDescriptor,
    prev, next *ElementDescriptor,
) []float32 {
    vec := h.vectorize(el.Composite())
    if prev != nil {
        nVec := h.vectorize(prev.Composite())
        for i := range vec { vec[i] += nVec[i] * 0.1 }
    }
    if next != nil {
        nVec := h.vectorize(next.Composite())
        for i := range vec { vec[i] += nVec[i] * 0.1 }
    }
    h.normalize(vec)
    return vec
}

Inspired By

  • mgrep: context-aware parsing where chunks carry surrounding context
  • Word2Vec/CBOW: words defined by their context

Acceptance Criteria

  • EmbeddingMatcher.Find() accepts or computes neighbor context
  • Neighbor influence configurable (default 10%)
  • "Search" near textbox gets different embedding than "Search" near heading
  • Nil neighbors = current behavior
  • Stays sub-millisecond per element

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions