# Embeddings Exploration - Java

Java equivalent of the Python embeddings notebook. Using **Langchain4j** which bundles the same `all-MiniLM-L6-v2` model with a one-liner API — Java's answer to `sentence-transformers`.

## 1. Setup

One dependency — it bundles ONNX Runtime, the model weights, tokenizer, and pooling logic.

In [38]:
%dependency /add dev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:1.0.0-beta1
%dependency /resolve

Adding dependency [0m[1m[32mdev.langchain4j:langchain4j-embeddings-all-minilm-l6-v2:1.0.0-beta1
[0mSolving dependencies
Resolved artifacts count: 14
Add to classpath: [0m[32m/Users/sukhdeepsingh/.m2/repository/io/github/padreati/rapaio-jupyter-kernel/2.2.0/mima_cache/dev/langchain4j/langchain4j-embeddings-all-minilm-l6-v2/1.0.0-beta1/langchain4j-embeddings-all-minilm-l6-v2-1.0.0-beta1.jar[0m
[0mAdd to classpath: [0m[32m/Users/sukhdeepsingh/.m2/repository/io/github/padreati/rapaio-jupyter-kernel/2.2.0/mima_cache/dev/langchain4j/langchain4j-embeddings/1.0.0-beta1/langchain4j-embeddings-1.0.0-beta1.jar[0m
[0mAdd to classpath: [0m[32m/Users/sukhdeepsingh/.m2/repository/io/github/padreati/rapaio-jupyter-kernel/2.2.0/mima_cache/dev/langchain4j/langchain4j-core/1.0.0-beta1/langchain4j-core-1.0.0-beta1.jar[0m
[0mAdd to classpath: [0m[32m/Users/sukhdeepsingh/.m2/repository/io/github/padreati/rapaio-jupyter-kernel/2.2.0/mima_cache/org/slf4j/slf4j-api/2.0.16/slf4j-api-2.0.16.jar

In [39]:
import dev.langchain4j.model.embedding.onnx.allminilml6v2.AllMiniLmL6V2EmbeddingModel;
import dev.langchain4j.data.embedding.Embedding;
import java.util.*;

var model = new AllMiniLmL6V2EmbeddingModel();

System.out.println("Model loaded!");
System.out.println("Compare to Python: model = SentenceTransformer('all-MiniLM-L6-v2')");

Model loaded!
Compare to Python: model = SentenceTransformer('all-MiniLM-L6-v2')


## 2. Getting Embeddings

Python: `model.encode("hello")`  
Java: `model.embed("hello").content().vector()`

In [40]:
float[] embedding = model.embed("hello").content().vector();

System.out.println("Embedding length: " + embedding.length);
System.out.print("First 10 values: [");
for (int i = 0; i < 10; i++) {
    System.out.printf("%.4f%s", embedding[i], i < 9 ? ", " : "");
}
System.out.println("]");

float min = Float.MAX_VALUE, max = -Float.MAX_VALUE;
for (float v : embedding) { min = Math.min(min, v); max = Math.max(max, v); }
System.out.printf("Range: [%.4f, %.4f]%n", min, max);

Embedding length: 384
First 10 values: [-0.0628, 0.0550, 0.0522, 0.0858, -0.0827, -0.0746, 0.0686, 0.0184, -0.0820, -0.0374]
Range: [-0.1444, 0.2981]


java.io.PrintStream@4301ce2f

## 3. Cosine Similarity

Same formula: `cos(θ) = (A · B) / (||A|| × ||B||)`

In [41]:
static double cosineSimilarity(float[] a, float[] b) {
    double dot = 0, normA = 0, normB = 0;
    for (int i = 0; i < a.length; i++) {
        dot += a[i] * b[i];
        normA += a[i] * a[i];
        normB += b[i] * b[i];
    }
    return dot / (Math.sqrt(normA) * Math.sqrt(normB));
}

// Quick sanity check
System.out.printf("Same direction: %.2f%n", cosineSimilarity(new float[]{1,0,0}, new float[]{1,0,0}));
System.out.printf("Perpendicular:  %.2f%n", cosineSimilarity(new float[]{1,0,0}, new float[]{0,1,0}));

Same direction: 1.00
Perpendicular:  0.00


java.io.PrintStream@2ef9c428

In [42]:
// Helper to show similarity between two texts
void showSim(String text1, String text2, String note) {
    float[] e1 = model.embed(text1).content().vector();
    float[] e2 = model.embed(text2).content().vector();
    double sim = cosineSimilarity(e1, e2);
    String bar = "#".repeat((int)(sim * 30));
    System.out.printf("'%s' vs '%s': %.3f %s (%s)%n", text1, text2, sim, bar, note);
}

## 4. Semantic Similarity

In [43]:
System.out.println("Same concept, different words:");
showSim("dog", "puppy", "same animal");
showSim("car", "automobile", "synonyms");
showSim("happy", "joyful", "synonyms");
showSim("big", "large", "synonyms");

Same concept, different words:
'dog' vs 'puppy': 0.804 ######################## (same animal)
'car' vs 'automobile': 0.865 ######################### (synonyms)
'happy' vs 'joyful': 0.684 #################### (synonyms)
'big' vs 'large': 0.807 ######################## (synonyms)


In [44]:
System.out.println("Related concepts:");
showSim("dog", "cat", "both pets");
showSim("coffee", "tea", "both beverages");
showSim("doctor", "nurse", "both medical");
showSim("king", "queen", "both royalty");

Related concepts:
'dog' vs 'cat': 0.661 ################### (both pets)
'coffee' vs 'tea': 0.616 ################## (both beverages)
'doctor' vs 'nurse': 0.608 ################## (both medical)
'king' vs 'queen': 0.681 #################### (both royalty)


In [45]:
System.out.println("Unrelated concepts:");
showSim("dog", "refrigerator", "unrelated");
showSim("banana", "democracy", "unrelated");
showSim("laptop", "elephant", "unrelated");
showSim("music", "mathematics", "unrelated?");

Unrelated concepts:
'dog' vs 'refrigerator': 0.246 ####### (unrelated)
'banana' vs 'democracy': 0.186 ##### (unrelated)
'laptop' vs 'elephant': 0.370 ########### (unrelated)
'music' vs 'mathematics': 0.381 ########### (unrelated?)


In [46]:
System.out.println("Opposites (often surprisingly similar!):");
showSim("hot", "cold", "same axis");
showSim("love", "hate", "same axis");
showSim("up", "down", "same axis");

Opposites (often surprisingly similar!):
'hot' vs 'cold': 0.519 ############### (same axis)
'love' vs 'hate': 0.488 ############## (same axis)
'up' vs 'down': 0.673 #################### (same axis)


## 5. Sentences vs Words

In [47]:
// Same word, different meaning
showSim("I need to deposit money at the bank",
        "We had a picnic by the river bank",
        "bank = financial vs river");

System.out.println();

// Paraphrases
System.out.println("Paraphrases (same meaning, different words):");
showSim("How old are you?", "What is your age?", "paraphrase");
showSim("The movie was great", "I really enjoyed the film", "paraphrase");
showSim("It's raining outside", "The weather is wet", "paraphrase");

'I need to deposit money at the bank' vs 'We had a picnic by the river bank': 0.312 ######### (bank = financial vs river)

Paraphrases (same meaning, different words):
'How old are you?' vs 'What is your age?': 0.761 ###################### (paraphrase)
'The movie was great' vs 'I really enjoyed the film': 0.805 ######################## (paraphrase)
'It's raining outside' vs 'The weather is wet': 0.791 ####################### (paraphrase)


## 6. Semantic Search

In [48]:
String[] documents = {
    "Python is a popular programming language for data science",
    "Machine learning models can predict future outcomes",
    "The weather forecast shows rain tomorrow",
    "Neural networks are inspired by the human brain",
    "I made a delicious pasta for dinner last night",
    "Deep learning requires large amounts of training data",
    "The stock market showed gains today",
    "Transformers have revolutionized natural language processing",
    "My cat loves to sleep in sunny spots",
    "GPT models generate human-like text",
};

// Pre-compute embeddings
float[][] docEmbeddings = new float[documents.length][];
for (int i = 0; i < documents.length; i++) {
    docEmbeddings[i] = model.embed(documents[i]).content().vector();
}
System.out.println("Indexed " + documents.length + " documents (" + docEmbeddings[0].length + "d vectors)");

Indexed 10 documents (384d vectors)


In [49]:
void search(String query, int topK) {
    float[] queryEmb = model.embed(query).content().vector();

    record Result(double score, int idx) implements Comparable<Result> {
        public int compareTo(Result o) { return Double.compare(o.score, this.score); }
    }

    var results = new ArrayList<Result>();
    for (int i = 0; i < documents.length; i++) {
        results.add(new Result(cosineSimilarity(queryEmb, docEmbeddings[i]), i));
    }
    Collections.sort(results);

    System.out.printf("Query: '%s'%n", query);
    for (int i = 0; i < topK; i++) {
        var r = results.get(i);
        System.out.printf("  %d. [%.3f] %s%n", i+1, r.score(), documents[r.idx()]);
    }
    System.out.println();
}

In [50]:
search("AI and artificial intelligence", 3);
search("coding", 3);
search("How do large language models work?", 3);
search("pets and animals", 3);
search("cooking and food", 3);

Query: 'AI and artificial intelligence'
  1. [0.454] Neural networks are inspired by the human brain
  2. [0.321] Machine learning models can predict future outcomes
  3. [0.215] Transformers have revolutionized natural language processing

Query: 'coding'
  1. [0.287] Python is a popular programming language for data science
  2. [0.277] GPT models generate human-like text
  3. [0.267] Neural networks are inspired by the human brain

Query: 'How do large language models work?'
  1. [0.403] Transformers have revolutionized natural language processing
  2. [0.393] GPT models generate human-like text
  3. [0.294] Deep learning requires large amounts of training data

Query: 'pets and animals'
  1. [0.243] My cat loves to sleep in sunny spots
  2. [0.171] Neural networks are inspired by the human brain
  3. [0.123] Python is a popular programming language for data science

Query: 'cooking and food'
  1. [0.367] I made a delicious pasta for dinner last night
  2. [0.136] My cat loves to sl

## 7. Python vs Java Comparison

| | Python | Java (Langchain4j) |
|---|---|---|
| **Dependency** | `pip install sentence-transformers` | One Maven artifact |
| **Load model** | `model = SentenceTransformer(name)` | `model = new AllMiniLmL6V2EmbeddingModel()` |
| **Get embedding** | `model.encode("text")` | `model.embed("text").content().vector()` |
| **Return type** | `numpy.ndarray` | `float[]` |
| **Model bundled?** | Downloaded on first use | Bundled in the jar |

Same model, same results, comparable simplicity.