Skip to content

feat(embedding, web worker): add embedding pipeline with Web Worker and note reader#2

Open
Harsh16gupta wants to merge 2 commits into
masterfrom
feat/plugin-skeleton-transformers
Open

feat(embedding, web worker): add embedding pipeline with Web Worker and note reader#2
Harsh16gupta wants to merge 2 commits into
masterfrom
feat/plugin-skeleton-transformers

Conversation

@Harsh16gupta
Copy link
Copy Markdown
Collaborator

Sets up the plugin infrastructure for on-device note embedding.

What this does:

  • configures Webpack for Web Worker + ONNX WASM
  • adds embed worker using all-MiniLM-L6-v2 (384-dim vectors)
  • adds paginated note reader (Joplin Data API)
  • adds a test command under Tools menu to verify the pipeline

Testing:
Screenshot from 2026-05-19 05-59-39

Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Sets up initial infrastructure for generating on-device note embeddings in a Joplin plugin, including a note reader, an embedding worker, and a manual test command.

Changes:

  • Added a paginated note reader using the Joplin Data API and a Tools-menu command to exercise the pipeline.
  • Added an embedding worker using @huggingface/transformers and a build step to copy ONNX WASM assets into the plugin dist.
  • Updated plugin build configuration to compile the worker as an extra script and changed webpack target via overrides.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
tools/copyAssets.js Copies onnxruntime-web runtime assets into dist/onnx-dist for local loading.
src/worker/embedWorker.ts Implements embedding model load + inference in a Worker message loop.
src/utils/logger.ts Adds a small prefixed logger wrapper for consistent output.
src/pipeline/noteReader.ts Adds paginated note fetching via joplin.data.get(['notes'], ...).
src/manifest.json Updates plugin description text.
src/index.ts Registers the Tools-menu test command and uses the new logger.
src/commands/testEmbed.ts Implements the “Test Embedding” command and attempts to spawn the worker.
plugin.config.json Adds the worker as an extra script and applies a global webpack target override.
package.json Adds @huggingface/transformers and a copyAssets build step.
package-lock.json Locks new dependency tree for transformers/onnxruntime/sharp/etc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread plugin.config.json Outdated
Comment on lines +2 to +5
"extraScripts": ["worker/embedWorker.ts"],
"webpackOverrides": {
"target": "web"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need "target": "web" here?

Looking at AI summarization plugin, it works fine without it: https://github.com/joplin/plugin-ai-summarisation/blob/main/plugin.config.json

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I am using @huggingface/transformers v3, which has both a Node and browser build, so I needed target: 'web' for the worker to resolve the WASM backend. The AI summarisation plugin doesn't need it because it uses @xenova/transformers v2, which handles module resolution differently.

Removing it throws an error, and the plugin did not compile.

But I was doing it in the wrong place, moved target: 'web' to only the extra scripts build in webpack.config.js, so the main plugin stays target: 'node' and only the worker gets target: 'web'.

Comment thread src/commands/testEmbed.ts
Comment on lines +17 to +22
log(`Embedding note: "${testNote.title}" (${textToEmbed.length} chars)`);

const worker = new Worker(`${installDir}/worker/embedWorker.js`);

worker.onerror = (err) => {
logErr('Worker error:', err.message || err);
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new Worker(...) pattern with a filesystem path works in Joplin plugins because they run in Electron's renderer process which has Web Worker support.

The AI summarisation plugin also uses the same approach new Worker(\${installDir}/workers/transformersWorker.js) in initWorkers.ts.

Comment thread src/worker/embedWorker.ts
Comment on lines +1 to +10
// @ts-ignore
import { pipeline, env } from '@huggingface/transformers';

const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
const MODEL_DTYPE = 'fp32' as const;
const POOLING = 'mean' as const;

// Worker compiles to dist/worker/, WASM files are at dist/onnx-dist/
env.backends.onnx.wasm!.wasmPaths = '../onnx-dist/';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was related to it. I have addressed it.

Comment thread tools/copyAssets.js
Comment on lines +1 to +5
// Copy onnxruntime-web wasm binaries into dist/onnx-dist so the Web Worker
// can load them locally without hitting CSP/CORS issues in Electron.
const fs = require('fs-extra');
const path = require('path');

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a filter that copy only the required ONNX runtime files that start with ort-wasm, and added cleanup of stale files in the target directory before copying. This brings it down from the entire dist/ directory to just 4 files.

Comment thread package.json
Comment on lines +30 to 32
"dependencies": {
"@huggingface/transformers": "^3.8.1"
}
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those dependencies (sharp, onnxruntime-node) are optional in @huggingface/transformers v3 and get skipped if native compilation fails, so npm install won't break. also the plugin only uses the WASM backend so they're never loaded at runtime. I think we can ignore this for now.

Comment thread src/commands/testEmbed.ts
const worker = new Worker(`${installDir}/worker/embedWorker.js`);

worker.onerror = (err) => {
logErr('Worker error:', err.message || err);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the screenshot, I see that you're getting some error. Could you try to trace it?

Look at there: https://stackoverflow.com/questions/591857/how-can-i-get-a-javascript-stack-trace-when-i-throw-an-exception

Copy link
Copy Markdown
Collaborator Author

@Harsh16gupta Harsh16gupta May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was "exports is not defined", webpack was wrapping the worker output in module.exports which doesn't exist in Web Workers. Removed that wrapping and the error is removed now.
image

Comment thread src/worker/embedWorker.ts
Comment on lines +1 to +10
// @ts-ignore
import { pipeline, env } from '@huggingface/transformers';

const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
const MODEL_DTYPE = 'fp32' as const;
const POOLING = 'mean' as const;

// Worker compiles to dist/worker/, WASM files are at dist/onnx-dist/
env.backends.onnx.wasm!.wasmPaths = '../onnx-dist/';

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment thread plugin.config.json Outdated
Comment on lines +2 to +5
"extraScripts": ["worker/embedWorker.ts"],
"webpackOverrides": {
"target": "web"
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need "target": "web" here?

Looking at AI summarization plugin, it works fine without it: https://github.com/joplin/plugin-ai-summarisation/blob/main/plugin.config.json

Comment thread src/worker/embedWorker.ts
import { pipeline, env } from '@huggingface/transformers';

const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
const MODEL_DTYPE = 'fp32' as const;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion about a concern that embedding models might be slow for +1000 notes, have you tried experimenting with fp16 instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, will test it and update you when done.

Comment thread src/worker/embedWorker.ts
const t0 = performance.now();

embedder = await pipeline('feature-extraction', MODEL_ID, {
dtype: MODEL_DTYPE,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point regarding inference speed. If webgpu is available, we could try to run webgpu - it'd be really nice to see this how it changes the performance: https://huggingface.co/docs/transformers.js/en/api/env

import { pipeline, env } from "@huggingface/transformers";

const device = env.IS_WEBGPU_AVAILABLE ? "webgpu" : "wasm";

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember doing this for the testing embedding plugin, but it failed due to some limitation (I don't remember exactly why it failed).

I’ll test this one again and update you once it’s done.

@HahaBill HahaBill changed the title add embedding pipeline with Web Worker and note reader feat: add embedding pipeline with Web Worker and note reader May 20, 2026
@HahaBill HahaBill changed the title feat: add embedding pipeline with Web Worker and note reader feat(embedding, web worker): add embedding pipeline with Web Worker and note reader May 20, 2026
@Harsh16gupta
Copy link
Copy Markdown
Collaborator Author

Thank you for such a detailed review. I am working on all the comments and will re-request review when done.

@Harsh16gupta
Copy link
Copy Markdown
Collaborator Author

Bill, I have addressed all the comments.
I’m still testing the fp16 and WebGPU one and will update you once I’m done with those as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants