feat(embedding, web worker): add embedding pipeline with Web Worker and note reader by Harsh16gupta · Pull Request #2 · joplin/plugin-note-categorization

Harsh16gupta · 2026-05-19T00:53:00Z

Sets up the plugin infrastructure for on-device note embedding.

What this does:

configures Webpack for Web Worker + ONNX WASM
adds embed worker using all-MiniLM-L6-v2 (384-dim vectors)
adds paginated note reader (Joplin Data API)
adds a test command under Tools menu to verify the pipeline

Testing:

Copilot

Pull request overview

Sets up initial infrastructure for generating on-device note embeddings in a Joplin plugin, including a note reader, an embedding worker, and a manual test command.

Changes:

Added a paginated note reader using the Joplin Data API and a Tools-menu command to exercise the pipeline.
Added an embedding worker using @huggingface/transformers and a build step to copy ONNX WASM assets into the plugin dist.
Updated plugin build configuration to compile the worker as an extra script and changed webpack target via overrides.

Reviewed changes

Copilot reviewed 9 out of 10 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
tools/copyAssets.js	Copies `onnxruntime-web` runtime assets into `dist/onnx-dist` for local loading.
src/worker/embedWorker.ts	Implements embedding model load + inference in a Worker message loop.
src/utils/logger.ts	Adds a small prefixed logger wrapper for consistent output.
src/pipeline/noteReader.ts	Adds paginated note fetching via `joplin.data.get(['notes'], ...)`.
src/manifest.json	Updates plugin description text.
src/index.ts	Registers the Tools-menu test command and uses the new logger.
src/commands/testEmbed.ts	Implements the “Test Embedding” command and attempts to spawn the worker.
plugin.config.json	Adds the worker as an extra script and applies a global webpack target override.
package.json	Adds `@huggingface/transformers` and a `copyAssets` build step.
package-lock.json	Locks new dependency tree for transformers/onnxruntime/sharp/etc.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

HahaBill · 2026-05-19T23:27:02Z

+	"extraScripts": ["worker/embedWorker.ts"],
+	"webpackOverrides": {
+		"target": "web"
+	}


Why do we need "target": "web" here?

Looking at AI summarization plugin, it works fine without it: https://github.com/joplin/plugin-ai-summarisation/blob/main/plugin.config.json

As I am using @huggingface/transformers v3, which has both a Node and browser build, so I needed target: 'web' for the worker to resolve the WASM backend. The AI summarisation plugin doesn't need it because it uses @xenova/transformers v2, which handles module resolution differently.

Removing it throws an error, and the plugin did not compile.

But I was doing it in the wrong place, moved target: 'web' to only the extra scripts build in webpack.config.js, so the main plugin stays target: 'node' and only the worker gets target: 'web'.

Harsh16gupta · 2026-05-20T19:06:16Z

+	log(`Embedding note: "${testNote.title}" (${textToEmbed.length} chars)`);
+
+	const worker = new Worker(`${installDir}/worker/embedWorker.js`);
+
+	worker.onerror = (err) => {
+		logErr('Worker error:', err.message || err);


the new Worker(...) pattern with a filesystem path works in Joplin plugins because they run in Electron's renderer process which has Web Worker support.

The AI summarisation plugin also uses the same approach new Worker(\${installDir}/workers/transformersWorker.js) in initWorkers.ts.

HahaBill · 2026-05-19T23:21:59Z

+// @ts-ignore
+import { pipeline, env } from '@huggingface/transformers';
+
+const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
+const MODEL_DTYPE = 'fp32' as const;
+const POOLING = 'mean' as const;
+
+// Worker compiles to dist/worker/, WASM files are at dist/onnx-dist/
+env.backends.onnx.wasm!.wasmPaths = '../onnx-dist/';
+


Could be related to: https://github.com/joplin/plugin-note-categorization/pull/2/changes#r3270264716

Yes, it was related to it. I have addressed it.

Harsh16gupta · 2026-05-20T23:21:03Z

+// Copy onnxruntime-web wasm binaries into dist/onnx-dist so the Web Worker
+// can load them locally without hitting CSP/CORS issues in Electron.
+const fs = require('fs-extra');
+const path = require('path');
+


added a filter that copy only the required ONNX runtime files that start with ort-wasm, and added cleanup of stale files in the target directory before copying. This brings it down from the entire dist/ directory to just 4 files.

Harsh16gupta · 2026-05-20T23:26:03Z

+  "dependencies": {
+    "@huggingface/transformers": "^3.8.1"
  }


Those dependencies (sharp, onnxruntime-node) are optional in @huggingface/transformers v3 and get skipped if native compilation fails, so npm install won't break. also the plugin only uses the WASM backend so they're never loaded at runtime. I think we can ignore this for now.

HahaBill · 2026-05-19T23:17:32Z

+	const worker = new Worker(`${installDir}/worker/embedWorker.js`);
+
+	worker.onerror = (err) => {
+		logErr('Worker error:', err.message || err);


In the screenshot, I see that you're getting some error. Could you try to trace it?

Look at there: https://stackoverflow.com/questions/591857/how-can-i-get-a-javascript-stack-trace-when-i-throw-an-exception

The error was "exports is not defined", webpack was wrapping the worker output in module.exports which doesn't exist in Web Workers. Removed that wrapping and the error is removed now.

HahaBill · 2026-05-19T23:21:59Z

+// @ts-ignore
+import { pipeline, env } from '@huggingface/transformers';
+
+const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
+const MODEL_DTYPE = 'fp32' as const;
+const POOLING = 'mean' as const;
+
+// Worker compiles to dist/worker/, WASM files are at dist/onnx-dist/
+env.backends.onnx.wasm!.wasmPaths = '../onnx-dist/';
+


Could be related to: https://github.com/joplin/plugin-note-categorization/pull/2/changes#r3270264716

HahaBill · 2026-05-19T23:27:02Z

+	"extraScripts": ["worker/embedWorker.ts"],
+	"webpackOverrides": {
+		"target": "web"
+	}


Why do we need "target": "web" here?

Looking at AI summarization plugin, it works fine without it: https://github.com/joplin/plugin-ai-summarisation/blob/main/plugin.config.json

HahaBill · 2026-05-19T23:29:47Z

+import { pipeline, env } from '@huggingface/transformers';
+
+const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
+const MODEL_DTYPE = 'fp32' as const;


We had a discussion about a concern that embedding models might be slow for +1000 notes, have you tried experimenting with fp16 instead?

Noted, will test it and update you when done.

HahaBill · 2026-05-19T23:36:14Z

+	const t0 = performance.now();
+
+	embedder = await pipeline('feature-extraction', MODEL_ID, {
+		dtype: MODEL_DTYPE,


Another point regarding inference speed. If webgpu is available, we could try to run webgpu - it'd be really nice to see this how it changes the performance: https://huggingface.co/docs/transformers.js/en/api/env

import { pipeline, env } from "@huggingface/transformers"; const device = env.IS_WEBGPU_AVAILABLE ? "webgpu" : "wasm";

I remember doing this for the testing embedding plugin, but it failed due to some limitation (I don't remember exactly why it failed).

I’ll test this one again and update you once it’s done.

Harsh16gupta · 2026-05-20T18:43:33Z

Thank you for such a detailed review. I am working on all the comments and will re-request review when done.

Harsh16gupta · 2026-05-21T01:25:24Z

Bill, I have addressed all the comments.
I’m still testing the fp16 and WebGPU one and will update you once I’m done with those as well.

feat: add embedding pipeline with Web Worker and note reader

1890040

Harsh16gupta requested a review from HahaBill May 19, 2026 02:09

HahaBill requested a review from Copilot May 19, 2026 23:06

HahaBill assigned Harsh16gupta May 19, 2026

Copilot started reviewing on behalf of HahaBill May 19, 2026 23:07 View session

Copilot AI reviewed May 19, 2026

View reviewed changes

HahaBill reviewed May 19, 2026

View reviewed changes

HahaBill changed the title ~~add embedding pipeline with Web Worker and note reader~~ feat: add embedding pipeline with Web Worker and note reader May 20, 2026

HahaBill changed the title ~~feat: add embedding pipeline with Web Worker and note reader~~ feat(embedding, web worker): add embedding pipeline with Web Worker and note reader May 20, 2026

Addressed the review comments

5b57f8a

Conversation

Harsh16gupta commented May 19, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Harsh16gupta May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Harsh16gupta commented May 20, 2026

Uh oh!

Harsh16gupta commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Harsh16gupta May 21, 2026 •

edited

Loading