Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1,007 changes: 1,003 additions & 4 deletions package-lock.json

Large diffs are not rendered by default.

6 changes: 5 additions & 1 deletion package.json
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,9 @@
"name": "joplin-note-categorization-plugin",
"version": "1.0.0",
"scripts": {
"dist": "webpack --env joplin-plugin-config=buildMain && webpack --env joplin-plugin-config=buildExtraScripts && webpack --env joplin-plugin-config=createArchive",
"dist": "webpack --env joplin-plugin-config=buildMain && webpack --env joplin-plugin-config=buildExtraScripts && npm run copyAssets && webpack --env joplin-plugin-config=createArchive",
"prepare": "npm run dist",
"copyAssets": "node tools/copyAssets.js",
"updateVersion": "webpack --env joplin-plugin-config=updateVersion",
"update": "npm install -g generator-joplin && yo joplin --node-package-manager npm --update --force"
},
Expand All @@ -25,5 +26,8 @@
"typescript": "^4.8.2",
"webpack": "^5.74.0",
"webpack-cli": "^4.10.0"
},
"dependencies": {
"@huggingface/transformers": "^3.8.1"
}
Comment on lines +30 to 32
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those dependencies (sharp, onnxruntime-node) are optional in @huggingface/transformers v3 and get skipped if native compilation fails, so npm install won't break. also the plugin only uses the WASM backend so they're never loaded at runtime. I think we can ignore this for now.

}
2 changes: 1 addition & 1 deletion plugin.config.json
Original file line number Diff line number Diff line change
@@ -1,3 +1,3 @@
{
"extraScripts": []
"extraScripts": ["worker/embedWorker.ts"]
}
60 changes: 60 additions & 0 deletions src/commands/testEmbed.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
import { fetchAllNotes } from '../pipeline/noteReader';
import { log, logErr } from '../utils/logger';

export const runTestEmbed = async (installDir: string) => {
log('Test embed command triggered');

const notes = await fetchAllNotes();
log(`Fetched ${notes.length} notes`);

if (notes.length === 0) {
log('No notes found. Create some notes and try again.');
return;
}

const testNote = notes[0];
const textToEmbed = testNote.body.length > 0 ? testNote.body.slice(0, 2000) : testNote.title;
log(`Embedding note: "${testNote.title}" (${textToEmbed.length} chars)`);

const worker = new Worker(`${installDir}/worker/embedWorker.js`);

worker.onerror = (err) => {
logErr('Worker error:', err.message || err);
Comment on lines +17 to +22
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the new Worker(...) pattern with a filesystem path works in Joplin plugins because they run in Electron's renderer process which has Web Worker support.

The AI summarisation plugin also uses the same approach new Worker(\${installDir}/workers/transformersWorker.js) in initWorkers.ts.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the screenshot, I see that you're getting some error. Could you try to trace it?

Look at there: https://stackoverflow.com/questions/591857/how-can-i-get-a-javascript-stack-trace-when-i-throw-an-exception

Copy link
Copy Markdown
Collaborator Author

@Harsh16gupta Harsh16gupta May 21, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error was "exports is not defined", webpack was wrapping the worker output in module.exports which doesn't exist in Web Workers. Removed that wrapping and the error is removed now.
image

};

worker.onmessage = async (event) => {
const data = event.data;

if (data.type === 'load-result') {
if (data.success) {
log(`Model loaded in ${(data.loadTime / 1000).toFixed(1)}s, warmup: ${Math.round(data.warmupTime)}ms`);

worker.postMessage({
type: 'embed',
text: textToEmbed,
noteId: testNote.id,
});
} else {
logErr('Model load failed:', data.error);
worker.terminate();
}
return;
}

if (data.type === 'embed-result') {
if (data.success) {
log(`Embedding complete for "${testNote.title}"`);
log(` Dimensions: ${data.dimensions}`);
log(` Inference time: ${Math.round(data.inferenceTime)}ms`);
log(` First 5 values: [${data.embedding.slice(0, 5).map((v: number) => v.toFixed(4)).join(', ')}]`);
} else {
logErr('Embed failed:', data.error);
}
worker.terminate();
log('Worker terminated. Test complete.');
}
};

log('Loading model...');
worker.postMessage({ type: 'load' });
};
24 changes: 21 additions & 3 deletions src/index.ts
Original file line number Diff line number Diff line change
@@ -1,8 +1,26 @@
import joplin from 'api';
import { MenuItemLocation } from 'api/types';
import { runTestEmbed } from './commands/testEmbed';
import { log } from './utils/logger';

joplin.plugins.register({
onStart: async function() {
// eslint-disable-next-line no-console
console.info('Hello world. Test plugin started!');
onStart: async function () {
log('Plugin started');

const installDir = await joplin.plugins.installationDir();

await joplin.commands.register({
name: 'aiCategorise.testEmbed',
label: 'AI Categorise: Test Embedding',
execute: async () => runTestEmbed(installDir),
});

await joplin.views.menuItems.create(
'aiCategorise.testEmbedMenuItem',
'aiCategorise.testEmbed',
MenuItemLocation.Tools,
);

log('Test command registered under Tools menu');
},
});
2 changes: 1 addition & 1 deletion src/manifest.json
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"app_min_version": "3.5",
"version": "1.0.0",
"name": "Note Categorization Plugin",
"description": "",
"description": "AI-based note categorisation: clusters notes semantically, suggests tags and notebook structures, and detects stale notes.",
"author": "Harsh Gupta",
"homepage_url": "",
"repository_url": "https://github.com/joplin/plugin-note-categorization",
Expand Down
28 changes: 28 additions & 0 deletions src/pipeline/noteReader.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
import joplin from 'api';

export interface NoteItem {
id: string;
title: string;
body: string;
updated_time: number;
user_updated_time: number;
parent_id: string;
}

export const fetchAllNotes = async (): Promise<NoteItem[]> => {
let page = 1;
const allNotes: NoteItem[] = [];

while (true) {
const result = await joplin.data.get(['notes'], {
fields: ['id', 'title', 'body', 'updated_time', 'user_updated_time', 'parent_id'],
page,
limit: 50,
});
allNotes.push(...result.items);
if (!result.has_more) break;
page++;
}

return allNotes;
};
9 changes: 9 additions & 0 deletions src/utils/logger.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
const LOG_PREFIX = '[ai-categorise]';

export const log = (...args: unknown[]) => {
console.info(LOG_PREFIX, ...args);
};

export const logErr = (...args: unknown[]) => {
console.error(LOG_PREFIX, ...args);
};
72 changes: 72 additions & 0 deletions src/worker/embedWorker.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,72 @@
// @ts-ignore
import { pipeline, env } from '@huggingface/transformers';

const MODEL_ID = 'Xenova/all-MiniLM-L6-v2';
const MODEL_DTYPE = 'fp32' as const;
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We had a discussion about a concern that embedding models might be slow for +1000 notes, have you tried experimenting with fp16 instead?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, will test it and update you when done.

const POOLING = 'mean' as const;

// Worker compiles to dist/worker/, WASM files are at dist/onnx-dist/
env.backends.onnx.wasm!.wasmPaths = '../onnx-dist/';

Comment on lines +1 to +10
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it was related to it. I have addressed it.

let embedder: any = null;

const loadModel = async () => {
const t0 = performance.now();

embedder = await pipeline('feature-extraction', MODEL_ID, {
dtype: MODEL_DTYPE,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another point regarding inference speed. If webgpu is available, we could try to run webgpu - it'd be really nice to see this how it changes the performance: https://huggingface.co/docs/transformers.js/en/api/env

import { pipeline, env } from "@huggingface/transformers";

const device = env.IS_WEBGPU_AVAILABLE ? "webgpu" : "wasm";

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I remember doing this for the testing embedding plugin, but it failed due to some limitation (I don't remember exactly why it failed).

I’ll test this one again and update you once it’s done.

});

const loadTime = performance.now() - t0;

// Warm-up: first inference is always slower due to JIT/WASM setup.
const tw = performance.now();
await embedder('warmup text', { pooling: POOLING, normalize: true });
const warmupTime = performance.now() - tw;

return { loadTime, warmupTime };
};

const embed = async (text: string) => {
if (!embedder) throw new Error('Model not loaded');

const t0 = performance.now();
const output = await embedder(text, { pooling: POOLING, normalize: true });
const inferenceTime = performance.now() - t0;
const dimensions = output.data.length;
const embedding = Array.from(output.data as Float32Array);

return { inferenceTime, dimensions, embedding };
};

self.addEventListener('message', async (event) => {
const { type } = event.data;

if (type === 'load') {
try {
const result = await loadModel();
postMessage({ type: 'load-result', success: true, ...result });
} catch (e: any) {
postMessage({ type: 'load-result', success: false, error: String(e) });
}
}

if (type === 'embed') {
try {
const result = await embed(event.data.text);
postMessage({
type: 'embed-result',
noteId: event.data.noteId,
success: true,
...result,
});
} catch (e: any) {
postMessage({
type: 'embed-result',
noteId: event.data.noteId,
success: false,
error: String(e),
});
}
}
});
43 changes: 43 additions & 0 deletions tools/copyAssets.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
// Copy onnxruntime-web wasm binaries into dist/onnx-dist so the Web Worker
// can load them locally without hitting CSP/CORS issues in Electron.
const fs = require('fs-extra');
const path = require('path');

Comment on lines +1 to +5
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added a filter that copy only the required ONNX runtime files that start with ort-wasm, and added cleanup of stale files in the target directory before copying. This brings it down from the entire dist/ directory to just 4 files.

const possiblePaths = [
path.join(__dirname, '..', 'node_modules', '@huggingface', 'transformers', 'node_modules', 'onnxruntime-web', 'dist'),
path.join(__dirname, '..', 'node_modules', 'onnxruntime-web', 'dist'),
];

let onnxDistDir = null;
for (const p of possiblePaths) {
if (fs.existsSync(p)) {
onnxDistDir = p;
break;
}
}

if (!onnxDistDir) {
console.error('ERROR: Could not find onnxruntime-web dist directory!');
console.error('Searched:', possiblePaths);
process.exit(1);
}

const targetDir = path.join(__dirname, '..', 'dist', 'onnx-dist');

// Clean stale files before copying
fs.removeSync(targetDir);
fs.ensureDirSync(targetDir);

// Copy only ONNX runtime files (.wasm + .mjs loaders) to reduce plugin archive size
const runtimeFiles = fs.readdirSync(onnxDistDir).filter(f =>
f.startsWith('ort-wasm') && (f.endsWith('.wasm') || f.endsWith('.mjs'))
);

console.log(`Copying ${runtimeFiles.length} ONNX runtime files from: ${onnxDistDir}`);
console.log(` to: ${targetDir}`);

for (const file of runtimeFiles) {
fs.copySync(path.join(onnxDistDir, file), path.join(targetDir, file));
}

console.log(`Done! ${runtimeFiles.length} ONNX runtime files copied to dist/onnx-dist/`);
10 changes: 8 additions & 2 deletions webpack.config.js
Original file line number Diff line number Diff line change
Expand Up @@ -308,8 +308,14 @@ function buildExtraScriptConfigs(userConfig) {

for (const scriptName of userConfig.extraScripts) {
const scriptPaths = resolveExtraScriptPath(scriptName);
output.push({ ...extraScriptConfig, entry: scriptPaths.entry,
output: scriptPaths.output });
output.push({
...extraScriptConfig, entry: scriptPaths.entry,
output: {
filename: scriptPaths.output.filename,
path: scriptPaths.output.path,
},
target: 'web'
});
}

return output;
Expand Down