[janhq/jan#1635] feat: decouple nitro engine into a library #367

InNoobWeTrust · 2024-01-22T06:49:36Z

Current progress

The library can be built, but with local dependency on janhq/jan as submodule
~~Next step is update janhq/core to npmjs and use the resolved package there instead of submodule~~
Update GitHub-action workflow to build and test nitro-node ~~for different OSes~~ and add the tarballs of npm packages to release
From janhq/jan build step for extensions, simply download nitro-node from ~~github release~~ npmjs with npm pack @janhq/nitro-node --target_arch=x64 --target_platform=linux --pack-destination ../electron/pre-install/
Refactor nitro-node to be independent of @janhq/core (ideal) or using the minimal interfaces and concepts from @janhq/core
From @janhq/jan, will import @janhq/nitro-node for interacting with nitro.

How to build

From root directory of repo

cd nitro-node && make

The plan:

Move the nitro extension to nitro repo
Make postinstall script to download release version of nitro binary for target arch and platform when being installed as dependency. Ref:
- https://github.com/electron/electron/blob/main/npm/install.js
- https://github.com/electron/electron/blob/1300e83884595a3c89c24bd4d42f3a1bbbb6fe7d/npm/package.json#L8
~~Build the extension and ~~put tarballs into release together with nitro~~ publish to npmjs under alpha/beta~~
~~From Jan, will need to add some kind of version.txt for downloading correct version of nitro-node and download it to put into ./electron/pre-install/~~
Refactor @janhq/nitro-node to be independent library
From @janhq/jan, integrate @janhq/nitro-node

Issue:

Currently I'm cloning https://github.com/janhq/jan as a submodule in order to build the janhq/core package. The version on npm seem to be outdated and the imports do not work although the version is displayed to be the same (https://www.npmjs.com/package/@janhq/core) => can you update it on npmjs or the team prefer to coupled the versions tightly with github repo?

InNoobWeTrust · 2024-01-22T08:23:17Z

Test build on different yarn versions (platform: MacOS M1):

v1.22.21: build success
v2.4.3: build fail with error ➤ YN0066: │ typescript@patch:typescript@npm%3A5.3.3#builtin<compat/typescript>::version=5.3.3&hash=8133ad: Cannot apply hunk #1 (set enableInlineHunks for details)
v3.7.0: build success

How to reproduce:

npm i -g corepack
corepack enable  --install-directory <some_dir_in_PATH>
corepack install -g yarn@<1,2,3>
make clean all

nitro-node/src/index.ts

nitro-node/src/@types/global.d.ts

nitro-node/src/module.ts

nitro-node/src/nvidia.ts

nitro-node/test/model.json

nitro-node/src/index.ts

.github/scripts/auto-sign.sh

.gitignore

nitro-node/.gitignore

nitro-node/package.json

InNoobWeTrust · 2024-01-25T10:12:34Z

Tests passed on Ubuntu (no CUDA) and Mac-Intel: https://github.com/InNoobWeTrust/nitro/actions/runs/7652932312

start/stop nitro process
run chatCompletion with stream option

…itro

…vironment" This reverts commit b6727a0. Polyfill alternatives for fetch are not fully supported ReadableStream.

… tag in code

…erence-engine-into-a-library

This expose exit code, termination signal, and stdio for nitro subprocess This should help solving #369

hiro-v · 2024-02-06T06:33:57Z

nitro-node/src/execute.ts

+ * Find which executable file to run based on the current platform.
+ * @returns The name of the executable file to run.
+ */
+export const executableNitroFile = (


What I'm seeing here is that this function expects folderPath as the parameters, and there are a bunch of available built binaries for Nitro. In my opinion, it's not a good design.
Personally I think there should be a better way:

node-nitro exposes a util function to return the exact binary (e.g: Windows - with CUDA, or Mac ARM64 with Metal).

Then run init, in the init function it download if no binary there/ or user can prefill it with another util function.

If it fails, let it fail and show user why it failed.

getNvidiaConfig() is very specific, we should only use it in utils as Nitro will support many other build (AMD with Vulkan, sycl for Intel, linux arm etc)

Can we split this logic into another PR? Since I don't know what will be the naming agreement for the release artifact (it's with/without CUDA version and some inconsistencies). So defining a single generic logic for the executable file is somewhat out of my control...😅

In nitro.ts, the runModel() function already tries to download the binaries automatically if they are not there yet. If we want to define specific binary to download then we can run the system analysis on intialize(). After that, in runModel() we will have all the necessary information to download the correct one.

Not quite understand your point about the behavior when failing, can you share with me a logical flow about fail handling so I can understand better what you mean? 😅

I'm a little dumb when it comes to AMD and Intel things. For your suggestion, I will move the getNvidiaConfig() out and only call it in an iterative greedy approach in detecting available hardware (loop over supported hardware and run the check for them). Then after we have the supported-hardware array, we will have a decision on what binary to load? 🤔

Why node-nitro needs to care about the model download?

I'm ok with running analysis first and install correct Nitro, not expecting a folder with all nitro build waiting to be initialized (with opinionated name as win-cpu/nitro, mac/nitro. And it should reflect that idea

For logical flow of failure: cpp compiled program can yield arbitrary error (core dump, illegal instructions, etc) that normal binding with subprocess just simply cannot understand. Those have to be handled with node-nitro if you think of providing surplus value here.

It's not dumb or not, it's technical engineering that we think through use cases we MIGHT need to solve, not only the one we CURRENTLY have

Hope you understand. Thank you

I don't quite understand this, nitro-node only helps to get nitro binaries available on the user's machine, no downloading of the model here...😅

Ok, will go with this. But the logic will be not flexible in the early versions, need to discuss the naming schemes of nitro release archives (can we have only a single generic CUDA build without separating into different versions?)

The scope of this PR is to first split the logic from Jan into a library here for re-use and easier to maintain as it will be tightened with nitro development. I'm not yet involved much in the architecture/build decision of nitro so for now I will just stick with the subprocess. Until nitro can be used as a system library then make a node binding later (will be much more work). Let's go with the iterative approach and create another issue for Node binding at the moment, I don't want the PR to become too big to review with unplanned things...😅

I believe we can't handle all the unplanned things all at once or we will go nowhere. It might be the use case but not until it's confirmed to be worth our effort (user survey and confirmed to be valid first before doing). If things are not planned yet and not an agreement on the roadmap then I don't think we should spend much time over-engineering it. Just delivering features one step at a time...

For your suggestions, I'm considering and thinking about the best ways to adapt for now with minimal effort possible. But for things in the future, please share the roadmap link of the mentioned point and a defined plan so I can understand what are the steps to iteratively deliver it. I'm not a team member at the moment so I don't know well about your plans so pardon if I'm not understanding your points correctly...😅

Why node-nitro needs to care about the model download?

I'm ok with running analysis first and install correct Nitro, not expecting a folder with all nitro build waiting to be initialized (with opinionated name as win-cpu/nitro, mac/nitro. And it should reflect that idea

For logical flow of failure: cpp compiled program can yield arbitrary error (core dump, illegal instructions, etc) that normal binding with subprocess just simply cannot understand. Those have to be handled with node-nitro if you think of providing surplus value here.

It's not dumb or not, it's technical engineering that we think through use cases we MIGHT need to solve, not only the one we CURRENTLY have

Hope you understand. Thank you

Sorry, misunderstood the third point. I already exposed the error code and exit signals upon exit of nitro subprocess in the recent commits. But to handle core dump and crash then I'm not yet sure how node handles those cases, let me check and have the appropriate solution for the mentioned case. Initially, my assumption is that if nitro crashes or core dumped then no callback for error code or signal but a disconnect callback will be called, then we detect that and get dump information from the system...🤔

hiro-v · 2024-02-06T06:36:58Z

nitro-node/src/nvidia.ts

+/**
+ * Get GPU information
+ */
+async function updateGpuInfo(): Promise<void> {


I think this one should be in the app, as it's very Jan specific support.
For normal use case, people export the CUDA_VISIBLE_DEVICES.
The logic to assign Nitro with the highest VRAM should not be here

I'm not sure about this, do you think most people will use the library with responsibility (exporting CUDA_VISIBLE_DEVICES) or prefer it fool-proof as default and let the library define sensible options for them? 🤔

what is sensible and what is your definition of Nitro sensible way to choose.
I don't believe any lib can understand hardware in a better way compared to the user of the lib who owns the hardware.
Even pytorch use default allocation for all GPU, and let user know that they can override which GPU to use.
What do you think?

Yes, I mean if users do not provide anything then the library will use defaults based on analysis. And for the one knowing and wanting to be specific, we let them specify it.
For now, I'll add support for env var to let users specify their preferred devices.

hiro-v · 2024-02-06T06:37:51Z

nitro-node/src/prompt.ts

+ * @returns {(NitroPromptSetting | never)} parsed prompt setting
+ * @throws {Error} if cannot split promptTemplate
+ */
+export function promptTemplateConverter(


I actually think this should be in the Nitro c++ code, not here.
And it's not thoroughly cover many cases that we have

You're right, but for the sake of this PR, let's just create another issue for the template converter. Gonna check the flow of C++ later and adapt this in another PR.

hiro-v · 2024-02-06T06:39:31Z

nitro-node/src/scripts/download-nitro.ts

+// The platform architecture
+//const ARCH = process.env.npm_config_arch || process.arch;
+
+const getReleaseInfo = async (taggedVersion: string): Promise<any> => {


do you think that using node-gyp to install and build nitro locally in npm install would be better?
I think this one is useful

I prefer to skip node-gyp at this stage. As I understand, the primary use-case of node-gyp is to build node native modules (eg. when we want to build nitro into *.so files and use node binding to control nitro directly instead of going through a subprocess wrapper).
Using node-gyp for just getting the binaries is somehow overkill, and supporting programmatic build on every supported platform is another story. At this point I'm not confident enough that I can support it without people raising issues (build env involves PATH, OS versions, available tooling, etc..). For now, I don't think stepping into it will make good use of the time, more issues will happen...

Then basically, what is the point of wrapping Nitro binaries in a big Nitro-Node package?
Nitro itself is just a binary in 3mb with OpenAI compatible API pointing to llama.cpp with 1 single command and 2 fetch POST to use.
I understand your point but given the state of this, if node-nitro does not provide anything surplus to existing one, I don't think it should exist as it adds a lot of overhead and grey-box.

The intention for now is simple, to ease the usage of nitro in Nodejs programs without too much boilerplate code repeated and bloated between projects. It's not about overhead but rather about practical usage, it's just a simple wrapper for now, with no intention to bring surplus value to what nitro already has but rather improve developer experience on making a client program that utilizes nitro. 😅

For building nitro during installation if no binary is available for the platform, let's move it to another issue as I believe it will be rather complex. Let's deliver what is useful for gaining more adoption of nitro from fellow developers then improve it later with more features...😅

FYI: I'm looking at a reference build script at the link below
https://github.com/withcatai/node-llama-cpp/blob/master/src/cli/commands/BuildCommand.ts

They also don't use node-gyp but rather have a build script themself. I also prefer it this way, to not rely on node-gyp if not necessary means no dependency on Python and other native libs required by node-gyp, then we can have a less demanding library => more usage from other projects => more visibility, and more contribution back to nitro and this library. 😉

hiro-v · 2024-02-06T06:40:49Z

nitro-node/src/utils/index.ts

+/**
+ * Get the system resources information
+ */
+export async function getResourcesInfo(): Promise<ResourcesInfo> {


I think this one is how Jan is USING Nitro, should not be something here to be reuse

This one is for getting the number of CPU cores on the machine to provide a default value if not specified when running nitro. It's also useful for ones who use nitro-node, I believe.

.github/workflows/build.yml

hiro-v · 2024-02-06T06:51:06Z

Overall, thank you for the great job.
However I have some comments:

It's ok to download binaries, but not ok to check a lot and use the respective build. It bloats the core execute.ts, esp. with upcoming builds.
This should include install and build from source, I think https://github.com/withcatai/node-llama-cpp did it

louis-jan · 2024-02-19T01:38:08Z

@InNoobWeTrust @hiro-v I would like to proceed and merge this pull request. Any further updates will be in separate PRs.

louis-jan

LGTM

louis-jan

LGTM. Let's resolve the pending ones in the next PRs.

InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch from 6662320 to 01731cf Compare January 22, 2024 08:16

InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch 2 times, most recently from 70c20ee to d7344dd Compare January 22, 2024 20:44

InNoobWeTrust mentioned this pull request Jan 22, 2024

feat(WIP): decouple nitro inference engine into a library janhq/jan#1712

Closed

3 tasks

louis-jan reviewed Jan 23, 2024

View reviewed changes

nitro-node/src/index.ts Outdated Show resolved Hide resolved

louis-jan reviewed Jan 23, 2024

View reviewed changes

nitro-node/src/@types/global.d.ts Outdated Show resolved Hide resolved

louis-jan reviewed Jan 23, 2024

View reviewed changes

nitro-node/src/module.ts Outdated Show resolved Hide resolved

InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch 4 times, most recently from 13245cf to cba07be Compare January 24, 2024 08:47