Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[janhq/jan#1635] feat: decouple nitro engine into a library #367

Conversation

InNoobWeTrust
Copy link
Contributor

@InNoobWeTrust InNoobWeTrust commented Jan 22, 2024

Current progress

  • The library can be built, but with local dependency on janhq/jan as submodule
  • Next step is update janhq/core to npmjs and use the resolved package there instead of submodule
  • Update GitHub-action workflow to build and test nitro-node for different OSes and add the tarballs of npm packages to release
  • From janhq/jan build step for extensions, simply download nitro-node from github release npmjs with npm pack @janhq/nitro-node --target_arch=x64 --target_platform=linux --pack-destination ../electron/pre-install/
  • Refactor nitro-node to be independent of @janhq/core (ideal) or using the minimal interfaces and concepts from @janhq/core
  • From @janhq/jan, will import @janhq/nitro-node for interacting with nitro.

How to build

From root directory of repo

cd nitro-node && make

The plan:

Issue:

  • Currently I'm cloning https://github.com/janhq/jan as a submodule in order to build the janhq/core package. The version on npm seem to be outdated and the imports do not work although the version is displayed to be the same (https://www.npmjs.com/package/@janhq/core) => can you update it on npmjs or the team prefer to coupled the versions tightly with github repo?

@InNoobWeTrust InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch from 6662320 to 01731cf Compare January 22, 2024 08:16
@InNoobWeTrust
Copy link
Contributor Author

InNoobWeTrust commented Jan 22, 2024

Test build on different yarn versions (platform: MacOS M1):

  • v1.22.21: build success
  • v2.4.3: build fail with error ➤ YN0066: │ typescript@patch:typescript@npm%3A5.3.3#builtin<compat/typescript>::version=5.3.3&hash=8133ad: Cannot apply hunk #1 (set enableInlineHunks for details)
  • v3.7.0: build success

How to reproduce:

npm i -g corepack
corepack enable  --install-directory <some_dir_in_PATH>
corepack install -g yarn@<1,2,3>
make clean all

@InNoobWeTrust InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch 2 times, most recently from 70c20ee to d7344dd Compare January 22, 2024 20:44
nitro-node/src/index.ts Outdated Show resolved Hide resolved
nitro-node/src/module.ts Outdated Show resolved Hide resolved
@InNoobWeTrust InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch 4 times, most recently from 13245cf to cba07be Compare January 24, 2024 08:47
nitro-node/src/nvidia.ts Outdated Show resolved Hide resolved
nitro-node/src/index.ts Outdated Show resolved Hide resolved
nitro-node/src/index.ts Outdated Show resolved Hide resolved
.gitignore Show resolved Hide resolved
nitro-node/.gitignore Outdated Show resolved Hide resolved
nitro-node/.gitignore Outdated Show resolved Hide resolved
nitro-node/package.json Outdated Show resolved Hide resolved
nitro-node/package.json Outdated Show resolved Hide resolved
@InNoobWeTrust InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch 6 times, most recently from cb57401 to d85535c Compare January 25, 2024 09:03
@InNoobWeTrust
Copy link
Contributor Author

Tests passed on Ubuntu (no CUDA) and Mac-Intel: https://github.com/InNoobWeTrust/nitro/actions/runs/7652932312

  • start/stop nitro process
  • run chatCompletion with stream option

@InNoobWeTrust InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch from 02d91bd to ddd1406 Compare January 25, 2024 14:56
@InNoobWeTrust InNoobWeTrust force-pushed the feat/1635/decouple-nitro-inference-engine-into-a-library branch from ff73fbd to 8a309a3 Compare February 6, 2024 00:14
* Find which executable file to run based on the current platform.
* @returns The name of the executable file to run.
*/
export const executableNitroFile = (
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What I'm seeing here is that this function expects folderPath as the parameters, and there are a bunch of available built binaries for Nitro. In my opinion, it's not a good design.
Personally I think there should be a better way:

  • node-nitro exposes a util function to return the exact binary (e.g: Windows - with CUDA, or Mac ARM64 with Metal).
  • Then run init, in the init function it download if no binary there/ or user can prefill it with another util function.
  • If it fails, let it fail and show user why it failed.
  • getNvidiaConfig() is very specific, we should only use it in utils as Nitro will support many other build (AMD with Vulkan, sycl for Intel, linux arm etc)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Can we split this logic into another PR? Since I don't know what will be the naming agreement for the release artifact (it's with/without CUDA version and some inconsistencies). So defining a single generic logic for the executable file is somewhat out of my control...😅
  • In nitro.ts, the runModel() function already tries to download the binaries automatically if they are not there yet. If we want to define specific binary to download then we can run the system analysis on intialize(). After that, in runModel() we will have all the necessary information to download the correct one.
  • Not quite understand your point about the behavior when failing, can you share with me a logical flow about fail handling so I can understand better what you mean? 😅
  • I'm a little dumb when it comes to AMD and Intel things. For your suggestion, I will move the getNvidiaConfig() out and only call it in an iterative greedy approach in detecting available hardware (loop over supported hardware and run the check for them). Then after we have the supported-hardware array, we will have a decision on what binary to load? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why node-nitro needs to care about the model download?
  2. I'm ok with running analysis first and install correct Nitro, not expecting a folder with all nitro build waiting to be initialized (with opinionated name as win-cpu/nitro, mac/nitro. And it should reflect that idea
  3. For logical flow of failure: cpp compiled program can yield arbitrary error (core dump, illegal instructions, etc) that normal binding with subprocess just simply cannot understand. Those have to be handled with node-nitro if you think of providing surplus value here.
  4. It's not dumb or not, it's technical engineering that we think through use cases we MIGHT need to solve, not only the one we CURRENTLY have

Hope you understand. Thank you

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I don't quite understand this, nitro-node only helps to get nitro binaries available on the user's machine, no downloading of the model here...😅
  • Ok, will go with this. But the logic will be not flexible in the early versions, need to discuss the naming schemes of nitro release archives (can we have only a single generic CUDA build without separating into different versions?)
  • The scope of this PR is to first split the logic from Jan into a library here for re-use and easier to maintain as it will be tightened with nitro development. I'm not yet involved much in the architecture/build decision of nitro so for now I will just stick with the subprocess. Until nitro can be used as a system library then make a node binding later (will be much more work). Let's go with the iterative approach and create another issue for Node binding at the moment, I don't want the PR to become too big to review with unplanned things...😅
  • I believe we can't handle all the unplanned things all at once or we will go nowhere. It might be the use case but not until it's confirmed to be worth our effort (user survey and confirmed to be valid first before doing). If things are not planned yet and not an agreement on the roadmap then I don't think we should spend much time over-engineering it. Just delivering features one step at a time...

For your suggestions, I'm considering and thinking about the best ways to adapt for now with minimal effort possible. But for things in the future, please share the roadmap link of the mentioned point and a defined plan so I can understand what are the steps to iteratively deliver it. I'm not a team member at the moment so I don't know well about your plans so pardon if I'm not understanding your points correctly...😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. Why node-nitro needs to care about the model download?

    1. I'm ok with running analysis first and install correct Nitro, not expecting a folder with all nitro build waiting to be initialized (with opinionated name as win-cpu/nitro, mac/nitro. And it should reflect that idea

    2. For logical flow of failure: cpp compiled program can yield arbitrary error (core dump, illegal instructions, etc) that normal binding with subprocess just simply cannot understand. Those have to be handled with node-nitro if you think of providing surplus value here.

    3. It's not dumb or not, it's technical engineering that we think through use cases we MIGHT need to solve, not only the one we CURRENTLY have

Hope you understand. Thank you

Sorry, misunderstood the third point. I already exposed the error code and exit signals upon exit of nitro subprocess in the recent commits. But to handle core dump and crash then I'm not yet sure how node handles those cases, let me check and have the appropriate solution for the mentioned case. Initially, my assumption is that if nitro crashes or core dumped then no callback for error code or signal but a disconnect callback will be called, then we detect that and get dump information from the system...🤔

/**
* Get GPU information
*/
async function updateGpuInfo(): Promise<void> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one should be in the app, as it's very Jan specific support.
For normal use case, people export the CUDA_VISIBLE_DEVICES.
The logic to assign Nitro with the highest VRAM should not be here

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about this, do you think most people will use the library with responsibility (exporting CUDA_VISIBLE_DEVICES) or prefer it fool-proof as default and let the library define sensible options for them? 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is sensible and what is your definition of Nitro sensible way to choose.
I don't believe any lib can understand hardware in a better way compared to the user of the lib who owns the hardware.
Even pytorch use default allocation for all GPU, and let user know that they can override which GPU to use.
What do you think?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I mean if users do not provide anything then the library will use defaults based on analysis. And for the one knowing and wanting to be specific, we let them specify it.
For now, I'll add support for env var to let users specify their preferred devices.

* @returns {(NitroPromptSetting | never)} parsed prompt setting
* @throws {Error} if cannot split promptTemplate
*/
export function promptTemplateConverter(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I actually think this should be in the Nitro c++ code, not here.
And it's not thoroughly cover many cases that we have

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, but for the sake of this PR, let's just create another issue for the template converter. Gonna check the flow of C++ later and adapt this in another PR.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok agree

// The platform architecture
//const ARCH = process.env.npm_config_arch || process.arch;

const getReleaseInfo = async (taggedVersion: string): Promise<any> => {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think that using node-gyp to install and build nitro locally in npm install would be better?
I think this one is useful

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer to skip node-gyp at this stage. As I understand, the primary use-case of node-gyp is to build node native modules (eg. when we want to build nitro into *.so files and use node binding to control nitro directly instead of going through a subprocess wrapper).
Using node-gyp for just getting the binaries is somehow overkill, and supporting programmatic build on every supported platform is another story. At this point I'm not confident enough that I can support it without people raising issues (build env involves PATH, OS versions, available tooling, etc..). For now, I don't think stepping into it will make good use of the time, more issues will happen...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then basically, what is the point of wrapping Nitro binaries in a big Nitro-Node package?
Nitro itself is just a binary in 3mb with OpenAI compatible API pointing to llama.cpp with 1 single command and 2 fetch POST to use.
I understand your point but given the state of this, if node-nitro does not provide anything surplus to existing one, I don't think it should exist as it adds a lot of overhead and grey-box.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intention for now is simple, to ease the usage of nitro in Nodejs programs without too much boilerplate code repeated and bloated between projects. It's not about overhead but rather about practical usage, it's just a simple wrapper for now, with no intention to bring surplus value to what nitro already has but rather improve developer experience on making a client program that utilizes nitro. 😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For building nitro during installation if no binary is available for the platform, let's move it to another issue as I believe it will be rather complex. Let's deliver what is useful for gaining more adoption of nitro from fellow developers then improve it later with more features...😅

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I'm looking at a reference build script at the link below
https://github.com/withcatai/node-llama-cpp/blob/master/src/cli/commands/BuildCommand.ts

They also don't use node-gyp but rather have a build script themself. I also prefer it this way, to not rely on node-gyp if not necessary means no dependency on Python and other native libs required by node-gyp, then we can have a less demanding library => more usage from other projects => more visibility, and more contribution back to nitro and this library. 😉

/**
* Get the system resources information
*/
export async function getResourcesInfo(): Promise<ResourcesInfo> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this one is how Jan is USING Nitro, should not be something here to be reuse

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This one is for getting the number of CPU cores on the machine to provide a default value if not specified when running nitro. It's also useful for ones who use nitro-node, I believe.

@hiro-v
Copy link
Contributor

hiro-v commented Feb 6, 2024

Overall, thank you for the great job.
However I have some comments:

  1. It's ok to download binaries, but not ok to check a lot and use the respective build. It bloats the core execute.ts, esp. with upcoming builds.
  2. This should include install and build from source, I think https://github.com/withcatai/node-llama-cpp did it

@louis-jan
Copy link
Contributor

@InNoobWeTrust @hiro-v I would like to proceed and merge this pull request. Any further updates will be in separate PRs.

Copy link
Contributor

@louis-jan louis-jan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@louis-jan louis-jan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Let's resolve the pending ones in the next PRs.

@tikikun tikikun merged commit a33e074 into janhq:main Feb 19, 2024
18 checks passed
@InNoobWeTrust InNoobWeTrust deleted the feat/1635/decouple-nitro-inference-engine-into-a-library branch February 21, 2024 07:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Archived in project
Development

Successfully merging this pull request may close these issues.

4 participants