Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for jupyter-ai in web assembly (jupyterlite) #822

Open
jaanli opened this issue Jun 8, 2024 · 16 comments
Open

Support for jupyter-ai in web assembly (jupyterlite) #822

jaanli opened this issue Jun 8, 2024 · 16 comments
Labels
enhancement New feature or request

Comments

@jaanli
Copy link

jaanli commented Jun 8, 2024

This is currently not supported but would be amazing to have:

https://github.com/jupyterlite/jupyterlite does not support some of the extensions.

I have some spare capacity to do this over the summer if needed, but would need some pointers in the right direction.

For more context, I gave a talk at @Lightning-AI last week (talks.onefact.org) around a demo for jupyter-ai: https://colab.research.google.com/github/onefact/loving-the-baseline/blob/main/nearest-neighbors.ipynb

It will take too long for VC-backed companies like Lightning to get their act together and release this, nor will the @google team be able to support this kind of thing in Colaboratory (due to Gemini / Cloud growth requirements as a public company with fiduciary responsibility).

HTH! 🙇

@jaanli jaanli added the enhancement New feature or request label Jun 8, 2024
@linlol
Copy link

linlol commented Jun 8, 2024

hmmmm sorry that just curious, are you sure that using Jupyter AI in jupyterlite as a productionised solution?

Since JupyterLite is fully wasm powered, it would be a bit slow during initialising the import

@jaanli
Copy link
Author

jaanli commented Jun 8, 2024

hmmmm sorry that just curious, are you sure that using Jupyter AI in jupyterlite as a productionised solution?

Since JupyterLite is fully wasm powered, it would be a bit slow during initialising the import

Thanks for checking!! That's okay if it is a bit slow, because it is at least a reliable solution (right now there is no reliable solution, which is why I am learning rust to try to help and @onefact joined the the @uxlfoundation and @pytorch foundations :)

@jtpio
Copy link
Member

jtpio commented Jun 9, 2024

Linking to #119 as related.

@jtpio
Copy link
Member

jtpio commented Jun 9, 2024

Jupyter AI requires some setup on the Jupyter Server side to be able to use the models. And it also uses some dependencies that might not be packaged for WebAssembly / Pyodide yet.

Maybe the Jupyter AI stack could be made more modular, so some parts can run in the browser without needing a server. And also support the Notebook 7 interface (#504).

In the meantime, another approach is to write pure frontend extensions that can talk to AI providers directly, for example https://github.com/jtpio/jupyterlab-codeium

@jaanli
Copy link
Author

jaanli commented Jun 9, 2024

Related: endomorphosis/ipfs_transformers#1

@jtpio
Copy link
Member

jtpio commented Jun 11, 2024

For reference, here is another experiment for getting Jupyter AI - like features to work in JupyterLite: https://github.com/jtpio/jupyterlab-codestral. It is distributed as a regular pure frontend JupyterLab extension, which means it works in JupyterLite out of the box.

This extensions talks to the MistralAI API directly to get inline completions and chat results. But the same idea would apply to any other provider. However it would likely be quite tedious to implement a new plugin for each provider, as there are many AI providers already and there will likely be many more in the future.

So maybe it could be interesting to start looking into how to reuse some bits of Jupyter AI, so they can be used in JupyterLite too. The main obstacle at the moment are the API which seem to be going through the Jupyter Server extension:

handlers = [
(r"api/ai/api_keys/(?P<api_key_name>\w+)", ApiKeysHandler),
(r"api/ai/config/?", GlobalConfigHandler),
(r"api/ai/chats/?", RootChatHandler),
(r"api/ai/chats/history?", ChatHistoryHandler),
(r"api/ai/chats/slash_commands?", SlashCommandsInfoHandler),
(r"api/ai/providers?", ModelProviderHandler),
(r"api/ai/providers/embeddings?", EmbeddingsModelProviderHandler),
(r"api/ai/completion/inline/?", DefaultInlineCompletionHandler),

However the API calls to the model providers can also directly be made from the frontend directly, as demonstrated by https://github.com/jtpio/jupyterlab-codestral.

While the frontend components of Jupyter AI can be reused (for example the chat panel), it's not clear how the langchain-based logic could be used in JupyterLite, since Jupyter AI expects to be running on Jupyter Server.

Maybe the easiest would be create a dedicated package that would be using langchainjs in the browser, instead of the Python version of langchain on the server: https://github.com/langchain-ai/langchainjs.
According to their docs:

This is built to integrate as seamlessly as possible with the LangChain Python package. Specifically, this means all objects (prompts, LLMs, chains, etc) are designed in a way where they can be serialized and shared between languages.

@jaanli
Copy link
Author

jaanli commented Jun 11, 2024

interesting - thanks @jtpio - unfortunately there are some issues with langchain for our use at @onefact :(

what about @duckdb?

specifically, this extension should support this use case: https://github.com/duckdb/duckdb_vss

i'm happy to give it a go but could use a hand. runs in WASM, we've built most of our stack using duckdb and are giving a talk about it in august: https://duckdb.org/2024/08/15/duckcon5.html

@endomorphosis
Copy link

Sorry for some delay, one of my 2 partners was recently hospitalized with cancer.

I am working on doing an IPFS/libp2p based mlops inference, whereby requests from libp2p can be made to perform inference, based on the reputation or whitelist of the agents identity, much in the way of BOINC or huggingface petals, except for that each node has a "model manager" that maintains a list of what models and files are located local/s3/ipfs/https and includes model metadata i.e. flops per token, minimum vram, disk usage, etc.

@jaanli
Copy link
Author

jaanli commented Jun 13, 2024

Sorry for some delay, one of my 2 partners was recently hospitalized with cancer.

Oh no @endomorphosis that is really intense - sorry to hear.

Happy to chat about this in the upcoming weeks, my personal email is jaan.li@jaan.li if needed, and you can email help@payless.health in case there are medical financing issues/options needed.

We have worked with difficult conditions and datasets, and can try to help find the price of things if rarer treatment options need to be financed (last year we did this campaign on the price variation in surgery and can try to see whether we have the prices of things in case a single case agreement exception needs to be negotiated).

More resources that can be helpful:

https://dollarfor.org/
https://www.payless.health/help/cancer-checklist (we wrote this last year)

A book a friend recommended that may help (hopefully, it is likely a very difficult time): https://mitpress.mit.edu/9780262621045/choices-in-healing/ (libgen)

I will take the time to learn more about the basics of IPFS first before responding to the technical aspects of this discussion - just happen to work in this area in open source.

If anyone has good examples of basic python notebooks or resources to learn about IPFS please let me know - don't have much of a computer science background but happy to learn and test to see how to enable more local LLMs/LMMs.

@endomorphosis
Copy link

I will take the time to learn more about the basics of IPFS first before responding to the technical aspects of this discussion - just happen to work in this area in open source.

If anyone has good examples of basic python notebooks or resources to learn about IPFS please let me know - don't have much of a computer science background but happy to learn and test to see how to enable more local LLMs/LMMs.

I have done some looking for you, there are varying levels of integration from the most simple example of getting data from a https ipfs gateway e.g.

wget -O pretrained/AdaBins_nyu.pt https://cloudflare-ipfs.com/ipfs/Qmd2mMnDLWePKmgfS8m6ntAg4nhV5VkUyAydYBp8cWWeB7/AdaBins_nyu.pt

to deeper integrations e.g.

https://github.com/BlockScience/cats
https://github.com/pollinations/pollinations
https://github.com/AlgoveraAI/ipfspy

to implementing mesh networking systems to send pytorch tensors between peers e.g.

OpenMined/PyGrid-deprecated---see-PySyft-@d2aa4b4
discussion
OpenMined/PyGrid-deprecated---see-PySyft-#166

@jaanli
Copy link
Author

jaanli commented Jun 14, 2024

That's awesome, thank you @endomorphosis ! That is plenty to get started on to see what a nearest neighbor baseline method could look like :) 🙏

Example for a medical use case: https://colab.research.google.com/github/onefact/loving-the-baseline/blob/main/nearest-neighbors.ipynb

@endomorphosis
Copy link

That's awesome, thank you @endomorphosis ! That is plenty to get started on to see what a nearest neighbor baseline method could look like :) 🙏

Example for a medical use case: https://colab.research.google.com/github/onefact/loving-the-baseline/blob/main/nearest-neighbors.ipynb

I had intended to make a fully decentralized ipfs K_nearest_neighbors implementation e.g. https://github.com/endomorphosis/ipfs_faiss and
https://github.com/endomorphosis/ipfs_datasets, but I do want you to know that I will also be supporting S3.

My intent was to make a tool to go import/export a FAISS index to a Pail key/value pair https://github.com/web3-storage/pail, and then use pail's "prefix" function to shard the large index to smaller subindexes, and then export those CAR files to the huggingface directory and the ipfs model manager.

The CAR file can then be used to do the K nearest neighbors inference either with the FAISS library in huggingface https://huggingface.co/docs/datasets/en/faiss_es , or perhaps in some time the WASM equivalent https://www.npmjs.com/package/hnswlib-wasm

@jaanli
Copy link
Author

jaanli commented Jun 14, 2024

That's awesome, thank you @endomorphosis ! That is plenty to get started on to see what a nearest neighbor baseline method could look like :) 🙏
Example for a medical use case: https://colab.research.google.com/github/onefact/loving-the-baseline/blob/main/nearest-neighbors.ipynb

I had intended to make a fully decentralized ipfs K_nearest_neighbors implementation e.g. https://github.com/endomorphosis/ipfs_faiss and https://github.com/endomorphosis/ipfs_datasets, but I do want you to know that I will also be supporting S3.

My intent was to make a tool to go import/export a FAISS index to a Pail key/value pair https://github.com/web3-storage/pail, and then use pail's "prefix" function to shard the large index to smaller subindexes, and then export those CAR files to the huggingface directory and the ipfs model manager.

The CAR file can then be used to do the K nearest neighbors inference either with the FAISS library in huggingface https://huggingface.co/docs/datasets/en/faiss_es , or perhaps in some time the WASM equivalent https://www.npmjs.com/package/hnswlib-wasm

So glad I mentioned our road map - this is incredibly helpful for scaling our work, and I think you have just solved a key bottleneck in translating the raw data from hospitals (e.g. price sheets we published from 4000+ hospitals into actionable arbitrage opportunity like in our campaign).

That's because this translation often requires a dbt/data modeling layer, and a separate machine learning layer (often with different infrastructure).

The IPFS work you have done (and the references to the WASM libraries) means we can decouple @onefact from public cloud providers, which is desirable due to misaligned incentives that affect both @jupyterlab's ability to allocate resources to this, and to @duckdb and @WasmEdge (and your?) resources to solve for use cases at the edge in poor countries that can't negotiate with public market-driven quarterly earnings targets -- oft driven by cloud divisions of organizations like @google or @amzn.

Interesting related discussion on the @motherduckdb slack:

There's a lot to discuss here, however generally I agree with the possibility of misaligned incentives. Cloud data companies know there a gravitational force to data due to joins, and many of the easy paths bias toward entirely remote data, transforms, compute. Even the asymmetry of AWS egress/ingress costs shows this bias. To be fair this is often efficient, minimizing data transport. However I think you can see the misalignment in the customer pressure to get data lakes as first class peers to internal tables, and in the workflow of ad hoc notebooks, where roundtripping to local is a common practice for it's flexibility. 
We think there's a lot of potential to making it much easier to mix and match local and remote sources in one query, and in easier roundtripping. It certainly makes it easier for my ETL jobs to INSERT INTO instead of always dropping into S3, for example. 
Let me discuss internally and come up with more specific response with examples. It's a busy week here with our GA launch, but maybe this is worth a call with you to brainstorm a bit more.

The financial time horizons are long (15-20 years, rather than typical 5-7 year horizons for VCs to turn their fund for LPs or institutional investors like family offices/private equity firms/state pension funds) to make money from open source, which is why we are focusing on the non-profit side now. This is roughly equivalent of duckdb.org being a non-profit and duckdblabs.com being a for-profit, also known as a "contract hybrid" business model eligible for both SBIR/STTR federal funding alongside philanthropy.

Open to any input here, we are learning as we go and picking up more open source interest (and interest from folks like the @ec-europa president and their advisory teams) to support this work.

@endomorphosis
Copy link

That's awesome, thank you @endomorphosis ! That is plenty to get started on to see what a nearest neighbor baseline method could look like :) 🙏
Example for a medical use case: https://colab.research.google.com/github/onefact/loving-the-baseline/blob/main/nearest-neighbors.ipynb

I had intended to make a fully decentralized ipfs K_nearest_neighbors implementation e.g. https://github.com/endomorphosis/ipfs_faiss and https://github.com/endomorphosis/ipfs_datasets, but I do want you to know that I will also be supporting S3.
My intent was to make a tool to go import/export a FAISS index to a Pail key/value pair https://github.com/web3-storage/pail, and then use pail's "prefix" function to shard the large index to smaller subindexes, and then export those CAR files to the huggingface directory and the ipfs model manager.
The CAR file can then be used to do the K nearest neighbors inference either with the FAISS library in huggingface https://huggingface.co/docs/datasets/en/faiss_es , or perhaps in some time the WASM equivalent https://www.npmjs.com/package/hnswlib-wasm

So glad I mentioned our road map - this is incredibly helpful for scaling our work, and I think you have just solved a key bottleneck in translating the raw data from hospitals (e.g. price sheets we published from 4000+ hospitals into actionable arbitrage opportunity like in our campaign).

That's because this translation often requires a dbt/data modeling layer, and a separate machine learning layer (often with different infrastructure).

The IPFS work you have done (and the references to the WASM libraries) means we can decouple @onefact from public cloud providers, which is desirable due to misaligned incentives that affect both @jupyterlab's ability to allocate resources to this, and to @duckdb and @WasmEdge (and your?) resources to solve for use cases at the edge in poor countries that can't negotiate with public market-driven quarterly earnings targets -- oft driven by cloud divisions of organizations like @google or @amzn.

Interesting related discussion on the @motherduckdb slack:

There's a lot to discuss here, however generally I agree with the possibility of misaligned incentives. Cloud data companies know there a gravitational force to data due to joins, and many of the easy paths bias toward entirely remote data, transforms, compute. Even the asymmetry of AWS egress/ingress costs shows this bias. To be fair this is often efficient, minimizing data transport. However I think you can see the misalignment in the customer pressure to get data lakes as first class peers to internal tables, and in the workflow of ad hoc notebooks, where roundtripping to local is a common practice for it's flexibility. 
We think there's a lot of potential to making it much easier to mix and match local and remote sources in one query, and in easier roundtripping. It certainly makes it easier for my ETL jobs to INSERT INTO instead of always dropping into S3, for example. 
Let me discuss internally and come up with more specific response with examples. It's a busy week here with our GA launch, but maybe this is worth a call with you to brainstorm a bit more.

The financial time horizons are long (15-20 years, rather than typical 5-7 year horizons for VCs to turn their fund for LPs or institutional investors like family offices/private equity firms/state pension funds) to make money from open source, which is why we are focusing on the non-profit side now. This is roughly equivalent of duckdb.org being a non-profit and duckdblabs.com being a for-profit, also known as a "contract hybrid" business model eligible for both SBIR/STTR federal funding alongside philanthropy.

Open to any input here, we are learning as we go and picking up more open source interest (and interest from folks like the @ec-europa president and their advisory teams) to support this work.

I forgot that the intent is to allow nodes to communicate over libp2p, such that nodes discover other nodes, and list what resources they have available and their identity, such that access to resources can be allowed / denied in the node configuration settings, because I intend to extend the Huggingface Agents library to define "tools" based on the resources that are available on the libp2p network.

I too have recently applied for SBIR.

@jaanli
Copy link
Author

jaanli commented Jun 14, 2024

that's awesome @endomorphosis ! happy to help if i can, it's tricky to structure the contract hybrid structures necessary (and source the types of patient capital required with sufficiently long time horizons).

for the libp2p nodes, this reminds me of soulseek/p2p/torrent type models.

a user journey would be helpful here:

have you tried making one?

with diagrams.net, figma.com (happy to send you a starter?), or mermaid.live (or other tools? like reactflow.dev)

@endomorphosis
Copy link

for the libp2p nodes, this reminds me of soulseek/p2p/torrent type models.

Yes, this is correct, this is how individual nodes will be able to share their resources.

a user journey would be helpful here:

have you tried making one?

I am more focused on trying to get a good minor release out for most of the packages, and was planning to do the documentation after the fact, and make the code self documenting enough that my Jr. dev can port it over to another language (node js), as a result of being resource constrained by the number of developer hours I have available between the 3 of us, given @Mwni 's cancer hospitalization.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

4 participants