Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

python bindings? #82

Closed
bryanhpchiang opened this issue Mar 13, 2023 · 19 comments
Closed

python bindings? #82

bryanhpchiang opened this issue Mar 13, 2023 · 19 comments
Labels
enhancement New feature or request

Comments

@bryanhpchiang
Copy link

No description provided.

@MarkSchmidty
Copy link

Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)

@shaunabanana
Copy link

As a temporary work-around before an "official" binding is available, I've written a quick script to call the llama.cpp executable that supports streaming and interactive mode: https://github.com/shaunabanana/llama.py

@aratic
Copy link

aratic commented Mar 15, 2023

Python Bindings for llama.cpp: https://pypi.org/project/llamacpp/0.1.3/ (not mine, just found them)

looks promising on description, will try try and feedback

@bryanhpchiang
Copy link
Author

bryanhpchiang commented Mar 15, 2023 via email

@gjmulder gjmulder added the enhancement New feature or request label Mar 15, 2023
@seemanne
Copy link

I hacked something together tonight on this. Its python-cpp bindings for the model directly (allowing you to call model.generate() from python and getting the string in returned. It doesn't support setting parameters within python yet (working on it), but it is model agnostic so you can load whatever ggml supports.

Merging it would however require splitting some parts of the code out of main.cpp which @ggerganov has argued against IIRc.

https://github.com/seemanne/llamacpypy

@aratic
Copy link

aratic commented Mar 18, 2023

@seemanne u did what I want, dude, easier to expose or integrate with web or chat, appreciate and will give it a try

@seemanne
Copy link

Ok I updated this and put it into a proper fork. You can now pass parameters in python. I will need to do some refactoring to pull upstream changes each time but it should work and i tested it on linux and mac.

@LostRuins
Copy link
Collaborator

LostRuins commented Mar 18, 2023

I wrote my own ctypes bindings and wrapped it an a KoboldAI compatible REST API.

https://github.com/LostRuins/llamacpp-for-kobold

@abetlen
Copy link
Collaborator

abetlen commented Mar 22, 2023

EDIT: I've adapted the single-file bindings into a pip-installable package (will build llama.cpp on install) called llama-cpp-python

If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.

To use it you have to first build llama.cpp as a shared library and then put the shared library in the same directory as the llama.py file.

On Linux for example, to build the shared library, update the Makefile to add a new target for libllama.so

libllama.so: llama.o ggml.o
	$(CXX) $(CXXFLAGS) -shared -fPIC -o libllama.so llama.o ggml.o $(LDFLAGS)

Then run make libllama.so to generate the library.

@Ayushk4
Copy link

Ayushk4 commented Mar 22, 2023

We are putting together a Huggingface-like library with python interface that auto-downloads pre-compressed models at https://github.com/NolanoOrg/cformers/#usage
Please let us know what features and models would you us to add.

@Piezoid
Copy link
Contributor

Piezoid commented Mar 22, 2023

I also found these bindings https://github.com/PotatoSpudowski/fastLLaMa

Some feature suggestions, mostly about low level capabilities:

  • Accessing the output classifier activations from python, enabling sampling and quantitative evaluation from python,
  • Managing k/v state with its own python object, allowing to swap them in and out.
  • Array view on embedding and the possibility to bypass ggml_get_rows for feeding back embedding crafted by hand.

@DrBenjamin
Copy link

EDIT: I've adapted the single-file bindings into a pip-installable package (will build llama.cpp on install) called llama-cpp-python

If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.

To use it you have to first build llama.cpp as a shared library and then put the shared library in the same directory as the llama.py file.

On Linux for example, to build the shared library, update the Makefile to add a new target for libllama.so

libllama.so: llama.o ggml.o
	$(CXX) $(CXXFLAGS) -shared -fPIC -o libllama.so llama.o ggml.o $(LDFLAGS)

Then run make libllama.so to generate the library.

Having issues with both variants on a M1 Mac:
from llama_cpp import Llama
produces this error:

zsh: illegal hardware instruction

The python bindings approach (after building the shared library) produces:
libllama.so' (mach-o file, but is an incompatible architecture (have 'arm64', need 'x86_64'))

@PotatoSpudowski
Copy link

I also found these bindings https://github.com/PotatoSpudowski/fastLLaMa

Some feature suggestions, mostly about low level capabilities:

  • Accessing the output classifier activations from python, enabling sampling and quantitative evaluation from python,
  • Managing k/v state with its own python object, allowing to swap them in and out.
  • Array view on embedding and the possibility to bypass ggml_get_rows for feeding back embedding crafted by hand.

We have added most of these suggestion in the latest fastLLaMa update 👀

@dmahurin
Copy link
Contributor

dmahurin commented May 17, 2023

As #1156 is closed as a duplicate of this issue, I am bringing the discussion here about the creation of an official python binding in the llama.cpp repository (which I now assume the the objective of this issue.)

The current external python bindings seem to be:

  • llama-cpp-python
  • llamacpp
  • pyllamacpp
  • llamacpypy
  • fastllama

But none really stand out as a candidate to be merged into llama.cpp.

My proposal is to model the llama.cpp bindings after rwkv.cpp by @saharNooby (bert.cpp also follows a similar path).

  • Assume llama.cpp is built as shared library (built with BUILD_SHARED_LIBS=ON )
  • Create basic python bindings that just expose functions in the shared library as is
  • (optional) Create a higher level model that that builds on the basic bindings
  • Change the examples to be written in python, rather than bash

We could keep the following in mind for the basic binding:

  • completeness - should be a complete binding, aligning to the llama.cpp interface
  • simplicity - relatively straight-forward, easy to understand implementation
  • easily to maintain

Any suggestion on which of the current external python bindings could be considered a good start for eventual merge into llama.cpp?

@dmahurin
Copy link
Contributor

dmahurin commented May 17, 2023

If anyone's just looking for python bindings I put together llama.py which uses ctypes to expose the current C API.

@abetlen , could this single file implementation be a starting point for a basic binding mentioned above?

@abetlen
Copy link
Collaborator

abetlen commented May 17, 2023

Hey @dmahurin w.r.t your proposal I should point out that what you describe is the current state of llama-cpp-python

  • Builds llama.cpp as a shared library with support for all the llama.cpp build flags for OpenBLAS, CUDA, CLBLAST
  • Exposes the entire llama.h api as-is via ctypes
  • Exposes a higher-level api that handles type conversions, memory management, etc
  • Includes examples for both APIs in python

That being said I don't have anything against moving these bindings to llama.cpp if that's something the maintainers think is worthwhile / the right approach. I would also be happy to transfer over the PyPI package as long as we don't break downstream users (text-generation-webui, langchain, babyagi, etc).

@seemanne
Copy link

@dmahurin I don't see how merging python bindings into this repo is needed when solutions like the repo of @abetlen exist already.
Putting the maintenance burden of a mainly python library on mainly cpp developers just so bash can be removed from the readme seems unwise. The python bindings are already linked in the readme, those who want them will find them.

@dmahurin
Copy link
Contributor

dmahurin commented May 17, 2023

Hi @seemanne, the purpose is not to replace bash. The purpose is to widen the development community. Like it or not, Python is a very common language in AI development.

I do not think having supported python code would put any burden on cpp developers. Again, reference rwkv.cpp and bert.cpp. The python support in rwkv.cpp for example comes in the form of two python files.

As mentioned, there are 5 independent python bindings for llama.cpp. Unifying at least the base python binding would help to focus related python llama.cpp development.

@dmahurin
Copy link
Contributor

dmahurin commented Jun 1, 2023

@abetlen, perhaps you saw that I created pull request #1660 to add low level python bindings from llama-cpp-python.

The PR puts llama_cpp.py and low level examples in the examples/ folder.

There was a bit of filtering and some squashing to get clean history for the low level commits. For now I excluded the multi char change, mainly because it created a dependency on another file, util.py. (and the change looks more complex than I would expect).

Any comments on the approach of the PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests