-
Couldn't load subscription status.
- Fork 4.9k
Added ggml as submodule #613
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
My experience with submodules, to put it mildly, is "far from optimal" because they make development much more tedious, especially for rapidly evolving projects. Imagine needing to change the API; you must modify it in the submodule, then update submodules in all projects using the library, and finally adjust the API usage in all these projects. The worst aspect of this is that it obscures the changes made by splitting them across multiple commits. I suggest using a monorepo where ggml, whisper.cpp, and llama.cpp coexist. This approach makes it much easier to implement changes in a single commit (e.g., a lib API change) and greatly improves the process of writing CI scripts that check everything together in one step. If necessary, individual projects can still be exported from the monorepo to separate repositories, but this is honestly rarely needed. @ggerganov have you considered switching to monorepo? |
|
My fear with a monorepo is that is promotes ggml as just the plumbing of whisper.cpp and llama.cpp, rather than a tensor framework in its own right, with wider applicability. I also guess this is a matter of personal opinion, but I believe having this separation of repos is a good thing - it encourages classic software development principles like good API design and separation of concerns. However, I'll go along with whatever @ggerganov and the community deems best. |
Right. These are good arguments to consider too. If the goal is to decouple ggml from whisper.cpp and llama.cpp and include it in these projects via release process, this makes sense. |
|
My two cents is that when you have a closely-related dependency tree like this, it makes sense to just put them in the same repo. You can always put them in separate folders with separate readme's if you want them to "feel separate" while also getting the benefits that @prusnak explained. Dependency management is hard. |
I feel git submodules offer the ideal solution here: they're separate repos, but still embedded within a project. Now that there are two projects based on ggml, my concern would be the risk/cost associated with having two or three separate versions of it being maintained at once. Moreover, as ggml is a logical unit, my feeling is that it should be maintained separately, and changes to it should be considered with respect to it as an independent project, rather than to the immediate aims of the downstream project in question. Dependency management is hard, but traditional versioning and good practice work well, and similar projects manage just fine with such an approach. |
|
Hey all, I am still thinking how to organize the projects to make it easier to work with. |
|
@ggerganov personally been using submodules since forever, but i recently came across this post https://diziet.dreamwidth.org/14666.html |
|
Just to put in my two cents - I'm in favour of using submodules for this.
I would argue this is a feature. It forces you to think about the changes to the API and how they affect other projects (not just whisper and llama) and to think of ggml (for example) as a separate project. In my experience this is a Good Thing even if slightly inconvenient at times. A monorepo with ggml, whisper.cpp, llama.cpp and then whatever else comes up might be easier for whisper.cpp and llama.cpp, but it's less usable by other projects. If I'm just interested in embedding ggml as a submodule in my own work, I don't want to carry around the whisper.cpp and llama.cpp stuff (+ any new things), and then have to sift through all the changes for those projects whenever there are updates. Keeping them separate also makes documentation easier to follow. Small and simple wins IMHO. |
|
Btw, these are nice write-ups about |
|
I've had the best experience with https://github.com/ingydotnet/git-subrepo. We have an opensource game project and we use git subrepo to pull in addons https://github.com/V-Sekai/v-sekai-game/tree/main/addons. We also do this with c++ godot engine. |
|
If you're open to sticking with The downside is that it cause your configuration time via cmake to go up; albeit slightly in this case, and potentially only initially since the downloaded sources are cached. |
Currently,
ggml.candggml.hare maintained as separate files in both whisper.cpp and llama.cpp. This pull request adds the ggml project as a submodule to whisper.cpp, allowing ggml development to occur in one location only, and minimising the risk of diverging branches.I propose taking the same approach with llama.cpp as well, and will follow this up separately if this pull request is approved.
Happy to tweak or discuss if needed.