Skip to content
This repository has been archived by the owner on Jun 24, 2024. It is now read-only.

WIP: Bloom Inference #85

Closed
wants to merge 24 commits into from
Closed

WIP: Bloom Inference #85

wants to merge 24 commits into from

Conversation

hhamud
Copy link
Contributor

@hhamud hhamud commented Mar 29, 2023

Completes #45

Will refactor and remove most of the duplicate code before merging.

  • Adds a build flag for apple silicon
  • Adds ggml_alibi, ggml_compute_alibi_forward_alibi_f32, ggml_compute_forward_alibi_f16, ggml_compute_forward_alibi to ggml.c
  • Adds in ggml_view_2d, ggml_alibi and ggml_gelu to lib.rs in ggml-raw crate.
  • Adds a bloom mode flag to the CLI.
  • Splits up the original library into a commons folder and the models into a models folder
  • Moved some code into functions in main.rs in llama-cli
  • Updates ggml.h

@hhamud hhamud marked this pull request as draft March 29, 2023 03:06
@philpax philpax mentioned this pull request Mar 30, 2023
@hhamud hhamud marked this pull request as ready for review April 1, 2023 17:39
@hhamud
Copy link
Contributor Author

hhamud commented Apr 1, 2023

Hitting a bit of an issue here when trying to read in the converted bloom models on HF

[2023-04-01T16:59:13Z INFO  llama_cli] Warning: Bad token in vocab at index 0
thread 'main' panicked at 'Could not load model: ReadExactFailed { source: Error { kind: UnexpectedEof, message: "failed to fill whole buffer" }, bytes: 4 }', llama-cli/src/main.rs:267:10

@philpax philpax mentioned this pull request Apr 6, 2023
@hhamud hhamud mentioned this pull request Apr 6, 2023
3 tasks

/// The weights for the BLOOM model. All the mutable state is split into a
/// separate struct `InferenceSession`.
pub struct BLOOM {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go with Bloom:

In UpperCamelCase, acronyms and contractions of compound words count as one word: use Uuid rather than UUID, Usize rather than USize or Stdin rather than StdIn. In snake_case, acronyms and contractions are lower-cased: is_xid_start.

https://rust-lang.github.io/api-guidelines/naming.html

@philpax
Copy link
Collaborator

philpax commented Apr 7, 2023

Sorry for taking so long to look at this - it looks good! Once @setzer22 gives his OK (just so that he's happy with the overall structure of things), I'll get it ready for the PR merge chain. Don't update the PR yet - there's some other changes we'll likely need to land first, and it'll be easier for you to do them all at once.

@hhamud
Copy link
Contributor Author

hhamud commented Apr 7, 2023

Sorry for taking so long to look at this - it looks good! Once @setzer22 gives his OK (just so that he's happy with the overall structure of things), I'll get it ready for the PR merge chain. Don't update the PR yet - there's some other changes we'll likely need to land first, and it'll be easier for you to do them all at once.

sure, I decided against further restructuring the model load function to cut down on code duplication since there are multiple PRs either making changes or using it.

I will re-open this #74 and refactor it after completing #85 this one.

@iacore
Copy link
Contributor

iacore commented Apr 8, 2023

I hope this could be in another repo. Having bloom model in llama-rs is strange.

@philpax
Copy link
Collaborator

philpax commented Apr 8, 2023

Yes - the reason we're keeping it here for now is because they share a lot of commonalities in architecture, and the base LLaMA library keeps changing. We will probably do one or more of the following over time:

  1. Put BLOOM support under an optional feature
  2. Rename the library to indicate its support for more than one kind of GPT-like
  3. Work out how to share the implementation details, then split them all out into their own crates

@philpax philpax added this to the 0.1 milestone Apr 10, 2023
@philpax philpax mentioned this pull request Apr 13, 2023
@danforbes
Copy link
Contributor

What's the status of this? Looks like it's not very up-to-date. Is there anything I can do to help?

@hhamud
Copy link
Contributor Author

hhamud commented Apr 13, 2023

What's the status of this? Looks like it's not very up-to-date. Is there anything I can do to help?

It's on pause until most of the major changes have been merged in, see discussion

@danforbes danforbes mentioned this pull request Apr 13, 2023
@philpax
Copy link
Collaborator

philpax commented Apr 13, 2023

There's also a chance that we investigate #137 before we tackle this again, just to reduce the rework if we go down that road, but I'm not sure yet. (Would appreciate your thoughts on it, btw!)

@setzer22
Copy link
Collaborator

setzer22 commented Apr 14, 2023

There's also a chance that we investigate #137 before we tackle this again, just to reduce the rework if we go down that road, but I'm not sure yet. (Would appreciate your thoughts on it, btw!)

I would not go that far. Making our own computation graphs is a significant undertaking, and support for Bloom has been here for a while now. I would prioritize merging this before making any other big refactors.

In my experience, abstractions always come up better when you don't design them with a single use case to test. If we build the computation graph API and only make sure it works for LLaMA, chances are we're going to have to rework it later on anyway when adding support for other models. It would be better to have multiple models in first, to make sure the abstraction we come up with is more solid.

@danforbes danforbes mentioned this pull request Apr 16, 2023
@philpax philpax mentioned this pull request Apr 20, 2023
@hhamud
Copy link
Contributor Author

hhamud commented Apr 20, 2023

closing this PR as #141 has integrated changes from this #85 PR

@hhamud hhamud closed this Apr 20, 2023
@danforbes danforbes mentioned this pull request Apr 30, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants