Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow larger tensor sizes. #626

Merged
merged 1 commit into from
Apr 2, 2023
Merged

Allow larger tensor sizes. #626

merged 1 commit into from
Apr 2, 2023

Conversation

Seltsamsel
Copy link
Contributor

This should solve #599 .
I was able to successfully run 30B/ggml-model-q4_0.bin with -c 6000 (and extended ctx_size but that's a different story), but I've not tested too many other cases. I'd like to hear feedback if this is a sensible approach before putting more effort in it.

@prusnak
Copy link
Collaborator

prusnak commented Mar 30, 2023

#599 (comment) says to use int64_t not size_t for element counts and size_t only for memory offsets.

@Seltsamsel
Copy link
Contributor Author

#599 (comment) says to use int64_t not size_t for element counts and size_t only for memory offsets.

Yes, thanks.

@ggerganov

Use int64_t instead of size_t because when dealing with computations of number of elements, every now and then one has to subtract them and we don't want to get underflows.

This makes perfect sense to me (although I guess it would be a rather rare case).

What needs to be done I think is to carefully go through the places where we multiply ne and make the cast to int64_t.

I don't exactly get this. ne itself has to be an int64_t, right? Because I'd expect ne to sometimes get extremely large itself (and not just calculations including ne) when dealing with tensors with more than $2^{31}$ elements?
So

int ne[GGML_MAX_DIMS]; // number of elements

already has to be an int64_t, and thus the function argument types, and thus the iterators. I'd claim I made the minimal replacements possible. Or do I get something wrong?

@howard0su
Copy link
Collaborator

why do you want to change the type of ne? int is pretty large to hold 6000. I suggest a minimal and the necessary change.

@Seltsamsel
Copy link
Contributor Author

why do you want to change the type of ne? int is pretty large to hold 6000.

Because ne is not 6000 for -c 6000 but 6000 * n_layer * n_embd = 2396160000 > $2^{31}$ and thus cannot be hold by an int. See

llama.cpp/llama.cpp

Lines 259 to 260 in 1d08882

const int n_mem = n_layer*n_ctx;
const int n_elements = n_embd*n_mem;

I suggest a minimal and the necessary change.

Agreed and that's at least what I hope I did.

@Seltsamsel Seltsamsel changed the title [WIP] Allow larger tensor sizes. Allow larger tensor sizes. Mar 31, 2023
@prusnak
Copy link
Collaborator

prusnak commented Apr 1, 2023

Can you please rebase on top of the current master and resolve the conflicts? The PR cannot be merged as of now because of the merge conflicts.

PS: Because of the nature of PR it might be easier to start from scratch and force push instead of trying to merge/rebase with master?

@Seltsamsel
Copy link
Contributor Author

Can you please rebase on top of the current master and resolve the conflicts? The PR cannot be merged as of now because of the merge conflicts.

PS: Because of the nature of PR it might be easier to start from scratch and force push instead of trying to merge/rebase with master?

It was just a minor conflict in a fprintf in llama.cpp but also only due to code deletion, so the merge was rather unproblematic. Rebased and tested, seems to work.

Copy link
Collaborator

@prusnak prusnak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I confirm that before the change I saw segfault with 7B model and -c 20000 and this change fixes the issue and 7B model runs even with -c 40000.

@ggerganov
Copy link
Owner

ggerganov commented Apr 2, 2023

#599 (comment) says to use int64_t not size_t for element counts and size_t only for memory offsets.

Yes, thanks.

@ggerganov

Use int64_t instead of size_t because when dealing with computations of number of elements, every now and then one has to subtract them and we don't want to get underflows.

This makes perfect sense to me (although I guess it would be a rather rare case).

What needs to be done I think is to carefully go through the places where we multiply ne and make the cast to int64_t.

I don't exactly get this. ne itself has to be an int64_t, right? Because I'd expect ne to sometimes get extremely large itself (and not just calculations including ne) when dealing with tensors with more than 231 elements? So

int ne[GGML_MAX_DIMS]; // number of elements

already has to be an int64_t, and thus the function argument types, and thus the iterators. I'd claim I made the minimal replacements possible. Or do I get something wrong?

I guess I was wrong - seems we really need to change ne to int64_t as you suggest.
The alternative is to make the KV cache tensor 2 dimensional instead of 1, but not sure if this is better long term and it will require probably similar amount of changes in ggml.c

I haven't tested this, but merging it (yolo), so keep an eye out for any regressions and if necessary - revert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants