Allow larger tensor sizes. #626

Seltsamsel · 2023-03-30T16:25:44Z

This should solve #599 .
I was able to successfully run 30B/ggml-model-q4_0.bin with -c 6000 (and extended ctx_size but that's a different story), but I've not tested too many other cases. I'd like to hear feedback if this is a sensible approach before putting more effort in it.

prusnak · 2023-03-30T20:58:53Z

#599 (comment) says to use int64_t not size_t for element counts and size_t only for memory offsets.

Seltsamsel · 2023-03-31T04:00:48Z

#599 (comment) says to use int64_t not size_t for element counts and size_t only for memory offsets.

Yes, thanks.

@ggerganov

Use int64_t instead of size_t because when dealing with computations of number of elements, every now and then one has to subtract them and we don't want to get underflows.

This makes perfect sense to me (although I guess it would be a rather rare case).

What needs to be done I think is to carefully go through the places where we multiply ne and make the cast to int64_t.

I don't exactly get this. ne itself has to be an int64_t, right? Because I'd expect ne to sometimes get extremely large itself (and not just calculations including ne) when dealing with tensors with more than $2^{31}$ elements?
So

llama.cpp/ggml.h

Line 261 in 9733104

int ne[GGML_MAX_DIMS]; // number of elements

already has to be an int64_t, and thus the function argument types, and thus the iterators. I'd claim I made the minimal replacements possible. Or do I get something wrong?

howard0su · 2023-03-31T15:50:27Z

why do you want to change the type of ne? int is pretty large to hold 6000. I suggest a minimal and the necessary change.

Seltsamsel · 2023-03-31T16:15:28Z

why do you want to change the type of ne? int is pretty large to hold 6000.

Because ne is not 6000 for -c 6000 but 6000 * n_layer * n_embd = 2396160000 > $2^{31}$ and thus cannot be hold by an int. See

llama.cpp/llama.cpp

Lines 259 to 260 in 1d08882

    
           const int n_mem      = n_layer*n_ctx; 
        
           const int n_elements = n_embd*n_mem;

I suggest a minimal and the necessary change.

Agreed and that's at least what I hope I did.

prusnak · 2023-04-01T13:56:26Z

Can you please rebase on top of the current master and resolve the conflicts? The PR cannot be merged as of now because of the merge conflicts.

PS: Because of the nature of PR it might be easier to start from scratch and force push instead of trying to merge/rebase with master?

Seltsamsel · 2023-04-01T16:42:14Z

Can you please rebase on top of the current master and resolve the conflicts? The PR cannot be merged as of now because of the merge conflicts.

PS: Because of the nature of PR it might be easier to start from scratch and force push instead of trying to merge/rebase with master?

It was just a minor conflict in a fprintf in llama.cpp but also only due to code deletion, so the merge was rather unproblematic. Rebased and tested, seems to work.

prusnak

I confirm that before the change I saw segfault with 7B model and -c 20000 and this change fixes the issue and 7B model runs even with -c 40000.

ggerganov · 2023-04-02T10:17:15Z

#599 (comment) says to use int64_t not size_t for element counts and size_t only for memory offsets.

Yes, thanks.

@ggerganov

Use int64_t instead of size_t because when dealing with computations of number of elements, every now and then one has to subtract them and we don't want to get underflows.

This makes perfect sense to me (although I guess it would be a rather rare case).

What needs to be done I think is to carefully go through the places where we multiply ne and make the cast to int64_t.

I don't exactly get this. ne itself has to be an int64_t, right? Because I'd expect ne to sometimes get extremely large itself (and not just calculations including ne) when dealing with tensors with more than 231 elements? So

llama.cpp/ggml.h

Line 261 in 9733104

int ne[GGML_MAX_DIMS]; // number of elements

already has to be an int64_t, and thus the function argument types, and thus the iterators. I'd claim I made the minimal replacements possible. Or do I get something wrong?

I guess I was wrong - seems we really need to change ne to int64_t as you suggest.
The alternative is to make the KV cache tensor 2 dimensional instead of 1, but not sure if this is better long term and it will require probably similar amount of changes in ggml.c

I haven't tested this, but merging it (yolo), so keep an eye out for any regressions and if necessary - revert

ggerganov mentioned this pull request Mar 30, 2023

Support tensors with 64-bit number of elements in ggml #599

Closed

Seltsamsel changed the title ~~[WIP] Allow larger tensor sizes.~~ Allow larger tensor sizes. Mar 31, 2023

Allow larger tensor sizes.

c928ab8

prusnak approved these changes Apr 1, 2023

View reviewed changes

ggerganov merged commit c0bb1d3 into ggerganov:master Apr 2, 2023

Seltsamsel mentioned this pull request Apr 2, 2023

ggml_new_tensor_impl: not enough space in the context's memory pool (needed 1178743392, available 1106773248) antimatter15/alpaca.cpp#130

Open

sw mentioned this pull request Apr 5, 2023

Performance Discrepancy: gpt4all Faster than Optimized llama.cpp #603

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow larger tensor sizes. #626

Allow larger tensor sizes. #626

Seltsamsel commented Mar 30, 2023

prusnak commented Mar 30, 2023 •

edited

Loading

Seltsamsel commented Mar 31, 2023

howard0su commented Mar 31, 2023

Seltsamsel commented Mar 31, 2023

prusnak commented Apr 1, 2023 •

edited

Loading

Seltsamsel commented Apr 1, 2023

prusnak left a comment

ggerganov commented Apr 2, 2023 •

edited

Loading

Allow larger tensor sizes. #626

Allow larger tensor sizes. #626

Conversation

Seltsamsel commented Mar 30, 2023

prusnak commented Mar 30, 2023 • edited Loading

Seltsamsel commented Mar 31, 2023

howard0su commented Mar 31, 2023

Seltsamsel commented Mar 31, 2023

prusnak commented Apr 1, 2023 • edited Loading

Seltsamsel commented Apr 1, 2023

prusnak left a comment

Choose a reason for hiding this comment

ggerganov commented Apr 2, 2023 • edited Loading

prusnak commented Mar 30, 2023 •

edited

Loading

prusnak commented Apr 1, 2023 •

edited

Loading

ggerganov commented Apr 2, 2023 •

edited

Loading