Skip to content

Commit

Permalink
Fix memory allocation issues and seg faults
Browse files Browse the repository at this point in the history
  • Loading branch information
ggerganov committed Mar 23, 2023
1 parent 483bab2 commit 4870e45
Showing 1 changed file with 16 additions and 18 deletions.
34 changes: 16 additions & 18 deletions llama.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,9 @@ struct llama_context {
// decode output (2-dimensional array: [n_tokens][n_vocab])
std::vector<float> logits;
bool logits_all = false;

// work buffer for transformer evaluation
std::vector<uint8_t> buf_eval;
};

struct llama_context_params llama_context_default_params() {
Expand Down Expand Up @@ -627,27 +630,19 @@ static bool llama_eval_internal(
const int n_rot = hparams.n_embd/hparams.n_head;

auto & mem_per_token = lctx.mem_per_token;
auto & buf_eval = lctx.buf_eval;

// TODO: fix this hardcoded size
static size_t buf_size = 512u*1024*1024;
static void * buf = malloc(buf_size);
if (mem_per_token*(n_past + N + 16) > buf_eval.size()) {
const size_t buf_size_new = 1.618*buf_eval.size();

if (mem_per_token > 0 && mem_per_token*N > buf_size) {
const size_t buf_size_new = 1.3*(mem_per_token*N); // add 30% to account for ggml object overhead
//fprintf(stderr, "\n%s: reallocating buffer from %zu to %zu bytes\n", __func__, buf_size, buf_size_new);
//fprintf(stderr, "\n%s: reallocating buffer from %zu to %zu bytes\n", __func__, buf_eval.size(), buf_size_new);

// reallocate
buf_size = buf_size_new;
buf = realloc(buf, buf_size);
if (buf == nullptr) {
fprintf(stderr, "%s: failed to allocate %zu bytes\n", __func__, buf_size);
return false;
}
buf_eval.resize(buf_size_new);
}

struct ggml_init_params params = {
/*.mem_size =*/ buf_size,
/*.mem_buffer =*/ buf,
/*.mem_size =*/ buf_eval.size(),
/*.mem_buffer =*/ buf_eval.data(),
};

struct ggml_context * ctx0 = ggml_init(params);
Expand Down Expand Up @@ -832,10 +827,11 @@ static bool llama_eval_internal(
memcpy(logits_out.data(), (float *) ggml_get_data(inpL) + (n_vocab*(N-1)), sizeof(float)*n_vocab);
}

if (mem_per_token == 0) {
mem_per_token = ggml_used_mem(ctx0)/N;
if (N == 1) {
mem_per_token = ggml_used_mem(ctx0)/(n_past + N);
}
//fprintf(stderr, "used_mem = %zu\n", ggml_used_mem(ctx0));

//fprintf(stderr, "\nused_mem = %zu, %zu MB\n", ggml_used_mem(ctx0), ggml_used_mem(ctx0)/1024/1024);

ggml_free(ctx0);

Expand Down Expand Up @@ -1416,6 +1412,8 @@ struct llama_context * llama_init_from_file(
return nullptr;
}

ctx->buf_eval.resize(512u*1024u*1024u);

return ctx;
}

Expand Down

6 comments on commit 4870e45

@mikewii
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure how to trace it correctly, but it still segfaults for me, using alpaca model

Thread 1 "main" received signal SIGSEGV, Segmentation fault.
0x000055555556cc09 in ggml_element_size ()
(gdb) backtrace
#0  0x000055555556cc09 in ggml_element_size ()
#1  0x0000555555580593 in llama_eval_internal(llama_context&, int const*, int, int, int) ()
#2  0x0000555555580c09 in llama_eval ()
#3  0x000055555555b9d5 in main ()

@mikewii
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok found out that tensor type become invalid at that point

(gdb) print tensor.type
$3 = 3206835900

@mikewii
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also found out it always crash here when n_past reach 513

#1  in llama_eval_internal (lctx=..., tokens=<optimized out>, n_tokens=1, 
    n_past=n_past@entry=513, n_threads=<optimized out>)
    at llama.cpp/llama.cpp:681
681                     struct ggml_tensor * v = ggml_view_1d(ctx0, model.memory_v, N*n_embd, (ggml_element_size(model.memory_v)*n_embd)*(il*n_ctx + n_past));

@nazthelizard122
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

still segfaults,
ggml_new_tensor_impl: not enough space in the context's memory pool (needed 537102496, available 536870912)
zsh: segmentation fault ./main -m ./models/65B/ggml-model-q4_0.bin -t 16 -n 256 --repeat_penalty 1.0
It was fixed but I guess it reappeared?

@rabidcopy
Copy link
Contributor

@rabidcopy rabidcopy commented on 4870e45 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Never had any segfaults but after this I do. :( ggml_new_tensor_impl: not enough space in the context's memory pool (needed 537029792, available 536870912)
Edit: Always starts off fine for a bit, then after a handful of responses it happens.

@slaren
Copy link
Collaborator

@slaren slaren commented on 4870e45 Mar 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have broken --perplexity as well, runs out of memory on the first batch.

Please sign in to comment.