Llama.cpp example for cpp backend #2904

mreso · 2024-01-25T00:26:27Z

Description

This PR replaces #2527 which adds a llama.cpp example to the cpp backend. It contains all adjustments to the removed TochScriptBackend.

Fixes #(issue)

Type of change

Please delete options that are not relevant.

New feature (non-breaking change which adds functionality)
This change requires a documentation update

Feature/Issue validation/testing

Please describe the Unit or Integration tests that you ran to verify your changes and relevant result summary. Provide instructions so it can be reproduced.
Please also list any relevant details for your test configuration.

Test A
Logs for Test A

torchserve_cpp build is complete. To run unit test:   ./_build/test/torchserve_cpp_test
Running main() from /home/ubuntu/serve/cpp/_build/_deps/googletest-src/googletest/src/gtest_main.cc
[==========] Running 46 tests from 11 test suites.
[----------] Global test environment set-up.
[----------] 1 test from BackendIntegTest
[ RUN      ] BackendIntegTest.TestOTFProtocolAndHandler
I0126 05:37:49.989585 855970 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:96.200719|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706247469,reqi
I0126 05:37:49.989680 855970 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:96.200719|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706247469,reqi
[       OK ] BackendIntegTest.TestOTFProtocolAndHandler (122 ms)
[----------] 1 test from BackendIntegTest (122 ms total)

The bird flew away and the little girl was left alone in the garden. She was very sad and cried all the way home.
<s>

I0126 05:37:51.754795 855970 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:1758.094646|#ModelName:babyllama,Level:Model|#hostname:ip-172-31-55-226,1706247471,llm_ts_0,llm_ts_1
I0126 05:37:51.754818 855970 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:1758.094646|#ModelName:babyllama,Level:Model|#hostname:ip-172-31-55-226,1706247471,llm_ts_0,llm_ts_1
[       OK ] ModelPredictTest.TestLoadPredictBabyLlamaHandler (1764 ms)
[ RUN      ] ModelPredictTest.TestLoadPredictLlmHandler
llama_model_loader: loaded meta data with 19 key-value pairs and 291 tensors from test/resources/examples/llamacpp/llamacpp_handler/llama-2-7b-chat.Q5_0.gguf (version GGUF V2)
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = llama
llama_model_loader: - kv   1:                               general.name str              = LLaMA v2
llama_model_loader: - kv   2:                       llama.context_length u32              = 4096
llama_model_loader: - kv   3:                     llama.embedding_length u32              = 4096
llama_model_loader: - kv   4:                          llama.block_count u32              = 32
llama_model_loader: - kv   5:                  llama.feed_forward_length u32              = 11008
llama_model_loader: - kv   6:                 llama.rope.dimension_count u32              = 128
llama_model_loader: - kv   7:                 llama.attention.head_count u32              = 32
llama_model_loader: - kv   8:              llama.attention.head_count_kv u32              = 32
llama_model_loader: - kv   9:     llama.attention.layer_norm_rms_epsilon f32              = 0.000001
llama_model_loader: - kv  10:                          general.file_type u32              = 8
llama_model_loader: - kv  11:                       tokenizer.ggml.model str              = llama
llama_model_loader: - kv  12:                      tokenizer.ggml.tokens arr[str,32000]   = ["<unk>", "<s>", "</s>", "<0x00>", "<...
llama_model_loader: - kv  13:                      tokenizer.ggml.scores arr[f32,32000]   = [0.000000, 0.000000, 0.000000, 0.0000...
llama_model_loader: - kv  14:                  tokenizer.ggml.token_type arr[i32,32000]   = [2, 3, 3, 6, 6, 6, 6, 6, 6, 6, 6, 6, ...
llama_model_loader: - kv  15:                tokenizer.ggml.bos_token_id u32              = 1
llama_model_loader: - kv  16:                tokenizer.ggml.eos_token_id u32              = 2
llama_model_loader: - kv  17:            tokenizer.ggml.unknown_token_id u32              = 0
llama_model_loader: - kv  18:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:   65 tensors
llama_model_loader: - type q5_0:  225 tensors
llama_model_loader: - type q6_K:    1 tensors
llm_load_vocab: special tokens definition check successful ( 259/32000 ).
llm_load_print_meta: format           = GGUF V2
llm_load_print_meta: arch             = llama
llm_load_print_meta: vocab type       = SPM
llm_load_print_meta: n_vocab          = 32000
llm_load_print_meta: n_merges         = 0
llm_load_print_meta: n_ctx_train      = 4096
llm_load_print_meta: n_embd           = 4096
llm_load_print_meta: n_head           = 32
llm_load_print_meta: n_head_kv        = 32
llm_load_print_meta: n_layer          = 32
llm_load_print_meta: n_rot            = 128
llm_load_print_meta: n_embd_head_k    = 128
llm_load_print_meta: n_embd_head_v    = 128
llm_load_print_meta: n_gqa            = 1
llm_load_print_meta: n_embd_k_gqa     = 4096
llm_load_print_meta: n_embd_v_gqa     = 4096
llm_load_print_meta: f_norm_eps       = 0.0e+00
llm_load_print_meta: f_norm_rms_eps   = 1.0e-06
llm_load_print_meta: f_clamp_kqv      = 0.0e+00
llm_load_print_meta: f_max_alibi_bias = 0.0e+00
llm_load_print_meta: n_ff             = 11008
llm_load_print_meta: n_expert         = 0
llm_load_print_meta: n_expert_used    = 0
llm_load_print_meta: rope scaling     = linear
llm_load_print_meta: freq_base_train  = 10000.0
llm_load_print_meta: freq_scale_train = 1
llm_load_print_meta: n_yarn_orig_ctx  = 4096
llm_load_print_meta: rope_finetuned   = unknown
llm_load_print_meta: model type       = 7B
llm_load_print_meta: model ftype      = Q5_0
llm_load_print_meta: model params     = 6.74 B
llm_load_print_meta: model size       = 4.33 GiB (5.52 BPW)
llm_load_print_meta: general.name     = LLaMA v2
llm_load_print_meta: BOS token        = 1 '<s>'
llm_load_print_meta: EOS token        = 2 '</s>'
llm_load_print_meta: UNK token        = 0 '<unk>'
llm_load_print_meta: LF token         = 13 '<0x0A>'
llm_load_tensors: ggml ctx size =    0.11 MiB
llm_load_tensors: offloading 0 repeating layers to GPU
llm_load_tensors: offloaded 0/33 layers to GPU
llm_load_tensors:        CPU buffer size =  4435.49 MiB
..................................................................................................
llama_new_context_with_model: n_ctx      = 512
llama_new_context_with_model: freq_base  = 10000.0
llama_new_context_with_model: freq_scale = 1
llama_kv_cache_init:        CPU KV buffer size =   256.00 MiB
llama_new_context_with_model: KV self size  =  256.00 MiB, K (f16):  128.00 MiB, V (f16):  128.00 MiB
llama_new_context_with_model:        CPU input buffer size   =     9.01 MiB
llama_new_context_with_model:        CPU compute buffer size =    70.50 MiB
llama_new_context_with_model: graph splits (measure): 1
Context initialized successfully
New Token:  Unterscheidung
New Token:  between
New Token:  the
New Token:  two
New Token:  types
New Token:  of
New Token:  the
New Token: ft
New Token:

New Token:  everybody
New Token:  knows
New Token:  that
New Token:  the
New Token: ft
New Token:  is
New Token:  a
New Token:  crime
New Token:  that
New Token:  involves
New Token:  taking
New Token:  something
New Token:  that
New Token:  belongs
New Token:  to
New Token:  someone
New Token:  else
New Token:  without
New Token:  their
New Token:  permission
New Token: .
New Token:  However
New Token: ,
New Token:  Unterscheidung
New Token:  between
New Token:  the
New Token:  two
New Token:  types
New Token:  of
New Token:  the
New Token: ft
New Token:

New Token:  everybody
New Token:  knows
New Token:  that
New Token:  the
New Token: ft
New Token:  is
New Token:  a
New Token:  crime
New Token:  that
New Token:  involves
New Token:  taking
New Token:  something
New Token:  that
New Token:  belongs
New Token:  to
New Token:  someone
New Token:  else
New Token:  without
New Token:  their
New Token:  permission
New Token: .
New Token:  However
New Token: ,

llama_print_timings:        load time =     404.97 ms
llama_print_timings:      sample time =       2.34 ms /    64 runs   (    0.04 ms per token, 27350.43 tokens per second)
llama_print_timings: prompt eval time =       0.00 ms /     1 tokens (    0.00 ms per token,      inf tokens per second)
llama_print_timings:        eval time =   10358.85 ms /    64 runs   (  161.86 ms per token,     6.18 tokens per second)
llama_print_timings:       total time =   10663.05 ms /    65 tokens
Generated Text Str:  Unterscheidung between the two types of theft
 everybody knows that theft is a crime that involves taking something that belongs to someone else without their permission. However,
Generated Text Str:  Unterscheidung between the two types of theft
 everybody knows that theft is a crime that involves taking something that belongs to someone else without their permission. However,
I0126 05:38:02.453844 855970 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:10512.526553|#ModelName:llamacpp,Level:Model|#hostname:ip-172-31-55-226,1706247482,llm_ts_0,llm_ts_1
I0126 05:38:02.453866 855970 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:10512.526553|#ModelName:llamacpp,Level:Model|#hostname:ip-172-31-55-226,1706247482,llm_ts_0,llm_ts_1
[       OK ] ModelPredictTest.TestLoadPredictLlmHandler (10794 ms)
[ RUN      ] ModelPredictTest.TestLoadPredictBaseHandler
I0126 05:38:02.567005 855970 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:5.536009|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706247482,mnist_ts_0,mnist_ts_1
I0126 05:38:02.567039 855970 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:5.536009|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706247482,mnist_ts_0,mnist_ts_1
[       OK ] ModelPredictTest.TestLoadPredictBaseHandler (15 ms)
[ RUN      ] ModelPredictTest.TestLoadPredictMnistHandler
I0126 05:38:02.582451 855970 log_metric.cc:92] [METRICS]HandlerTime.Milliseconds:4.719433|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706247482,mnist_ts_0,mnist_ts_1
I0126 05:38:02.582480 855970 log_metric.cc:92] [METRICS]PredictionTime.Milliseconds:4.719433|#ModelName:mnist_scripted_v2,Level:Model|#hostname:ip-172-31-55-226,1706247482,mnist_ts_0,mnist_ts_1
[       OK ] ModelPredictTest.TestLoadPredictMnistHandler (15 ms)
[ RUN      ] ModelPredictTest.TestBackendInitWrongModelDir
E0126 05:38:02.583073 855970 model_archive.cc:53] Failed to init Manifest from: test/resources/examples/mnist/MAR-INF/MANIFEST.json
[       OK ] ModelPredictTest.TestBackendInitWrongModelDir (0 ms)
[ RUN      ] ModelPredictTest.TestBackendInitWrongHandler
[       OK ] ModelPredictTest.TestBackendInitWrongHandler (0 ms)
[ RUN      ] ModelPredictTest.TestLoadModelFailure
E0126 05:38:02.589013 855970 torch_scripted_handler.cc:22] loading the model: mnist_scripted_v2, device id: -1, error: open file failed because of errno 2 on fopen: , file path: test/resources/examples/mnist/wrong_model/mnist_script.pt
[       OK ] ModelPredictTest.TestLoadModelFailure (5 ms)
[ RUN      ] ModelPredictTest.TestLoadPredictMnistHandlerFailure
E0126 05:38:02.602114 855970 base_handler.cc:154] Failed to load tensor for request id: mnist_ts_0, c10 error: PytorchStreamReader failed reading zip archive: failed finding central directory
E0126 05:38:02.606549 855970 base_handler.cc:154] Failed to load tensor for request id: mnist_ts_1, c10 error: PytorchStreamReader failed reading zip archive: failed finding central directory
E0126 05:38:02.608585 855970 base_handler.cc:51] Failed to handle this batch after: Preprocessing
[       OK ] ModelPredictTest.TestLoadPredictMnistHandlerFailure (19 ms)
[----------] 8 tests from ModelPredictTest (12616 ms total)

[----------] 1 test from DLLoaderTest
[ RUN      ] DLLoaderTest.TestGetInstance
[       OK ] DLLoaderTest.TestGetInstance (0 ms)
[----------] 1 test from DLLoaderTest (0 ms total)

[----------] 3 tests from LoggingTest
[ RUN      ] LoggingTest.TestIncorrectLogInitialization
[       OK ] LoggingTest.TestIncorrectLogInitialization (0 ms)
[ RUN      ] LoggingTest.TestJSONConfigLogInitialization
[       OK ] LoggingTest.TestJSONConfigLogInitialization (0 ms)
[ RUN      ] LoggingTest.TestFileLogInitialization
[       OK ] LoggingTest.TestFileLogInitialization (0 ms)
[----------] 3 tests from LoggingTest (0 ms total)

[----------] 6 tests from TSLogMetricTest
[ RUN      ] TSLogMetricTest.TestCounterMetric
[       OK ] TSLogMetricTest.TestCounterMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestGaugeMetric
[       OK ] TSLogMetricTest.TestGaugeMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestHistogramMetric
[       OK ] TSLogMetricTest.TestHistogramMetric (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithRequestId
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithRequestId (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithoutRequestId
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithoutRequestId (1 ms)
[ RUN      ] TSLogMetricTest.TestTSLogMetricEmitWithIncorrectDimensionData
[       OK ] TSLogMetricTest.TestTSLogMetricEmitWithIncorrectDimensionData (0 ms)
[----------] 6 tests from TSLogMetricTest (7 ms total)

[----------] 2 tests from TSLogMetricsCacheTest
[ RUN      ] TSLogMetricsCacheTest.TestInitialize
[       OK ] TSLogMetricsCacheTest.TestInitialize (3 ms)
[ RUN      ] TSLogMetricsCacheTest.TestGetMetric
I0126 05:38:02.621423 855970 log_metric.cc:89] [METRICS]GaugeTsMetricExample.Count:1.5|#model_name:model_name,host_name:host_name|#hostname:ip-172-31-55-226,1706247482
[       OK ] TSLogMetricsCacheTest.TestGetMetric (1 ms)
[----------] 2 tests from TSLogMetricsCacheTest (4 ms total)

[----------] 3 tests from RegistryTest
[ RUN      ] RegistryTest.TestValidConfigFile
[       OK ] RegistryTest.TestValidConfigFile (1 ms)
[ RUN      ] RegistryTest.TestInvalidConfigFile
[       OK ] RegistryTest.TestInvalidConfigFile (0 ms)
[ RUN      ] RegistryTest.TestReInitialize
[       OK ] RegistryTest.TestReInitialize (1 ms)
[----------] 3 tests from RegistryTest (3 ms total)

[----------] 3 tests from UnitsTest
[ RUN      ] UnitsTest.TestGetExistingUnitMapping
[       OK ] UnitsTest.TestGetExistingUnitMapping (0 ms)
[ RUN      ] UnitsTest.TestGetNonExistentUnitMapping
[       OK ] UnitsTest.TestGetNonExistentUnitMapping (0 ms)
[ RUN      ] UnitsTest.TestGetEmptyUnitMapping
[       OK ] UnitsTest.TestGetEmptyUnitMapping (0 ms)
[----------] 3 tests from UnitsTest (0 ms total)

[----------] 10 tests from YAMLConfigTest
[ RUN      ] YAMLConfigTest.TestLoadValidConfigFrontendContext
[       OK ] YAMLConfigTest.TestLoadValidConfigFrontendContext (1 ms)
[ RUN      ] YAMLConfigTest.TestLoadValidConfigBackendContext
[       OK ] YAMLConfigTest.TestLoadValidConfigBackendContext (1 ms)
[ RUN      ] YAMLConfigTest.TestLoadMinimalValidConfig
[       OK ] YAMLConfigTest.TestLoadMinimalValidConfig (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithUndefinedDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithUndefinedDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricDimension
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricDimension (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithMissingMetricName
E0126 05:38:02.629822 855970 yaml_config.cc:203] Configuration for a metric must consist of "name", "unit" and "dimensions"
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithMissingMetricName (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyMetricName
E0126 05:38:02.630178 855970 yaml_config.cc:215] Configuration for a metric must consist of a non-empty "name"
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithEmptyMetricName (0 ms)
[ RUN      ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricName
[       OK ] YAMLConfigTest.TestLoadInvalidConfigWithDuplicateMetricName (0 ms)
[----------] 10 tests from YAMLConfigTest (5 ms total)

[----------] 1 test from ManifestTest
[ RUN      ] ManifestTest.TestInitialize
[       OK ] ManifestTest.TestInitialize (0 ms)
[----------] 1 test from ManifestTest (0 ms total)

[----------] Global test environment tear-down
[==========] 46 tests from 11 test suites ran. (12761 ms total)
[  PASSED  ] 46 tests.

Checklist:

Did you have fun?
Have you added tests that prove your fix is effective or that this feature works?
Has code been commented, particularly in hard-to-understand areas?
Have you made corresponding changes to the documentation?

Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Updating llm handler - loadmodel, preprocess, inference methods Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fixed infinite lock by adding request ids to the preprocess method Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adding test script for finding tokens per second llama-7b-chat and ggml version Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> GGUF Compatibility Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fixing unit tests Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Fix typo Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Using folly to read config path Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Removing debug couts Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Processing all the items in the batch Signed-off-by: Shrinath Suresh <shrinath@ideas2it.com> Adopted llama.cpp api changes

chauhang

@mreso Thanks for the updates. Few comments:

The directory structure is bit complex for the examples - as the examples handles files are under the cpp/src/examples folder and the usage prompts etc in the top level examples/cpp/xxx folders. Also there is only one CMake file for all the examples. This is ok for the initial code merge, but will require some cleanup to make it easier for use. Are you planning to change that in the subsequent PR?
Under the cpp/test/resources/examples/ folder there are multiple .pt files. Can you please clarify what these files are for and how they are getting generated? Will be good to include the actual source file / script for generating these.

lxning

We probably need redefine LSP later.

lxning · 2024-01-26T17:40:51Z

cpp/src/examples/llamacpp/llamacpp_handler.cc

+      const int max_tokens_list_size = max_context_size - 4;
+
+      if ((int)tokens_list.size() > max_tokens_list_size) {
+        std::cout << __func__ << ": error: prompt too long ("


can you change to log?

lxning · 2024-01-26T17:43:46Z

cpp/src/examples/llamacpp/llamacpp_handler.cc

+
+        if (llama_eval(llama_ctx, tokens_list.data(), int(tokens_list.size()),
+                       n_past)) {
+          std::cout << "Failed to eval\n" << __func__ << std::endl;


lxning · 2024-01-26T17:45:20Z

cpp/src/examples/llamacpp/llamacpp_handler.cc

+        std::cout << "New Token: "
+                  << llama_token_to_piece(llama_ctx, new_token_id) << std::endl;


do we need log each output?

Not necessary, its an example so I though about keeping it, but can log as debug instead

lxning · 2024-01-26T17:47:58Z

cpp/src/examples/llamacpp/llamacpp_handler.cc

+      }
+
+      std::string generated_text_str = generated_text_stream.str();
+      std::cout << "Generated Text Str: " << generated_text_str << std::endl;


mreso · 2024-01-26T19:27:02Z

@mreso Thanks for the updates. Few comments:

The directory structure is bit complex for the examples - as the examples handles files are under the cpp/src/examples folder and the usage prompts etc in the top level examples/cpp/xxx folders. Also there is only one CMake file for all the examples. This is ok for the initial code merge, but will require some cleanup to make it easier for use. Are you planning to change that in the subsequent PR?

Under the cpp/test/resources/examples/ folder there are multiple .pt files. Can you please clarify what these files are for and how they are getting generated? Will be good to include the actual source file / script for generating these.

Thanks @chauhang!

To 1: I was actually under the impression that source files need to be under the root cmake folder but turns out that you can add out-of-tree sources. Currently, moving the examples over and split up into separate CMake files.

To 2: these are part of the unit tests for the TorchScript model. They probably contain this mlp I can check and add a py script to recreate the pt file in a next pr. Same with the pt file for the data. Better to show a way to upload actual image data instead. Will create an issue to track these.

mreso · 2024-01-26T19:30:46Z

Issue for artifacts recreation: #2909

mreso changed the title ~~Llama.cpp example for cpp backend~~ [WIP] Llama.cpp example for cpp backend Jan 25, 2024

shrinath-suresh and others added 15 commits January 26, 2024 05:34

Adapt to removal of TS backend

94b9c9c

Re-add test for llama.cpp example

49649d5

Add llama.cpp as a submodule

5804216

Point to correct llama.cpp installation

6add8da

Build llama.cpp in build.sh

ed4114d

Skip llama.cpp example test if model weights are not available

4946825

renamed torchscript_model folder into examples

25992e8

Adjust to new base_handler interface

0f00fed

Remove debug statement

d50b19b

Rename llamacpp class + remove dummy.pt file

7c5e242

Move llamacpp config.json

ccb9c4d

Moved and created prompt file

641386c

Reset context for mutiple batch entries

8e33517

Add doc for llamacpp example

9a9b69e

mreso force-pushed the feature/cpp_llama_cpp_rebase branch from a59e85c to 9a9b69e Compare January 26, 2024 05:39

Fix spell check

d08e4eb

mreso requested review from lxning and chauhang January 26, 2024 05:43

mreso changed the title ~~[WIP] Llama.cpp example for cpp backend~~ Llama.cpp example for cpp backend Jan 26, 2024

mreso marked this pull request as ready for review January 26, 2024 05:43

Replace output example in llamacpp example

2d070a2

chauhang suggested changes Jan 26, 2024

View reviewed changes

lxning reviewed Jan 26, 2024

View reviewed changes

mreso added 2 commits January 26, 2024 20:22

Move cpp example src into main examples folder

f9dccd3

Convert cerr/cout into logs

2cb7b03

mreso requested a review from lxning January 26, 2024 20:31

mreso requested a review from chauhang January 26, 2024 20:31

lxning approved these changes Jan 26, 2024

View reviewed changes

mreso enabled auto-merge January 26, 2024 21:19

chauhang approved these changes Jan 26, 2024

View reviewed changes

mreso added this pull request to the merge queue Jan 26, 2024

Merged via the queue into master with commit a07b7d9 Jan 26, 2024
13 checks passed

chauhang added this to the v0.10.0 milestone Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Llama.cpp example for cpp backend #2904

Llama.cpp example for cpp backend #2904

mreso commented Jan 25, 2024 •

edited

chauhang left a comment •

edited

lxning left a comment

lxning Jan 26, 2024

lxning Jan 26, 2024

lxning Jan 26, 2024

mreso Jan 26, 2024

lxning Jan 26, 2024

mreso commented Jan 26, 2024

mreso commented Jan 26, 2024

		std::cout << "New Token: "
		<< llama_token_to_piece(llama_ctx, new_token_id) << std::endl;

Llama.cpp example for cpp backend #2904

Llama.cpp example for cpp backend #2904

Conversation

mreso commented Jan 25, 2024 • edited

Description

Type of change

Feature/Issue validation/testing

Checklist:

chauhang left a comment • edited

Choose a reason for hiding this comment

lxning left a comment

Choose a reason for hiding this comment

lxning Jan 26, 2024

Choose a reason for hiding this comment

lxning Jan 26, 2024

Choose a reason for hiding this comment

lxning Jan 26, 2024

Choose a reason for hiding this comment

mreso Jan 26, 2024

Choose a reason for hiding this comment

lxning Jan 26, 2024

Choose a reason for hiding this comment

mreso commented Jan 26, 2024

mreso commented Jan 26, 2024

mreso commented Jan 25, 2024 •

edited

chauhang left a comment •

edited