diff --git a/examples/models/llama2/README.md b/examples/models/llama2/README.md index f5686eccd95..5ec46eeb94c 100644 --- a/examples/models/llama2/README.md +++ b/examples/models/llama2/README.md @@ -205,11 +205,6 @@ If you want to deploy and run a smaller model for educational purposes. From `ex ``` python -m examples.models.llama2.export_llama -c stories110M.pt -p params.json -X -kv ``` -4. Create tokenizer.bin. - - ``` - python -m extension.llm.tokenizer.tokenizer -t -o tokenizer.bin - ``` ### Option D: Download and export Llama 2 7B model @@ -224,7 +219,6 @@ You can export and run the original Llama 2 7B model. python -m examples.models.llama2.export_llama --checkpoint --params -kv --use_sdpa_with_kv_cache -X -qmode 8da4w --group_size 128 -d fp32 ``` 4. Create tokenizer.bin. - ``` python -m extension.llm.tokenizer.tokenizer -t -o tokenizer.bin ``` @@ -286,7 +280,7 @@ tokenizer.path=/tokenizer.model Using the same arguments from above ``` -python -m examples.models.llama2.eval_llama -c -p -t -d fp32 --max_seq_len --limit +python -m examples.models.llama2.eval_llama -c -p -t -d fp32 --max_seq_len --limit ``` The Wikitext results generated above used: `{max_seq_len: 2048, limit: 1000}` @@ -332,7 +326,7 @@ Note for Mac users: There's a known linking issue with Xcode 15.1. Refer to the cmake-out/examples/models/llama2/llama_main --model_path= --tokenizer_path= --prompt= ``` -For Llama2 and stories models, pass the converted `tokenizer.bin` file instead of `tokenizer.model`. +For Llama2 models, pass the converted `tokenizer.bin` file instead of `tokenizer.model`. To build for CoreML backend and validate on Mac, replace `-DEXECUTORCH_BUILD_XNNPACK=ON` with `-DEXECUTORCH_BUILD_COREML=ON`