FAQ

1. The download.sh script doesn't work on default bash in MacOS X:

Please see answers from theses issues:

meta-llama#41 (comment)
meta-llama#53 (comment)

2. Generations are bad!

Keep in mind these models are not finetuned for question answering. As such, they should be prompted so that the expected answer is the natural continuation of the prompt.

Here are a few examples of prompts (from issue#69) geared towards finetuned models, and how to modify them to get the expected results:

Do not prompt with "What is the meaning of life? Be concise and do not repeat yourself." but with "I believe the meaning of life is"
Do not prompt with "Explain the theory of relativity." but with "Simply put, the theory of relativity states that"
Do not prompt with "Ten easy steps to build a website..." but with "Building a website can be done in 10 simple steps:\n"

To be able to directly prompt the models with questions / instructions, you can either:

Prompt it with few-shot examples so that the model understands the task you have in mind.
Finetune the models on datasets of instructions to make them more robust to input prompts.

We've updated example.py with more sample prompts. Overall, always keep in mind that models are very sensitive to prompts (particularly when they have not been finetuned).

3. CUDA Out of memory errors

The example.py file pre-allocates a cache according to these settings:

model_args: ModelArgs = ModelArgs(max_seq_len=max_seq_len, max_batch_size=max_batch_size, **params)

Accounting for 14GB of memory for the model weights (7B model), this leaves 16GB available for the decoding cache which stores 2 * 2 * n_layers * max_batch_size * max_seq_len * n_heads * head_dim bytes.

With default parameters, this cache was about 17GB (2 * 2 * 32 * 32 * 1024 * 32 * 128) for the 7B model.

We've added command line options to example.py and changed the default max_seq_len to 512 which should allow decoding on 30GB GPUs.

Feel free to lower these settings according to your hardware.

4. Other languages

The model was trained primarily on English, but also on a few other languages with Latin or Cyrillic alphabets.

For instance, LLaMA was trained on Wikipedia for the 20 following languages: bg, ca, cs, da, de, en, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk.

LLaMA's tokenizer splits unseen characters into UTF-8 bytes, as a result, it might also be able to process other languages like Chinese or Japanese, even though they use different characters.

Although the fraction of these languages in the training was negligible, LLaMA still showcases some abilities in Chinese-English translation:

Prompt = "J'aime le chocolat = I like chocolate\n祝你一天过得愉快 ="
Output = "I wish you a nice day"

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FAQ.md

FAQ.md

FAQ

1. The download.sh script doesn't work on default bash in MacOS X:

2. Generations are bad!

3. CUDA Out of memory errors

4. Other languages

Files

FAQ.md

Latest commit

History

FAQ.md

File metadata and controls

FAQ

1. The download.sh script doesn't work on default bash in MacOS X:

2. Generations are bad!

3. CUDA Out of memory errors

4. Other languages