Unexpected larger perplexity on PG19 #48

Yiyi-philosophy · 2023-12-10T07:53:57Z

Hi Yarn team,

I hope this finds you well. I've been using your code jquesnelle/yarn for testing the PG19 dataset. While reviewing the eval.sh script, I noticed some definitions related to the PG19 dataset, but the code for testing perplexity results seems somewhat unclear.

Settings:

Base Model: llama2-7b
Base Context Size: 4096
Sliding Window: 256, 4096
Scale to: 8192

In eval.sh, I found the following definition for the PG19 dataset:

# python eval/perplexity.py -m meta-llama/Llama-2-7b-hf --dataset pg19 --split test --feature text --save-tokenized output/pg19-test-tokenized
PG19="--tokenized emozilla/pg19-test-tokenized"

However, I did not find the actual code for testing perplexity results. Therefore, I attempted to use our own defined code for testing:

python eval/perplexity.py --dataset pg19 --feature "text" --samples 5 -m meta-llama/Llama-2-7b-hf --max-tokens $max_tokens --min-tokens $max_tokens --tokens-step 4000 --tokenized emozilla/pg19-test-tokenized --yarn $((max_tokens / 4096)) --max-position-embeddings 4096 --original-max-position-embeddings 4096 --dataset-min-tokens $max_tokens --sliding-window 4096 --custom-model --aggressive-memory --flash-attention

I observed that the results differ when the sliding window is set to 4096 and 256. In comparison to other PI and dy-ntk methods, the performance is unstable with a sliding window set to 256 and stable with a sliding window set to 4096.

Results:

--sliding-window 4096:
- meta-llama/Llama-2-7b-hf: 8192=9.89344
--sliding-window 256:
- meta-llama/Llama-2-7b-hf: 8192=32.76145

In contrast, other PI and dy-ntk methods maintain relatively stable performance when the sliding window is set to 256 and 4096:

Sliding window: 4096 / 256
- PI: 10.79598 / 10.65644
- dy-ntk: 10.19125 / 10.214816

I would appreciate your insights on this phenomenon. Is this behavior considered normal, or could there be potential configuration issues? If possible, could you provide more detailed information about the PG19 dataset testing script to help me better understand and adjust the testing configuration?

Thank you very much for your time and assistance. I look forward to your response.

Best regards,
Yiran

The text was updated successfully, but these errors were encountered:

ClarkChin08 · 2024-04-25T01:54:50Z

Hi! I also experienced this issue. How many samples do you use? Have you solved the issue?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unexpected larger perplexity on PG19 #48

Unexpected larger perplexity on PG19 #48

Yiyi-philosophy commented Dec 10, 2023

ClarkChin08 commented Apr 25, 2024

Unexpected larger perplexity on PG19 #48

Unexpected larger perplexity on PG19 #48

Comments

Yiyi-philosophy commented Dec 10, 2023

ClarkChin08 commented Apr 25, 2024