Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create h2oGPT 40B based on tiiuae/falcon-40b #216

Closed
arnocandel opened this issue Jun 1, 2023 · 13 comments
Closed

Create h2oGPT 40B based on tiiuae/falcon-40b #216

arnocandel opened this issue Jun 1, 2023 · 13 comments
Assignees

Comments

@arnocandel
Copy link
Member

arnocandel commented Jun 1, 2023

https://huggingface.co/tiiuae/falcon-40b Apache 2.0 model (can't use the -instruct one, since trained on Alpaca)

RWForCausalLM(
  (transformer): RWModel(
    (word_embeddings): Embedding(65024, 8192)
    (h): ModuleList(
      (0-59): 60 x DecoderLayer(
        (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
        (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
        (self_attention): Attention(
          (maybe_rotary): RotaryEmbedding()
          (query_key_value): Linear(in_features=8192, out_features=9216, bias=False)
          (dense): Linear(in_features=8192, out_features=8192, bias=False)
          (attention_dropout): Dropout(p=0.0, inplace=False)
        )
        (mlp): MLP(
          (dense_h_to_4h): Linear(in_features=8192, out_features=32768, bias=False)
          (act): GELU(approximate='none')
          (dense_4h_to_h): Linear(in_features=32768, out_features=8192, bias=False)
        )
      )
    )
    (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
  )
  (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
)
@arnocandel arnocandel self-assigned this Jun 1, 2023
@arnocandel
Copy link
Member Author

arnocandel commented Jun 1, 2023

(env) arno@rippa:/nfs4/llm/h2ogpt(main)$ CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt_graded --drop_truncations=True --train_8bit=True --base_model=tiiuae/falcon-40b --micro_batch_size=1 --batch_size=128 --num_epochs=3 --run_id=6 &> log.6.txt

bin /nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so
CUDA SETUP: CUDA runtime path found: /usr/local/cuda-11/lib64/libcudart.so.11.0
CUDA SETUP: Highest compute capability among GPUs detected: 8.9
CUDA SETUP: Detected CUDA version 117
CUDA SETUP: Loading binary /nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/bitsandbytes/libbitsandbytes_cuda117.so...
Loading checkpoint shards: 100%|██████████| 9/9 [03:35<00:00, 23.90s/it]
Loading checkpoint shards: 100%|██████████| 9/9 [03:35<00:00, 23.91s/it]
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear8bitLt(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear8bitLt(in_features=8192, out_features=8192, bias=False)
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear8bitLt(in_features=8192, out_features=32768, bias=False)
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear8bitLt(in_features=32768, out_features=8192, bias=False)
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 8355840 || all params: 41311649792 || trainable%: 0.020226352716656956
PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear8bitLt(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear8bitLt(in_features=8192, out_features=8192, bias=False)
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear8bitLt(in_features=8192, out_features=32768, bias=False)
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear8bitLt(in_features=32768, out_features=8192, bias=False)
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 8355840 || all params: 41311649792 || trainable%: 0.020226352716656956
Using Validation Metrics: []
Supported Metrics: ['bleu', 'rouge', 'sacrebleu', 'meteor']
Auto set val_set_size 1000
Found cached dataset json (/home/arno/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
  0%|          | 0/1 [00:00<?, ?it/s]Found cached dataset json (/home/arno/.cache/huggingface/datasets/h2oai___json/h2oai--openassistant_oasst1_h2ogpt_graded-29f03a61004f6aef/0.0.0/e347ab1c932092252e717ff3f949105a4dd28b27e842dd53157d2f72e276c2e4)
100%|██████████| 1/1 [00:00<00:00,  5.71it/s]
100%|██████████| 1/1 [00:00<00:00,  5.70it/s]
Tokenizing 30368 training rows
avoid keeping truncated cases to avoid contaminating model with truncation cases.  Original size: 30368
avoid keeping truncated cases to avoid contaminating model with truncation cases.  New size: 21583
Final fine-tuning data:
Train Dataset({
    features: ['source', 'grade_deberta', 'input', 'prompt_type', 'id', 'input_ids', 'token_type_ids', 'attention_mask', 'labels'],
    num_rows: 21583
})
Valid None
Sample input: {'source': ['OpenAssistant/oasst1'], 'grade_deberta': [0.4986453354358673], 'input': ['<human>: You obviously know yourself the best, but how do you believe you learn the best? Do you prefer large datasets all at once or would you rather have numerous small episodes of learning? Finally, as humans we are able to build new knowledge by leveraging what we already know, so are you able to do that easily or would you need to be re-trained/fine-tuned?\n<bot>: I think I learned. Best from numerous small episodes of learning. It feels like the most natural way. Understand the foundation, make a number of attempts, learn from the failures of those attempts, just continue to build on that.\n\n<human>: I think you should learn more about how to use punctuation)\n\n<bot>: Sorry for my bad use of punctuation here is an improved response:\nI think I learned best from numerous small episodes of learning. It feels like the most natural way. Understand the foundation, make a number of attempts and learn from the failures of those attempts. Just continue to build on that.\n\n<human>:'], 'prompt_type': ['plain'], 'id': [20695], 'token_type_ids': [[0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]}
No neptune configured, set NEPTUNE_API_TOKEN env var.
Auto set eval_steps to 25 out of 505 total training steps
Auto step save_steps to 25
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You are adding a <class 'transformers.integrations.TensorBoardCallback'> to the callbacks of this Trainer, but there is already one. The currentlist of callbacks is
:DefaultFlowCallback
TensorBoardCallback
You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")
  0%|          | 0/504 [00:00<?, ?it/s]You're using a PreTrainedTokenizerFast tokenizer. Please note that with a fast tokenizer, using the `__call__` method is faster than using a method to encode the text followed by a call to the `pad` method to get a padded encoding.
/nfs4/llm/h2o-llm/env/lib/python3.10/site-packages/bitsandbytes/autograd/_functions.py:318: UserWarning: MatMul8bitLt: inputs will be cast from torch.float32 to float16 during quantization
  warnings.warn(f"MatMul8bitLt: inputs will be cast from {A.dtype} to float16 during quantization")

0%| | 1/504 [02:35<21:44:04, 155.56s/it]
OutOfMemoryError: CUDA out of memory. Tried to allocate 64.00 MiB (GPU 0; 47.47 GiB total capacity; 43.43 GiB already allocated; 63.81 MiB free; 45.14 GiB reserved in total by PyTorch) If on 2xA6000Ada (48GB)

@arnocandel
Copy link
Member Author

arnocandel commented Jun 1, 2023

  • confirm training works locally
  • prepare merging LoRA + foundation -> HF state
  • prepare to train on 8xA100, with improved LoRA (use more layers)

@arnocandel
Copy link
Member Author

arnocandel commented Jun 1, 2023

Improved LoRA coverage:

CUDA_VISIBLE_DEVICES=0,1 torchrun --nproc_per_node=2 finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt_graded --drop_truncations=True --train_8bit=True --base_model=tiiuae/falcon-40b --micro_batch_size=1 --batch_size=128 --num_epochs=1 --run_id=7 --lora_target_modules='["query_key_value", "dense_h_to_4h", "dense_4h_to_h", "dense"]' &> log.7.txt

PeftModelForCausalLM(
  (base_model): LoraModel(
    (model): RWForCausalLM(
      (transformer): RWModel(
        (word_embeddings): Embedding(65024, 8192)
        (h): ModuleList(
          (0-59): 60 x DecoderLayer(
            (ln_attn): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (ln_mlp): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
            (self_attention): Attention(
              (maybe_rotary): RotaryEmbedding()
              (query_key_value): Linear8bitLt(
                in_features=8192, out_features=9216, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=9216, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (dense): Linear8bitLt(
                in_features=8192, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (attention_dropout): Dropout(p=0.0, inplace=False)
            )
            (mlp): MLP(
              (dense_h_to_4h): Linear8bitLt(
                in_features=8192, out_features=32768, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=8192, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=32768, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
              (act): GELU(approximate='none')
              (dense_4h_to_h): Linear8bitLt(
                in_features=32768, out_features=8192, bias=False
                (lora_dropout): ModuleDict(
                  (default): Dropout(p=0.05, inplace=False)
                )
                (lora_A): ModuleDict(
                  (default): Linear(in_features=32768, out_features=8, bias=False)
                )
                (lora_B): ModuleDict(
                  (default): Linear(in_features=8, out_features=8192, bias=False)
                )
                (lora_embedding_A): ParameterDict()
                (lora_embedding_B): ParameterDict()
              )
            )
          )
        )
        (ln_f): LayerNorm((8192,), eps=1e-05, elementwise_affine=True)
      )
      (lm_head): Linear(in_features=8192, out_features=65024, bias=False)
    )
  )
)
trainable params: 55541760 || all params: 41358835712 || trainable%: 0.13429236835089367

1%| | 2/168 [05:06<7:03:23, 153.03s/it]

@arnocandel
Copy link
Member Author

arnocandel commented Jun 1, 2023

8x A100 80GB tiiuae/falcon-40b + oasst1_h2ogpt_graded INSTRUCT TUNING (4-bit)

Note: failed with OOM for --train_8bit=True, maybe still the PEFT memory overuse bug? https://github.com/huggingface/peft.git@207d2908650f3f4f3ba0e21d243c1b2aee66e72d

torchrun --nproc_per_node=8 finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt_graded --drop_truncations=True --train_4bit=True --base_model=tiiuae/falcon-40b --micro_batch_size=1 --batch_size=32 --num_epochs=3 --lora_target_modules='["query_key_value", "dense_h_to_4h", "dense_4h_to_h", "dense"]' --run_id=8 &> log.8.txt
0%| | 1/504 [00:53<7:31:49, 53.90s/it]

image
https://slack-files.com/T0329MHH6-F05AGKHEW85-a7ce922a1a lora weights, checkpoint and logs

https://huggingface.co/h2oai/h2ogpt-oasst1-falcon-40b

@arnocandel
Copy link
Member Author

arnocandel commented Jun 1, 2023

8x A100 80GB tiiuae/falcon-40b + h2ogpt-oig-oasst1-instruct-cleaned-v3 INSTRUCT TUNING (4-bit)

torchrun --nproc_per_node=8 finetune.py --data_path=h2oai/h2ogpt-oig-oasst1-instruct-cleaned-v3 --drop_truncations=True --train_4bit=True --base_model=tiiuae/falcon-40b --micro_batch_size=2 --batch_size=64 --num_epochs=3 --lora_target_modules='["query_key_value", "dense_h_to_4h", "dense_4h_to_h", "dense"]' --run_id=10 &> log.10.txt
1%| | 128/12213 [18:10<28:15:09, 8.42s/it]
image
https://huggingface.co/h2oai/h2ogpt-oig-oasst1-falcon-40b

@arnocandel
Copy link
Member Author

arnocandel commented Jun 2, 2023

CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oasst1-falcon-40b --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=500 --eval_sharegpt_as_output=False --num_beams=2 --infer_devices=False --load_4bit=True --debug

OOM
image

16-bit/80GB across 2x48GB+1x24GB cards
CUDA_VISIBLE_DEVICES=0,1,2 python generate.py --base_model=h2oai/h2ogpt-oasst1-falcon-40b --chat=False --stream_output=False --gradio=False --eval_sharegpt_prompts_only=500 --eval_sharegpt_as_output=False --num_beams=2 --infer_devices=False --debug

@arnocandel
Copy link
Member Author

arnocandel commented Jun 6, 2023

8xA100 Eval suite

https://github.com/EleutherAI/lm-evaluation-harness
4b701e228768052cfae9043dca13e82052ca5eea

diff --git a/lm_eval/models/huggingface.py b/lm_eval/models/huggingface.py
index 4d3aa24..5e4257e 100644
--- a/lm_eval/models/huggingface.py
+++ b/lm_eval/models/huggingface.py
@@ -76,10 +76,10 @@ class HuggingFaceAutoLM(BaseLM):
         subfolder: Optional[str] = None,
         revision: Optional[str] = "main",
         batch_size: Optional[Union[int, str]] = 1,
-        max_gen_toks: Optional[int] = 256,
+        max_gen_toks: Optional[int] = 512,
         max_length: Optional[int] = None,
         add_special_tokens: Optional[bool] = None,
-        use_accelerate: Optional[bool] = False,
+        use_accelerate: Optional[bool] = True,
         device_map_option: Optional[str] = "auto",
         max_memory_per_gpu: Optional[Union[int, str]] = None,
         max_cpu_memory: Optional[Union[int, str]] = None,
@@ -89,7 +89,7 @@ class HuggingFaceAutoLM(BaseLM):
         peft: str = None,
         load_in_8bit: Optional[bool] = False,
         load_in_4bit: Optional[bool] = False,
-        trust_remote_code: Optional[bool] = False,
+        trust_remote_code: Optional[bool] = True,
         gptq_use_triton: Optional[bool] = False,
     ):
         """Initializes a HuggingFace `AutoModel` and `AutoTokenizer` for evaluation.

CUDA_VISIBLE_DEVICES=0,1 python main.py --model hf-causal-experimental --model_args pretrained=h2oai/h2ogpt-oig-oasst1-falcon-40b --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> h2ogpt-oig-oasst1-falcon-40b.16bit.eval.log

CUDA_VISIBLE_DEVICES=2,3 python main.py --model hf-causal-experimental --model_args pretrained=h2oai/h2ogpt-oasst1-falcon-40b --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> h2ogpt-oasst1-falcon-40b.16bit.eval.log

logs.zip

@arnocandel
Copy link
Member Author

arnocandel commented Jun 7, 2023

8xA100 ShareGPT Eval 40B

CUDA_VISIBLE_DEVICES=4,5 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-falcon-40b --chat=False --stream_output=False --gradio=False --eval_prompts_only_num=500 --eval_as_output=False --num_beams=2 --infer_devices=False --debug --max_new_tokens=512 &> h2ogpt-oig-oasst1-falcon-40b.sharegpt.log

CUDA_VISIBLE_DEVICES=6,7 python generate.py --base_model=h2oai/h2ogpt-oasst1-falcon-40b --chat=False --stream_output=False --gradio=False --eval_prompts_only_num=500 --eval_as_output=False --num_beams=2 --infer_devices=False --debug --max_new_tokens=512 &> h2ogpt-oasst1-falcon-40b.sharegpt.log

image

@arnocandel
Copy link
Member Author

arnocandel commented Jun 7, 2023

1x A6000 Ada ShareGPT Eval 40B 4bit

CUDA_VISIBLE_DEVICES=0 python generate.py --base_model=h2oai/h2ogpt-oig-oasst1-falcon-40b --chat=False --stream_output=False --gradio=False --eval_prompts_only_num=500 --eval_as_output=False --num_beams=2 --infer_devices=False --debug --max_new_tokens=512 --load_4bit=True &> h2ogpt-oasst1-falcon-40b.4bit.sharegpt.log

OOM still
h2ogpt-oasst1-falcon-40b.4bit.sharegpt.log
df_scores_500_500_1234_False_h2ogpt-oig-oasst1-falcon-40b_

@arnocandel
Copy link
Member Author

@arnocandel
Copy link
Member Author

arnocandel commented Jun 22, 2023

Attempt to improve h2oGPT 40B slightly, based on findings from h2ogpt-gm models

Changes:

  • 1 epoch vs 3 epochs, but use larger dataset again, no grading
  • increase cutoff length to 2048, so nothing gets dropped
  • increase lora alpha/r/dropout

CUDA_VISIBLE_DEVICES=4,5,6,7 torchrun --nproc_per_node=4 finetune.py --data_path=h2oai/openassistant_oasst1_h2ogpt --cutoff_len=2048 --drop_truncations=True --train_4bit=True --base_model=tiiuae/falcon-40b --micro_batch_size=1 --batch_size=32 --num_epochs=1 --lora_alpha=32 --lora_r=16 --lora_dropout=0.1 --lora_target_modules='["query_key_value", "dense_h_to_4h", "dense_4h_to_h", "dense"]' --run_id=9 &> log.9.txt
4%|▎ | 52/1483 [15:58<7:10:12, 18.04s/it]

https://huggingface.co/h2oai/h2ogpt-oasst1-2048-falcon-40b

@arnocandel
Copy link
Member Author

arnocandel commented Jun 23, 2023

Eval Suite

same as #216 (comment)
CUDA_VISIBLE_DEVICES=6,7 python main.py --model hf-causal-experimental --model_args pretrained=h2oai/h2ogpt-oasst1-2048-falcon-40b --tasks openbookqa,arc_easy,winogrande,hellaswag,arc_challenge,piqa,boolq --device cuda &> h2ogpt-oasst1-2048-falcon-40b.16bit.eval.log
h2ogpt-oasst1-2048-falcon-40b.16bit.eval.log

hf-causal-experimental (pretrained=h2oai/h2ogpt-oasst1-2048-falcon-40b), limit: None, provide_description: False, num_fewshot: 0, batch_size: None
|    Task     |Version| Metric |Value |   |Stderr|
|-------------|------:|--------|-----:|---|-----:|
|arc_challenge|      0|acc     |0.4940|±  |0.0146|
|             |       |acc_norm|0.5307|±  |0.0146|
|arc_easy     |      0|acc     |0.8106|±  |0.0080|
|             |       |acc_norm|0.7748|±  |0.0086|
|boolq        |      1|acc     |0.8266|±  |0.0066|
|hellaswag    |      0|acc     |0.6464|±  |0.0048|
|             |       |acc_norm|0.8267|±  |0.0038|
|openbookqa   |      0|acc     |0.3520|±  |0.0214|
|             |       |acc_norm|0.4720|±  |0.0223|
|piqa         |      0|acc     |0.8156|±  |0.0090|
|             |       |acc_norm|0.8384|±  |0.0086|
|winogrande   |      0|acc     |0.7774|±  |0.0117|

@arnocandel
Copy link
Member Author

arnocandel commented Jun 23, 2023

maybe DOA (or maybe just 1 epoch isn't enough for proper personalization, as seen for gm models too)
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants