Add BloomCausalLM #1467

abuelnasr0 · 2024-02-24T23:19:36Z

This PR:

adds BloomCausalLM
I have compared BloomCausalLM with the huggingface output and both produces similar output

This is hf output with no start_token
.

This is keras output with no start_token
.

this is keras output with start_token (default)
.

Fixes small nit in AlibiBias
when keras dtype is float16 in the causalLM test, AlibiBias called arange function with dtype=int16 which causes error in tensorflow (I think). also this fix is the same in this PR Add FalconBackbone #1389.
Adds the BLOOMz presets to the conversion script
BLOOMZ models are fintuned versions of the pretrained models and are better and recommended to use by bigscience

reference: https://arxiv.org/abs/2211.01786

mattdangerw

Looks great! Thanks so much for the thorough testing on this. "canary generations" like you did are a great way to check. Just one comment.

keras_nlp/models/bloom/bloom_backbone.py

keras_nlp/models/bloom/bloom_causal_lm_test.py

abuelnasr0 · 2024-02-27T17:17:50Z

keras_nlp/models/bloom/bloom_causal_lm.py

+        head_dim = self.backbone.hidden_dim // num_heads
+        shape = [batch_size, num_layers, 2, max_length, num_heads, head_dim]
+        attention_layer_dtype = self.backbone.transformer_layers[0].dtype
+        cache = ops.zeros(shape, dtype=attention_layer_dtype)


I added these two lines because of a keras2 error. Setting the cache dtype to the same dtype as the variable_dtype of the attention layer will prevent raising the tensorflow error while updating the cache.

TypeError: Input 'update' of 'XlaDynamicUpdateSlice' Op has type bfloat16 that does not match type float32 of argument 'input'.

This error didn't raise in keras3, because keras3 casts all the inputs dtype to the layer.variable_dtype,

I actually think we want this to be compute_dtype not variable_dtype, for mixed precision cases. In that case, a variable dtype might be float32, but the output of layers will be bfloat16 or float16. This cache is the cache our key/value projection outputs, so the compute dtype would be the correct thing to cache here.

Do you run into errors if you make that change?

Thank you for that! That's right, the cache dtype should be the same as the dtype of the einsum layer output. That was my intention actually, but I thought that the output of the layer will be the variable_dtype not compute_dtype.

But are you with me that this is not compatible with keras2:

cache = ops.zeros(shape, dtype=self.compute_dtype)

because BloomCausalLM doesn't have the same dtype policy as the backbone layers, it follow the global policy.
So instead we should initialize the cache with the compute_dtype of the backbone attention layers policy.
like that:

attention_layer_dtype = self.backbone.transformer_layers[ 0 ].compute_dtype cache = ops.zeros(shape, dtype=attention_layer_dtype)

Even we can't do:

cache = ops.zeros(shape, dtype=self.backbone.compute_dtype)

because also the backbone has dtype policy different than its layers ( when we pass a dtype arg different from the global policy during initialization).

Take a look at this commit :
77c576b
to show how we can set the dtype_policy of the backbone. and how to set the causalLM to follow the backbone dtype_policy. I will revert this commit after checking the tests because it's not its time now. but you can take a look. I can open a PR to Apply that for all models, if you think it will be useful.

Also I forget to mention, Look at that commit and its error:
7cc8671
That will clarify what I am talking about a little bit

Not a computer right now, but I think you are right. We want to set the task dtype policy to just follow the backbone's exactly. And set the backbone dtype accessors so they reflect the proper dtype. That's a bug we can probably fix on a separate pr

Will land #1486 tomorrow, and pull this in with self.compute_dtype.

Looks great!

Ok #1486 is in. Shall I update this to self.compute_dtype or do you want to?

@mattdangerw I updated to self.compute_dtype and keras2 test passed thanks for the dtype fix.

mattdangerw

Thanks! Version 3 of bloom_560m_multi is uploading now.

https://www.kaggle.com/models/keras/bloom/frameworks/keras/variations/bloom_560m_multi

Left a couple comments re variant naming, and cache dtype.

mattdangerw · 2024-02-29T01:33:34Z

keras_nlp/models/bloom/bloom_causal_lm.py

+        head_dim = self.backbone.hidden_dim // num_heads
+        shape = [batch_size, num_layers, 2, max_length, num_heads, head_dim]
+        attention_layer_dtype = self.backbone.transformer_layers[0].dtype
+        cache = ops.zeros(shape, dtype=attention_layer_dtype)


I actually think we want this to be compute_dtype not variable_dtype, for mixed precision cases. In that case, a variable dtype might be float32, but the output of layers will be bfloat16 or float16. This cache is the cache our key/value projection outputs, so the compute dtype would be the correct thing to cache here.

Do you run into errors if you make that change?

This reverts commit 57ce4c2.

This reverts commit 1b5949f.

This reverts commit 7cc8671.

This reverts commit 4d701d3.

…one layers in dtype arg is based to bacckbone

…nd backbone layers in dtype arg is based to bacckbone" This reverts commit 77c576b.

mattdangerw · 2024-03-06T20:54:35Z

Thank you so much! For the rest of the presets, would it help if we converted them ourselves? Is the conversion script all ready to go? Or have you already converted these already on your Kaggle?

Also a question on the preset map

    "bloom_multi": "bigscience/bloom",
    # Multitask finetuned on xP3 (Crosslingual Public Pool of Prompts) https://huggingface.co/datasets/bigscience/xP3
    # xP3 is a mixture of 13 training tasks in 46 languages with English prompts
    "bloomz_560m_multi": "bigscience/bloomz-560m",
    "bloomz_1.1b_multi": "bigscience/bloomz-1b1",
    "bloomz_1.7b_multi": "bigscience/bloomz-1b7",
    "bloomz_3b_multi": "bigscience/bloomz-3b",
    "bloomz_7b_multi": "bigscience/bloomz-7b1",
    "bloomz_multi": "bigscience/bloomz",
    # Multitask finetuned on xP3mt
    # (Crosslingual Public Pool of Prompts machine-translated) https://huggingface.co/datasets/bigscience/xP3
    # xP3mt is Mixture of 13 training tasks in 46 languages with prompts in 20
    # languages (machine-translated from English)
    "bloomz_7b_mt": "bigscience/bloomz-7b1-mt",
    "bloomz_mt": "bigscience/bloomz-mt",
    # Multitask finetuned on P3 (Public Pool of Prompts) https://huggingface.co/datasets/Muennighoff/P3
    # xP3 is a mixture of 8 training tasks with English-only prompts
    "bloomz_7b_p3": "bigscience/bloomz-7b1-p3",
    "bloomz_p3": "bigscience/bloomz-p3",

What are the sizes of bloom_multi, bloomz_multi? We might still want to include the size even the original naming does not. I think it would help usability.

abuelnasr0 · 2024-03-06T21:10:07Z

@mattdangerw Thanks for your reviews and help.
all the presets from 560m to 3b of bloom and bloomz (8 presets) are here: https://www.kaggle.com/models/mohamedabuelnasr/bloom
you can copy them , or you can run the conversion script also if you wish.

Also I have added validate_only flag you can use it after copying the models into keras from the link above.

bloom is 176b parameter: BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

abuelnasr0 · 2024-03-06T21:15:30Z

@mattdangerw Also max_sequence_length should be deleted from the model card in keras.

mattdangerw · 2024-03-07T02:44:17Z

Thanks! Will copy over tomorrow. Let's rename the largest presets to include size in the name though.

* Initial commit for BloomCausalLm * Avoid adding a start token into token ids * Revert "Avoid adding a start token into token ids" This reverts commit 57ce4c2. * Tie embeddding weights * Export BloomCausalLM * Add bloomz to the preset map * Fix some presets names * Revert "Fix some presets names" This reverts commit 1b5949f. * Add doc Example * format the code * Alibi bias small fixes * Add tests * Maek float16 dtype test to keras3 only * Update model version * Edit exampples * Remove max_sequnce_length argument * Try fix keras2 error * Update hf_model download fun * Make 1b models easier to copy * Optimize conversion script * Save checkpoints in float16 * Add test for mixed_float16 * Try to reproduce the keras2 Error * Revert "Try to reproduce the keras2 Error" This reverts commit 7cc8671. * Revert "Make 1b models easier to copy" This reverts commit 4d701d3. * Show How to couple dtype_policy between backbone, causalLM, and backbone layers in dtype arg is based to bacckbone * Revert "Show How to couple dtype_policy between backbone, causalLM, and backbone layers in dtype arg is based to bacckbone" This reverts commit 77c576b. * Add validate_only flag to conversion script * Change preset version * set cache dtype to self.compute_dtype * Minor fix

mattdangerw reviewed Feb 27, 2024

View reviewed changes

keras_nlp/models/bloom/bloom_backbone.py Show resolved Hide resolved

keras_nlp/models/bloom/bloom_causal_lm_test.py Show resolved Hide resolved

abuelnasr0 commented Feb 27, 2024

View reviewed changes

mattdangerw reviewed Feb 29, 2024

View reviewed changes

abuelnasr0 added 27 commits March 6, 2024 18:36

Initial commit for BloomCausalLm

1b11aca

Avoid adding a start token into token ids

0e9029f

Revert "Avoid adding a start token into token ids"

b8c4e5f

This reverts commit 57ce4c2.

Tie embeddding weights

6e74f4b

Export BloomCausalLM

3cfa273

Add bloomz to the preset map

dfe8617

Fix some presets names

8aeb9f7

Revert "Fix some presets names"

994fd93

This reverts commit 1b5949f.

Add doc Example

261cd4c

format the code

0e115ba

Alibi bias small fixes

c0fe6d0

Add tests

ffb334e

Maek float16 dtype test to keras3 only

f5c6d66

Update model version

52b2fe7

Edit exampples

70b557b

Remove max_sequnce_length argument

1e3fbf3

Try fix keras2 error

d4b5754

Update hf_model download fun

5a2851c

Make 1b models easier to copy

5903aa8

Optimize conversion script

abca643

Save checkpoints in float16

4784a35

Add test for mixed_float16

f48c2fe

Try to reproduce the keras2 Error

64eeb64

Revert "Try to reproduce the keras2 Error"

13946b2

This reverts commit 7cc8671.

Revert "Make 1b models easier to copy"

60af405

This reverts commit 4d701d3.

Show How to couple dtype_policy between backbone, causalLM, and backb…

6620840

…one layers in dtype arg is based to bacckbone

Revert "Show How to couple dtype_policy between backbone, causalLM, a…

5e24212

…nd backbone layers in dtype arg is based to bacckbone" This reverts commit 77c576b.

abuelnasr0 added 3 commits March 6, 2024 18:36

Add validate_only flag to conversion script

9a3d193

Change preset version

6dffeb8

set cache dtype to self.compute_dtype

72eec43

abuelnasr0 force-pushed the bloom_causal_lm branch from 0953c5d to 72eec43 Compare March 6, 2024 16:40

Minor fix

7538641

mattdangerw added the kokoro:force-run Runs Tests on GPU label Mar 6, 2024

kokoro-team removed the kokoro:force-run Runs Tests on GPU label Mar 6, 2024

mattdangerw merged commit f332034 into keras-team:master Mar 6, 2024
10 checks passed

abuelnasr0 deleted the bloom_causal_lm branch March 6, 2024 21:10

abuelnasr0 mentioned this pull request Mar 7, 2024

Rename 176B presets & Add other presets into bloom_presets.py #1496

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BloomCausalLM #1467

Add BloomCausalLM #1467

abuelnasr0 commented Feb 24, 2024 •

edited

Loading

mattdangerw left a comment

abuelnasr0 Feb 27, 2024 •

edited

Loading

mattdangerw Feb 29, 2024

abuelnasr0 Feb 29, 2024

abuelnasr0 Feb 29, 2024

mattdangerw Mar 1, 2024

mattdangerw Mar 5, 2024

mattdangerw Mar 6, 2024

abuelnasr0 Mar 6, 2024

mattdangerw left a comment

mattdangerw Feb 29, 2024

mattdangerw commented Mar 6, 2024

abuelnasr0 commented Mar 6, 2024

abuelnasr0 commented Mar 6, 2024

mattdangerw commented Mar 7, 2024 •

edited

Loading

Add BloomCausalLM #1467

Add BloomCausalLM #1467

Conversation

abuelnasr0 commented Feb 24, 2024 • edited Loading

mattdangerw left a comment

Choose a reason for hiding this comment

abuelnasr0 Feb 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mattdangerw commented Mar 6, 2024

abuelnasr0 commented Mar 6, 2024

abuelnasr0 commented Mar 6, 2024

mattdangerw commented Mar 7, 2024 • edited Loading

abuelnasr0 commented Feb 24, 2024 •

edited

Loading

abuelnasr0 Feb 27, 2024 •

edited

Loading

mattdangerw commented Mar 7, 2024 •

edited

Loading