Skip to content

Conversation

martin-gorner
Copy link
Contributor

Llama 3 uses the <|end_of_text|> special token in non-instruct model variants and the <|eot_id|> special token in "instruct" model versions. Without this fix, in "instruct" model variants, text generation does not stop. This is configured in tokenizer_config.json.

For example:
Meta-Llama-3-8B-Instruct/tokenizer_config.json :"bos_token": "<|begin_of_text|>"
"eos_token": "<|eot_id|>"
Meta-Llama-3-8B/tokenizer_config.json :"bos_token": "<|begin_of_text|>"
"eos_token": "<|end_of_text|>"

However, this file does not exist in Keras Llama presets and special tokens are hard-coded in the Lllama3Tokenizer constructor using _add_special_token.

Another difference between Keras and Transformers tokenizer presets is that in Transformers, the tokenizer_config.json has a list of special tokens: <|end_header_id|> <|start_header_id|> <|end_of_text|> <|python_tag|> <|eom_id|> <|begin_of_text|> <|eot_id|> <|finetune_right_pad_id|>. Again, this config file does not exist in Keras presets. Instead, special tokens can be found directly in the vocabulary. And it's not exactly the same list: <|end_header_id|> <|start_header_id|> <|end_of_text|> <|begin_of_text|> <|eot_id|>

In light of this discrepancies, this fix does the following:

  1. It keeps the "special tokens hardcoded in constructor through _add_special_token" approach but expands the list to match all the special tokens in the Keras' Llama3 vocabulary: <|end_header_id|> <|start_header_id|> <|end_of_text|> <|begin_of_text|> <|eot_id|>

  2. When converting from a Transformers checkpoint, it adds special tokens to the vocabulary to that tokenization works for them.

  3. I declares both <|end_of_text|> and <|eot_id|> as stopping tokens for generation in all cases. It would have been possible to fix this properly on the Transformers side by reading the configured eos_token from the config, but generation from Instruct variants loaded from Keras presets would have remained broken.

@martin-gorner
Copy link
Contributor Author

There seems to be a problem left:
model.preprocessor("A") appends the end_token_id to the tokenized output. This will not be the correct id for "Instruct" Llama variants. Any guidance on how to address this properly?

@martin-gorner
Copy link
Contributor Author

More issues found. The output is different when using the HF vs. the Keras tokenizer. Special tokens are not being parsed by the Keras preprocessor.

txt = "<|start_header_id|>system<|end_header_id|>hello<|end_of_text|>"

tok = transformers.AutoTokenizer.from_pretrained("meta-llama/Llama-3.2-1B")
print("HF tokenizer\n", tok(txt).input_ids)

prepro = keras_hub.models.Llama3CausalLMPreprocessor.from_preset("hf://meta-llama/Llama-3.2-1B-Instruct")
print("Keras proprocessor\n",prepro(txt)[0]["token_ids"][:40])

output:

# 128000, 128006, 128007, 128001 are the expected special tokens

HF tokenizer output:
 [128000, 128006, 9125, 128007, 15339, 128001]

Keras preprocessor output:
 [128000     27     91   2527     62   2775     62    307     91     29
   9125     27     91    408     62   2775     62    307     91     29
  15339     27     91    408     62   1073     62   1342     91     29
 128001      0      0      0      0      0      0      0      0      0]

This happens with or without the present PR. Investigating...

@martin-gorner
Copy link
Contributor Author

Added a fix for the preprocessing of special tokens.
Added the loading of the correct eos_token from the Hugging Face config. The info unfortunately does nto exist in the Keras config so a hack is used to make generation work in all cases.
Last remaining problem: The packer in the preprocesor: model.preprocessor("A") appends the end_token_id to the tokenized output. It will be the wrong end_token for Llama3 "instruct" variants loaded from Keras checkpoints.

…ing hack for Keras checkpoint because it does not have this info
@divyashreepathihalli
Copy link
Collaborator

Thanks Martin! the presets would need to be regenerated after this, correct?

@divyashreepathihalli divyashreepathihalli added the kokoro:force-run Runs Tests on GPU label Oct 7, 2024
@kokoro-team kokoro-team removed the kokoro:force-run Runs Tests on GPU label Oct 7, 2024
@martin-gorner
Copy link
Contributor Author

My PR does not change anything to Keras Llama presets. They still have their built-in end_token, which is wrong for -instruct variants. I just added a hack that declares both possible end_tokens as stop characters for generation so that generation at least stops.

If you have a way of storing the correct end_token in the preset, that would be a good fox for the Keras version indeed. Is that possible? How would versioning work? All Llama3-instruct models need this: Llama2, Llama3.0, Llama3.1

@martin-gorner
Copy link
Contributor Author

The Deeplabv3 test failure does not seem to be related to this PR... There isn't much I can do to fix it.

@divyashreepathihalli
Copy link
Collaborator

If you have a way of storing the correct end_token in the preset, that would be a good fox for the Keras version indeed. Is that possible? How would versioning work? All Llama3-instruct models need this: Llama2, Llama3.0, Llama3.1
I think this would be a good thing to add for tokenizer.json in presets. Regarding versioning, we can upload new presets files on Kaggle and the new version number needs to be updated in the presets file for the model.

@divyashreepathihalli divyashreepathihalli merged commit e337f7d into keras-team:master Oct 7, 2024
9 of 10 checks passed
@SamanehSaadat
Copy link
Member

Hey @martin-gorner!
Why aren't we hard coding all the special tokens in the Llama tokenizer?
We already know what all the special tokens are and we don't expect them to change, right? So why not hard-code them all? Wouldn't that fix the issue for both loading from a Keras and a HF checkpoint?

@martin-gorner
Copy link
Contributor Author

Hardcoded or loaded from a fixed preset for all Llamas does not make much difference. The difficulty is to select the correct "end of text" token for instruct and non-instruct models. This is easy in the Hugging Face approach where each model has its own config file.

For example, the current behavior in Keras is:

# "instruct" variant
tokenizer - keras_hub.models.Llama3Tokenizer.from_preset("llama3_instruct_8b_en")
> tokenizer.end_token = '<|end_of_text|>'
> tokenizer.end_token2 = '<|eot_id|>'

# "non-instruct" variant
tokenizer - keras_hub.models.Llama3Tokenizer.from_preset("llama3_8b_en")
> tokenizer.end_token = '<|end_of_text|>'
> tokenizer.end_token2 = '<|eot_id|>'

Which is wrong. The instruct variant end token should be '<|eot_id|>'. Thank to my "end_token2" hack, this will work for inference since both end tokens will stop generation. But for further fine-tuning of the "instruct" variants, the end token is wrong.

ushareng pushed a commit to ushareng/keras-nlp that referenced this pull request Oct 24, 2024
BytePairTokenizer must not split sequences of \n (keras-team#1910)

* fix for loading of special tokens in Llama tokenizer

* fix for Llama tokenizer which can have multiple end tokens

* bug fix

* adding some missing tokens to Llama3 tokenizer

* fixed tests and Llama3Tokenizer init.

* now loading correct eos_token config from Hugging Face checkpoint. Using hack for Keras checkpoint because it does not have this info

* fix for BytePairTokenizer to make Lllama3-instruct work in chat: \n\n sequences are significant in the chat template and must be preserved by the tokenizer

---------

Co-authored-by: Martin Görner <martin@huggingface.co>

fix for generation that never stops in Llama3-Instruct variants (keras-team#1904)

* fix for loading of special tokens in Llama tokenizer

* fix for Llama tokenizer which can have multiple end tokens

* bug fix

* adding some missing tokens to Llama3 tokenizer

* fixed tests and Llama3Tokenizer init.

* now loading correct eos_token config from Hugging Face checkpoint. Using hack for Keras checkpoint because it does not have this info

---------

Co-authored-by: Martin Görner <martin@huggingface.co>

fix failing JAX GPU test (keras-team#1911)

* fix tests

* fix test

Refactor `MMDiT`, add `ImageToImage` and `Inpaint` for SD3 (keras-team#1909)

* Refactor `MMDiT` and add `ImageToImage`

* Update model version

* Fix minor bugs.

* Add `Inpaint` for SD3.

* Fix warnings of MMDiT.

* Addcomment to Inpaint

* Simplify `MMDiT` implementation and info of `summary()`.

* Refactor `generate()` API of `TextToImage`, `ImageToImage` and `Inpaint`.

Minor bug fix (keras-team#1915)

Change to image_converter.image_size since it is a tuple and it's not a callable function.

[Mix Transformer] Add Presets for MiTB0...MiTB5 (keras-team#1893)

* add presets for mit

* add standin paths

* register presets in __init__.py

* fix op in overlapping patching and embedding, start adding conversion utils

* style

* add padding to MiT patchingandembedding

* update to support other presets

* update conversin script

* fix link for b5

* add cityscapes weights

* update presets

* update presets

* update conversion script to make directories

* use save_preset

* change name of output dir

* add preprocessor flow

* api gen and add preprocessor to mits

* conform to new image classifier style

* format

* resizing image converter -> ImageConverter

* address comments

refactoring

remove default resizing for vision backbones (keras-team#1916)

* remove defailt resizing

* fix GPU test

Update VGG model to be compatible with HF and add conversion scripts (keras-team#1914)

Deeplab presets (keras-team#1918)

* add preset configurations for deeplabv3

* fix uri

* Add training details

update presets to point to the main Keras Kaggle page (keras-team#1921)

* update presets to point to the main keras page

* update mit path

Added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates (keras-team#1912)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* un commented the test lines that were commented by mistake

* fixed linter errors

Task models fix (keras-team#1922)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* fix for wrongly configured task models LLama, PaliGemma, Mistral and Phi3 + test

* comments

* un commented the test lines that were commented by mistake

* fixed linter errors

adding option strip_prompt to generate() (keras-team#1913)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* un commented the test lines that were commented by mistake

* fixed linter errors

* added options strip_prompt to generate()

* fix for tensorflow: the compiled version of generate(strip_prompt=True) now works + code refactoring to make it more understandable

* added test for generate(strip_prompt=True)

* minor edits

Layout map for Llama (keras-team#1923)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* un commented the test lines that were commented by mistake

* fixed linter errors

* added default layout map for Llama

* minor fixes in tests

Update deeplab_v3_presets.py (keras-team#1924)

Add paths to get SAM weights from (keras-team#1925)

Two fixes for image resizing in preprocessing (keras-team#1927)

1. Properly display when are not resizing the input image in
   `model.summary()`
2. Allow setting the `image_size` directly on a preprocessing layer.

2. is just to allow a more consistent way to set the input shape
across tasks. We now have:

```python
text_classifier = keras_hub.models.TextClassifer.from_preset(
    "bert_base_en",
)
text_classifier.preprocessor.sequence_length = 256

image_classifier = keras_hub.models.TextClassifer.from_preset(
    "bert_base_en",
)
image_classifier.preprocessor.image_size = (256, 256)

multi_modal_lm = keras_hub.models.CausalLM.from_preset(
    "some_preset",
)
multi_modal_lm.preprocessor.sequence_length = 256
multi_modal_lm.preprocessor.image_size = (256, 256)
```

add back default image resizing (keras-team#1926)

Update deeplab_v3_presets.py (keras-team#1928)

* Update deeplab_v3_presets.py

* Update deeplab_v3_presets.py

Update PaliGemma to remove `include_rescaling` arg (keras-team#1917)

* update PaliGemma

* update conversion script

* fix GPU tests

fix path (keras-team#1929)

* fix path

* nit

Fix paligemma checkpoint conversion script (keras-team#1931)

* add back default image resizing

* fix bug in image converter

* fix paligemma checkpoint conversion file

* fix preset name

* remove debug code

* revert unintended changes

update preset path to point to latest version of models (keras-team#1932)

Update sdv3 path (keras-team#1934)

update sam docstring to show correct backbone in docstring (keras-team#1936)

Convert input dict to tensors during train_on_batch (keras-team#1919)

Register VGG presets. (keras-team#1935)

* register vgg preset

* nit

* nit

* nit

Add ResNetVD presets (keras-team#1897)

* Add ResNetVD presets

* Updated Kaggle handles

* Add weight conversion script for ResNet_vd

* Add usage

rebase conflict resolved

conflict resolve

Update sam_presets.py (keras-team#1940)

Update vit_det_backbone.py (keras-team#1941)

fix gpu test (keras-team#1939)

* fix gpu test

* cast input

* update dtype

* change to resnet preset

* remove arg

Added Support for Returning Attention Scores in TransformerEncoder call (keras-team#1879)

* Added: Return attention scores argument to transformer encoder

* Added: docstring for return_attention_scores and added a test to chek the working of the argument

* Fixed: Test case by removing print stmts and using self.assertAllEqual

* Fixed: Linting

Mark preset tests as large (keras-team#1942)

* fix tests

* fix test

* Update preset_utils_test.py

version bump to 0.17.0.dev0 (keras-team#1944)

Update stable_diffusion_3_presets.py (keras-team#1946)

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets (keras-team#1883)

* initial commit - tf-based, kcv

* porting to keras_hub structure - removing aliases, presets, etc.

* enable instantiation of segformer backbone with custom MiT backbone

* remove num_classes from backbone

* fix input

* add imports to __init__

* update preset

* update docstrings

* add basic tests

* remove redundant imports

* update docstrings

* remove unused import

* running api_gen.py

* undo refactor of mit

* update docstrings

* add presets for mit

* add standin paths

* add presets for segformer backbone

* register presets in __init__.py

* addressing comments

* addressing comments

* addressing comments

* update most tests

* add remaining tests

* remove copyright

* fix test

* override from_config

* fix op in overlapping patching and embedding, start adding conversion utils

* style

* add padding to MiT patchingandembedding

* update to support other presets

* update conversin script

* fix link for b5

* add cityscapes weights

* update presets

* update presets

* update conversion script to make directories

* use save_preset

* change name of output dir

* add preprocessor flow

* api gen and add preprocessor to mits

* conform to new image classifier style

* format

* resizing image converter -> ImageConverter

* merge mit branch into segformer branch

* add preprocessor and converter

* address comments

* clarify backbone usage

* add conversion script

* numerical equivalence changes

* fix numerical inaccuracies

* update conversion script

* update conversion script

* remove transpose

* add preprocessor to segformer class

* fix preset path

* update test shape

* update presets

* update test shape

* expand docstrings

* add rescaling and normalization to preprocessor

* remove backbone presets, remove copyrights, remove backbone cls from segmenter

* remove copyright and unused import

* apply same transformation to masks as input images

* fix import

* fix shape in tests

Update readme (keras-team#1949)

* Update README.md

* Update README.md

Update llama_backbone.py docstring (keras-team#1950)

Update path (keras-team#1953)

Update preset path for keras.io.

There is no LLaMA2 in keras.io https://keras.io/api/keras_hub/models/llama2

This is the actual link:
https://keras.io/api/keras_hub/models/llama2

For Vicuna it does not have it's own model direcotry, since it is also the part of Llama,, updated the path.

Update SD3 init parameters (replacing `height`, `width` with `image_shape`) (keras-team#1951)

* Replace SD3 `height` and `width` with `image_shape`

* Update URI

* Revert comment

* Update SD3 handle

* Replace `height` and `width` with `image_shape`

* Update docstrings

* Fix CI

Update docstring (keras-team#1954)

AudioConverter is registered as "keras_hub.layers.WhisperAudioConverter" and not as part of models.

 updated Mobilenet backbone to match it with torch implementation

timm script added

checkpoint conversion added

Refactoring
divyashreepathihalli pushed a commit that referenced this pull request Feb 10, 2025
* kaggle weights

* updated Mobilenet backbone to match it with torch implementation

* Deleted presets

* Mobilenet preset deleted

* code reformat

* padding changed

* downsample_padding

* typo fixed

* timm script added

* checkpoint conversion added

* preset added

* preset testcase added

BytePairTokenizer must not split sequences of \n (#1910)

* fix for loading of special tokens in Llama tokenizer

* fix for Llama tokenizer which can have multiple end tokens

* bug fix

* adding some missing tokens to Llama3 tokenizer

* fixed tests and Llama3Tokenizer init.

* now loading correct eos_token config from Hugging Face checkpoint. Using hack for Keras checkpoint because it does not have this info

* fix for BytePairTokenizer to make Lllama3-instruct work in chat: \n\n sequences are significant in the chat template and must be preserved by the tokenizer

---------

Co-authored-by: Martin Görner <martin@huggingface.co>

fix for generation that never stops in Llama3-Instruct variants (#1904)

* fix for loading of special tokens in Llama tokenizer

* fix for Llama tokenizer which can have multiple end tokens

* bug fix

* adding some missing tokens to Llama3 tokenizer

* fixed tests and Llama3Tokenizer init.

* now loading correct eos_token config from Hugging Face checkpoint. Using hack for Keras checkpoint because it does not have this info

---------

Co-authored-by: Martin Görner <martin@huggingface.co>

fix failing JAX GPU test (#1911)

* fix tests

* fix test

Refactor `MMDiT`, add `ImageToImage` and `Inpaint` for SD3 (#1909)

* Refactor `MMDiT` and add `ImageToImage`

* Update model version

* Fix minor bugs.

* Add `Inpaint` for SD3.

* Fix warnings of MMDiT.

* Addcomment to Inpaint

* Simplify `MMDiT` implementation and info of `summary()`.

* Refactor `generate()` API of `TextToImage`, `ImageToImage` and `Inpaint`.

Minor bug fix (#1915)

Change to image_converter.image_size since it is a tuple and it's not a callable function.

[Mix Transformer] Add Presets for MiTB0...MiTB5 (#1893)

* add presets for mit

* add standin paths

* register presets in __init__.py

* fix op in overlapping patching and embedding, start adding conversion utils

* style

* add padding to MiT patchingandembedding

* update to support other presets

* update conversin script

* fix link for b5

* add cityscapes weights

* update presets

* update presets

* update conversion script to make directories

* use save_preset

* change name of output dir

* add preprocessor flow

* api gen and add preprocessor to mits

* conform to new image classifier style

* format

* resizing image converter -> ImageConverter

* address comments

refactoring

remove default resizing for vision backbones (#1916)

* remove defailt resizing

* fix GPU test

Update VGG model to be compatible with HF and add conversion scripts (#1914)

Deeplab presets (#1918)

* add preset configurations for deeplabv3

* fix uri

* Add training details

update presets to point to the main Keras Kaggle page (#1921)

* update presets to point to the main keras page

* update mit path

Added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates (#1912)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* un commented the test lines that were commented by mistake

* fixed linter errors

Task models fix (#1922)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* fix for wrongly configured task models LLama, PaliGemma, Mistral and Phi3 + test

* comments

* un commented the test lines that were commented by mistake

* fixed linter errors

adding option strip_prompt to generate() (#1913)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* un commented the test lines that were commented by mistake

* fixed linter errors

* added options strip_prompt to generate()

* fix for tensorflow: the compiled version of generate(strip_prompt=True) now works + code refactoring to make it more understandable

* added test for generate(strip_prompt=True)

* minor edits

Layout map for Llama (#1923)

* added test for the way BytePairTokenizer handles the \n\n sequence, which is important in Lama chat templates

* un commented the test lines that were commented by mistake

* fixed linter errors

* added default layout map for Llama

* minor fixes in tests

Update deeplab_v3_presets.py (#1924)

Add paths to get SAM weights from (#1925)

Two fixes for image resizing in preprocessing (#1927)

1. Properly display when are not resizing the input image in
   `model.summary()`
2. Allow setting the `image_size` directly on a preprocessing layer.

2. is just to allow a more consistent way to set the input shape
across tasks. We now have:

```python
text_classifier = keras_hub.models.TextClassifer.from_preset(
    "bert_base_en",
)
text_classifier.preprocessor.sequence_length = 256

image_classifier = keras_hub.models.TextClassifer.from_preset(
    "bert_base_en",
)
image_classifier.preprocessor.image_size = (256, 256)

multi_modal_lm = keras_hub.models.CausalLM.from_preset(
    "some_preset",
)
multi_modal_lm.preprocessor.sequence_length = 256
multi_modal_lm.preprocessor.image_size = (256, 256)
```

add back default image resizing (#1926)

Update deeplab_v3_presets.py (#1928)

* Update deeplab_v3_presets.py

* Update deeplab_v3_presets.py

Update PaliGemma to remove `include_rescaling` arg (#1917)

* update PaliGemma

* update conversion script

* fix GPU tests

fix path (#1929)

* fix path

* nit

Fix paligemma checkpoint conversion script (#1931)

* add back default image resizing

* fix bug in image converter

* fix paligemma checkpoint conversion file

* fix preset name

* remove debug code

* revert unintended changes

update preset path to point to latest version of models (#1932)

Update sdv3 path (#1934)

update sam docstring to show correct backbone in docstring (#1936)

Convert input dict to tensors during train_on_batch (#1919)

Register VGG presets. (#1935)

* register vgg preset

* nit

* nit

* nit

Add ResNetVD presets (#1897)

* Add ResNetVD presets

* Updated Kaggle handles

* Add weight conversion script for ResNet_vd

* Add usage

rebase conflict resolved

conflict resolve

Update sam_presets.py (#1940)

Update vit_det_backbone.py (#1941)

fix gpu test (#1939)

* fix gpu test

* cast input

* update dtype

* change to resnet preset

* remove arg

Added Support for Returning Attention Scores in TransformerEncoder call (#1879)

* Added: Return attention scores argument to transformer encoder

* Added: docstring for return_attention_scores and added a test to chek the working of the argument

* Fixed: Test case by removing print stmts and using self.assertAllEqual

* Fixed: Linting

Mark preset tests as large (#1942)

* fix tests

* fix test

* Update preset_utils_test.py

version bump to 0.17.0.dev0 (#1944)

Update stable_diffusion_3_presets.py (#1946)

[Semantic Segmentation] - Add SegFormer Architecture, Weight Conversion Script and Presets (#1883)

* initial commit - tf-based, kcv

* porting to keras_hub structure - removing aliases, presets, etc.

* enable instantiation of segformer backbone with custom MiT backbone

* remove num_classes from backbone

* fix input

* add imports to __init__

* update preset

* update docstrings

* add basic tests

* remove redundant imports

* update docstrings

* remove unused import

* running api_gen.py

* undo refactor of mit

* update docstrings

* add presets for mit

* add standin paths

* add presets for segformer backbone

* register presets in __init__.py

* addressing comments

* addressing comments

* addressing comments

* update most tests

* add remaining tests

* remove copyright

* fix test

* override from_config

* fix op in overlapping patching and embedding, start adding conversion utils

* style

* add padding to MiT patchingandembedding

* update to support other presets

* update conversin script

* fix link for b5

* add cityscapes weights

* update presets

* update presets

* update conversion script to make directories

* use save_preset

* change name of output dir

* add preprocessor flow

* api gen and add preprocessor to mits

* conform to new image classifier style

* format

* resizing image converter -> ImageConverter

* merge mit branch into segformer branch

* add preprocessor and converter

* address comments

* clarify backbone usage

* add conversion script

* numerical equivalence changes

* fix numerical inaccuracies

* update conversion script

* update conversion script

* remove transpose

* add preprocessor to segformer class

* fix preset path

* update test shape

* update presets

* update test shape

* expand docstrings

* add rescaling and normalization to preprocessor

* remove backbone presets, remove copyrights, remove backbone cls from segmenter

* remove copyright and unused import

* apply same transformation to masks as input images

* fix import

* fix shape in tests

Update readme (#1949)

* Update README.md

* Update README.md

Update llama_backbone.py docstring (#1950)

Update path (#1953)

Update preset path for keras.io.

There is no LLaMA2 in keras.io https://keras.io/api/keras_hub/models/llama2

This is the actual link:
https://keras.io/api/keras_hub/models/llama2

For Vicuna it does not have it's own model direcotry, since it is also the part of Llama,, updated the path.

Update SD3 init parameters (replacing `height`, `width` with `image_shape`) (#1951)

* Replace SD3 `height` and `width` with `image_shape`

* Update URI

* Revert comment

* Update SD3 handle

* Replace `height` and `width` with `image_shape`

* Update docstrings

* Fix CI

Update docstring (#1954)

AudioConverter is registered as "keras_hub.layers.WhisperAudioConverter" and not as part of models.

 updated Mobilenet backbone to match it with torch implementation

timm script added

checkpoint conversion added

Refactoring

* rebase done

* code formatting

* preset path updated

* WIP mobilenet fixes, subblock refactoring

* WIP refactored, classifier/task changes

* matched mobilenetv3 inference, working now

* format pass

* actual format pass

* fix import

* update test, attempting to fix format issue

* fix format back to original style

* review updates, format fixes etc.

* update fix DepthwiseConvBlock args

* implement compute output shape for squeeze_and_excite layer

* update arguments to IR Block

* explicitly build head before transfer

* updates, fixes to ensure colab workflow works

* add noqa, fix protected variable issue

* fix remaining test issues

* update expected test output/presets

* fix merge typo

---------

Co-authored-by: Usha Rengaraju <34335028+ushareng@users.noreply.github.com>
Co-authored-by: ushareng <usha.rengaraju@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants