Skip to content

Conversation

@AleHD
Copy link

@AleHD AleHD commented Feb 26, 2025

Opening PR to keep track of upstream changes

AleHD pushed a commit that referenced this pull request Aug 7, 2025
* updated mistral3 model card (#1)

* updated mistral3 model card

* applying suggestions from code review

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* made all changes to mistral3.md

* adding space between paragraphs in docs/source/en/model_doc/mistral3.md

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* removing duplicate in mistral3.md

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>

* adding 4 backticks to preserve formatting

---------

Co-authored-by: Steven Liu <59462357+stevhliu@users.noreply.github.com>
@github-actions
Copy link

[For maintainers] Suggested jobs to run (before merge)

run-slow: csm, d_fine, instructblipvideo, llava_onevision, metaclip_2, mm_grounding_dino, modernbert_decoder, qwen2, qwen2_5_omni, rt_detr_v2, sam_hq, siglip2, smolvlm, table_transformer, aqlm_integration, ggml

Cyrilvallez and others added 28 commits December 1, 2025 18:50
* gemma3

* qwen3 and modulars

* fix tp plans

---------

Co-authored-by: vasqu <antonprogamer@gmail.com>
* Up

* WIP

* WIP

* WIP

* Apply suggestions from code review

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* Update src/transformers/models/ministral3/configuration_ministral3.py

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>

* fix most tests

* update docsting

* fixup

* typo in the ocnfig

* make the last 3 tests pass

* fix auto

* nits

* WIP

* WIP

* WIP

* per tensor

* WIP

* WIP

* WIP

* style

* fixup

* WIP

* WIP

* WIP

* hack for now

* add todo

* fixup

* WIP

---------

Co-authored-by: Julien Denize <40604584+juliendenize@users.noreply.github.com>
Co-authored-by: Arthur <arthur.zucker@gmail.com>
Co-authored-by: medmekk <mekk.cyber@gmail.com>
* fix

* style
* Added an initial conversion script

* Added a modular where FastVLM is different from LlaVA

* Improved the conversion script

* Adjusted the conversion script

* Removed redundant labels from FastViT & improved the template

* Added docs and changed default config

* Fix default config

* Fix default config

* Fixed layer feature handling and more docs

* Fixed documentation

* Style fixed

* Some small fixes

* Improved the example script to be more inclusive

* Fixes after the rebase

* Made the code and docs more readable and consistent

* Some fixes from the review

* Reverted back to last layer only

* Typos fixed

* added initial tests - some still failing

* Style and quality fixes

* Updated modular according to the review

* Tests passing and some suggested generic improvements

* Docs updated with another usage tip and an auto model

* Reversed changes to test_can_intialize_on_meta becuase it's not fully compatible with one existing model

* Some tweaks

* Typo fix

* Consistency fixed

* Review comment

* Redundant config attr deleted

* Consistency fixed

* Fixed integration tests after rebase

---------

Co-authored-by: Arthur <48595927+ArthurZucker@users.noreply.github.com>
Co-authored-by: Raushan Turganbay <raushan@huggingface.co>
* fix all qwen models with promp tuning

* forgot to rename

* fix style

* fallback better when no cache position

* just why?
* Fix missing model attribute in Glm4vMoeIntegrationTest

* Removed extra condition.

---------

Co-authored-by: Yih-Dar <2521628+ydshieh@users.noreply.github.com>
* Fix fp8 + some enhancement

* style

* Add coauthor

Co-authored-by: Yang Kai <kai.yang@intel.com>

* fix

* style

* fix tests

* style

* assertin

* style

* fix

* fix

* Apply suggestions from code review

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Yang Kai <kai.yang@intel.com>
Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
…dition` (#42562)

delete

Co-authored-by: ydshieh <ydshieh@users.noreply.github.com>
* fix

* eetq dep removed

* maybe ?

* Fix !

* eveyrthing is passing !

* Apply style fixes

* move to nn.paramters

* grad false

* fix

* fix

---------

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>
…ryEmbeddingConfigMixin` (#42517)

* Add backward compatibility for methods which have been moved to `RotaryEmbeddingConfigMixin`

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* They're not actually no-ops

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* Fix type hint

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* If they're calling this function, they haven't standardised either

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

* No need to BC method that wasn't in any releases

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>

---------

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* Remove references to parse_response in the docs

* Re-add parse_response (with appropriate warnings)

* future annotations

* Guard the text generation pipeline correctly
* fix regression

* fix

* fix
* mapping error resolved with test check

* Fix undefined variable 'device' in kernel_config

* added test in test_kernels

* added test with proper format

* added test with proper format once again

* Removed mapping_test.py file

* reformated with ruff

* removed the test
* Fix: lacking EOS token + failing AutoProcessor

* Tests

* Tests
* Compute masks once instead of per-layer, fix fa2 crash

* nit

* Change after review
…n model_type can't be inferred from config (#42402)

* fix raise error early

* add back feature extractor saving logic
* fix

* fix circular condition
* fix

* style

* initial

* fix

* comment

* style

* fix
* make sure the FSDP plugin args are appropriately cast to bools

* handle fsdp_version properly

* include reshard_after_forward and handle correctly for fsdp2 vs fsdp2

* lint

* chore: lint
* initial commit

* passing tests

* fix replace_linear

* style

* rm list

* fix

* style
…not always present (#42593)

Signed-off-by: Harry Mellor <19981378+hmellor@users.noreply.github.com>
* [XPU] Fix fp8 UT patch

* make style

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* fix resume from epoch >= 1

* add test checking order of sampled data points

* add require_torch_non_multi_accelerator decorator to test method

* move the epoch setting of epoch_dataloader before iterating over it

* make fixup
Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
Fix three typos found in code comments:
- 'avaoid' → 'avoid' in modeling_utils.py
- 'weigth' → 'weight' in trainer_utils.py
- 'Templace' → 'Template' in convert_slow_tokenizer.py

These typos appeared in TODO comments and inline documentation.
stevhliu and others added 30 commits December 18, 2025 11:06
* Updated backbone_config docstrings

* Fix typos
* initial

* initial commit

* fix

* fix

* first fix

* second fix

* second fix

* revert

* fix
…#42802)

* fix error: 'BlockMask' object has no attribute 'dtype' for lasr model

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix error: 'BlockMask' object has no attribute 'dtype' for lasr model

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* disable flex_attn

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* add comment

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update comment

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
Co-authored-by: Anton Vlasjuk <73884904+vasqu@users.noreply.github.com>
* use meta device directly

* style

* move back non-persistent

* fix

* make helper

* fix it

* use native param dtype

* make tensors buffers

* style

* fix

* oupsi

* add a test and fix

* fix

* create timm integration to reinit non-persistemnt buffers....

* style

* style

* more

* better

* add doc

* more timm stuff

* more

* fix

* small change

* no actually it was fine before
* Handle inifinity and NaNs in JSON serialization

* Docs

* Tests
* fix

* test

* fix 2

* should not happen but safety

* fast "integration" test
* more attention cleanup

* llama like text attention

* generates different text but cos and sin tensors are always close - 1e-8

* another round of rope fixups

* yea, gonna check tomorrow cant cheat w freqs for whatever reason

* NOTE: last time where comp with old rope

* rope cleanup

* more rope

* somewhat clean 3d rope with attn - sin / cos has very small diffs to original formula (torch.allclose always True) leading to slightly different generations

* new rope type

* style

* attempt at moe, gonna need a deeper look

* cleanup gate

* more cleaning

* NOTE remove attempt at moe for now

* another round of cleanups

* whoops

* we back boys, reattempting moe start

* moe should be done with this

* cleanup

* more cleanup

* nits

* add conversion and adjust code accordingly

* fix

* make moe copyable as far as we can

* cleanup conversion a bit, next config

* cleanup config part1

* small removal of unused things

* config conversion, rope type doesnt get loaded tho...

* fix rope

* last hardcoded values

* remove unnecessary class

* starting to make copies available for vision, vision rope refactor tomorrow

* vl rope changes

* simplify variable resolution resampler

* nit

* conversion update

* more conversions, standardization, and big dtype fix!

* remove some docs (tmp), focus on code for me

* oops

* nit

* fixup embeddings, add todos

* more cleanup

* more cleanup, next caching changes

* revert fp16, internally discussed weights are supposed to be bf16

* fix rope (a bit), prepare cache logic changes

* more prep for cache

* cache class is used, fixup some flags

* modular refactor

* partially docstrings, docs, etc

* cleaner order

* nit

* fix config

* remove old artefacts/todos

* sync with remote and add some todos for orientation

* remove img process dep on modeling code

* image processor with a few diffs highlighted to copy from maybe

* fast img processor version

* modular image processors

* convert tokenizer to have dedicated video placeholder token

* before i forget

* a modular bug :/

* more processor things, some modular adjustments

* remove dependency on token type ids

* position ids ala qwen vl and modular is bugging

* fixup some inheritances + nits

* token type ids

* moe loss, docs, simplify pos ids

* align some feature getters

* docs

* rename conv -> merge aka our naming convention

* style

* fixup tokenizer class in auto

* no more nn sequential

* fix chat template, fix tokenizer conversion, modular bug

* remove this

* remove old deps (from the remote processor)

* whoops

* argh

* todo, restarting progress tomorrow

* fast image processor changes output, keeping slow for now

* NOTE rm debugging code on processor conversion

* first complete conversion script version, todo on whether to use fast processor

* config docs

* image processor tests, only kept to images as videos need different resolutions

* processor tests

* first ish version for video processor, very much WIP tho

* sync with main and all the changes that happened, fix ernie moe bug in dtype casting

* mini style fix

* vid processor is properly separated now

* make vid processor its own thing

* style

* video processing and cleanups, img processing done, processing needs one TODO, vid processing needs tests

* readd vid patch fn

* make 4D RoPE possible if manually passed

* simplify the msg on packing, allow external prep but not internal one

* nit

* revert general changes video utils, make it specific to ernie, fixup tests

* vid to auto

* left to check: pos ids (rope) + token type ids

* move token type ids to processor, fix processor to ernie logic

TODOs: tests, tests, tests

* processor fixes, conversion todo for fast img processor

TODOs: tests for vid processor and modeling

* fix

* video processor tests, torch compile does not work due to PIL drawing being needed

* fix config consistency

* style

* wip tests

* fix most tests, 2 failing ones remain

* fix last tests

* check

* docs consistency

* fix conversion script, more docs

* optional drawing on frames, style

* add error on compile x draw on frames

* fix

* fix

* change font loading to hub dep with default font

* fix config try 2

* fix diff resolution, tests (not fast processor, a100)

* fix test

* style

* torch 2.9 (fa2 untested, video from 2.6)

* raushan's review (part 1)

* Update docs/source/en/model_doc/ernie4_5_vl.md

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>

* Pablo's review

* style

* fix device/dtype stuff that is no longer needed

* revert vision property rm, necessary for composite sdpa test

* fixup few smaller things + refactor how we load the font entirely (based on font name with expected associated file at same repo)

* remove bc min max pixels --> less modular on processor parts but way cleaner code

* fix fps and add fixme to the inefficient conversion stuff

* rope

* style

* copies and last rope stuff i fogot

* revert glm4v copies

* fix

* simplify temporal slicing and add more descriptions

* that ":" 😢

* fixup init

* conversion for moe split and merge + general renamings etc -- encountering OOM (automap maybe?)

* wrong order whoops

* style

* copies

* fix init

* fix

* fix

* allow the resolved path to be passed to explicit video processor classes and refactor how we load them for ernie

* simplify

* shoot, I need it there as well

* better err handling

* style

* initial fixes after merge

* working loading version

* cleanup

* change moe order and fix vl version

* reverse op is mapping incorrectly TODO

* reverse loading somewhat works, name conversion has issues it seems 👀

* fix renaming issue, slow tests pass (except the integration ones ~ expected due to fused weights)

* conversion mapping with native features + remove conversion mapping restriction

* add test for new conversion

* style

* update conversion

* fix integration tests, remove fa tests

* fix

* update docs a bit

* style

* fix ernie moe and routing ernie series

* style

* fix rope warning

* i fucked up again pain

* update expectations

* remove EP, broken atm be it sole or in combination with TP

* update docs a bit

* first part of addressing review comments

* fixup

* fix vid processor

* fix font saving

* readd decorators oops

* add mm token type id shortcut

* always compose mm token type ids if needed

* move config to modular

* fix loading by enforcing correct order

* fix

* address first bunch of comments

* smaller comments

* let's make moe layer types, ill fix modular in a second

* modular

* style

* renamed version along a few fixes in conversion and processor tests

* fix

* style + decorator

* fix tokenizer handling of additional special tokens

* style

* fix doc refs

* test fix

* fix

* was this too breaking?

* fix conversion via workaround for now

* post merge fix

* revert a few tok things (additional_special_tokens), updated conversion

* fix video processing loading logic

add exception for auto class (reload config as we have a circular dep on finding which class we have, i.e. we need to load to find the class then load with specific logic)

remove some original ideas

* style

* processor path change

* add small dummy integration tests

* style

* fix rope modeling to follow qwen2 vl instead + change auto loading to specifically load via pretrained (overridable from pretrained for auto classes)

* seems to be skipped in other similar vlms

* small conversion updates and adjust max vram usage during the big integration test

* update test paths

* style

* style attmpt 2

* docs

* trigger ci

* review

* post merge fixes

* fix

* safety

* fix test

* style

* oops

* fix

* ...

* simplify the config init for moe pattern

* gonna be fixed by #42963

---------

Co-authored-by: Pablo Montalvo <39954772+molbap@users.noreply.github.com>
* merge two attr into one

* delete tie encoder decoder

* one more

* mt5

* skip tests when tying is hardcoded

* change test value to True, so we don't have to adjust hardcoded configs

* awful decision in t5 to support two variants

* delete my comment

* not copied anymore

* skip

* they all had a shared embedding which was hardcoded, force it

* force it in umt5 also, my model won't work otherwise :(

* skip the key

* dont't force if official weights set it to True

* skip one test and fix teh other
* fix device dismatch issue for pe_audio_video model parallelism

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* skip the model parallelism unit test

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* Set all padding_mask_videos to 1 to avoid NaN values in outputs

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix bug

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* keep at least 1 valid frame

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* fix format issue

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* use random.seed

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

* update code

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>

---------

Signed-off-by: Liu, Kaixuan <kaixuan.liu@intel.com>
* fix

* more careful about the items

* oops

* ...
* Fix formatting of trackio model tag

* changes

* changes

---------

Co-authored-by: Marc Sun <57196510+SunMarc@users.noreply.github.com>
* Fix DocQA max_answer_len validation error message

* Revert DocQA sanitize parameters test

* Fix QA max_answer_len validation error message

* Revert QA sanitize parameters test
* Fix incorrect library name in BitNet integration warning

* Fix typos in BitNet integration docstrings

* Fix docstring parameter mismatch in BitLinear
Signed-off-by: Yuanyuan Chen <cyyever@outlook.com>
…ding primitives (#43003)

* why was it so complicated

* device

* simplify

* fix

* imports

* simplify quite a bit

* fix
* fix

* style

* fix

* fix

* no need to this class

* rm

* Apply suggestions from code review

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>

---------

Co-authored-by: Mohamed Mekkouri <93391238+MekkCyber@users.noreply.github.com>
* do not return a tuple

* add a test

* fix test

* improve doc
* move `original_max_position_embeddings` to rope param dict and resolve TODs from Joao

* bring back truncate in yarn

* move the patch to `standardize` helper, this one gets called every time we init rope comute fn

* my bad

* silly typo, I should read the code I write!

* force the tester to use specific layer types, because rope is built with these types

* revert, whyhow did it get deleted?!

* factor isn't guaranteed to exist in the end

* tiny test issue, needs to standardize first
* add prefill arg in generation

* add a slow test

* fix copies

* can be like this but checking special tokens isn't good

* ig this solves the issue with assisted_gen+prefill

* update overwritten `prepare_inpits_for_generation`

* prefill is actually when we have no cache at all.. Try this for now

* first iteration is not always techincally same as prefill

* fix?

* fix now?

* update bloom

* fix smth

* make style

* fix copies and skip test

* fix copies

* tiny updates after a review

* fix other slow tests

* fix copies

* do not pass the same kwargs twice in prefill

* oops

* have to revert? prob fails only on dgx

* adjust slow test again

* address comments

* fix copies
* guard counting of bytes

* add small test

* quality

* simplify a bit
…43002)

* make fixup happy

* Empty deprecated model list.

* Add check_modeling_structure to CI
* unique device

* comment

* fix

* simplify and add mismatched

* style
…lier (#43021)

* move to correct device earlier

* fix typo

* simplify
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.