-
Notifications
You must be signed in to change notification settings - Fork 27.7k
Insights: huggingface/transformers
Overview
Could not load contribution data
Please try again later
1 Release published by 1 person
-
v4.48.1 Patch release v4.48.1
published
Jan 20, 2025
65 Pull requests merged by 39 people
-
Add
Rocketknight1toself-comment-ci.yml#35881 merged
Jan 24, 2025 -
add xpu device check in device_placement
#35865 merged
Jan 24, 2025 -
use torch.testing.assertclose instead to get more details about error in cis
#35659 merged
Jan 24, 2025 -
Fix Llava-NeXT / Llava-NeXT Video / Llava-OneVision's token unpadding mismatch
#35779 merged
Jan 24, 2025 -
Fix test_pipelines_video_classification that was always failing
#35842 merged
Jan 23, 2025 -
fix apply_chat_template() padding choice
#35828 merged
Jan 23, 2025 -
Fix typo
#35854 merged
Jan 23, 2025 -
[DOC] Fix contamination and missing paragraph in translation
#35851 merged
Jan 23, 2025 -
Granite Vision Support
#35579 merged
Jan 23, 2025 -
Fix more CI tests
#35661 merged
Jan 23, 2025 -
Fix uploading processors/tokenizers to WandB on train end
#35701 merged
Jan 23, 2025 -
Fix GA loss for Deepspeed
#35808 merged
Jan 23, 2025 -
add qwen2.5vl
#35569 merged
Jan 23, 2025 -
[Backend support] Allow
num_logits_to_keepas Tensor and change it tologits_to_keep+ add flag#35757 merged
Jan 23, 2025 -
[
tests] remove some flash attention class tests#35817 merged
Jan 23, 2025 -
Fix NoneType type as it requires py>=3.10
#35843 merged
Jan 22, 2025 -
Add PyTorch version check for FA backend on AMD GPUs
#35813 merged
Jan 22, 2025 -
Fix compatibility issues when using auto_gptq with these older versions
#35830 merged
Jan 22, 2025 -
[chat] docs fix
#35840 merged
Jan 22, 2025 -
Fix
head_dimin config extracted from Gemma2 GGUF model#35818 merged
Jan 22, 2025 -
[Chat] Add Chat from TRL 🐈
#35714 merged
Jan 22, 2025 -
Fix : Nemotron tokenizer for GGUF format
#35836 merged
Jan 22, 2025 -
[pipeline] missing import regarding assisted generation
#35752 merged
Jan 22, 2025 -
[gpt2] fix generation tests
#35822 merged
Jan 22, 2025 -
Hotfix: missing
working-directoryinself-comment-ci.yml#35833 merged
Jan 22, 2025 -
Init cache on meta device
#35164 merged
Jan 22, 2025 -
Another security patch for
self-comment-ci.yml#35816 merged
Jan 22, 2025 -
Remove pyav pin to allow python 3.11 to be used
#35823 merged
Jan 21, 2025 -
Remove old
benchmarkcode#35730 merged
Jan 21, 2025 -
[Mimi] update test expected values for t4 runners
#35696 merged
Jan 21, 2025 -
Improve modular documentation
#35737 merged
Jan 21, 2025 -
add Qwen2-VL image processor fast
#35733 merged
Jan 21, 2025 -
move fastspeech to audio models
#35788 merged
Jan 21, 2025 -
[i18n-ar] Translated file:
docs/source/ar/tasks/masked_language_modeling.mdinto Arabic#35198 merged
Jan 21, 2025 -
Optimized set_initialized_submodules.
#35493 merged
Jan 21, 2025 -
Remove deprecated
get_cached_models#35809 merged
Jan 21, 2025 -
Fixed typo in autoawq version number in an error message for IPEX backend requirements.
#35815 merged
Jan 21, 2025 -
Fix : BLOOM tie_word_embeddings in GGUF
#35812 merged
Jan 21, 2025 -
Auto-add
timmtag to timm-wrapper models.#35794 merged
Jan 21, 2025 -
Support adamw_torch_8bit
#34993 merged
Jan 21, 2025 -
add a new flax example for Bert model inference
#34794 merged
Jan 21, 2025 -
[Doc] Adding blog post to model doc for
TimmWrapper#35744 merged
Jan 21, 2025 -
Byebye
test_batching_equivalence's flakiness#35729 merged
Jan 21, 2025 -
Add LlavaImageProcessor
#33191 merged
Jan 21, 2025 -
Update AMD Docker image
#35804 merged
Jan 21, 2025 -
Fix "test_chat_template_dict" in video LLMs
#35660 merged
Jan 21, 2025 -
Deterministic sorting in modular converter when adding new functions
#35795 merged
Jan 21, 2025 -
modular_model_converter bugfix on assignments
#35642 merged
Jan 21, 2025 -
Fixes, improvements to
timmimport behaviour#35800 merged
Jan 20, 2025 -
Tool calling: support more types
#35776 merged
Jan 20, 2025 -
fix low-precision audio classification pipeline
#35435 merged
Jan 20, 2025 -
Fix vits low-precision dtype
#35418 merged
Jan 20, 2025 -
fix document qa bf16 pipeline
#35456 merged
Jan 20, 2025 -
Don't import torch.distributed when it's not available
#35777 merged
Jan 20, 2025 -
Patch moonshine
#35731 merged
Jan 20, 2025 -
transformers.image_transforms.normalize wrong types
#35773 merged
Jan 20, 2025 -
[fix] cannot import name 'Pop2PianoFeatureExtractor' from 'transformers'
#35604 merged
Jan 20, 2025 -
Skip Falcon 7B GGML Test
#35783 merged
Jan 20, 2025 -
remove code owners as it was generating too much noise BUT
#35784 merged
Jan 20, 2025 -
[generate] update docstring of
SequenceBiasLogitsProcessor#35699 merged
Jan 20, 2025 -
fix register_buffer in MimiEuclideanCodebook
#35759 merged
Jan 20, 2025 -
Add SuperGlue model
#29886 merged
Jan 20, 2025 -
[ViTPose] Convert more checkpoints
#35638 merged
Jan 20, 2025 -
Security fix for
self-comment-ci.yml#35548 merged
Jan 20, 2025 -
Fix CI for VLMs
#35690 merged
Jan 20, 2025
44 Pull requests opened by 29 people
-
Fixed ViTMAE for non-square images
#35769 opened
Jan 19, 2025 -
Fix Audio Classification Pipeline top_k Documentation Mismatch and Bug #35736
#35771 opened
Jan 19, 2025 -
Add ColQwen2 to 🤗 transformers
#35778 opened
Jan 19, 2025 -
feat: Add gradient testing for Flash Attention 2
#35780 opened
Jan 20, 2025 -
[tests] further fix "Tester object has no attribute '_testMethodName'"
#35781 opened
Jan 20, 2025 -
Remove head mask in generative models
#35786 opened
Jan 20, 2025 -
Add StyleTTS 2
#35790 opened
Jan 20, 2025 -
Fix is_causal being a tensor
#35791 opened
Jan 20, 2025 -
Fix docstring for get_candidates return shape
#35793 opened
Jan 20, 2025 -
Fix IsADirectoryError in notebook progress bar display methods. Fixes #34766
#35796 opened
Jan 20, 2025 -
Fix StopStringCriteria to handle tokens above len(tokenizer)
#35797 opened
Jan 20, 2025 -
Fix AutoProcessor import order issue with custom classes
#35798 opened
Jan 20, 2025 -
Fix Noise Computation for NEFTune During Packed Training #34659
#35799 opened
Jan 20, 2025 -
[generate] WIP vectorized beam search
#35802 opened
Jan 20, 2025 -
Remove cache migration script
#35810 opened
Jan 21, 2025 -
Exploring use of kwargs for timm model and transforms creation
#35819 opened
Jan 21, 2025 -
[docs] uv install
#35821 opened
Jan 21, 2025 -
Add MultipleChoice & QuestionAnswering heads to ModernBERT
#35825 opened
Jan 22, 2025 -
Add support for MiniMax's MiniMax-Text-01
#35831 opened
Jan 22, 2025 -
fix(FA): QKV not being casted to target_dtype for FA with dpo lora
#35834 opened
Jan 22, 2025 -
Fix PretrainedTokenizerFast check
#35835 opened
Jan 22, 2025 -
Optimize Qwen2VL vision model by precomputing cos/sin embeds before ViT blocks
#35837 opened
Jan 22, 2025 -
Update modeling_attn_mask_utils.py
#35841 opened
Jan 22, 2025 -
Update-tp
#35844 opened
Jan 22, 2025 -
Nail in edge case of torch dtype being overriden permantly in the case of an error
#35845 opened
Jan 22, 2025 -
Github action for auto-assigning reviewers
#35846 opened
Jan 22, 2025 -
Fix Jitter Noise Passing to Experts in Switch Transformers #33969
#35847 opened
Jan 22, 2025 -
Whisper: fix static cache CI
#35852 opened
Jan 23, 2025 -
Add utility for Reload Transformers imports cache for development workflow #35508
#35858 opened
Jan 23, 2025 -
Fix PaliGemma Pad Token Masking During Training #35855
#35859 opened
Jan 23, 2025 -
Fix TP initialization
#35860 opened
Jan 23, 2025 -
Add padding-free to bamba
#35861 opened
Jan 23, 2025 -
[doctest] Fixes
#35863 opened
Jan 23, 2025 -
Add Tensor Parallel support for Gemma
#35864 opened
Jan 23, 2025 -
Fix device mismatch error in Whisper model during feature extraction
#35866 opened
Jan 24, 2025 -
[docs] no hard coding cuda as bnb has multi-backend support
#35867 opened
Jan 24, 2025 -
[docs] fix bugs in the bitsandbytes documentation
#35868 opened
Jan 24, 2025 -
Add default TP plan for all models with backend support
#35870 opened
Jan 24, 2025 -
fix gemma that needed kwargs
#35871 opened
Jan 24, 2025 -
Fix lost loss values when using user-defined compute_loss_func in some cases
#35872 opened
Jan 24, 2025 -
Make cache traceable
#35873 opened
Jan 24, 2025 -
Fix model kwargs
#35875 opened
Jan 24, 2025 -
Fix XGLM loss computation (PyTorch and TensorFlow)
#35878 opened
Jan 24, 2025
42 Issues closed by 16 people
-
ModernBERT fails to work without FlashAttention !
#35879 closed
Jan 24, 2025 -
Some Whisper beam search output (sequences_scores, etc.) is lost in _stack_split_outputs
#32373 closed
Jan 24, 2025 -
LLaVA-OneVision image features and image tokens mismatch
#35775 closed
Jan 24, 2025 -
[Question] Why doesn't `trainer.state.epoch` fall round after training?
#35298 closed
Jan 24, 2025 -
DPO LoRA loss incorrect with Gradient Accumulation in 4.48.1
#35856 closed
Jan 23, 2025 -
set_initialized_submodules too slow when loading big model like DeepSeekV3
#35635 closed
Jan 23, 2025 -
How to load a model directly into the GPU memory?
#35853 closed
Jan 23, 2025 -
A bug that may cause device inconsistency
#31930 closed
Jan 23, 2025 -
Any plans to integrate GTE model natively into transformers
#35568 closed
Jan 23, 2025 -
Significant Increase in Training Loss after Upgrading from Transformers 4.47.1 to 4.48.0
#35787 closed
Jan 23, 2025 -
is it expected that the same token has two different token ids in T5TokenizerFast
#35839 closed
Jan 22, 2025 -
Multi-GPU setup: indices should be either on cpu or on the same device as the indexed tensor (cuda:1)
#33147 closed
Jan 22, 2025 -
how to load the weight of decoder.embed_tokens.weight seperately from the shared weight?
#35152 closed
Jan 22, 2025 -
Detokenization discrepancy with Llama3.1
#35175 closed
Jan 22, 2025 -
Shape mismatch in RoPE embeddings gpt_neox model when rotary_ndims is odd
#35233 closed
Jan 22, 2025 -
microsoft/Phi-3.5-mini-instruct not working with FA2 due to position_ids
#35274 closed
Jan 22, 2025 -
Adding Audio-MAE Model for Underwater Audio to Hugging Face Transformers Library
#35811 closed
Jan 22, 2025 -
Mismatch Between txt img_token and Image Count in Multimodal Processor Causes Debugging
#35254 closed
Jan 22, 2025 -
Handle python 3.11 for huggingface["video"] and dependencies
#35803 closed
Jan 21, 2025 -
image_transforms preprocess quite slow when run large image with qwen2vl
#34272 closed
Jan 21, 2025 -
tokenizers v0.20 not supported
#33528 closed
Jan 21, 2025 -
Links in release note are broken
#35480 closed
Jan 21, 2025 -
Errors when returning the tensors with PyTorch using AutoImageProcessor
#35806 closed
Jan 21, 2025 -
Mimic `adamw_torch_4bit` and have `adamw_torch_8bit`
#34893 closed
Jan 21, 2025 -
Supporting Padding in llava processor
#33175 closed
Jan 21, 2025 -
Issue with Idefics3 sample code
#35369 closed
Jan 21, 2025 -
BarkProcessor voice_preset doesn't work
#34634 closed
Jan 21, 2025 -
RuntimeError: "rshift_cuda" not implemented for 'Half'
#35256 closed
Jan 21, 2025 -
transformers.image_transforms.normalize documents and checks for the wrong type for std and mean arguments
#35772 closed
Jan 20, 2025 -
version 4.47.0 provides different generation results when using quantized awq model
#35286 closed
Jan 20, 2025 -
inconsistent execution time
#35265 closed
Jan 20, 2025 -
apply class transformers.SequenceBiasLogitsProcessor on Qwen model
#35432 closed
Jan 20, 2025 -
Implement SuperPoint / SuperGlue
#25489 closed
Jan 20, 2025 -
Aria processor does not work with images
#35768 closed
Jan 20, 2025 -
Text Only input using LlaVa Next
#35421 closed
Jan 20, 2025 -
StopStringCriteria relies on `len(tokenizer)==model.config.vocab_size`, leading to index errors
#35244 closed
Jan 20, 2025 -
is possible convert transforms tokenizers in sentence piece .model?
#35538 closed
Jan 20, 2025 -
gradient calculation is not correct with gradient accumulation in LM training
#35203 closed
Jan 19, 2025 -
run_mlm_flax on tpu v5-pods
#35205 closed
Jan 19, 2025 -
How to convert my Mask2Former model (ResNet-50 backbone) to Hugging Face transformer
#35186 closed
Jan 18, 2025
27 Issues opened by 26 people
-
Request to add Co-DETR
#35882 opened
Jan 24, 2025 -
Mllama training via FSDP device and dtype misassignment
#35880 opened
Jan 24, 2025 -
Mask2Former _init_weights
#35877 opened
Jan 24, 2025 -
Support Shared Cache
#35876 opened
Jan 24, 2025 -
ZeroShotClassificationArgumentHandler should be explicit it has a somewhat unsafe internal behaviour.
#35874 opened
Jan 24, 2025 -
ERROR: Video features and Video Tokens do not match!!!
#35869 opened
Jan 24, 2025 -
Adding special tokens by default
#35862 opened
Jan 23, 2025 -
[DEV Testing] Issues with `test_modeling_common`
#35857 opened
Jan 23, 2025 -
Paliegemma Pad Token not Masked
#35855 opened
Jan 23, 2025 -
resume_from_checkpoint failed when using PEFT LORA
#35850 opened
Jan 23, 2025 -
chane data
#35849 opened
Jan 23, 2025 -
forward() got an unexpected keyword argument 'num_items_in_batch'
#35838 opened
Jan 22, 2025 -
tokenizer_class: `LlamaTokenizerFast` becomes `LlamaTokenizer` after load + immediate save
#35832 opened
Jan 22, 2025 -
ImportError: cannot import name 'NoneType' from 'types' on main in Python 3.9
#35827 opened
Jan 22, 2025 -
model.gradient_checkpointing_enable() makes loss.requires_grad be False
#35826 opened
Jan 22, 2025 -
multi-gpu: test_model_parallel_beam_search tests fail with "IndexError: list index out of range"
#35824 opened
Jan 21, 2025 -
convert_llama_weight_to_hf.py
#35820 opened
Jan 21, 2025 -
[Feature Request] Support register customize quantization method out-of-tree
#35814 opened
Jan 21, 2025 -
How to change data
#35807 opened
Jan 21, 2025 -
RWKV CUDA error: an illegal memory access was encountered during training from scratch
#35805 opened
Jan 21, 2025 -
Ascend:Training not loaded into NPU
#35785 opened
Jan 20, 2025 -
Auto-resume from checkpoint throws error if last checkpoint is incomplete
#35782 opened
Jan 20, 2025 -
TPU Initialization Error with Transformers in Kaggle TPU VM v3-8
#35774 opened
Jan 19, 2025 -
Mamba2 doesn't support Multi-GPU training (fast path)
#35770 opened
Jan 19, 2025 -
Issue: Error with _eos_token_tensor when using Generator with GenerationMixin
#35767 opened
Jan 18, 2025 -
Defining LLM Dataset types in Trainers or during Training Workflow
#35766 opened
Jan 18, 2025 -
Inconsistent output lengths when `max_length=20` is set implicitly vs explicitly in `generate()`
#35765 opened
Jan 18, 2025
132 Unresolved conversations
Sometimes conversations happen on old items that aren’t yet closed. Here is a list of all the Issues and Pull Requests with unresolved conversations.
-
Add Zamba2
#34517 commented on
Jan 24, 2025 • 39 new comments -
Add support for Apple's Depth-Pro
#34583 commented on
Jan 24, 2025 • 20 new comments -
Adding RTDETRv2
#34773 commented on
Jan 22, 2025 • 17 new comments -
Add GOT-OCR 2.0 to Transformers
#34721 commented on
Jan 24, 2025 • 16 new comments -
Add LightGlue model
#31718 commented on
Jan 24, 2025 • 15 new comments -
support telechat2
#35415 commented on
Jan 23, 2025 • 13 new comments -
Pixtral: vectorize patch embeddings and enable tests
#35122 commented on
Jan 21, 2025 • 12 new comments -
Trainer Refactor: Part 1
#35567 commented on
Jan 23, 2025 • 11 new comments -
Fix Batch Size Mismatch When Using `crops_n_layers` in `mask-generation` Pipeline #35530
#35627 commented on
Jan 22, 2025 • 10 new comments -
Universal Speculative Decoding `CandidateGenerator`
#35029 commented on
Jan 23, 2025 • 9 new comments -
Uniformize OwlViT and Owlv2 processors
#35700 commented on
Jan 24, 2025 • 8 new comments -
Save checkpoint to temporary directory to handle partial saves during failures
#35580 commented on
Jan 24, 2025 • 7 new comments -
Mask2former & Maskformer Fast Image Processor
#35685 commented on
Jan 23, 2025 • 7 new comments -
Add Prompt Depth Anything Model
#35401 commented on
Jan 22, 2025 • 7 new comments -
Add internlm3 dense
#35694 commented on
Jan 22, 2025 • 6 new comments -
Several fixes related to rotary position embeddings
#35376 commented on
Jan 22, 2025 • 6 new comments -
switch from `training_args.bin` `training_args.json`
#35010 commented on
Jan 21, 2025 • 6 new comments -
Integrate xlstm cleanly.
#35377 commented on
Jan 20, 2025 • 3 new comments -
Refactoring of ImageProcessorFast
#35069 commented on
Jan 22, 2025 • 3 new comments -
Bart: new cache format
#35314 commented on
Jan 23, 2025 • 3 new comments -
Remove _supports_static_cache = True for some model classes
#34975 commented on
Jan 23, 2025 • 3 new comments -
Continuous batching
#35727 commented on
Jan 23, 2025 • 2 new comments -
Make `output_dir` Optional in `TrainingArguments` #27866
#35735 commented on
Jan 20, 2025 • 2 new comments -
Add DAB-DETR Object detection/segmentation model
#30803 commented on
Jan 23, 2025 • 2 new comments -
[WiP] `GPT2Model` StaticCache support
#35761 commented on
Jan 23, 2025 • 2 new comments -
ModernBERT FlexAttention
#35423 commented on
Jan 21, 2025 • 1 new comment -
Fix hardcoded `float` dtypes in DeBERTa model, which caused multiple RuntimeErrors in `bfloat16`
#35336 commented on
Jan 24, 2025 • 1 new comment -
Adding FlexAttention Support for Qwen2 models
#35155 commented on
Jan 24, 2025 • 1 new comment -
Fix mask slicing for models with HybridCache
#35681 commented on
Jan 21, 2025 • 1 new comment -
Test: generate with `torch.compile(model.forward)` as a fast test
#34544 commented on
Jan 23, 2025 • 1 new comment -
fix immediate quantization of the first token in QuantizedCache
#35760 commented on
Jan 20, 2025 • 1 new comment -
add RAdamScheduleFree optimizer
#35313 commented on
Jan 20, 2025 • 0 new comments -
Add support for DeepSpeed sequence parallelism (Ulysses)
#35301 commented on
Jan 24, 2025 • 0 new comments -
🔴 Video processors as a separate class
#35206 commented on
Jan 24, 2025 • 0 new comments -
Create Zero-Delay_QKV_Compression.md
#35324 commented on
Jan 23, 2025 • 0 new comments -
Create zero-delay-qkv-compression.py
#35328 commented on
Jan 23, 2025 • 0 new comments -
Create zero_delay_qkv_benchmark.py
#35329 commented on
Jan 23, 2025 • 0 new comments -
Samhq model addition
#35147 commented on
Jan 24, 2025 • 0 new comments -
Commont bot CI for other jobs (`generation` / `quantization`)
#35341 commented on
Jan 22, 2025 • 0 new comments -
Output dicts support in text generation pipeline
#35092 commented on
Jan 21, 2025 • 0 new comments -
Replace all torch.FloatTensor by torch.Tensor
#35004 commented on
Jan 24, 2025 • 0 new comments -
Fix Gemma2 dtype issue when storing weights in float16 precision
#35398 commented on
Jan 21, 2025 • 0 new comments -
Add timm_wrapper support to AutoFeatureExtractor
#35764 commented on
Jan 21, 2025 • 0 new comments -
multi-gpu: fix tensor device placements for various models
#35763 commented on
Jan 22, 2025 • 0 new comments -
Pipeline: fix unnecessary warnings
#35753 commented on
Jan 20, 2025 • 0 new comments -
[Whisper] Pipeline: handle long form generation
#35750 commented on
Jan 20, 2025 • 0 new comments -
More tensor parallel
#35748 commented on
Jan 23, 2025 • 0 new comments -
Fix multi gpu loss sync condition, add doc and test
#35743 commented on
Jan 21, 2025 • 0 new comments -
Fix: loading DBRX back from saved path
#35728 commented on
Jan 24, 2025 • 0 new comments -
VLM: compile compatibility
#35724 commented on
Jan 23, 2025 • 0 new comments -
Add mean_resizing for every VLMs' resizing_token_embeddings()
#35717 commented on
Jan 20, 2025 • 0 new comments -
Fix Aria CI and testing
#35674 commented on
Jan 20, 2025 • 0 new comments -
Add more rigerous non-slow grad accum tests
#35668 commented on
Jan 21, 2025 • 0 new comments -
Decompose chat template docs
#35657 commented on
Jan 22, 2025 • 0 new comments -
Fix tests for vision models
#35654 commented on
Jan 22, 2025 • 0 new comments -
Add XPU type for work-around -inf mask causing sdpa NaN issue in modeling files
#35647 commented on
Jan 24, 2025 • 0 new comments -
Guard against unset resolved_archive_file
#35628 commented on
Jan 23, 2025 • 0 new comments -
Uniformize LlavaNextVideoProcessor kwargs
#35613 commented on
Jan 20, 2025 • 0 new comments -
Fix the config class comparison for remote code models
#35592 commented on
Jan 23, 2025 • 0 new comments -
Support QuestionAnswering Module for ModernBert based models.
#35566 commented on
Jan 23, 2025 • 0 new comments -
BLIPs clean-up
#35560 commented on
Jan 20, 2025 • 0 new comments -
Add support for nested images to LLava and VipLLava
#35558 commented on
Jan 21, 2025 • 0 new comments -
Add support for 4D custom attention masks in GPT-2
#35517 commented on
Jan 19, 2025 • 0 new comments -
[tokenizer] fix llama tokenizer (slow)
#35488 commented on
Jan 23, 2025 • 0 new comments -
Add FAST
#35476 commented on
Jan 22, 2025 • 0 new comments -
Add SDPA support for LayoutLMv3 model
#35469 commented on
Jan 22, 2025 • 0 new comments -
Fix #35447 Tokenizer does not split text according to newly added input tokens
#35455 commented on
Jan 20, 2025 • 0 new comments -
Support constant lr with cooldown
#35453 commented on
Jan 24, 2025 • 0 new comments -
[WIP] Add support for flex attention (paged attention)
#35419 commented on
Jan 21, 2025 • 0 new comments -
Add D-FINE Model into Transformers
#35400 commented on
Jan 22, 2025 • 0 new comments -
Add support for MiniMax-Text-01 and MiniMax-VL-01 from MiniMaxAI
#35710 commented on
Jan 22, 2025 • 0 new comments -
multi-gpu: test_model_parallel_beam_search tests fail with "RuntimeError: Expected all tensors to be on the same device"
#35762 commented on
Jan 22, 2025 • 0 new comments -
Incorrect Whisper long-form decoding timestamps
#31942 commented on
Jan 22, 2025 • 0 new comments -
Offline mode doesn't work with models that require `trust_remote_code=True`
#34855 commented on
Jan 22, 2025 • 0 new comments -
Unknown quantization type, got fp8
#35471 commented on
Jan 22, 2025 • 0 new comments -
Gemma2: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:7 and cuda:0! (when checking argument for argument index in method wrapper_CUDA__index_select)
#34706 commented on
Jan 22, 2025 • 0 new comments -
Multiple training runs not working with deepspeed
#35073 commented on
Jan 22, 2025 • 0 new comments -
Add CLIP-ViP
#22829 commented on
Jan 21, 2025 • 0 new comments -
LlavaNextVideoProcessor -> TypeError: LlavaNextVideoProcessor.__call__() got an unexpected keyword argument 'legacy' (I have the fix)
#35602 commented on
Jan 21, 2025 • 0 new comments -
No use `no_sync` context manager when using gradient accumulation w/ deepspeed's zero stage 2 or 3 via `accelerate`
#34984 commented on
Jan 21, 2025 • 0 new comments -
`RuntimeError: self and mat2 must have the same dtype, but got Float and BFloat16` when training with `torch_compile`
#35382 commented on
Jan 21, 2025 • 0 new comments -
`modular_model_converter` can not handle objects import via try - except
#35414 commented on
Jan 21, 2025 • 0 new comments -
TypeError: Accelerator.__init__() got an unexpected keyword argument 'dispatch_batches'
#34714 commented on
Jan 21, 2025 • 0 new comments -
redirect logging output to `stdout` instead of `stderr`
#34613 commented on
Jan 21, 2025 • 0 new comments -
When set num_beams in GenerationConfig, stop_strings parameter has no effect
#34843 commented on
Jan 20, 2025 • 0 new comments -
Neftune computation is probably wrong with packed training
#34659 commented on
Jan 20, 2025 • 0 new comments -
Default value for mean_resizing in resize_token_embeddings should be False
#35357 commented on
Jan 20, 2025 • 0 new comments -
Qwen2VL exhibits significant performance differences under different attention implementations.
#35749 commented on
Jan 20, 2025 • 0 new comments -
Help Understanding Beam Search Scores in Hugging Face (LLaMA + LoRA)
#35618 commented on
Jan 20, 2025 • 0 new comments -
tokenizer decode decode with timestamp fails for extended vocabulary
#35330 commented on
Jan 20, 2025 • 0 new comments -
[Whisper] TypeError: '<=' not supported between instances of 'NoneType' and 'float'
#33552 commented on
Jan 20, 2025 • 0 new comments -
Model loaded with `PretrainedModel.from_pretrained` and `with torch.device("cuda"):` decorator leads to unexpected errors compared to `.to("cuda")`
#35371 commented on
Jan 20, 2025 • 0 new comments -
MultiModalityCausalLM does not support Flash Attention 2.0 yet
#35383 commented on
Jan 20, 2025 • 0 new comments -
FA2 support for Aria
#35670 commented on
Jan 20, 2025 • 0 new comments -
Object Detection Pipeline only outputs first element when batching
#31356 commented on
Jan 19, 2025 • 0 new comments -
Audio-Classification Pipeline top_k Documentation mismatch and bug (possibly generalizes to any classification pipelines)
#35736 commented on
Jan 19, 2025 • 0 new comments -
Maybe the way SequenceClassification Model calculates the last non-pad token is not reasonable.
#35352 commented on
Jan 19, 2025 • 0 new comments -
Tranformers documentation translation to Italian
#17459 commented on
Jan 18, 2025 • 0 new comments -
unable to convert llama 3.3 weights to hf.py
#35326 commented on
Jan 18, 2025 • 0 new comments -
MPI environment variables are not set.
#35331 commented on
Jan 18, 2025 • 0 new comments -
Default arguments in `DebertaConfig` disable relative attention, contrary to the docs and `deberta-base`
#35335 commented on
Jan 18, 2025 • 0 new comments -
Move `DataCollatorForMultipleChoice` from the docs to the package
#34763 commented on
Jan 24, 2025 • 0 new comments -
Support `return_tensors` in audio chat templates
#34601 commented on
Jan 20, 2025 • 0 new comments -
Handle num_items_in_batch in Mistral's forward
#34576 commented on
Jan 24, 2025 • 0 new comments -
fix: DataCollatorWithFlattening incompatible with Tensor input ids
#34267 commented on
Jan 23, 2025 • 0 new comments -
LLaVA-NeXT: add new model checkpoints
#34195 commented on
Jan 21, 2025 • 0 new comments -
feat: add support for tensor parallel using Pytorch
#34194 commented on
Jan 23, 2025 • 0 new comments -
Add TimesFM Time Series Forecasting Model
#34082 commented on
Jan 21, 2025 • 0 new comments -
fix: Updated BridgeTower Image processor
#32384 commented on
Jan 23, 2025 • 0 new comments -
Add Segment Anything 2 (SAM2)
#32317 commented on
Jan 21, 2025 • 0 new comments -
added warning to Trainer when label_names is not specified for PeftModel
#32085 commented on
Jan 22, 2025 • 0 new comments -
[docs] Redesign
#31757 commented on
Jan 23, 2025 • 0 new comments -
Support Kosmos-2.5
#31711 commented on
Jan 24, 2025 • 0 new comments -
RLE of SAM can't handle masks with no change
#35664 commented on
Jan 25, 2025 • 0 new comments -
AttributeError: 'Config' object has no attribute '_get_non_default_generation_parameters'
#35543 commented on
Jan 24, 2025 • 0 new comments -
Jitter Noise added to input being passed to experts in Switch Transformers
#33969 commented on
Jan 24, 2025 • 0 new comments -
BatchEncoding.to throws away columns silently, thus no way to pass non-tensor columns such as String in Trainer metric computation
#34983 commented on
Jan 24, 2025 • 0 new comments -
Allow passing 2D attention mask
#27640 commented on
Jan 24, 2025 • 0 new comments -
Supporting Selective Activation Checkpointing and CPU Offloading Option.
#29648 commented on
Jan 24, 2025 • 0 new comments -
Reload Transformers imports
#35508 commented on
Jan 23, 2025 • 0 new comments -
hyperparameter_serch() does not consider LoRA parameters like r to be finetuned.
#29391 commented on
Jan 23, 2025 • 0 new comments -
A warning message showing that `MultiScaleDeformableAttention.so` is not found in `/root/.cache/torch_extensions` if `ninja` is installed with `transformers`
#35349 commented on
Jan 23, 2025 • 0 new comments -
Memory leak on python 3.10.*
#35434 commented on
Jan 23, 2025 • 0 new comments -
AttributeError in automatic_speech_recognition.py when return_segments and return_timestamps are both True
#35713 commented on
Jan 23, 2025 • 0 new comments -
Recomputed tensor size does not match when using activation checkpointing when using FSDP and accelerate
#34928 commented on
Jan 23, 2025 • 0 new comments -
Vision models don't work for non-square object
#35280 commented on
Jan 23, 2025 • 0 new comments -
resizing token embeddings causes output embedding to be reinitialized in `post_init` when `tie_word_embedding` is False
#35141 commented on
Jan 23, 2025 • 0 new comments -
DeBERTa's `DisentangledSelfAttention` hardcodes `float` dtype, which causes `bfloat16` overflow error
#35332 commented on
Jan 23, 2025 • 0 new comments -
torchrun breaks with load_model_at_end and with metric_for_best_model=eval_f1 on question_answering example
#30819 commented on
Jan 23, 2025 • 0 new comments -
Accelerate x Trainer issue tracker:
#33345 commented on
Jan 23, 2025 • 0 new comments -
When gradient checkpointing is enabled, flash_attn_kwargs cannot be passed into the decoder_layer
#35509 commented on
Jan 23, 2025 • 0 new comments -
AttributeError: 'SegformerFeatureExtractor' object has no attribute 'reduce_labels' still has no clear guide around
#35402 commented on
Jan 23, 2025 • 0 new comments