Conversation
|
When I ran the test in my environment, hcx test ran normally without any failures.
I think the work will proceed in this manner? |
|
Hmm... In my environment, |
…iteration and using cache
…orConditionalGeneration
…d processor classes
…template application
…ing in HyperClovaXProcessor
|
Yep don't worry we'll fix this one on main! |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Hey @jp1924, thanks a lot for the PR!
I think we need to move all the code to modular file first, since there is a lot of copied stuff from other models. I left you comments below about where exactly can copy from. Ping me for another review when the modular is ready
src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
src/transformers/models/hyperclovax_vision/configuration_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
src/transformers/models/hyperclovax_vision/modeling_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
src/transformers/models/hyperclovax_vision/modeling_hyperclovax_vision.py
Show resolved
Hide resolved
tests/models/hyperclovax_vision/test_modeling_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
tests/models/hyperclovax_vision/test_processing_hyperclovax_vision.py
Outdated
Show resolved
Hide resolved
…deo processor classes.
… to HCXVisionConfig and remove unnecessary code.
…d handling of hidden sizes and output processing
…ne input handling
…f vision and text features
Can we ask repo owners to change the |
|
This is because the I’ll open a pull request and ask the repo owner to make the changes, |
|
@zucchini-nlp The reason I’m working on this PR is that I need to use this, and I’m submitting it to resolve this inconvenience... |
|
I suppose the cleanest solution would be to create a conversion file like |
|
Okeee, I didn't scroll enough to see the multimodal config i guess 😓 So to wrap it up, this affects only when loading If yes, then I suggest to open a PR and add a warning on model doc page, saying that the model has to be loaded with |
…e image/video feature handling
… improved clarity
…ameter and add get_video_features method
|
I opened a PR on the Model Hub repo for the Best case is the repo maintainer approves/merges quickly, but activity has been low so responses may be slow. Therefore, for now I think we should proceed with the transformers PR while keeping Loading tests with the current code show:
from transformers import (
HCXVisionForConditionalGeneration,
HyperClovaXForCausalLM,
HyperClovaXConfig,
HCXVisionConfig,
)
HCXVisionForConditionalGeneration.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B")
HyperClovaXForCausalLM.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B")
HyperClovaXConfig.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B")
HCXVisionConfig.from_pretrained("naver-hyperclovax/HyperCLOVAX-SEED-Think-32B")I recommend adding a strong warning in the model docstring / model card to reduce user confusion. Suggested wording:
When the Hub change is applied we can remove or update this warning — this keeps user confusion to a minimum in the meantime. Also, regarding issue #20: I updated Main differences:
Consequences:
So I overrode feature extraction and forward logic to implement HCX's specific behavior. Please review and let me know any feedback or requested changes |
zucchini-nlp
left a comment
There was a problem hiding this comment.
Oke, the model type thing affects only when users want AutoModel support and we can just add a warning in docs. Not a big deal
I think there are still opportunities to copy and delete copy-paste code, e.g. no need to create a whole new LM and TextConfig. Same goes for processor, as I didn't see difference in the code. We can instead define the correct auto-mapping. Should work, no?
| ("vits", "VitsConfig"), | ||
| ("vivit", "VivitConfig"), | ||
| ("vjepa2", "VJEPA2Config"), | ||
| # ("vlm", "HCXVisionConfig"), |
There was a problem hiding this comment.
ig "hyperclovax_vlm: HCXVisionConfig"
| ("qwen2", "Qwen2Config"), | ||
| ("qwen2_5_omni", "Qwen2_5OmniConfig"), | ||
| ("qwen2_5_vl", "Qwen2_5_VLConfig"), | ||
| ("qwen2_5_vl_image", "Qwen2_5_VLVisionConfig"), |
There was a problem hiding this comment.
nit: can we do qwen2_5_vl_vision just for consistency
| ("hunyuan_v1_dense", "HunYuanDenseV1Config"), | ||
| ("hunyuan_v1_moe", "HunYuanMoEV1Config"), | ||
| ("hyperclovax", "HyperClovaXConfig"), | ||
| ("hyperclovax_vlm", "HCXVisionConfig"), |
There was a problem hiding this comment.
same here: hyperclovax_vision
| ("glmasr_encoder", "glmasr"), | ||
| ("hyperclovax", "hyperclovax_vision"), | ||
| ("hyperclovax_vlm", "hyperclovax_vision"), | ||
| # ("vlm", "hyperclovax_vision"), |
There was a problem hiding this comment.
to delete the commented line. There are more, so I won't comment on each one :)
| _no_split_modules = ["HyperClovaXDecoderLayer"] | ||
| input_modalities = ("image", "video", "text") | ||
| _can_record_outputs = {"hidden_states": HyperClovaXDecoderLayer, "attentions": HyperClovaXAttention} | ||
|
|
||
|
|
||
| @auto_docstring | ||
| class HyperClovaXModel(GraniteModel): | ||
| config_class = HyperClovaXConfig | ||
| input_modalities = ("text",) | ||
|
|
||
|
|
||
| @auto_docstring | ||
| class HyperClovaXForCausalLM(GraniteForCausalLM): | ||
| accepts_loss_kwargs = False | ||
| config_class = HyperClovaXConfig | ||
| input_modalities = ("text",) | ||
|
|
There was a problem hiding this comment.
hyperclovax LM is same as granite right? I don't think we need a new LM class for it then
| image_processor_class = "Qwen2VLImageProcessorFast" | ||
| video_processor_class = "Qwen2VLVideoProcessor" | ||
| tokenizer_class = ( | ||
| "GPT2Tokenizer", | ||
| "GPT2TokenizerFast", | ||
| "PreTrainedTokenizer", | ||
| "PreTrainedTokenizerFast", | ||
| ) |
There was a problem hiding this comment.
shouldn't be needed, instead we only add it in auto mapping
| def apply_chat_template(self, conversation, chat_template=None, **kwargs): | ||
| conversation = copy.deepcopy(conversation) | ||
|
|
||
| tokenize = kwargs.get("tokenize", False) | ||
| if not tokenize: | ||
| template = chat_template | ||
| if template is None: | ||
| template = ( | ||
| self.chat_template["default"] if isinstance(self.chat_template, dict) else self.chat_template | ||
| ) | ||
| return self.tokenizer.apply_chat_template(conversation, chat_template=template, **kwargs) | ||
|
|
||
| return super().apply_chat_template(conversation, chat_template=chat_template, **kwargs) |
There was a problem hiding this comment.
same here, i think we can work with nested templates no? If not, I can check because it should be supported
| @@ -0,0 +1,225 @@ | |||
| import copy | |||
There was a problem hiding this comment.
the class is same as https://github.com/huggingface/transformers/blob/main/src/transformers/models/qwen2_vl/processing_qwen2_vl.py. If yes, we can delete the whole file and just map hyperclovax to load a Qwen2VLProcessor in auto mappings
If no, we need to move this in modular and let it copy identical code blocks,
| "vlm": [ | ||
| WeightRenaming("mm_projector", "projector"), | ||
| WeightRenaming("language_model.model", "language_model"), | ||
| WeightRenaming("language_model.lm_head", "lm_head"), | ||
| ], |
There was a problem hiding this comment.
hmm, no, we need to delete this one. Also we don't have a model_type = vlm so it wont be a problem
| VLMS = ["detr"] | ||
| VLMS = [ | ||
| "aria", | ||
| "ayavision", | ||
| "colpali", | ||
| "emu3", | ||
| "fuyu", |
There was a problem hiding this comment.
thsi was deleted on main, no need to revert :)
|
[For maintainers] Suggested jobs to run (before merge) run-slow: auto, hyperclovax_vision, qwen2_5_vl |
|
I want to link here another very related PR. A contrib wants to add the LM backbone and I realized that the text-only LM actually applies |
What does this PR do?
Hello, Transformers team!
I submitted a PR to add naver-hyperclovax/HyperCLOVAX-SEED-Think-32B (hereafter HCX), developed by the Korean IT company Naver while executing the government's national AI model project.
The HCX code was written based on Transformer 4.52.4, leading to the following issues:
Moving to Transformer 5.0.0 significantly improved the readability and development convenience of the modeling code. We aim to leverage this to add the HCX model to transformers.
TODO list
Before submitting
Pull Request section?
to it if that's the case.
documentation guidelines, and
here are tips on formatting docstrings.
Who can review?
@zucchini-nlp @yonigozlan @molbap