Skip to content

Conversation

metascroy
Copy link
Contributor

As titled

Copy link

pytorch-bot bot commented May 13, 2025

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/executorch/10839

Note: Links to docs will display an error until the docs builds have been completed.

❗ 1 Active SEVs

There are 1 currently active SEVs. If your PR is affected, please view them below:

❌ 3 New Failures

As of commit 880c99b with merge base d7201ab (image):

NEW FAILURES - The following jobs have failed:

This comment was automatically generated by Dr. CI and updates every 15 minutes.

@facebook-github-bot facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label May 13, 2025
Copy link

This PR needs a release notes: label

If your change should be included in the release notes (i.e. would users of this library care about this change?), please use a label starting with release notes:.

If not, please add the release notes: none label.

To add a label, you can comment to pytorchbot, for example
@pytorchbot label "release notes: none"

For more information, see
https://github.com/pytorch/pytorch/wiki/PyTorch-AutoLabel-Bot#why-categorize-for-release-notes-and-how-does-it-work.

Copy link
Contributor

@jackzhxng jackzhxng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

have you tested the regular qwen3 export?

converted_state_dict["output.weight"] = converted_state_dict[
"tok_embeddings.weight"
]
# If lm_head.weight is not present, assume tied embeddings (e.g., 0.6b and 4b models)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# If lm_head.weight is not present, assume tied embeddings (e.g., 0.6b and 4b models)
# If lm_head.weight is not present, assume tied embeddings (0.6b, 1.7b, and 4b models)

"tok_embeddings.weight"
]
# If lm_head.weight is not present, assume tied embeddings (e.g., 0.6b and 4b models)
if "lm_head.weight" not in state_dict:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lm_head is present in the hf checkpoints even if they are tied embeddings, it will just be the same weights as the tok_embeddings

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, are you sure it's there? I thought when config. tie_word_embeddings = true, it might not be there, but gets materialized during a tie_weights() command on the HF model.

In any case, if it is there, it's covered by the regular loop through keys and this logic is not executed. If it's not there, this sets lm_head's weight to the embeddings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's here, https://huggingface.co/Qwen/Qwen3-0.6B/tree/main?show_file_info=model.safetensors. Also I remember seeing it while debugging the checkpoint. But sure, in that case could you reword the comment?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworded the comment a little.

I'm not convinced by https://huggingface.co/Qwen/Qwen3-0.6B/tree/main?show_file_info=model.safetensors because it's just metadata, and doesn't prove anything about what is stored in the file.

It does not look like lm_head is present in the safetensors when I unpack them locally. Perhaps you looked at the checkpoint after running your script? (Which copied the embedding tensors into lm_head).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, if you double checked then that's fine!

)
parser.add_argument(
"input_dir",
"input_dir_or_checkpoint",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a big fan of specifying something that could be either a dir or a file, if pytorch_model.bin is the filename for the quantized checkpoint going forward i'd rather just specify the checkpoint and search for that

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how do you usually download the directories from HF?

Copy link
Contributor

@jackzhxng jackzhxng May 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

e.g. huggingface-cli download ibm-granite/granite-3b-code-instruct-128k gets you the entire directory

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed to use directories.

@metascroy
Copy link
Contributor Author

have you tested the regular qwen3 export?

I can test if there is no CI for it.

@jackzhxng
Copy link
Contributor

jackzhxng commented May 13, 2025

Yeah if you test it and it works, feel free to merge pending comments

@metascroy
Copy link
Contributor Author

Yeah if you test it and it works, feel free to merge pending comments

Checked convert script works with original Qwen3 on HF

@metascroy metascroy merged commit fa5048b into main May 14, 2025
89 of 93 checks passed
@metascroy metascroy deleted the support-qwen3-prequant branch May 14, 2025 16:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. topic: not user facing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants