New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feature request] a tool to clone existing models to make new models with small changes #14032
Comments
That's an interesting feature request, would be very useful indeed! Could provide a better starting point than the templates in many situations. |
Sounds interesting indeed! I personally won't have any time to work on this before end of November however. |
Thank you for validating that it'd be a useful tool, Lysandre and Sylvain Would it be a good idea to open this to the community if perhaps someone would be interested to work on this? |
Yes, that's a good starting point! I would advise studying the templates and how they were implemented (with cookiecutter) in order to provide something similar: it has been used quite a bit by now and should be able to handle most of it. They are available here: https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model |
I'm just concerned that w/o defining a spec of how we think it should be done we are likely to get a proposal that we won't be happy with. So on a second thought perhaps it'd better to wait for Sylvain's time in November. Unless one of you has a clear idea of how you think it can/should be done, write a rough outline, so that it'd guide the contributor in their work. e.g. I have no clue how one of you would want this to work. I know how I'd do it (described in OP) and I'm sure you won't like it. |
Unstale, will soon have time for this :-) |
a gentle ping |
would love to have this tool, as I think GPTMeg model will now have to be re-done as many things have changed in the repo since 2 months ago when the PR was created. #14084 We are waiting for the legal team at BigScience to sort out the licensing, hence there was no activity on this gpt2 megatron variation model for quite some time. But once it's sorted out we will want to release gpt2-13B-en and will need this new architecture. |
I can work on this a bit next week once I have re-enabled the doc styler. I don't promise to have something fully finished before I go on vacation (first week of January) however. |
Not expecting any promises, just appreciating you wanting to work on it, @sgugger - thank you so much! |
馃殌 Feature request
So we have great templates for creating a new model.
Can you think of a way to create full clones of existing models?
Practically for BigScience needs we will have to create something like GPTMeg which is 99.9% identical to GPT2 with 2-3 tiny changes. And then we will need another GPT2 variant that replaces Positional Embeddings with ALiBi. And there will be more variants.
Using templates would be quite expensive, when always everything is really identical.
So ideally a user will do:
and voila it'd replicate model's files, tests and docs.
If all source files could be easily identified this perhaps could be done in a few perl one liners. Here is a very rough outline:
The hard to automate part is the index files as they is only one of each
I think I can work it out, but I'm afraid that the end result would be a set of Perl one-liners only Stas will know what to do with. So perhaps long term this is not a good solution.
Here is the Issue where we need to implement this: bigscience-workshop/Megatron-DeepSpeed#138
and 2 more will be coming soon.
@LysandreJik, @patrickvonplaten, @sgugger
The text was updated successfully, but these errors were encountered: