Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We鈥檒l occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] a tool to clone existing models to make new models with small changes #14032

Closed
stas00 opened this issue Oct 16, 2021 · 10 comments 路 Fixed by #14992
Closed

[feature request] a tool to clone existing models to make new models with small changes #14032

stas00 opened this issue Oct 16, 2021 · 10 comments 路 Fixed by #14992
Labels
Feature request Request for a new feature

Comments

@stas00
Copy link
Contributor

stas00 commented Oct 16, 2021

馃殌 Feature request

So we have great templates for creating a new model.

Can you think of a way to create full clones of existing models?

Practically for BigScience needs we will have to create something like GPTMeg which is 99.9% identical to GPT2 with 2-3 tiny changes. And then we will need another GPT2 variant that replaces Positional Embeddings with ALiBi. And there will be more variants.

Using templates would be quite expensive, when always everything is really identical.

So ideally a user will do:

transformers-clone-model GPT2 GPTMeg

and voila it'd replicate model's files, tests and docs.

If all source files could be easily identified this perhaps could be done in a few perl one liners. Here is a very rough outline:

  1. find the pertinent source files grep -Irl GPT2 .
  2. rename files/dirs while copying s/gpt2/gpt_meg/
  3. rename internals to s/GPT2/GPTMeg/g

The hard to automate part is the index files as they is only one of each

I think I can work it out, but I'm afraid that the end result would be a set of Perl one-liners only Stas will know what to do with. So perhaps long term this is not a good solution.

Here is the Issue where we need to implement this: bigscience-workshop/Megatron-DeepSpeed#138
and 2 more will be coming soon.

@LysandreJik, @patrickvonplaten, @sgugger

@LysandreJik
Copy link
Member

That's an interesting feature request, would be very useful indeed! Could provide a better starting point than the templates in many situations.

@sgugger
Copy link
Collaborator

sgugger commented Oct 18, 2021

Sounds interesting indeed! I personally won't have any time to work on this before end of November however.

@stas00
Copy link
Contributor Author

stas00 commented Oct 18, 2021

Thank you for validating that it'd be a useful tool, Lysandre and Sylvain

Would it be a good idea to open this to the community if perhaps someone would be interested to work on this?

@LysandreJik
Copy link
Member

Yes, that's a good starting point! I would advise studying the templates and how they were implemented (with cookiecutter) in order to provide something similar: it has been used quite a bit by now and should be able to handle most of it.

They are available here: https://github.com/huggingface/transformers/tree/master/templates/adding_a_new_model

@stas00
Copy link
Contributor Author

stas00 commented Oct 18, 2021

I'm just concerned that w/o defining a spec of how we think it should be done we are likely to get a proposal that we won't be happy with. So on a second thought perhaps it'd better to wait for Sylvain's time in November.

Unless one of you has a clear idea of how you think it can/should be done, write a rough outline, so that it'd guide the contributor in their work. e.g. I have no clue how one of you would want this to work. I know how I'd do it (described in OP) and I'm sure you won't like it.

@sgugger
Copy link
Collaborator

sgugger commented Nov 15, 2021

Unstale, will soon have time for this :-)

@huggingface huggingface deleted a comment from github-actions bot Dec 10, 2021
@stas00
Copy link
Contributor Author

stas00 commented Dec 17, 2021

a gentle ping

@huggingface huggingface deleted a comment from github-actions bot Dec 17, 2021
@LysandreJik LysandreJik added the Feature request Request for a new feature label Dec 23, 2021
@stas00
Copy link
Contributor Author

stas00 commented Dec 23, 2021

would love to have this tool, as I think GPTMeg model will now have to be re-done as many things have changed in the repo since 2 months ago when the PR was created. #14084
It'd be much easier to clone and add the few changes then trying to catch up with all the mods that happened around the gpt2 model.

We are waiting for the legal team at BigScience to sort out the licensing, hence there was no activity on this gpt2 megatron variation model for quite some time. But once it's sorted out we will want to release gpt2-13B-en and will need this new architecture.

@sgugger
Copy link
Collaborator

sgugger commented Dec 23, 2021

I can work on this a bit next week once I have re-enabled the doc styler. I don't promise to have something fully finished before I go on vacation (first week of January) however.

@stas00
Copy link
Contributor Author

stas00 commented Dec 23, 2021

Not expecting any promises, just appreciating you wanting to work on it, @sgugger - thank you so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants