Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sequence to Sequence Problem Type #308

Merged
merged 20 commits into from
Aug 4, 2023
Merged

Sequence to Sequence Problem Type #308

merged 20 commits into from
Aug 4, 2023

Conversation

psinger
Copy link
Collaborator

@psinger psinger commented Jul 24, 2023

Closes #100

Adds a new problem Type Sequence to Sequence Modeling. For now tested for different T5 models.
We can probably go step by step adding more functionality if this will be frequently used.

I tried to reuse as much as possible to not have much duplicate code.

@maxjeblick please take an initial look, it might need 1-2 iterations

Todo:

  • Rework dataset?
  • Check padding
  • Check merging / pipeline
  • Default settings for dtype etc?
  • Model card

@psinger psinger requested a review from maxjeblick July 25, 2023 12:49
@psinger psinger marked this pull request as ready for review July 25, 2023 12:54
Copy link
Contributor

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks a lot for adding this problem type, it looks in a very good state!

I would move prepare_lora and generate to separate functions to avoid code duplication (also, they will probably be needed for other potential problem types as well).
Apart from that, the code looks fine, will test it more the next days.

@psinger
Copy link
Collaborator Author

psinger commented Jul 28, 2023

Thx for the comments @maxjeblick - tried to address them.

Copy link
Contributor

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the changes! Some observations:

  • Summary tab fails with AttributeError: 'ConfigProblemBase' object has no attribute 'hf'
  • Train Data Insights currently shows decoded input ids, whereas the model uses prompt input ids. Maybe it is useful in general to show prompt input ids, instead of input ids? Could potentially also changed in Max/insights table view #301.
  • Mid-term, promoting generate and prepare_lora to dedicated functions (with corresponding arguments, cfg, backbone, ...) may be useful. We can leave it here as is and I can have a look in [CODE IMPROVEMENT] Promote RLHF as a separate problem type #317.

@maxjeblick maxjeblick mentioned this pull request Aug 1, 2023
@maxjeblick
Copy link
Contributor

The PR is probably good to merge after hf has been added to the config.

@psinger
Copy link
Collaborator Author

psinger commented Aug 3, 2023

Yes, I just noticed that I also need to adjust the summary model card now. On it.

@psinger psinger requested a review from maxjeblick August 3, 2023 12:15
Copy link
Contributor

@maxjeblick maxjeblick left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thanks!

One small issue:
trust_remote_code is not populated in the model card, should be easy to fix.

@psinger psinger merged commit 454b8df into main Aug 4, 2023
5 checks passed
@psinger psinger deleted the psi/seq2seq branch August 4, 2023 13:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Feature] Allow fine tuning T5 models?
2 participants