Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change transforms in RoBERTa into classes #2002

Closed
joecummings opened this issue Dec 7, 2022 · 1 comment · Fixed by #2102
Closed

Change transforms in RoBERTa into classes #2002

joecummings opened this issue Dec 7, 2022 · 1 comment · Fixed by #2102
Assignees

Comments

@joecummings
Copy link
Contributor

Currently, transforms in the RobertaBundle are defined as anonymous lambda functions. These are not pickleable and cannot be imported for use anywhere else.

Ex proposal:

lambda: T.Sequential(
        T.SentencePieceTokenizer(urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model")),
        T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"))),
        T.Truncate(510),
        T.AddToken(token=0, begin=True),
        T.AddToken(token=2, begin=False),
    ),

-->

class RobertaTransform:
     def __init__(self, truncate_length=510):
         self.transform =  T.Sequential(
              T.SentencePieceTokenizer(urljoin(_TEXT_BUCKET, "xlmr.sentencepiece.bpe.model")),
              T.VocabTransform(load_state_dict_from_url(urljoin(_TEXT_BUCKET, "xlmr.vocab.pt"))),
              T.Truncate(truncate_length),
              T.AddToken(token=0, begin=True),
              T.AddToken(token=2, begin=False),
          ),

    def __call__(self, text):
        self.transform(text)
@Nayef211
Copy link
Contributor

Nayef211 commented Mar 8, 2023

Hey I would recommend taking a look at #1482 where we actually moved away from using classes to using scriptable sequential container to create the transform. The main benefit from this was as follows

As we added necessary primitive transforms, along with support of #1481 for text transforms, we can easily avoid overhead of maintaining a separate class for model transform. Instead in this PR, we argue that this can be accomplished in much more sustainable way by directly constructing sequential transform as part of Model Bundle object.

As such do you think we want to migrate back to using classes for our model transforms?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants