Add TFGPT2ForSequenceClassification based on DialogRPT #8714

spatil6 · 2020-11-22T17:22:09Z

What does this PR do?

This PR implements TFGPT2ForSequenceClassification in order to support DialogRPT.
Strongly based on modifications made in #7501
Fixes #7622

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline,
Pull Request section?
Was this discussed/approved via a Github issue or the forum? Please add a link
to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors which may be interested in your PR.

@LysandreJik Please review this PR, let me know if there is anything that should be changed =)

jplu

Thank you very much for this very nice addition!!

I left few comments on it. Also can you run the following piece of code and tell me if it works properly:

import tensorflow as tf
from transformers import GPT2Tokenizer, TFGPT2ForSequenceClassification

model = tf.function(TFGPT2ForSequenceClassification.from_pretrained("microsoft/dialogrpt"))
tokenizer = GPT2Tokenizer.from_pretrained("microsoft/dialogrpt")
inputs = tokenizer("Hello", return_tensors="tf")
model(inputs)

@LysandreJik I would recommend as well to wait a bit that the new input processing to be merged.

jplu · 2020-11-24T14:16:42Z

src/transformers/models/gpt2/modeling_tf_gpt2.py

+    """,
+    GPT2_START_DOCSTRING,
+)
+class TFGPT2ForSequenceClassification(TFGPT2PreTrainedModel, TFSequenceClassificationLoss):


Be careful, TFSequenceClassificationLoss takes only into account -100 as pad token id. So either you assume everywhere to have this same value, either you should redefine yourself the loss computation.

Apparently you forgot this comment :)

TFGPT2ForSequenceClassification uses the last token in order to do the classification. So for given task, if a pad_token_id is defined in the configuration, it finds the last token that is not a pad token in each row, If no pad_token_id is defined, it simply takes the last value in each row of the batch.

So I think TFSequenceClassificationLoss loss function should work, as it takes single last token which is not pad token.

Let me know your views on it.

@LysandreJik has already defined pad_token_id: 50256 for this model.
#7493 (comment)

This is snapshot of TFSequenceClassificationLoss, Not any mention of pad token id : -100 is aregetting ignored.

There are other loss function like TFTokenClassificationLoss,TFNextSentencePredictionLoss where toke id : -100 is getting ignored.

Ok, fine for me about this then!

jplu · 2020-11-24T14:19:22Z

src/transformers/models/gpt2/modeling_tf_gpt2.py

+            if input_ids is not None:
+                batch_size, sequence_length = input_ids.shape[:2]
+            else:
+                batch_size, sequence_length = inputs_embeds.shape[:2]


Use the shape_list function instead of the .shape.

jplu · 2020-11-24T14:22:07Z

src/transformers/models/gpt2/modeling_tf_gpt2.py

+                result = tf.map_fn(
+                    fn=lambda t: get_seq_element(t[0], t[1]), elems=[sequence_lengths, logits], dtype="float"
+                )
+                p_logits = tf.reshape(result, [result.shape[0], result.shape[-1]])


jplu · 2020-11-24T14:30:41Z

src/transformers/models/gpt2/modeling_tf_gpt2.py

+    def call(
+        self,
+        inputs,
+        past=None,
+        attention_mask=None,
+        token_type_ids=None,
+        position_ids=None,
+        head_mask=None,
+        inputs_embeds=None,
+        labels=None,
+        use_cache=None,
+        output_attentions=None,
+        output_hidden_states=None,
+        return_dict=None,


The list is not in the proper order, same think for the input order processing below. Look at the TFGPT2LMHeadModel class to see an example.

spatil6 · 2020-11-25T09:36:26Z

Thanks for review @jplu . I'll update my code with review comments and new input processing.

upto date with master

…mplemented review comments and added input processing

spatil6 · 2020-11-27T20:14:20Z

Thank you very much for this very nice addition!!

I left few comments on it. Also can you run the following piece of code and tell me if it works properly:
import tensorflow as tf
from transformers import GPT2Tokenizer, TFGPT2ForSequenceClassification

model = tf.function(TFGPT2ForSequenceClassification.from_pretrained("microsoft/dialogrpt"))
tokenizer = GPT2Tokenizer.from_pretrained("microsoft/dialogrpt")
inputs = tokenizer("Hello", return_tensors="tf")
model(inputs)
@LysandreJik I would recommend as well to wait a bit that the new input processing to be merged.

spatil6 · 2020-11-27T20:19:21Z

Hello @jplu and @LysandreJik ,
I have refactored code as per review comments and added new input processing as well.

Kindly review.

jplu

Much better!! Thanks for the updates.

There is still one comment to be addressed and the tests to fix.

jplu · 2020-11-29T17:39:59Z

src/transformers/models/gpt2/modeling_tf_gpt2.py

+    """,
+    GPT2_START_DOCSTRING,
+)
+class TFGPT2ForSequenceClassification(TFGPT2PreTrainedModel, TFSequenceClassificationLoss):


Apparently you forgot this comment :)

sync up

…mplemented review comments and added input processing

spatil6 · 2020-12-02T12:13:52Z

Much better!! Thanks for the updates.

There is still one comment to be addressed and the tests to fix.

@jplu tests are also fixed now.

LysandreJik

LGTM! Tested it out locally, it works great :) Thanks @spatil6!

LysandreJik · 2020-12-04T15:27:41Z

docs/source/model_doc/gpt2.rst

@@ -114,3 +114,9 @@ TFGPT2DoubleHeadsModel

 .. autoclass:: transformers.TFGPT2DoubleHeadsModel
    :members: call
+
+TFGPT2ForSequenceClassification


Could you also add the TFSequenceClassifierOutputWithPast here in the model-specific outputs?

Ok I'll do it.

jplu · 2020-12-04T15:33:49Z

@spatil6 we have merged today a PR that updates the way the booleans are processed. You can see an example in the TF BERT file for example, can you rebase and proceed to the same changes please. It would be awesome if you could do it!

spatil6 · 2020-12-04T15:58:06Z

@spatil6 we have merged today a PR that updates the way the booleans are processed. You can see an example in the TF BERT file for example, can you rebase and proceed to the same changes please. It would be awesome if you could do it!

Sure, will do that.

sync

Fix TF T5 only encoder model with booleans (huggingface#8925)

LysandreJik

The new inputs processing look good to me. If it looks good to you @jplu, feel free to merge!

jplu

LGTM!! Good work!!

Add TFGPT2ForSequenceClassification based on DialogRPT

3d0ee50

LysandreJik requested a review from jplu November 23, 2020 20:21

jplu reviewed Nov 24, 2020

View reviewed changes

spatil6 added 4 commits November 28, 2020 00:22

Merge pull request #1 from huggingface/master

9884182

upto date with master

Add TFGPT2ForSequenceClassification based on DialogRPT

fef3111

TFGPT2ForSequenceClassification based on DialogRPT-refactored code, i…

16a0276

…mplemented review comments and added input processing

TFGPT2ForSequenceClassification based on DialogRPT-refactored code, i…

c14399a

…mplemented review comments and added input processing

spatil6 requested a review from jplu November 28, 2020 07:28

jplu reviewed Nov 29, 2020

View reviewed changes

spatil6 requested a review from jplu November 30, 2020 16:07

spatil6 added 4 commits December 2, 2020 16:07

Merge pull request #7 from huggingface/master

c28164f

sync up

Add TFGPT2ForSequenceClassification based on DialogRPT

c974e29

TFGPT2ForSequenceClassification based on DialogRPT-refactored code, i…

1516adc

…mplemented review comments and added input processing

code refactor

9e6ac3e

LysandreJik self-requested a review December 2, 2020 15:26

spatil6 mentioned this pull request Dec 2, 2020

Implement PyTorch and/or TensorFlow sequence classification architectures for causal language models #7623

Closed

LysandreJik approved these changes Dec 4, 2020

View reviewed changes

spatil6 added 6 commits December 4, 2020 22:01

Merge pull request #8 from huggingface/master

3e80c23

sync

code refactor for latest other TF PR

8b20e4c

Merge pull request #9 from huggingface/master

3eb31bd

Fix TF T5 only encoder model with booleans (huggingface#8925)

code refactor

eaf324d

code refactor

72653ed

Update modeling_tf_gpt2.py

4fb6467

LysandreJik approved these changes Dec 7, 2020

View reviewed changes

jplu approved these changes Dec 7, 2020

View reviewed changes

jplu merged commit 483e132 into huggingface:master Dec 7, 2020

spatil6 deleted the tf2_gpt2_sequence_model branch December 19, 2020 14:55

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add TFGPT2ForSequenceClassification based on DialogRPT #8714

Add TFGPT2ForSequenceClassification based on DialogRPT #8714

spatil6 commented Nov 22, 2020

jplu left a comment •

edited

jplu Nov 24, 2020

jplu Nov 29, 2020

spatil6 Nov 30, 2020

spatil6 Nov 30, 2020

jplu Dec 1, 2020 •

edited

jplu Nov 24, 2020

spatil6 Nov 27, 2020

jplu Nov 24, 2020

spatil6 Nov 27, 2020

jplu Nov 24, 2020

spatil6 Nov 27, 2020

spatil6 commented Nov 25, 2020

spatil6 commented Nov 27, 2020

spatil6 commented Nov 27, 2020

jplu left a comment

jplu Nov 29, 2020

spatil6 commented Dec 2, 2020

LysandreJik left a comment

LysandreJik Dec 4, 2020

spatil6 Dec 4, 2020

spatil6 Dec 4, 2020

jplu commented Dec 4, 2020

spatil6 commented Dec 4, 2020

LysandreJik left a comment

jplu left a comment

Add TFGPT2ForSequenceClassification based on DialogRPT #8714

Add TFGPT2ForSequenceClassification based on DialogRPT #8714

Conversation

spatil6 commented Nov 22, 2020

What does this PR do?

Before submitting

Who can review?

jplu left a comment • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu Dec 1, 2020 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spatil6 commented Nov 25, 2020

spatil6 commented Nov 27, 2020

spatil6 commented Nov 27, 2020

jplu left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

spatil6 commented Dec 2, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jplu commented Dec 4, 2020

spatil6 commented Dec 4, 2020

LysandreJik left a comment

Choose a reason for hiding this comment

jplu left a comment

Choose a reason for hiding this comment

jplu left a comment •

edited

jplu Dec 1, 2020 •

edited