Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

why the pooled_output just use first token to represent the whole sentence? #196

Open
shcup opened this issue Nov 29, 2018 · 9 comments
Open

Comments

@shcup
Copy link

shcup commented Nov 29, 2018

No description provided.

@hanxiao
Copy link

hanxiao commented Nov 29, 2018

because the first token is [CLS] which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning, [CLS] aka the first token can be a meaningful representation of the whole sentence.

If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular, [CLS] isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies

@hanxiao
Copy link

hanxiao commented Dec 7, 2018

btw here is a visualization may help you understand different BERT layers: https://github.com/hanxiao/bert-as-service#q-so-which-layer-and-which-pooling-strategy-is-the-best

@hitxujian
Copy link

because the first token is [CLS] which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning, [CLS] aka the first token can be a meaningful representation of the whole sentence.

If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular, [CLS] isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies

why you say after fine-tuning, [CLS] aka the first token represents the whole sentence? why can't represent before fine-tune

@Traeyee
Copy link

Traeyee commented Mar 18, 2019

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

@KavyaGujjala
Copy link

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?

I want sentence representation for my downstream tasks.

Any idea on how to do this?

@Traeyee
Copy link

Traeyee commented Mar 28, 2019

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?

I want sentence representation for my downstream tasks.

Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"

python extract_features.py
--input_file=./tmp.txt
--output_file=./tmp.jsonl
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--layers=-1,-2,-3,-4
--max_seq_length=128
--batch_size=8

modify the BERT_BASE_DIR to your new model path

@KavyaGujjala
Copy link

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?
I want sentence representation for my downstream tasks.
Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"

python extract_features.py
--input_file=./tmp.txt
--output_file=./tmp.jsonl
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--layers=-1,-2,-3,-4
--max_seq_length=128
--batch_size=8

modify the BERT_BASE_DIR to your new model path

Thanks a lot!!

Have you trained a model and got sentence representations?
How good was the output?
Because I have read that [CLS] token is better after fine tuning model.

@Traeyee
Copy link

Traeyee commented Mar 28, 2019

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?
I want sentence representation for my downstream tasks.
Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"
python extract_features.py
--input_file=./tmp.txt
--output_file=./tmp.jsonl
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--layers=-1,-2,-3,-4
--max_seq_length=128
--batch_size=8
modify the BERT_BASE_DIR to your new model path

Thanks a lot!!

Have you trained a model and got sentence representations?
How good was the output?
Because I have read that [CLS] token is better after fine tuning model.

Not yet, but many people have used this as a basic step on their own work

@chikubee
Copy link

chikubee commented Aug 14, 2019

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hey can you explain it a little more so as to how it is capturing the entire sentence's meaning.
I fine tuned BERT uncased small model for text classification task,

I wanted to use the representation I have from the last layer of the [CLS] token to understand the False Positives.
For instance, I thought if I could see the similar representations from the training set, will give me some insight of the wrong results.
But the topk similar representations I get, are not really similar.

Everywhere it is mentioned that CLS token representation works for the fine tuned task.
Works for my task, the accuracy is good.
But while interpreting the similar sentences, the story is otherwise.

What do you think?
Thanks in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants