why the pooled_output just use first token to represent the whole sentence? #196

shcup · 2018-11-29T01:20:45Z

No description provided.

hanxiao · 2018-11-29T01:54:38Z

because the first token is [CLS] which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning, [CLS] aka the first token can be a meaningful representation of the whole sentence.

If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular, [CLS] isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies

hanxiao · 2018-12-07T13:00:04Z

btw here is a visualization may help you understand different BERT layers: https://github.com/hanxiao/bert-as-service#q-so-which-layer-and-which-pooling-strategy-is-the-best

hitxujian · 2018-12-12T11:55:58Z

because the first token is [CLS] which is designed to be there, and is later fine-tuned on the downstream task. Only after fine-tuning, [CLS] aka the first token can be a meaningful representation of the whole sentence.

If you are interested in using (pretrained/fine-tuned) BERT for sentence encoding, please refer to my repo: https://github.com/hanxiao/bert-as-service and in particular, [CLS] isn't the only way to represent the sentence, please refer to this answer: https://github.com/hanxiao/bert-as-service#q-what-are-the-available-pooling-strategies

why you say after fine-tuning, [CLS] aka the first token represents the whole sentence? why can't represent before fine-tune

Traeyee · 2019-03-18T05:59:09Z

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

KavyaGujjala · 2019-03-27T10:20:33Z

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?

I want sentence representation for my downstream tasks.

Any idea on how to do this?

Traeyee · 2019-03-28T03:38:16Z

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?

I want sentence representation for my downstream tasks.

Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"

python extract_features.py
--input_file=./tmp.txt
--output_file=./tmp.jsonl
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--layers=-1,-2,-3,-4
--max_seq_length=128
--batch_size=8

modify the BERT_BASE_DIR to your new model path

KavyaGujjala · 2019-03-28T14:56:35Z

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?
I want sentence representation for my downstream tasks.
Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"

python extract_features.py
--input_file=./tmp.txt
--output_file=./tmp.jsonl
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--layers=-1,-2,-3,-4
--max_seq_length=128
--batch_size=8

modify the BERT_BASE_DIR to your new model path

Thanks a lot!!

Have you trained a model and got sentence representations?
How good was the output?
Because I have read that [CLS] token is better after fine tuning model.

Traeyee · 2019-03-28T16:17:55Z

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hi,
How to get that [CLS] representation after using run_pretraining.py code for domain specific text?
I want sentence representation for my downstream tasks.
Any idea on how to do this?

BERT_BASE_DIR="/home/cuiyi/repos/bert/model/chinese_L-12_H-768_A-12"
python extract_features.py
--input_file=./tmp.txt
--output_file=./tmp.jsonl
--vocab_file=$BERT_BASE_DIR/vocab.txt
--bert_config_file=$BERT_BASE_DIR/bert_config.json
--init_checkpoint=$BERT_BASE_DIR/bert_model.ckpt
--layers=-1,-2,-3,-4
--max_seq_length=128
--batch_size=8
modify the BERT_BASE_DIR to your new model path

Thanks a lot!!

Have you trained a model and got sentence representations?
How good was the output?
Because I have read that [CLS] token is better after fine tuning model.

Not yet, but many people have used this as a basic step on their own work

chikubee · 2019-08-14T06:06:32Z

Because BERT is bidirectional, the [CLS] is encoded including all representative information of all tokens through the multi-layer encoding procedure. The representation of [CLS] is individual in different sentences.

Hey can you explain it a little more so as to how it is capturing the entire sentence's meaning.
I fine tuned BERT uncased small model for text classification task,

I wanted to use the representation I have from the last layer of the [CLS] token to understand the False Positives.
For instance, I thought if I could see the similar representations from the training set, will give me some insight of the wrong results.
But the topk similar representations I get, are not really similar.

Everywhere it is mentioned that CLS token representation works for the fine tuned task.
Works for my task, the accuracy is good.
But while interpreting the similar sentences, the story is otherwise.

What do you think?
Thanks in advance

astariul mentioned this issue Dec 2, 2018

How to use a pre-trained model with additional custom features #201

Open

bilal2vec mentioned this issue Nov 27, 2019

word or sentence embedding from BERT model huggingface/transformers#1950

Closed

rzepinskip mentioned this issue Apr 8, 2020

BERT output rzepinskip/spoiler-detection#10

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

why the pooled_output just use first token to represent the whole sentence? #196

why the pooled_output just use first token to represent the whole sentence? #196

shcup commented Nov 29, 2018

hanxiao commented Nov 29, 2018

hanxiao commented Dec 7, 2018

hitxujian commented Dec 12, 2018

Traeyee commented Mar 18, 2019

KavyaGujjala commented Mar 27, 2019

Traeyee commented Mar 28, 2019

KavyaGujjala commented Mar 28, 2019

Traeyee commented Mar 28, 2019

chikubee commented Aug 14, 2019 •

edited

Loading

why the pooled_output just use first token to represent the whole sentence? #196

why the pooled_output just use first token to represent the whole sentence? #196

Comments

shcup commented Nov 29, 2018

hanxiao commented Nov 29, 2018

hanxiao commented Dec 7, 2018

hitxujian commented Dec 12, 2018

Traeyee commented Mar 18, 2019

KavyaGujjala commented Mar 27, 2019

Traeyee commented Mar 28, 2019

KavyaGujjala commented Mar 28, 2019

Traeyee commented Mar 28, 2019

chikubee commented Aug 14, 2019 • edited Loading

chikubee commented Aug 14, 2019 •

edited

Loading