What does the output of feature-extraction pipeline represent? #4613

orenpapers · 2020-05-27T09:31:46Z

I am using the feature-extraction pipeline:

nlp_fe = pipeline('feature-extraction')
nlp_fe('there is a book on the desk')

As an output I get a list with one element - that is a list with 9 elements - that is a list of 768 features (floats).
What is the output represent? What is every element of the lists, and what is the meaning of the 768 float values?
Thanks

The text was updated successfully, but these errors were encountered:

Abhishek-Rnjn · 2020-05-27T16:46:45Z

They are embeddings generated from the model. (Bert -Base Model I guess. cause it has a hidden representation of 768 dim). You get 9 elements:- one contextual embedding for each word in your sequence. These values of embeddings represent some hidden features that are not easy to interpret.

orenpapers · 2020-05-28T13:01:59Z

So the pipeline will just return the last layer encoding of Bert?
So what is the differance with a code like

input_ids = torch.tensor(bert_tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  
outputs = bert_model(input_ids)
hidden_states = outputs[-1][1:]  # The last hidden-state is the first element of the output tuple
layer_hidden_state = hidden_states[n_layer]
return layer_hidden_state

Also, does BERT encoding have similar traits as word2vec? e.g. similar word will be closer, France - Paris = England - London , etc?

stale · 2020-07-27T13:33:56Z

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

merleyc · 2021-04-30T05:34:26Z

So the pipeline will just return the last layer encoding of Bert?
So what is the differance with a code like
input_ids = torch.tensor(bert_tokenizer.encode("Hello, my dog is cute")).unsqueeze(0)  
outputs = bert_model(input_ids)
hidden_states = outputs[-1][1:]  # The last hidden-state is the first element of the output tuple
layer_hidden_state = hidden_states[n_layer]
return layer_hidden_state
Also, does BERT encoding have similar traits as word2vec? e.g. similar word will be closer, France - Paris = England - London , etc?

Hi @orko19,
Did you understand the difference from 'hidden_states' vs. 'feature-extraction pipeline'? I'd like to understand it as well
Thanks!

orenpapers · 2021-05-02T06:28:06Z

@merleyc I do not! Please share if you do :)

allmwh · 2021-05-25T15:38:41Z

The outputs between "last_hidden_state" and "feature-extraction pipeline" are same, you can try by yourself

"feature-extraction pipeline" just helps us do some jobs from tokenize words to embedding

stale bot added the wontfix label Jul 27, 2020

stale bot closed this as completed Aug 4, 2020

carschno mentioned this issue Oct 18, 2022

Parameterize hidden layers for feature-extraction pipeline #19726

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What does the output of feature-extraction pipeline represent? #4613

What does the output of feature-extraction pipeline represent? #4613

orenpapers commented May 27, 2020

Abhishek-Rnjn commented May 27, 2020

orenpapers commented May 28, 2020 •

edited

Loading

stale bot commented Jul 27, 2020

merleyc commented Apr 30, 2021

orenpapers commented May 2, 2021

allmwh commented May 25, 2021

What does the output of feature-extraction pipeline represent? #4613

What does the output of feature-extraction pipeline represent? #4613

Comments

orenpapers commented May 27, 2020

Abhishek-Rnjn commented May 27, 2020

orenpapers commented May 28, 2020 • edited Loading

stale bot commented Jul 27, 2020

merleyc commented Apr 30, 2021

orenpapers commented May 2, 2021

allmwh commented May 25, 2021

orenpapers commented May 28, 2020 •

edited

Loading