Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

the transformer to be applied to classification #18

Closed
hongjianyuan opened this issue Jul 27, 2020 · 9 comments
Closed

the transformer to be applied to classification #18

hongjianyuan opened this issue Jul 27, 2020 · 9 comments

Comments

@hongjianyuan
Copy link

How should I change the transformer to be applied to classification, such as seq2seq (many to many), how should I change it in the last layer of the model

@maxjcohen
Copy link
Owner

Hi, I believe the most straight forward solution would be to keep the original architecture, and only change the output module. Currently, I have a linear transformation followed by a sigmoid activation, I would start by simply replacing the activation with a softmax, and see from there.

@hongjianyuan
Copy link
Author

I currently want to input 250 features, segment them, and output the categories of these 250 features, so I just need to change the output module to softmax?

@maxjcohen
Copy link
Owner

Yes, set d_input=250, d_ouptut to the number of class, and replace the sigmoid by a softmax, you should have a functional segmentation algorithm.

@hongjianyuan
Copy link
Author

Thank you very much

@hongjianyuan
Copy link
Author

是的,设置d_input=250d_ouptut上课的人数,并通过SOFTMAX更换乙状结肠,你应该有一个功能分割算法。

If it is the category of these 250 features, the output is like 250*4

@MJimitater
Copy link

Hi @maxjcohen , thanks for your great repo!

Is it possible to change the transformer to understand sequence classification (many-to-one)?

@maxjcohen
Copy link
Owner

Hi, nothing is stopping you from setting d_output = 1, in order for the Transformer to behave as a many-to-one model. In practice, every hidden state will be computed with a dimension d_model, and later aggregated in the last layer to output a single value. Note that this process in different from how traditional architectures, such as RNN based networks, handle many-to-one predictions.

@MJimitater
Copy link

Thank you for your reply @maxjcohen ! How exactly do you mean its different? From the way a RNN-model would take hidden states as further input?

@maxjcohen
Copy link
Owner

RNN carry a memory-like hidden state across time steps, while the Transformer has no notion of memory and compute time steps in parallel instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants