-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
What to change in secondary_code argument for mode --sentiment if I want to generate negative sentiment text? #1
Comments
One more thing I wanna ask what is the difference between gen_type --gpt-2 CCLM and gen_type --gedi? Both looks similar as both are conditioned on secondary_code and mode like sentiment,detoxify or topic.. Thanks! |
My last question is - If I wanna train GeDi on my own data , I have to train whole network or just last layer is enough to train to learn embeddings of additional tokens? |
Hi! To answer your questions:
To get negative sentiment,
|
@benkrause Thanks for your valuable responses. I wanna ask one more thing from my last question.. How should I make my labelled dataset because as per your default dataset of AG News, there is four topics and for each sentence one topic is assigned, I also have to make dataset of four topics or I can have more or less than four topics? If I can change number of topics, then which python file or script I should update to change number of topics of a dataset? Another thing I wanna ask, what is the need of second column in train and test files of AG News as second column sentences are of length 4 to 5 words which I couldn't understand why is it necessary to have? One last question I wanna ask how can I label each sentence to specific topic as I have just preprocessed text file of sentences? Till now , I have applied LDA to classify sentences but for each sentence I didn't get broad topics like politics,crime or sports instead I am getting set to topics for each sentence.. Thanks! |
The second column of AG news is just the article titles, we don't actually use these. Our scripts only process the first and third columns. It assumes the topic labels are in the first column (and start at 1), and the text is in the third column. If you want to train on your own topic dataset with minimal changes, first set up new csv files in the same format as the AG news train and test csv files. So topic label IDs in the first column, second column can be blank since we ignore it anyway, third column has text.
Once you have replaced the AG-news train and test csv files with your own, you can process them into a dataset suitable for GeDi with As for your last question on how to use unlabeled data, that is something we haven't explored yet, all our experiments so far have used labeled datasets. I will mention that GeDi can often generate to topics it hasn't seen during training. For instance, if you run our topic GeDi trained on AG-news (which was trained on "world", "sports", "business" and "science"), and give it a secondary code of "crime", depending on the prompt, it should sometimes be able to generate text relating to crime. Hope this helps! |
@benkrause Thanks for your valuable response, it really helped me a lot. |
@akhileshgotmare I was trying to generate negative sentiment text instead of default one..How can I do that?
Thanks!
The text was updated successfully, but these errors were encountered: