Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Running the Models on Text Data #5

Closed
jcui1224 opened this issue Oct 11, 2020 · 2 comments
Closed

Running the Models on Text Data #5

jcui1224 opened this issue Oct 11, 2020 · 2 comments

Comments

@jcui1224
Copy link

Hello,
I cannot find the code in the repo to reproduce the NLP data results. I follow your instructions to preprocess the data and get the BERT embeddings. But it seems that the training command is missing and the code train_semisup_flowgmm_tabular.py got some unexpected error.
I tried to adapt the code and command for tabular data to run Ag-News. It runs but the accuracy is quite low (~0.3). I guess I must have missed something important.

Could you please share the code or provide some guidance to reproduce the NLP data results?

I really appreciate your help and time!

@mfinzi
Copy link
Collaborator

mfinzi commented Oct 12, 2020

Hi Jiali,
Thanks for taking an interest in FlowGMM! Sorry about disorganized state of the code for the NLP datasets. We'll host the preprocessed version of the datasets online so as to avoid pitfalls and inconveniences in going through these steps manually in the next couple of days.

The command for for training FlowGMM on YAHOO answers is python flowgmm_tabular_new.py --trainer_config "{'unlab_weight':.2}" --net_config "{'k':1024,'coupling_layers':7,'nperlayer':1}" --network RealNVPTabularWPrior --trainer SemiFlow --num_epochs 200 --dataset YAHOO --lr 3e-4 --train 800

Likewise for AG-NEWS the command is python flowgmm_tabular_new.py --trainer_config "{'unlab_weight':.6}" --net_config "{'k':1024,'coupling_layers':7,'nperlayer':1}" --network RealNVPTabularWPrior --trainer SemiFlow --num_epochs 100 --dataset AG_News --lr 3e-4 --train 200

Stay tuned for an update in the next couple of days for automatic dataset download like for the other 2 tabular datasets.
Cheers!

@mfinzi
Copy link
Collaborator

mfinzi commented Oct 13, 2020

We've updated the repo and you now should be good to go running the models following the instructions in the readme. After pip installing, when you run the above commands the datasets will be downloaded and you should get similar numbers to the table.

@mfinzi mfinzi closed this as completed Oct 14, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants