Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reduce search space for AllenNLP example. #1542

Merged
merged 9 commits into from
Jul 31, 2020
Merged

Reduce search space for AllenNLP example. #1542

merged 9 commits into from
Jul 31, 2020

Conversation

himkt
Copy link
Member

@himkt himkt commented Jul 21, 2020

Motivation

Current allennlp_simple.py may be computationally extensive and it sometimes causes a timeout.
https://github.com/optuna/optuna/actions/runs/175854473

Description of the changes

In this PR, I reduce the hyperparameter search space for the model.
Additionally, I extracted the dataset reader from the script and created the module because I'll also use it in allennlp_jsonnet.py.

I tested this new example on GitHub Actions to confirm that it reduced the execution time for each trial.
https://github.com/himkt/optuna/runs/894041267

optimizer = torch.optim.SGD(model.parameters(), lr=lr)

data_loader = torch.utils.data.DataLoader(
train_dataset, batch_size=64, collate_fn=allennlp.data.allennlp_collate
train_dataset, batch_size=16, collate_fn=allennlp.data.allennlp_collate
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed batch size, as I found small batch size improved accuracy.

@@ -105,11 +100,11 @@ def objective(trial):
if DEVICE > -1:
model.to(torch.device("cuda:{}".format(DEVICE)))

lr = trial.suggest_float("lr", 1e-5, 1e-1, log=True)
lr = trial.suggest_float("lr", 1e-2, 1e-1, log=True)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I increased the lower bound of learning rate, as I found that model training didn't succeed at all.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

74c8a23 1e-2 -> 1e-3

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good 👍

@@ -123,7 +118,7 @@ def objective(trial):
validation_data_loader=validation_data_loader,
validation_metric="+" + TARGET_METRIC,
patience=None, # `patience=None` since it could conflict with AllenNLPPruningCallback
num_epochs=50,
num_epochs=30,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I decreased the number of epochs for reducing the execution time for each trial.

examples/allennlp/allennlp_simple.py Show resolved Hide resolved
@codecov-commenter
Copy link

Codecov Report

Merging #1542 into master will not change coverage.
The diff coverage is n/a.

Impacted file tree graph

@@           Coverage Diff           @@
##           master    #1542   +/-   ##
=======================================
  Coverage   89.15%   89.15%           
=======================================
  Files         104      104           
  Lines        7891     7891           
=======================================
  Hits         7035     7035           
  Misses        856      856           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4f2fa9c...b35111f. Read the comment docs.

@toshihikoyanase toshihikoyanase self-assigned this Jul 22, 2020
Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your PR. I confirmed that the trials finished in ten minutes, and it basically looks good to me except for a typo of the class name.



@DatasetReader.register("subsample")
class SubsampledDatasetReader(allennlp.data.dataset_readers.TextClassificationJsonReader):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This class is called SubsampleDatasetReader in the allennlp_simple.py. Could you tell me which is correct?

Suggested change
class SubsampledDatasetReader(allennlp.data.dataset_readers.TextClassificationJsonReader):
class SubsampleDatasetReader(allennlp.data.dataset_readers.TextClassificationJsonReader):

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the review.
Initially, I create SubSampledDataset, which describes a reader for a subsampled dataset.
But I renamed it to SubsampleDataset since I could describe this class as a sub-sample dataset.
(And I thought it would be better to use only present tense words in the filenames if it was possible.)

I'm not confident for this convention, so I'd like to get your feedback.
Do you think SubsampledDatasetReader is better?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think either one is OK because I found both use cases.

I prefer SubsampleDatasetReader for its simplicity, but it is not a strong opinion.

Copy link
Member Author

@himkt himkt Jul 22, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, so let me leave the dataset reader name as is.

islice ... It can be useful if the dataset is too large to fit in memory.
Thank you for pointing out, you're right!
I gave it try to use islice in 4437f48 and found the dataset is quite balanced.

train: Counter({'0': 1003, '1': 997})
validation: Counter({'1': 489, '0': 511})

I also ran HPO on GPU to confirm training could go well.

python allennlp_simple.py
....
[I 2020-07-22 13:38:58,896] Trial 21 finished with value: 0.704 and parameters: {'dropout': 0.18415084463564746, 'embedding_dim': 33, 'output_dim': 69, 'max_filter_size': 4, 'num_filters': 55, 'lr': 0.09093981266227305}. Best is trial 12 with value: 0.725.
Number of finished trials:  22
Best trial:
  Value:  0.725
  Params:
    dropout: 0.31246648830863727
    embedding_dim: 22
    output_dim: 98
    max_filter_size: 4
    num_filters: 38
    lr: 0.09838055114180111

So I determined to remove train_test_split and simply use islice.

Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! The changes look basically good to me. I have a minor comment.

examples/allennlp/subsample_dataset_reader.py Show resolved Hide resolved
Copy link
Member

@HideakiImamura HideakiImamura left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Copy link
Member

@toshihikoyanase toshihikoyanase left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thank you for your contribution.

@toshihikoyanase toshihikoyanase added this to the v2.1.0 milestone Jul 31, 2020
@toshihikoyanase toshihikoyanase merged commit 02651af into optuna:master Jul 31, 2020
@himkt himkt deleted the patch/reduce-allennlp-simple-search-space branch July 31, 2020 11:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants