-
-
Notifications
You must be signed in to change notification settings - Fork 985
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat - add index study_id column on trials table #4449
feat - add index study_id column on trials table #4449
Conversation
Thank you for your pull request!
|
Okay, I'll try it and share it with you. |
7d64fa7
to
3660246
Compare
Codecov Report
📣 This organization is not using Codecov’s GitHub App Integration. We recommend you install it so Codecov can continue to function properly for your repositories. Learn more @@ Coverage Diff @@
## master #4449 +/- ##
==========================================
+ Coverage 89.68% 90.33% +0.65%
==========================================
Files 178 184 +6
Lines 13974 14099 +125
==========================================
+ Hits 12532 12736 +204
+ Misses 1442 1363 -79
... and 24 files with indirect coverage changes 📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more |
If I were to run a large number of experiments per HPO, I would expect the |
This pull request has not seen any recent activity. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late review. I'll review this PR today. Let me leave an early feedback for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the quick update! I left a minor suggestion.
Regarding the migration script, it looks good to me. The backward migration does not work in MySQL, but is acceptable since Optuna does not actually provide an API for backward migration to users.
I am checking for performance gains with this change using the following script.
https://gist.github.com/c-bata/c08fb89a583adbcdc3eddcf8cf192c1a
- In MySQL, the performance improvement could not be found because the index already exists due to foreign key constraints.
- In SQLite3, as you can see the following comment, no significant difference seems to exist with 50 studies (100 trials per each).
https://gist.github.com/c-bata/c08fb89a583adbcdc3eddcf8cf192c1a?permalink_comment_id=4500763#gistcomment-4500763
I will verify the performance on PostgreSQL tomorrow with more study and trial records.
I have confirmed with PostgreSQL that this change improves performance. I will approve this PR after my suggestion is reflected 👍 Benchmarking on PostgreSQLHere is a benchmark script and its result. Benchmark script: https://gist.github.com/c-bata/98532a60609a8a5f9e1e4dd162d45886 Before (master):
After (This PR):
According to the slow queries, a composite index of optuna-e2e sciriptsFor the second reviewer, let me share the optuna-e2e branch I used to check the migration script.
|
@Alnusjaponica Could you review this PR, please? |
Co-authored-by: Masashi Shibata <c-bata@users.noreply.github.com>
3385564
to
f551a89
Compare
f551a89
to
352a88d
Compare
@Ilevk Thank you for your contribution. As we discussed offline, @c-bata found out that this change might not have large affect on the performance (he'll share the data presently) and wonders what made this change affect the performance drastically in your environment. Could you provide us some information about what kind of job you're running or any reproducible codes? |
@Alnusjaponica We train about 5,000 models every day, and we run 5-7 HPOs per model. We have about 100 training instances. If you have a lot of connections happening at the same time, you seem to have the same problem as me. |
@Ilevk Thank you for your swift response. Could you also share which sampler you used and the number of trials per study? |
@c-bata we use a TPESampler & 5 ~ 7 trials each. almost 5 trials. |
There is one more bottleneck in our environment. Currently, RDBStorage creates engines internally with create_engine, and when accessed by many instances at the same time, the previously created engines & connections are not cleaned up and delay the next experiment. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM! I could see a clear performance improvement in the following scenario.
https://gist.github.com/c-bata/98532a60609a8a5f9e1e4dd162d45886
Before
optuna=# \d trials;
Table "public.trials"
Column | Type | Collation | Nullable | Default
-------------------+-----------------------------+-----------+----------+------------------------------------------
trial_id | integer | | not null | nextval('trials_trial_id_seq'::regclass)
number | integer | | |
study_id | integer | | |
state | trialstate | | not null |
datetime_start | timestamp without time zone | | |
datetime_complete | timestamp without time zone | | |
Indexes:
"trials_pkey" PRIMARY KEY, btree (trial_id)
Foreign-key constraints:
"trials_study_id_fkey" FOREIGN KEY (study_id) REFERENCES studies(study_id)
Referenced by:
TABLE "trial_heartbeats" CONSTRAINT "trial_heartbeats_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_intermediate_values" CONSTRAINT "trial_intermediate_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_params" CONSTRAINT "trial_params_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_system_attributes" CONSTRAINT "trial_system_attributes_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_user_attributes" CONSTRAINT "trial_user_attributes_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_values" CONSTRAINT "trial_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
# python profiler.py
Elapsed: 157.2042s (n_trials=500 n_params=10)
Sort by Total:
Total Time(s) Query Count Statement
43.5951 11000 SELECT trials.trial_id AS trials_trial_id
FROM trials
WHERE trials.study_id = %(study_id_1)s
40.9885 10000 SELECT trial_params.param_id AS trial_params_param_id, trial_params.trial_id AS trial_params_trial_id, trial_params.param_name AS trial_params_param_name, trial_params.param_value AS trial_params_param_value, trial_params.distribution_json AS trial_params_distribution_json
FROM trial_params JOIN trials ON trials.trial_id = trial_params.trial_id
WHERE trials.study_id = %(study_id_1)s AND trial_params.param_name = %(param_name_1)s
LIMIT %(param_1)s
4.7336 21000 SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete
FROM trials
WHERE trials.trial_id = %(trial_id_1)s
4.5883 1000 SELECT count(trials.trial_id) AS count_1
FROM trials
WHERE trials.study_id = %(study_id_1)s AND trials.trial_id < %(trial_id_1)s
3.5560 11100 SELECT studies.study_id AS studies_study_id, studies.study_name AS studies_study_name
FROM studies
WHERE studies.study_id = %(study_id_1)s
After
optuna=# create index ix_trials_study_id on trials(study_id);
CREATE INDEX
optuna=# \d trials;
Table "public.trials"
Column | Type | Collation | Nullable | Default
-------------------+-----------------------------+-----------+----------+------------------------------------------
trial_id | integer | | not null | nextval('trials_trial_id_seq'::regclass)
number | integer | | |
study_id | integer | | |
state | trialstate | | not null |
datetime_start | timestamp without time zone | | |
datetime_complete | timestamp without time zone | | |
Indexes:
"trials_pkey" PRIMARY KEY, btree (trial_id)
"ix_trials_study_id" btree (study_id)
Foreign-key constraints:
"trials_study_id_fkey" FOREIGN KEY (study_id) REFERENCES studies(study_id)
Referenced by:
TABLE "trial_heartbeats" CONSTRAINT "trial_heartbeats_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_intermediate_values" CONSTRAINT "trial_intermediate_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_params" CONSTRAINT "trial_params_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_system_attributes" CONSTRAINT "trial_system_attributes_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_user_attributes" CONSTRAINT "trial_user_attributes_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
TABLE "trial_values" CONSTRAINT "trial_values_trial_id_fkey" FOREIGN KEY (trial_id) REFERENCES trials(trial_id)
# python profiler.py
Elapsed: 66.0051s (n_trials=500 n_params=10)
Sort by Total:
Total Time(s) Query Count Statement
4.3927 21000 SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete
FROM trials
WHERE trials.trial_id = %(trial_id_1)s
3.0978 10000 SELECT trial_params.param_id AS trial_params_param_id, trial_params.trial_id AS trial_params_trial_id, trial_params.param_name AS trial_params_param_name, trial_params.param_value AS trial_params_param_value, trial_params.distribution_json AS trial_params_distribution_json
FROM trial_params JOIN trials ON trials.trial_id = trial_params.trial_id
WHERE trials.study_id = %(study_id_1)s AND trial_params.param_name = %(param_name_1)s
LIMIT %(param_1)s
2.9455 11100 SELECT studies.study_id AS studies_study_id, studies.study_name AS studies_study_name
FROM studies
WHERE studies.study_id = %(study_id_1)s
2.3931 10000 INSERT INTO trial_params (trial_id, param_name, param_value, distribution_json) VALUES (%(trial_id)s, %(param_name)s, %(param_value)s, %(distribution_json)s) RETURNING trial_params.param_id
1.8892 11000 SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete
FROM trials
WHERE trials.trial_id IN (NULL) AND (1 != 1) AND trials.study_id = %(study_id_1)s ORDER BY trials.trial_id
@Ilevk Thank you for sharing. The connection objects are basically cleaned up when the reference count of RDBStorage reaches zero, but if there are any connections left, they can be explicitly cleaned up as follows. storage = optuna.storages.RDBStorage(storage_url)
study = optuna.create_study(storage=storage)
study.optimize(objective, ...)
# Explicitly clean up connections
storage.engine.dispose() If you find any problems with the handling of connection objects in Optuna, please report them to us. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my delayed reply. I also run optimizations in the same scenario with two different versions and confirmed that it is the newer one is about twice as fast as the older one. LGTM.
Motivation
I have experienced bottleneck in using rdbstorage with access more than 100 sessions simultaneously.
contribute #4444
Description of the changes
study_id
on trials table