New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Error on GCP cluster using sqlite (sqlite3.OperationalError: too many SQL variables) #1457
Comments
Thanks for reporting this issue. I guess this could happen for any dialect and not just SQLite and should be fixed by Optuna. Digging into the cause for SQLite, it's likely related to the following, also explaining why you observe this at around 1000 trials.
https://www.sqlite.org/limits.html Until it's properly fixed by Optuna. There are a couple of workaround you could try.
|
For the record, #1426 should properly fix this issue but as noted in the PR, it has some issues. |
I somehow fail to reproduce your error. Can you provide more details on the failure such as your environment including the version of sqlite3, sqlalchemy, optuna (I assume it's v1.5.0) and maybe your script for running the code. I'd imagine that a hot-fix could look something like the following, #1467. Do you happen to be able to test it? |
Thanks I will try the solutions posted above (upgrade sqlite, mysql, cached storage). My current setup is quite computationally expensive (each objective function requires 15 min on 96 cores), but I can try to set up a simpler example to try out the hot-fix. Here's some info on the environment: This is in CentOS 7 virtual machines with a controller node where script runs and jobs are submitted; jobs run on compute nodes with NFS shared home folder with controller. Python 3.6.8
the script is not in a public repo, but the optuna calls are here: https://github.com/Neurosim-lab/netpyne/blob/optuna/netpyne/batch/optuna_parallel.py . I use a bash script to submit multiple instances with screen:
|
FYI, I am unable to reproduce the error on my laptop (Mac OS) using the same setup (50 processes and >1000 trials) but with fast objective func. So might be due to the HPC setup (CentOS, etc) or maybe because it's using NFS, ... I did reproduce the same issue twice now on the HPC after 1000 trials. |
Switched to using mysql (after installing required libraries) but getting the error below... any idea how to fix? thx
|
I'm trying your hot fix on the HPC with the full model. |
The hot fix seems to be working as it's already on trial 4490... thanks! |
Thanks for the detailed report, and great, maybe we can proceed with the hot fix. I also reproduced the issue/verified the fix by building sqlite3 from source with a lower
Note: |
Closing this issue as the fix #1467 has been merged. @salvadord, feel free to reopen as needed. |
I'm actually still getting this issue after around trial 1000; same setup as above (CentOS), but with optuna updated to v2.1: In [4]: sqlite3.version In [5]: sqlite3.sqlite_version In [10]: sqlalchemy.version In [14]: optuna.version Error: @hvy can you please confirm the hot fix is working ok in the main branch? thanks |
@hvy I tried reopening this issue, but don't think I have permissions to do this. |
Wondering why it's not caught in https://github.com/optuna/optuna/blob/master/optuna/storages/_rdb/storage.py#L979? Could you give repro steps, more context and possibly the full stack trace? |
The setup is the same described above (#1457 (comment)). I'll try to reproduce with simpler steps if possible. thx |
I was able to reproduce the issue with simpler code and figured out it was due to having the following line (one of the solutions suggested above a while ago): |
Thanks for the update. Also, I was mistaking your previous log as if an error was raised to the user code but taking a proper look I now see that the error is caught inside |
I am running optuna to optimize large-scale detailed models of cortical circuits on Google Cloud supercomputers. I integrated optuna within our modeling tool NetPyNE: https://github.com/Neurosim-lab/netpyne/blob/optuna/netpyne/batch/optuna_parallel.py
I was running 50 parallel processes on the controller node of a Google Cloud Platform (GCP) Slurm-based cluster. Each process is an optuna instance (run via 'screen') where the objective function submits a Slurm job to run a cortical simulation on 96-core compute job. Optuna was set up so the 50 processes exchange info via an sqlite db file:
At around trial 1000 I got the error
sqlite3.OperationalError: too many SQL variables
on all processes (see detailed error below).I know that the doc suggests using PostgreSQL or MySQL for large distributed optimization, particularly if using NFS drives (as do the compute nodes in the cluster). Unfortunately, I have no idea how to replace SQLite with PostgresSQL or MySQL. My question is, do you have any examples of how to set that up? Or have any other suggestions on how to fix this error?
Thanks!
Full error:
`[I 2020-06-30 00:53:40,502] Finished trial#996 with value: 409.0361137631617 with parameters: {'EEGain': 1.2190444451070857, 'EIGain': 1.810497756045793, "('IELayerGain', '1-3')": 1.087901880175065, "('IELayerGain', '4')": 1.7309720923711636, "('IELayerGain', '5')": 0.5960648412842655, "('IELayerGain', '6')": 0.7978043150339824, "('IILayerGain', '1-3')": 1.1688903007202631, "('IILayerGain', '4')": 0.672447638702098, "('IILayerGain', '5')": 0.8073360248421282, "('IILayerGain', '6')": 1.8855740649714237, 'thalamoCorticalGain': 1.08285373691784, 'intraThalamicGain': 1.7758770967297166, 'corticoThalamicGain': 1.6566204337609691}. Best is trial#375 with value: 294.2097451307643.
[W 2020-06-30 00:53:40,703] Setting status of trial#1046 as TrialState.FAIL because of the following error: OperationalError('(sqlite3.OperationalError) too many SQL variables',)
Traceback (most recent call last):
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlite3.OperationalError: too many SQL variables
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/study.py", line 734, in _run_trial
result = func(trial)
File "/home/ext_salvadordura_gmail_com/netpyne/netpyne/batch/optuna_parallel.py", line 387, in
study.optimize(lambda trial: objective(trial, args), n_trials=args['maxiters'], timeout=args['maxtime'])
File "/home/ext_salvadordura_gmail_com/netpyne/netpyne/batch/optuna_parallel.py", line 130, in objective
candidate.append(trial.suggest_uniform(str(paramLabel), minVal, maxVal))
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/trial/_trial.py", line 221, in suggest_uniform
return self._suggest(name, distribution)
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/trial/_trial.py", line 650, in _suggest
param_value = self.study.sampler.sample_independent(study, trial, name, distribution)
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/samplers/tpe/sampler.py", line 174, in sample_independent
values, scores = _get_observation_pairs(study, param_name, trial)
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/samplers/tpe/sampler.py", line 618, in _get_observation_pairs
for trial in study.get_trials(deepcopy=False):
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/study.py", line 145, in get_trials
return self._storage.get_all_trials(self._study_id, deepcopy=deepcopy)
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/storages/cached_storage.py", line 363, in get_all_trials
study_id, excluded_trial_ids=study.owned_or_finished_trial_ids
File "/home/ext_salvadordura_gmail_com/.local/lib/python3.6/site-packages/optuna/storages/rdb/storage.py", line 951, in _get_trials
models.TrialModel.study_id == study_id,
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3341, in all
return list(self)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3503, in iter
return self._execute_and_instances(context)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/orm/query.py", line 3528, in _execute_and_instances
result = conn.execute(querycontext.statement, self._params)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1014, in execute
return meth(self, multiparams, params)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/sql/elements.py", line 298, in _execute_on_connection
return connection._execute_clauseelement(self, multiparams, params)
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1133, in _execute_clauseelement
distilled_params,
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1318, in execute_context
e, statement, parameters, cursor, context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1512, in handle_dbapi_exception
sqlalchemy_exception, with_traceback=exc_info[2], from=e
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/util/compat.py", line 178, in raise
raise exception
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/base.py", line 1278, in _execute_context
cursor, statement, parameters, context
File "/usr/local/lib64/python3.6/site-packages/sqlalchemy/engine/default.py", line 593, in do_execute
cursor.execute(statement, parameters)
sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) too many SQL variables
[SQL: SELECT trials.trial_id AS trials_trial_id, trials.number AS trials_number, trials.study_id AS trials_study_id, trials.state AS trials_state, trials.value AS trials_value, trials.datetime_start AS trials_datetime_start, trials.datetime_complete AS trials_datetime_complete
FROM trials
WHERE trials.trial_id NOT IN (?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?, ?) AND trials.study_id = ?]
[parameters: (1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90, 91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113, 114, 115, 116, 117, 118, 119, 120, 121, 122, 123, 124, 125, 126, 127, 128, 129, 130, 131, 132, 133, 134, 135, 136, 137, 138, 139, 140, 141, 142, 143, 144, 145, 146, 147, 148, 149, 150, 151, 152, 153, 154, 155, 156, 157, 158, 159, 160, 161, 162, 163, 164, 165, 166, 167, 168, 169, 170, 171, 172, 173, 174, 175, 176, 177, 178, 179, 180, 181, 182, 183, 184, 185, 186, 187, 188, 189, 190, 191, 192, 193, 194, 195, 196, 197, 198, 199, 200, 201, 202, 203, 204, 205, 206, 207, 208, 209, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219, 220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231, 232, 233, 234, 235, 236, 237, 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250, 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263, 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289, 290, 291, 292, 293, 294, 295, 296, 297, 298, 299, 300, 301, 302, 303, 304, 305, 306, 307, 308, 309, 310, 311, 312, 313, 314, 315, 316, 317, 318, 319, 320, 321, 322, 323, 324, 325, 326, 327, 328, 329, 330, 331, 332, 333, 334, 335, 336, 337, 338, 339, 340, 341, 342, 343, 344, 345, 346, 347, 348, 349, 350, 351, 352, 353, 354, 355, 356, 357, 358, 359, 360, 361, 362, 363, 364, 365, 366, 367, 368, 369, 370, 371, 372, 373, 374, 375, 376, 377, 378, 379, 380, 381, 382, 383, 384, 385, 386, 387, 388, 389, 390, 391, 392, 393, 394, 395, 396, 397, 398, 399, 400, 401, 402, 403, 404, 405, 406, 407, 408, 409, 410, 411, 412, 413, 414, 415, 416, 417, 418, 419, 420, 421, 422, 423, 424, 425, 426, 427, 428, 429, 430, 431, 432, 433, 434, 435, 436, 437, 438, 439, 440, 441, 442, 443, 444, 445, 446, 447, 448, 449, 450, 451, 452, 453, 454, 455, 456, 457, 458, 459, 460, 461, 462, 463, 464, 465, 466, 467, 468, 469, 470, 471, 472, 473, 474, 475, 476, 477, 478, 479, 480, 481, 482, 483, 484, 485, 486, 487, 488, 489, 490, 491, 492, 493, 494, 495, 496, 497, 498, 499, 500, 501, 502, 503, 504, 505, 506, 507, 508, 509, 510, 511, 512, 513, 514, 515, 516, 517, 518, 519, 520, 521, 522, 523, 524, 525, 526, 527, 528, 529, 530, 531, 532, 533, 534, 535, 536, 537, 538, 539, 540, 541, 542, 543, 544, 545, 546, 547, 548, 549, 550, 551, 552, 553, 554, 555, 556, 557, 558, 559, 560, 561, 562, 563, 564, 565, 566, 567, 568, 569, 570, 571, 572, 573, 574, 575, 576, 577, 578, 579, 580, 581, 582, 583, 584, 585, 586, 587, 588, 589, 590, 591, 592, 593, 594, 595, 596, 597, 598, 599, 600, 601, 602, 603, 604, 605, 606, 607, 608, 609, 610, 611, 612, 613, 614, 615, 616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 626, 627, 628, 629, 630, 631, 632, 633, 634, 635, 636, 637, 638, 639, 640, 641, 642, 643, 644, 645, 646, 647, 648, 649, 650, 651, 652, 653, 654, 655, 656, 657, 658, 659, 660, 661, 662, 663, 664, 665, 666, 667, 668, 669, 670, 671, 672, 673, 674, 675, 676, 677, 678, 679, 680, 681, 682, 683, 684, 685, 686, 687, 688, 689, 690, 691, 692, 693, 694, 695, 696, 697, 698, 699, 700, 701, 702, 703, 704, 705, 706, 707, 708, 709, 710, 711, 712, 713, 714, 715, 716, 717, 718, 719, 720, 721, 722, 723, 724, 725, 726, 727, 728, 729, 730, 731, 732, 733, 734, 735, 736, 737, 738, 739, 740, 741, 742, 743, 744, 745, 746, 747, 748, 749, 750, 751, 752, 753, 754, 755`
The text was updated successfully, but these errors were encountered: