Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2.4.0 DB migration script corrupts saved studies over a certain size #2216

Closed
blaney83 opened this issue Jan 17, 2021 · 8 comments
Closed
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.

Comments

@blaney83
Copy link

After update to v2.4.0, the DB migration script for migrating table schema from study DB's <2.4.0 ($ optuna storage upgrade --storage $STORAGE_URL) corrupts sqlite DB's over a certain size (number of trials).

Expected behavior

The migration/update to the table schema works fine for smaller tables, so I would expect it to work on larger tables as well. If not, perhaps a warning before it corrupts the db file.

Environment

  • Optuna version: 2.3.0 -> 2.4.0
  • Python version: 3.8
  • OS: Ubuntu 20.04.1, Linux Kernel v5.4.0-56
  • (Optional) Other libraries and their versions:

Error messages, stack traces, or logs

Traceback (most recent call last):
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1284, in fetchall
    l = self.process_rows(self._fetchall_impl())
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1230, in _fetchall_impl
    return self.cursor.fetchall()
sqlite3.DatabaseError: database disk image is malformed

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 51, in _create_scoped_session
    yield session
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 980, in _get_trials
    session.query(models.TrialModel)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/query.py", line 3373, in all
    return list(self)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 100, in instances
    cursor.close()
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 88, in instances
    post_load.invoke(context, path)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 908, in invoke
    loader(context, path, states, self.load_keys, *arg, **kw)
  File "<string>", line 1, in <lambda>
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 2411, in _load_for_path
    self._load_via_parent(our_states, query_info, q, context)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/strategies.py", line 2469, in _load_via_parent
    for k, v in itertools.groupby(
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 100, in instances
    cursor.close()
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/langhelpers.py", line 68, in __exit__
    compat.raise_(
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/orm/loading.py", line 76, in instances
    fetch = cursor.fetchall()
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1288, in fetchall
    self.connection._handle_dbapi_exception(
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1510, in _handle_dbapi_exception
    util.raise_(
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 182, in raise_
    raise exception
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1284, in fetchall
    l = self.process_rows(self._fetchall_impl())
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/sqlalchemy/engine/result.py", line 1230, in _fetchall_impl
    return self.cursor.fetchall()
sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) database disk image is malformed
(Background on this error at: http://sqlalche.me/e/13/4xp6)

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "optuna_vis.py", line 12, in <module>
    x = optuna.visualization.plot_slice(study)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/visualization/_slice.py", line 80, in plot_slice
    return _get_slice_plot(study, params, target, target_name)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/visualization/_slice.py", line 92, in _get_slice_plot
    trials = [trial for trial in study.trials if trial.state == TrialState.COMPLETE]
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/study.py", line 150, in trials
    return self.get_trials(deepcopy=True, states=None)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/study.py", line 199, in get_trials
    self._storage.read_trials_from_remote_storage(self._study_id)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/storages/_cached_storage.py", line 397, in read_trials_from_remote_storage
    trials = self._backend._get_trials(
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 1015, in _get_trials
    trials = [self._build_frozen_trial_from_trial_model(trial) for trial in trial_models]
  File "/home/<user.name>/anaconda3/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/<user.name>/anaconda3/lib/python3.8/site-packages/optuna/storages/_rdb/storage.py", line 69, in _create_scoped_session
    raise optuna.exceptions.StorageInternalError(message) from e
optuna.exceptions.StorageInternalError: An exception is raised during the commit. This typically happens due to invalid data in the commit, e.g. exceeding max length. 

Steps to reproduce

  1. Run a study with optuna v2.3.0, saving the study to a local sqlite db file
  2. Update optuna to v2.4.0
  3. Use $ optuna storage upgrade --storage $STORAGE_URL to update the study .db file schema
  4. Attempt to load the study using optuna.load_study(...)
  5. Over a certain size, the loading fails with the above exception. (I don't know the exact size, but I have one study with ~600 trials which loads fine and one with ~800 trials that doesn't. I imagine it depends on the number of parameters as well, so this likely depends on the HPO being performed)
@blaney83 blaney83 added the bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself. label Jan 17, 2021
@HideakiImamura
Copy link
Member

HideakiImamura commented Jan 18, 2021

I test the DB migration using the following script.

import optuna

N_PARAMS = 30
N_TRIALS = 1500


def objective(trial):
    for i in range(N_PARAMS):
        trial.suggest_float(f"x{i}", 0, 1)
    return 1.


sampler = optuna.samplers.RandomSampler()
study = optuna.create_study(storage="sqlite:///test.db", study_name="test", sampler=sampler)
study.optimize(objective, n_trials=N_TRIALS)

I didn't have any errors after the upgrade and load_study. Could you share the reproducible code?

@blaney83
Copy link
Author

I cannot share the code directly, but your snippet accomplishes essentially the same thing. You ran the study with v2.3.0 and then successfully updated the study using the optuna CLI for v2.4.0? If so, perhaps it's not related to total number of trials but rather related to the size of the DB file created. I ran sqlite3_analyzer on the file and have included the output below. I couldn't find any values that violate any documented limits, but perhaps there is an internal limitation with the optuna update script? If not, perhaps this is an issue in one of the library dependencies.

/** Disk-Space Utilization Report For study.db

Page size in bytes................................ 4096      
Pages in the whole file (measured)................ 1173      
Pages in the whole file (calculated).............. 1173      
Pages that store data............................. 1173       100.0% 
Pages on the freelist (per header)................ 0            0.0% 
Pages on the freelist (calculated)................ 0            0.0% 
Pages of auto-vacuum overhead..................... 0            0.0% 
Number of tables in the database.................. 11        
Number of indices................................. 8         
Number of defined indices......................... 1         
Number of implied indices......................... 7         
Size of the file in bytes......................... 4804608   
Bytes of user payload stored...................... 3369300     70.1% 

*** Page counts for all tables with their indices *****************************

TRIAL_PARAMS...................................... 1114        95.0% 
TRIAL_USER_ATTRIBUTES............................. 24           2.0% 
TRIALS............................................ 21           1.8% 
ALEMBIC_VERSION................................... 2            0.17% 
STUDIES........................................... 2            0.17% 
STUDY_SYSTEM_ATTRIBUTES........................... 2            0.17% 
STUDY_USER_ATTRIBUTES............................. 2            0.17% 
TRIAL_SYSTEM_ATTRIBUTES........................... 2            0.17% 
TRIAL_VALUES...................................... 2            0.17% 
SQLITE_MASTER..................................... 1            0.085% 
VERSION_INFO...................................... 1            0.085% 

*** Page counts for all tables and indices separately *************************

TRIAL_PARAMS...................................... 852         72.6% 
SQLITE_AUTOINDEX_TRIAL_PARAMS_1................... 262         22.3% 
TRIALS............................................ 21           1.8% 
TRIAL_USER_ATTRIBUTES............................. 14           1.2% 
SQLITE_AUTOINDEX_TRIAL_USER_ATTRIBUTES_1.......... 10           0.85% 
ALEMBIC_VERSION................................... 1            0.085% 
IX_STUDIES_STUDY_NAME............................. 1            0.085% 
SQLITE_AUTOINDEX_ALEMBIC_VERSION_1................ 1            0.085% 
SQLITE_AUTOINDEX_STUDY_SYSTEM_ATTRIBUTES_1........ 1            0.085% 
SQLITE_AUTOINDEX_STUDY_USER_ATTRIBUTES_1.......... 1            0.085% 
SQLITE_AUTOINDEX_TRIAL_SYSTEM_ATTRIBUTES_1........ 1            0.085% 
SQLITE_AUTOINDEX_TRIAL_VALUES_1................... 1            0.085% 
SQLITE_MASTER..................................... 1            0.085% 
STUDIES........................................... 1            0.085% 
STUDY_SYSTEM_ATTRIBUTES........................... 1            0.085% 
STUDY_USER_ATTRIBUTES............................. 1            0.085% 
TRIAL_SYSTEM_ATTRIBUTES........................... 1            0.085% 
TRIAL_VALUES...................................... 1            0.085% 
VERSION_INFO...................................... 1            0.085% 

*** All tables and indices ****************************************************

Percentage of total database...................... 100.0%    
Number of entries................................. 62132     
Bytes of storage consumed......................... 4804608   
Bytes of payload.................................. 4248839     88.4% 
Bytes of metadata................................. 279574       5.8% 
Average payload per entry......................... 68.38     
Average unused bytes per entry.................... 4.45      
Average metadata per entry........................ 4.50      
Average fanout.................................... 116.00    
Maximum payload per entry......................... 390       
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 10        
Primary pages used................................ 1163      
Overflow pages used............................... 0         
Total pages used.................................. 1173      
Unused bytes on index pages....................... 24056       58.7% 
Unused bytes on primary pages..................... 252139       5.3% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 276195       5.7% 

*** All tables ****************************************************************

Percentage of total database......................  76.3%    
Number of entries................................. 31496     
Bytes of storage consumed......................... 3665920   
Bytes of payload.................................. 3372821     92.0% 
Bytes of metadata................................. 184362       5.0% 
Average payload per entry......................... 107.09    
Average unused bytes per entry.................... 3.45      
Average metadata per entry........................ 5.85      
Average fanout.................................... 176.00    
Maximum payload per entry......................... 390       
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 5         
Primary pages used................................ 890       
Overflow pages used............................... 0         
Total pages used.................................. 895       
Unused bytes on index pages....................... 13023       63.6% 
Unused bytes on primary pages..................... 95714        2.6% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 108737       3.0% 

*** All indices ***************************************************************

Percentage of total database......................  23.7%    
Number of entries................................. 30636     
Bytes of storage consumed......................... 1138688   
Bytes of payload.................................. 876018      76.9% 
Bytes of metadata................................. 95212        8.4% 
Average payload per entry......................... 28.59     
Average unused bytes per entry.................... 5.47      
Average metadata per entry........................ 3.11      
Average fanout.................................... 55.00     
Maximum payload per entry......................... 41        
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 5         
Primary pages used................................ 273       
Overflow pages used............................... 0         
Total pages used.................................. 278       
Unused bytes on index pages....................... 11033       53.9% 
Unused bytes on primary pages..................... 156425      14.0% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 167458      14.7% 

*** Table ALEMBIC_VERSION and all its indices *********************************

Percentage of total database......................   0.17%   
Number of entries................................. 2         
Bytes of storage consumed......................... 8192      
Bytes of payload.................................. 21           0.26% 
Bytes of metadata................................. 23           0.28% 
Average payload per entry......................... 10.50     
Average unused bytes per entry.................... 4074.00   
Average metadata per entry........................ 11.50     
Maximum payload per entry......................... 11        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 2         
Overflow pages used............................... 0         
Total pages used.................................. 2         
Unused bytes on primary pages..................... 8148        99.46% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8148        99.46% 

*** Table ALEMBIC_VERSION w/o any indices *************************************

Percentage of total database......................   0.085%  
Number of entries................................. 1         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 10           0.24% 
Bytes of metadata................................. 12           0.29% 
B-tree depth...................................... 1         
Average payload per entry......................... 10.00     
Average unused bytes per entry.................... 4074.00   
Average metadata per entry........................ 12.00     
Maximum payload per entry......................... 10        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4074        99.46% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4074        99.46% 

*** Index SQLITE_AUTOINDEX_ALEMBIC_VERSION_1 of table ALEMBIC_VERSION *********

Percentage of total database......................   0.085%  
Number of entries................................. 1         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 11           0.27% 
Bytes of metadata................................. 11           0.27% 
B-tree depth...................................... 1         
Average payload per entry......................... 11.00     
Average unused bytes per entry.................... 4074.00   
Average metadata per entry........................ 11.00     
Maximum payload per entry......................... 11        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4074        99.46% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4074        99.46% 

*** Table SQLITE_MASTER *******************************************************

Percentage of total database......................   0.085%  
Number of entries................................. 18        
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 3521        86.0% 
Bytes of metadata................................. 190          4.6% 
B-tree depth...................................... 1         
Average payload per entry......................... 195.61    
Average unused bytes per entry.................... 21.39     
Average metadata per entry........................ 10.56     
Maximum payload per entry......................... 390       
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 385          9.4% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 385          9.4% 

*** Table STUDIES and all its indices *****************************************

Percentage of total database......................   0.17%   
Number of entries................................. 2         
Bytes of storage consumed......................... 8192      
Bytes of payload.................................. 81           0.99% 
Bytes of metadata................................. 23           0.28% 
Average payload per entry......................... 40.50     
Average unused bytes per entry.................... 4044.00   
Average metadata per entry........................ 11.50     
Maximum payload per entry......................... 45        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 2         
Overflow pages used............................... 0         
Total pages used.................................. 2         
Unused bytes on primary pages..................... 8088        98.7% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8088        98.7% 

*** Table STUDIES w/o any indices *********************************************

Percentage of total database......................   0.085%  
Number of entries................................. 1         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 45           1.1% 
Bytes of metadata................................. 12           0.29% 
B-tree depth...................................... 1         
Average payload per entry......................... 45.00     
Average unused bytes per entry.................... 4039.00   
Average metadata per entry........................ 12.00     
Maximum payload per entry......................... 45        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4039        98.6% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4039        98.6% 

*** Index IX_STUDIES_STUDY_NAME of table STUDIES ******************************

Percentage of total database......................   0.085%  
Number of entries................................. 1         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 36           0.88% 
Bytes of metadata................................. 11           0.27% 
B-tree depth...................................... 1         
Average payload per entry......................... 36.00     
Average unused bytes per entry.................... 4049.00   
Average metadata per entry........................ 11.00     
Maximum payload per entry......................... 36        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4049        98.9% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4049        98.9% 

*** Table STUDY_SYSTEM_ATTRIBUTES and all its indices *************************

Percentage of total database......................   0.17%   
Number of entries................................. 0         
Bytes of storage consumed......................... 8192      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 16           0.20% 
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 2         
Overflow pages used............................... 0         
Total pages used.................................. 2         
Unused bytes on primary pages..................... 8176        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8176        99.80% 

*** Table STUDY_SYSTEM_ATTRIBUTES w/o any indices *****************************

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Index SQLITE_AUTOINDEX_STUDY_SYSTEM_ATTRIBUTES_1 of table STUDY_SYSTEM_ATTRIBUTES 

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Table STUDY_USER_ATTRIBUTES and all its indices ***************************

Percentage of total database......................   0.17%   
Number of entries................................. 0         
Bytes of storage consumed......................... 8192      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 16           0.20% 
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 2         
Overflow pages used............................... 0         
Total pages used.................................. 2         
Unused bytes on primary pages..................... 8176        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8176        99.80% 

*** Table STUDY_USER_ATTRIBUTES w/o any indices *******************************

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Index SQLITE_AUTOINDEX_STUDY_USER_ATTRIBUTES_1 of table STUDY_USER_ATTRIBUTES 

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Table TRIAL_PARAMS and all its indices ************************************

Percentage of total database......................  95.0%    
Number of entries................................. 57960     
Bytes of storage consumed......................... 4562944   
Bytes of payload.................................. 4111573     90.1% 
Bytes of metadata................................. 261429       5.7% 
Average payload per entry......................... 70.94     
Average unused bytes per entry.................... 3.28      
Average metadata per entry........................ 4.51      
Average fanout.................................... 159.00    
Maximum payload per entry......................... 124       
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 7         
Primary pages used................................ 1107      
Overflow pages used............................... 0         
Total pages used.................................. 1114      
Unused bytes on index pages....................... 12258       42.8% 
Unused bytes on primary pages..................... 177684       3.9% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 189942       4.2% 

*** Table TRIAL_PARAMS w/o any indices ****************************************

Percentage of total database......................  72.6%    
Number of entries................................. 28980     
Bytes of storage consumed......................... 3489792   
Bytes of payload.................................. 3262513     93.5% 
Bytes of metadata................................. 171349       4.9% 
B-tree depth...................................... 3         
Average payload per entry......................... 112.58    
Average unused bytes per entry.................... 1.93      
Average metadata per entry........................ 5.91      
Average fanout.................................... 283.00    
Non-sequential pages.............................. 286         33.6% 
Maximum payload per entry......................... 124       
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 3         
Primary pages used................................ 849       
Overflow pages used............................... 0         
Total pages used.................................. 852       
Unused bytes on index pages....................... 5101        41.5% 
Unused bytes on primary pages..................... 50829        1.5% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 55930        1.6% 

*** Index SQLITE_AUTOINDEX_TRIAL_PARAMS_1 of table TRIAL_PARAMS ***************

Percentage of total database......................  22.3%    
Number of entries................................. 28980     
Bytes of storage consumed......................... 1073152   
Bytes of payload.................................. 849060      79.1% 
Bytes of metadata................................. 90080        8.4% 
B-tree depth...................................... 3         
Average payload per entry......................... 29.30     
Average unused bytes per entry.................... 4.62      
Average metadata per entry........................ 3.11      
Average fanout.................................... 65.00     
Non-sequential pages.............................. 257         98.5% 
Maximum payload per entry......................... 41        
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 4         
Primary pages used................................ 258       
Overflow pages used............................... 0         
Total pages used.................................. 262       
Unused bytes on index pages....................... 7157        43.7% 
Unused bytes on primary pages..................... 126855      12.0% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 134012      12.5% 

*** Table TRIAL_SYSTEM_ATTRIBUTES and all its indices *************************

Percentage of total database......................   0.17%   
Number of entries................................. 0         
Bytes of storage consumed......................... 8192      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 16           0.20% 
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 2         
Overflow pages used............................... 0         
Total pages used.................................. 2         
Unused bytes on primary pages..................... 8176        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8176        99.80% 

*** Table TRIAL_SYSTEM_ATTRIBUTES w/o any indices *****************************

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Index SQLITE_AUTOINDEX_TRIAL_SYSTEM_ATTRIBUTES_1 of table TRIAL_SYSTEM_ATTRIBUTES 

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Table TRIAL_USER_ATTRIBUTES and all its indices ***************************

Percentage of total database......................   2.0%    
Number of entries................................. 3308      
Bytes of storage consumed......................... 98304     
Bytes of payload.................................. 68886       70.1% 
Bytes of metadata................................. 13433       13.7% 
Average payload per entry......................... 20.82     
Average unused bytes per entry.................... 4.83      
Average metadata per entry........................ 4.06      
Average fanout.................................... 11.00     
Maximum payload per entry......................... 30        
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 2         
Primary pages used................................ 22        
Overflow pages used............................... 0         
Total pages used.................................. 24        
Unused bytes on index pages....................... 7864        96.0% 
Unused bytes on primary pages..................... 8121         9.0% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 15985       16.3% 

*** Table TRIAL_USER_ATTRIBUTES w/o any indices *******************************

Percentage of total database......................   1.2%    
Number of entries................................. 1654      
Bytes of storage consumed......................... 57344     
Bytes of payload.................................. 41975       73.2% 
Bytes of metadata................................. 8355        14.6% 
B-tree depth...................................... 2         
Average payload per entry......................... 25.38     
Average unused bytes per entry.................... 4.24      
Average metadata per entry........................ 5.05      
Average fanout.................................... 13.00     
Non-sequential pages.............................. 12          92.3% 
Maximum payload per entry......................... 30        
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 1         
Primary pages used................................ 13        
Overflow pages used............................... 0         
Total pages used.................................. 14        
Unused bytes on index pages....................... 3988        97.4% 
Unused bytes on primary pages..................... 3026         5.7% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 7014        12.2% 

*** Index SQLITE_AUTOINDEX_TRIAL_USER_ATTRIBUTES_1 of table TRIAL_USER_ATTRIBUTES 

Percentage of total database......................   0.85%   
Number of entries................................. 1654      
Bytes of storage consumed......................... 40960     
Bytes of payload.................................. 26911       65.7% 
Bytes of metadata................................. 5078        12.4% 
B-tree depth...................................... 2         
Average payload per entry......................... 16.27     
Average unused bytes per entry.................... 5.42      
Average metadata per entry........................ 3.07      
Average fanout.................................... 10.00     
Non-sequential pages.............................. 8           88.9% 
Maximum payload per entry......................... 20        
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 1         
Primary pages used................................ 9         
Overflow pages used............................... 0         
Total pages used.................................. 10        
Unused bytes on index pages....................... 3876        94.6% 
Unused bytes on primary pages..................... 5095        13.8% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8971        21.9% 

*** Table TRIAL_VALUES and all its indices ************************************

Percentage of total database......................   0.17%   
Number of entries................................. 0         
Bytes of storage consumed......................... 8192      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 16           0.20% 
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 2         
Overflow pages used............................... 0         
Total pages used.................................. 2         
Unused bytes on primary pages..................... 8176        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 8176        99.80% 

*** Table TRIAL_VALUES w/o any indices ****************************************

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Index SQLITE_AUTOINDEX_TRIAL_VALUES_1 of table TRIAL_VALUES ***************

Percentage of total database......................   0.085%  
Number of entries................................. 0         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 0            0.0% 
Bytes of metadata................................. 8            0.20% 
B-tree depth...................................... 1         
Average payload per entry......................... 0.0       
Average unused bytes per entry.................... 0.0       
Average metadata per entry........................ 0.0       
Maximum payload per entry......................... 0         
Entries that use overflow......................... 0         
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4088        99.80% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4088        99.80% 

*** Table TRIALS **************************************************************

Percentage of total database......................   1.8%    
Number of entries................................. 841       
Bytes of storage consumed......................... 86016     
Bytes of payload.................................. 64747       75.3% 
Bytes of metadata................................. 4400         5.1% 
B-tree depth...................................... 2         
Average payload per entry......................... 76.99     
Average unused bytes per entry.................... 20.06     
Average metadata per entry........................ 5.23      
Average fanout.................................... 20.00     
Non-sequential pages.............................. 19          95.0% 
Maximum payload per entry......................... 78        
Entries that use overflow......................... 0            0.0% 
Index pages used.................................. 1         
Primary pages used................................ 20        
Overflow pages used............................... 0         
Total pages used.................................. 21        
Unused bytes on index pages....................... 3934        96.0% 
Unused bytes on primary pages..................... 12935       15.8% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 16869       19.6% 

*** Table VERSION_INFO ********************************************************

Percentage of total database......................   0.085%  
Number of entries................................. 1         
Bytes of storage consumed......................... 4096      
Bytes of payload.................................. 10           0.24% 
Bytes of metadata................................. 12           0.29% 
B-tree depth...................................... 1         
Average payload per entry......................... 10.00     
Average unused bytes per entry.................... 4074.00   
Average metadata per entry........................ 12.00     
Maximum payload per entry......................... 10        
Entries that use overflow......................... 0            0.0% 
Primary pages used................................ 1         
Overflow pages used............................... 0         
Total pages used.................................. 1         
Unused bytes on primary pages..................... 4074        99.46% 
Unused bytes on overflow pages.................... 0         
Unused bytes on all pages......................... 4074        99.46% 

@HideakiImamura
Copy link
Member

You ran the study with v2.3.0 and then successfully updated the study using the optuna CLI for v2.4.0?

Yes, I did it. Do you have no error if you use my code snippet? If so, I suspect that it is an issue of your local environment. It may be one of the library you use. Is not so, it should be an issue of the DB migration script.

For the reproducibility, please tell me the number of trials and the number of parameters. From the analysis results shared with you, it seems that trial parameters are dominant in the DB.

@blaney83
Copy link
Author

blaney83 commented Jan 19, 2021

Thank you for your response. I ran your script with v2.3.0 and then updated the db using the v2.4.0 cli (reproducing the original conditions) and the migration was successful:

optuna storage upgrade --storage sqlite:///test.db
[I 2021-01-19 12:46:11,005] Upgrading the storage schema to the latest version.
[I 2021-01-19 12:46:11,101] Completed to upgrade the storage.

The number of parameter was 34.

The number of trials defined in the creation of the study was 1000. The optimization was running in a distributed environment with a remote file system providing the db for managing the trials. The study itself continued beyond 1000 trials (which was fine) and I stopped it manually at 1086. I was running multiple studies in parallel too (with a different study db, but same logic). The other study db, which was set for 1000 trials was stopped at number 946. This db file and the migration were successful.

This could be related. I should have some time later to run a test which exceeds the initially specified number of trials and see if that is indeed the cause of the issue.

@HideakiImamura
Copy link
Member

I ran your script with v2.3.0 and then updated the db using the v2.4.0 cli (reproducing the original conditions) and the migration was successful:

Thank you. Then the failure of the DB migration may be due to the script.

I should have some time later to run a test which exceeds the initially specified number of trials and see if that is indeed the cause of the issue.

That's reasonable. I look forward to the results of the investigation.

@blaney83
Copy link
Author

After trying to reproduce locally with a modified version of your sample study script, I failed to be able to reproduce the issue.

Tests I conducted:

  • studies with db file sizes 10X the size of the two corrupted study files I initially noticed the issue with
  • overrunning the initially specified number of trials
  • running studies in parallel
  • defining user attributes during trials
  • rw of the study over a network connection

Because of this, I'm currently of the opinion that the issue is likely specific given my particular distributed setup and my abuse of the sqlite db: NFS, running in parallel, read/write over a network connection, etc.- many places for the introduction of db conflicts. Additionally, I don't foresee any issues moving forward. I have migrated all of my optimization scripts over to 2.4.0 and for distributed systems, switched over to a more robust RDB cloud backend.

My only suggestion might be a more intuitive log message with a link to the docu suggestions for the RDB configuration, but otherwise, I think this can be closed unless you deem further investigation necessary.

@HideakiImamura
Copy link
Member

Thank you for your careful and insightful reply.

Additionally, I don't foresee any issues moving forward.

Great! If you encounter any problems, please feel free to raise an issue.

My only suggestion might be a more intuitive log message with a link to the docu suggestions for the RDB configuration

Sounds good to me, but I don't have any concrete ideas to improve the log message. Do you have any idea? If you have one, I welcome you to send us a PR!

@HideakiImamura
Copy link
Member

Let me close this issue since this is addressed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Issue/PR about behavior that is broken. Not for typos/examples/CI/test but for Optuna itself.
Projects
None yet
Development

No branches or pull requests

2 participants