New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Apply EAV data model to system attributes. #162
Conversation
toshihikoyanase
commented
Aug 23, 2018
- System attributes are converted to a JSON-dumped string, and stored as an entry of user attributes.
- If users uses RDB backend, errors occur when dumped system attributes exceed max length of an entry of user attributes.
- This PR apply EAV data model to system attributes to avoid such errors.
- Schema version will be updated from 5 to 7 because version 6 will be assigned to this change about study names.
|
@@ -24,7 +23,7 @@ | |||
'json_serializable': {'baseline_score': 0.001, 'tags': ['image', 'classification']}, | |||
} | |||
|
|||
EXAMPLE_USER_ATTRS = dict(EXAMPLE_SYSTEM_ATTRS, **{SYSTEM_ATTRS_KEY: {}}) # type: Dict[str, Any] | |||
EXAMPLE_USER_ATTRS = EXAMPLE_SYSTEM_ATTRS # type: Dict[str, Any] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now, EXAMPLE_USER_ATTRS
and EXAMPLE_SYSTEM_ATTRS
have the same structure. I think we can use a common variable name such as: EXAMPLE_ATTRS
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your suggestion. Those two variables are merged into EXAMPLE_ATTRS
.
pfnopt/storages/rdb/storage.py
Outdated
|
||
study = models.StudyModel.find_or_raise_by_id(study_id, session) | ||
system_attr = models.StudySystemAttributeModel.find_by_study_and_key(study, key, session) | ||
# TODO(Yanase): KeyError may be inconsistent with ValueError raised by missing study_id. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for raising attention for this matter. As far as referring Python document, KeyError
is supposed to be used in a dictionary. Taking this also in consideration, ValueError
seems more appropriate here.
Raised when a mapping (dictionary) key is not found in the set of existing keys.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your answer to my question. This KeyError
was replaced with ValueError
. Corresponding method in in_memory.py is also fixed to keep the consistency.
pfnopt/storages/rdb/storage.py
Outdated
else: | ||
attribute.value_json = json.dumps(value) | ||
|
||
session.commit() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have better take care of IntegrityError
; otherwise, this line may fail on multi-worker environment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with you. The set methods for both study
and trial
need this fix. As discussed offline, I will resolve this issue in another PR because it requires a certain amount of changes.
pfnopt/storages/rdb/storage.py
Outdated
system_attr = models.StudySystemAttributeModel.find_by_study_and_key(study, key, session) | ||
if system_attr is None: | ||
raise ValueError( | ||
'System attribute {} does not exist in Study {}.'.format(key, study_id)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
study_id
should not be taken care of by a typical user. We may wanna display study name instead. (..or I think it's also OK not to display the information, such as "the study", to avoid complicated merge dependency.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. study_id
should not be visible to ordinal users, and we need to use study_uuid
or study_name
for this purpose. At this time, I just use "the study" to avoid merge dependency as you mentioned. Thanks.
pfnopt/storages/rdb/models.py
Outdated
@@ -18,7 +18,7 @@ | |||
from pfnopt.structs import StudyTask | |||
from pfnopt.structs import TrialState | |||
|
|||
SCHEMA_VERSION = 6 | |||
SCHEMA_VERSION = 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you may notice, let's take care of version number before merging.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except SCHEMA_VERSION
, LGTM!
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. My concern is that there is some inconsistency between system attrs and user attrs (e.g., get_trial_user_attrs
and get_trial_system_attr
), but this should be addressed in later PRs.