Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Serialize only options that have changed from the some default #648

Merged
merged 10 commits into from
Oct 31, 2023

Conversation

mrambacher
Copy link
Contributor

This PR extends #619 and adds a "compare_to" field to the ConfigOptions. If this value is set, when a configurable value is serialized (e.g. Configurable::ToString), only values that do not match the values in "compare_to" are serialized.

This feature has a couple of potential uses:

  • When writing an Options file, we could serialize only the values that do not match the default. In other words, only serialize values that were explicitly set
  • When writing an Options file, only serialize values that do not match some "good" value. For this use case, we could only serialize ColumnFamilyOptions that are different from the default column family (for example).
  • Only write values to the Log file that are not the default. If/when the Options code is used to generate the values in used via Dump(), then we can restrict the output of Dump to only the values that have been changed from the default (same as with the Options file).

Converts an object to its serialized name-value properties.
Adds a "compare_to" to the ConfigOptions.  If set, the serialize methods will only serialize values that do not match this value.
@udi-speedb udi-speedb requested review from ofriedma and removed request for udi-speedb September 24, 2023 02:18
Copy link
Contributor

@ofriedma ofriedma left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will it be backward compatible ? if on version 10 the default for config z will be 1 and the user sets the value so it will not be written to the options and then the use upgrades to version 10.1 and there the value z = 2, the user will get z = 2 although the user insert z = 1, is it true?

Why is it important for us to minimize options file? it is not a big file anyway

About the options printing in the logs, I am not sure, does it print the default options of the current binary, so the user would be able to compare what have changed (printing it a single time is not that much space)?

@mrambacher
Copy link
Contributor Author

Will it be backward compatible ? if on version 10 the default for config z will be 1 and the user sets the value so it will not be written to the options and then the use upgrades to version 10.1 and there the value z = 2, the user will get z = 2 although the user insert z = 1, is it true?

If the default value of an option changed between releases and the option was not written to the Options file, then the new default would be used. For Options files, this could be circumvented by writing out all of the Options for the Default column family and only those that are different than the Default for the remaining column families.

Why is it important for us to minimize options file? it is not a big file anyway

When a Column Family is added or options are changed, there is a great deal of time spent processing the options file (most of this is spent in validating the options file). By dumping fewer changes, the process can be sped up.

About the options printing in the logs, I am not sure, does it print the default options of the current binary, so the user would be able to compare what have changed (printing it a single time is not that much space)?

When written to the Logger file, the Options are currently truncated -- we do not print more than 10 Column Families. This is because the Options take up so much space in the options file. One way to truncate that is to only print the options that are not the default.

There are a few other uses for this feature as well. When invoking "GetColumnFamiliesFromString/Map, one of the first things that happens is the current ColumnFamily options are serialized. This is necessary to restore the options if the process fails. But there is no reason to serialize any option that is not being set and is not the default. For example, if the column family has all default options and the method was only updating one option (e.g. "disable_auto_compactions=true"), we currently convert ALL of the options in the ColumnFamily into a string so that we can restore that one option if necessary. It would be better to only store the options that were changed (from default) in case we needed to do the restore.

Note that all of these use cases are outside of this PR. If we decide to implement any of them, they will be done separately. This PR provides the infrastructure to enable those use cases to be implemented in the future.

@ofriedma
Copy link
Contributor

If the default value of an option changed between releases and the option was not written to the Options file, then the new default would be used. For Options files, this could be circumvented by writing out all of the Options for the Default column family and only those that are different than the Default for the remaining column families.

I am not sure it will solve the backward/forward compatibility, what happens if the options changed on another CF other than default?

When written to the Logger file, the Options are currently truncated -- we do not print more than 10 Column Families. This is because the Options take up so much space in the options file. One way to truncate that is to only print the options that are not the default.

What I have meant is to put the default values of the db and then print the changes foreach column family, so the user will know the current options.

@mrambacher
Copy link
Contributor Author

If the default value of an option changed between releases and the option was not written to the Options file, then the new default would be used. For Options files, this could be circumvented by writing out all of the Options for the Default column family and only those that are different than the Default for the remaining column families.

I am not sure it will solve the backward/forward compatibility, what happens if the options changed on another CF other than default?

(This discussion is all theoretical, since this PR does not change how Options files are written). My assumption is that, if we chose to write less output to the Options file, we would always write out the default column family as-is (dump all of the options). For any other CF, we only write out those that are different than the default. This would mean that the default CF will not matter if the default options change. For any other CF, we start with the value of "Default" (rather than an uninitialized CF) as the basis over which we apply any deltas. Again, this is all theoretical and would need more information written into the Options files to tell that we are using that means.

When written to the Logger file, the Options are currently truncated -- we do not print more than 10 Column Families. This is because the Options take up so much space in the options file. One way to truncate that is to only print the options that are not the default.

What I have meant is to put the default values of the db and then print the changes foreach column family, so the user will know the current options.

We could possibly do that (I have not looked in that much detail at the Dump/Log code). But that is outside of the scope of this PR -- this is only the infrastructure that would allow us to do something like that.

@ofriedma ofriedma self-requested a review October 24, 2023 11:25
ofriedma
ofriedma previously approved these changes Oct 24, 2023
@udi-speedb
Copy link
Contributor

udi-speedb commented Oct 29, 2023

@mrambacher, @ofriedma - Is anything left to do here or is this may be merged?
If it is ready to be merged, please move to "Ready for merge" status.
Thanks

@ofriedma
Copy link
Contributor

@Yuval-Ariel ready to be merge

Copy link
Contributor

@udi-speedb udi-speedb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mrambacher, @ofriedma - I have re-approved the PR to trigger the CI and verify that it passes before merging.

@udi-speedb udi-speedb self-requested a review October 29, 2023 14:48
udi-speedb
udi-speedb previously approved these changes Oct 29, 2023
@udi-speedb
Copy link
Contributor

@mrambacher - License check fails. Please fix. Thanks

@mrambacher mrambacher dismissed stale reviews from udi-speedb and ofriedma via 6df249d October 30, 2023 11:57
@udi-speedb
Copy link
Contributor

@mrambacher, @ofriedma - Ready for merge?

@mrambacher
Copy link
Contributor Author

@mrambacher, @ofriedma - I have re-approved the PR to trigger the CI and verify that it passes before merging.

@udi-speedb I believe it is ready for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants