Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

logical_count: improve memory usage #1341

Merged
merged 12 commits into from
Apr 5, 2022

Conversation

HashidaTKS
Copy link
Contributor

@HashidaTKS HashidaTKS commented Mar 29, 2022

Overall, I have changed logical_count to immediately unref shards or close temporary objects that are no longer referred like logical_range_filter.

TODO:

  • Refactor
    • Reconsider name of method, class and so on
    • Separate common method with logical_range_filter into a common class
      • e.g. detect_time_classify_types
      • If they have not match in common, I will skip this.
  • More test
    • Especially performance test

load --table Logs_20150709
[
{"timestamp": "2015-02-05 13:49:00"}
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this because the expected log #|d| [logical_count][select] <Logs_20150709>: no range index is not written if Logs_20150709 is empty.

Currently, an empty table is skipped before logging that log.
https://github.com/groonga/groonga/pull/1341/files#diff-d3fb57a047ca92c9fc19d900aa1a906a31bc3519eb7c26927516b229705c27a4R290

load --table Logs_20150709
[
{"timestamp": "2015-02-05 13:49:00"}
]

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this because the expected log #|d| [logical_count][range-index] <Logs_20150709>: range index is available is not written if Logs_20150709 is empty.

Currently, an empty table is skipped before logging that log.
https://github.com/groonga/groonga/pull/1341/files#diff-d3fb57a047ca92c9fc19d900aa1a906a31bc3519eb7c26927516b229705c27a4R290

load --table Logs_20150709
[
{"timestamp": "2015-02-05 13:49:00"}
]
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added this because the expected log #|d| [logical_count][select] <Logs_20150709>: covered is not written if Logs_20150709 is empty.

Currently, an empty table is skipped before logging that log.
https://github.com/groonga/groonga/pull/1341/files#diff-d3fb57a047ca92c9fc19d900aa1a906a31bc3519eb7c26927516b229705c27a4R290

@HashidaTKS HashidaTKS force-pushed the improve_logical_count_memory_usage branch from 20279e0 to 86f788e Compare March 29, 2022 07:18
raise InvalidArgument, message
end
counter = Counter.new(context)
count_result = counter.count
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you rename this from total?
It seems that count_result isn't better than total.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it.

plugins/sharding/logical_count.rb Outdated Show resolved Hide resolved
plugins/sharding/logical_count.rb Outdated Show resolved Hide resolved
plugins/sharding/logical_count.rb Outdated Show resolved Hide resolved
@HashidaTKS
Copy link
Contributor Author

I am testing performance but ready for review of implementation.
I will write test results to this PR.

@HashidaTKS HashidaTKS marked this pull request as ready for review March 31, 2022 03:03
@HashidaTKS HashidaTKS requested a review from kou March 31, 2022 03:03
@range_index = range_index
@table = shard.table
class ShardCountContext < StreamExecuteContext
attr_reader :order
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that we should be defined in StreamExecuteContext.

How about passing order to StreamExecuteContext#initialize?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please let me work on this in another PR because it also affects to logical_range_filter.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1347 is merged

plugins/sharding/logical_count.rb Show resolved Hide resolved
ensure_filtered

if @range_index
return count_n_records_in_range
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this done before return 0 if @filtered_result_sets.empty??
Can we return 0 here when we have no result sets?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to #1341 (comment).
We don't have any result set when @range_index is defined.

Copy link
Contributor Author

@HashidaTKS HashidaTKS Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have any result set when @range_index is defined.

In logical_count, @range_index is defined only when we don't need to execute filter and we simply get index_cursor.count (https://github.com/groonga/groonga/pull/1341/files#diff-d3fb57a047ca92c9fc19d900aa1a906a31bc3519eb7c26927516b229705c27a4R191)

if @post_filter and @dynamic_columns.have_filtered?
filtered_table = context.table.select_all
def execute_filter(range_index)
return if range_index
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm... "execute_filter does nothing if there is a range index" will confuse developers...
Can we use another approach?

Copy link
Contributor Author

@HashidaTKS HashidaTKS Mar 31, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have fixed it. How about the current approach?

  • Removed this return if range_index.
  • Modified to check @range_index before filtering in ShardCountContext.execute.

kou pushed a commit that referenced this pull request Mar 31, 2022
@HashidaTKS HashidaTKS force-pushed the improve_logical_count_memory_usage branch from cbb42a4 to edb70ab Compare March 31, 2022 06:04
@HashidaTKS
Copy link
Contributor Author

HashidaTKS commented Mar 31, 2022

Performance test results.

Condition:

  • 10M record per shard.
  • Search 30 shards.
  • Filter with indexed columns

Sorry for my lack of details but I cannot write detailed logs or a schema and so on because it is containing private information...

case: GRN_ENABLE_REFERENCE_COUNT=yes

Execution time

Condition Execution time
After changed 47 sec
Before changed 54 sec

Memory usage
image

case: GRN_ENABLE_REFERENCE_COUNT=no
Execution time

Condition Execution time
After changed 42 sec
Before changed 42 sec

Memory usage

image

Comment on lines 95 to 98
unless @post_filter.nil?
result_set = apply_post_filter(result_set)
@temporary_tables << result_set
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to StreamShardExecutor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have addressed this in another PR.
#1348

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#1348 is merged

kou pushed a commit that referenced this pull request Apr 1, 2022
kou pushed a commit that referenced this pull request Apr 1, 2022
kou added a commit that referenced this pull request Apr 5, 2022
This is for #1341.

Co-authored-by: Sutou Kouhei <kou@cozmixng.org>
@HashidaTKS HashidaTKS force-pushed the improve_logical_count_memory_usage branch from edb70ab to 508aff1 Compare April 5, 2022 05:50
@HashidaTKS HashidaTKS requested a review from kou April 5, 2022 05:59
end
@contexts << ShardCountContext.new(shard, cover_type, range_index)
class ShardCountExecutor < StreamShardExecutor
include Loggable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be in StreamShardExecutor because logger is used in it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants