Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat/Object count metrics #1712

Merged
merged 6 commits into from
Aug 25, 2022
Merged

Feat/Object count metrics #1712

merged 6 commits into from
Aug 25, 2022

Conversation

carpawell
Copy link
Member

@carpawell carpawell commented Aug 19, 2022

Related to #1658

# HELP neofs_node_object_counter Objects counters per shards
# TYPE neofs_node_object_counter gauge
neofs_node_object_counter{shard="A1AJ1cgEdGGYC2kGTGQxQJ"} 5
neofs_node_object_counter{shard="S9b69efduvCprPU4FWKirH"} 3

Does not contain the node's total object count since on the shard level it is quite easy to control objects but on the SE level it is not always clear whether a shard contains an object or not:

  1. GC works on the shard level
  2. SE's Delete always calls shard's Inhume not Delete

It is possible to solve (e.g. some SE'e atomic counter that could be passed to every shard) but IMO it is easier to calculate on a metric collector side.

@carpawell carpawell added the neofs-storage Storage node application issues label Aug 19, 2022
@carpawell carpawell self-assigned this Aug 19, 2022
@codecov
Copy link

codecov bot commented Aug 19, 2022

Codecov Report

Merging #1712 (584297a) into master (61f0d85) will increase coverage by 0.09%.
The diff coverage is 53.38%.

❗ Current head 584297a differs from pull request most recent head 9c4c7b9. Consider uploading reports for the commit 9c4c7b9 to get more accurate results

@@            Coverage Diff             @@
##           master    #1712      +/-   ##
==========================================
+ Coverage   32.50%   32.60%   +0.09%     
==========================================
  Files         337      338       +1     
  Lines       22701    22820     +119     
==========================================
+ Hits         7380     7441      +61     
- Misses      14708    14754      +46     
- Partials      613      625      +12     
Impacted Files Coverage Δ
pkg/local_object_storage/engine/metrics.go 0.00% <ø> (ø)
pkg/local_object_storage/engine/shards.go 66.29% <0.00%> (-13.44%) ⬇️
pkg/local_object_storage/shard/shard.go 61.46% <16.66%> (-8.87%) ⬇️
pkg/local_object_storage/metabase/iterators.go 67.08% <46.66%> (-4.79%) ⬇️
pkg/local_object_storage/metabase/control.go 77.94% <50.00%> (-1.75%) ⬇️
pkg/local_object_storage/metabase/put.go 74.00% <50.00%> (-0.36%) ⬇️
pkg/local_object_storage/metabase/delete.go 76.92% <70.37%> (-0.55%) ⬇️
pkg/local_object_storage/metabase/counter.go 73.91% <73.91%> (ø)
pkg/local_object_storage/shard/control.go 79.06% <100.00%> (+0.16%) ⬆️
pkg/local_object_storage/shard/delete.go 69.04% <100.00%> (+0.75%) ⬆️
... and 2 more

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

@carpawell carpawell marked this pull request as ready for review August 19, 2022 17:53
@realloc
Copy link

realloc commented Aug 20, 2022

Agree, the total node's object count can be handled externally.

Is it possible to add a separate per shard counter for logically available objects? For example, some objects may be inhumed, but not physically removed from the graveyard yet, so we could see the difference in the monitoring system.

@carpawell
Copy link
Member Author

carpawell commented Aug 22, 2022

Is it possible to add a separate per shard counter for logically available objects?

@realloc, that PR contains the metrics with physically stored objects (no split objects, tombstoned objects are included in the counter if they are not deleted by a shard yet). Do you mean to just add another counter that means the same but excludes objects that have TS/GC?

@realloc
Copy link

realloc commented Aug 22, 2022

Yes, there should be one counter with all types of objects available and one for all types of objects physically stored.

// tracked since it was opened and initialized.
//
// Returns only the errors that do not allow reading counter
// in bbolt database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bolt?

return
}

// updateCounter updates the object counter. Must be called NOT for read-only
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// updateCounter updates the object counter. Must be called NOT for read-only
// updateCounter updates the object counter. Tx MUST be writable.

// If inc == `true`, increases the counter, decreases otherwise.
func (db *DB) updateCounter(tx *bbolt.Tx, delta uint64, inc bool) error {
b := tx.Bucket(shardInfoBucket)
if b != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

immediate return on nil for smaller if?

b := tx.Bucket(shardInfoBucket)
if b != nil {
var counter uint64
newCounter := make([]byte, 8)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alloc on miss only?

counter = binary.LittleEndian.Uint64(data)
case 0:
default:
return nil
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it normal? Maybe panic?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well... ok, added panic

@fyrchik, ping

if inc {
counter += delta
} else {
if counter <= delta {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Merge with else?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merged but usually, i prefer not to use else if

}
}

err := db.updateCounter(tx, rawDeleted, false)
if err != nil {
return 0, fmt.Errorf("could not decrease object counter: %w", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe make it a pure feature and not fail the original op?

Same for PUT.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agree, removed return, added error log

Comment on lines 44 to 48
switch len(data) {
case 8:
counter = binary.LittleEndian.Uint64(data)
case 0:
default:
panic(fmt.Errorf(
"unexpected len of object counter value: %d", len(data)),
)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think just checking len(data) == 8 would be enough.

Comment on lines 78 to 79
db.log.Error("could not decrease object counter",
zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, let's not log anything inside of a transaction. It is potentially a blocking call.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missed that it is inside a bolt TX

return back the returned code: no logging, direct err to a caller only

Comment on lines 148 to 150
if err != nil {
db.log.Error("could not increase object counter: %w",
zap.Error(err))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here too. I think we can (should) return the error to the caller, updateCounter has no logical errors.

@@ -254,3 +270,15 @@ func (m objectServiceMetrics) AddPutPayload(ln int) {
func (m objectServiceMetrics) AddGetPayload(ln int) {
m.getPayload.Add(float64(ln))
}

func (m objectServiceMetrics) ChangeBy(shardID string, delta int) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change what? Could you add ObjectCounter to the name?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

removed, only AddToObjectCounter(shardID string, delta int) and SetObjectCounter(shardID string, v uint64) are kept on the metrics side; wrappers still have Inc*, Dec* methods

Comment on lines 278 to 280
func (m objectServiceMetrics) IncObjectCounter(shardID string) {
m.shardMetrics.With(prometheus.Labels{shardIDLabelKey: shardID}).Inc()
}

func (m objectServiceMetrics) DecObjectCounter(shardID string) {
m.shardMetrics.With(prometheus.Labels{shardIDLabelKey: shardID}).Dec()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why did you decide to stay on 3 methods instead of a single AddToObjectCounter which covers both inc and dec?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

kept only two: Set* and Add*

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Increment shard's object counter on successful `Put` calls and decrement on
`Delete`.

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
Includes:
1. Renaming counter key to distinguish logical and physical objects
2. Version update dropping since changes could be done in a compatible way

Signed-off-by: Pavel Karpy <carpawell@nspcc.ru>
@fyrchik fyrchik merged commit c7c1c25 into nspcc-dev:master Aug 25, 2022
@carpawell carpawell deleted the feat/object-count-metrics branch August 25, 2022 16:21
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
neofs-storage Storage node application issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants