Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature/remote writes status page (#7971) #8218

Closed

Conversation

strideynet
Copy link

@strideynet strideynet commented Nov 23, 2020

Purely a rough draft at the mo. Well aware tests are currently missing.

A few things I'm a bit unsure on is how we should best extract the pending samples, and last send, last send duration and last error status for each shard. My current thought is introducing a map to the shards of shard id to some kind of ShardStatus struct that contains each of these values and then to update this "status" struct from each shards goroutine.

@strideynet strideynet force-pushed the feature/remote-writes-status-page branch 2 times, most recently from 2f64326 to a5a5398 Compare November 23, 2020 23:12
Copy link
Contributor

@brian-brazil brian-brazil left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a very quick high level glance.

web/api/v1/api.go Outdated Show resolved Hide resolved
shards *shards
numShards int
shards *shards
numShards int
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This all needs locking

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I was just roughly drafting out on the structs to get a feel. Need to go back and make sure appropriate mutexes are added.

Signed-off-by: Noah Stride <noah@noahstride.co.uk>
@strideynet strideynet force-pushed the feature/remote-writes-status-page branch from d466a19 to 3277706 Compare November 24, 2020 00:01
)

if samplesOutRate <= 0 {
var c ShardingCalculations
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't tell if this version using the struct is too much uglier, and if it would make more sense to just copy the values in at the end?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly prefer copying the stats at the end, but not a big deal.

Signed-off-by: Noah Stride <noah@noahstride.co.uk>
Copy link
Member

@csmarchbanks csmarchbanks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work so far!

A few things I'm a bit unsure on is how we should best extract the pending samples, and last send, last send duration and last error status for each shard. My current thought is introducing a map to the shards of shard id to some kind of ShardStatus struct that contains each of these values and then to update this "status" struct from each shards goroutine.

That seems reasonable to me, an alternative would be a new shard struct that would then have a run method on it in place of runShard. The nice part of that would be each shard would then be independent of each other and not need to know about a shared state map. shards would then have to have a []*shard, but that makes sense from a dependency perspective anyway.

)

if samplesOutRate <= 0 {
var c ShardingCalculations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I slightly prefer copying the stats at the end, but not a big deal.

Queues: make([]*RemoteWriteQueue, 0, len(queues)),
}

var wrn []error
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this ever used?

}

// RemoteWriteShardingCalculations provides insight into the values that are used to calculate the required shard num
type RemoteWriteShardingCalculations struct {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than creating a new struct, could this just use remote.ShardingCalculations?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I wasn't too sure if we tried to keep all API objects defined in api/v1, I did consider just adding the json tags to the remote.ShardingCalculations

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've a mix currently. We've had problems in cases where the actual struct didn't match what we wanted on the wire, but if that's not the case this should be okay to reuse. This API will be marked experimental one way or the other.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks for clearing that up @brian-brazil

web/api/v1/api_remote_write.go Outdated Show resolved Hide resolved
web/ui/react-app/src/App.tsx Outdated Show resolved Hide resolved
@@ -203,6 +203,19 @@ func (rws *WriteStorage) Close() error {
return nil
}

// Queues returns all of the queues that have been configured
func (rws *WriteStorage) Queues() []*QueueManager {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if rather than returning the actual queues if we should just return stats instead with a Stats() method? Not a strong feeling either way, but I am not sure if we need to get the full queues.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not too sure on how I feel on this one yet. Probably will build out the rest and then see if my opinion changes on this. It definitely feels odd to return the entire queues as you've said.

@strideynet
Copy link
Author

Awesome work so far!

A few things I'm a bit unsure on is how we should best extract the pending samples, and last send, last send duration and last error status for each shard. My current thought is introducing a map to the shards of shard id to some kind of ShardStatus struct that contains each of these values and then to update this "status" struct from each shards goroutine.

That seems reasonable to me, an alternative would be a new shard struct that would then have a run method on it in place of runShard. The nice part of that would be each shard would then be independent of each other and not need to know about a shared state map. shards would then have to have a []*shard, but that makes sense from a dependency perspective anyway.

I think that alternative is definitely what I would prefer potentially, I was just cautious of making significant changes to the working code. That being said, I think I will go with your suggestion, since it seems cleaner imho.

Signed-off-by: Noah Stride <noah@noahstride.co.uk>
Signed-off-by: Noah Stride <noah@noahstride.co.uk>
Signed-off-by: Noah Stride <noah@noahstride.co.uk>
Signed-off-by: Noah Stride <noah@noahstride.co.uk>
@stale stale bot added the stale label Jan 26, 2021
Base automatically changed from master to main February 23, 2021 19:36
@stale stale bot removed the stale label Feb 23, 2021
@roidelapluie
Copy link
Member

hello @strideynet , are you still up for this? :)

@strideynet
Copy link
Author

Hey @roidelapluie, had a lot of stuff going on with switching jobs but am a bit more free now. I'm happy to come back to this weekend and work out where I was, and pick it up again, unless someone else is offering :)

@roidelapluie
Copy link
Member

If you are in, this is still an awesome feature :)

@strideynet
Copy link
Author

Too busy with $PRIMARY_EMPLOYER to finish this ticket, apologies.

@strideynet strideynet closed this Apr 7, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants