Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Truncate deploymentStatus metrics after reporting stats #4857

Merged
merged 6 commits into from
Apr 5, 2024

Conversation

t-kikuc
Copy link
Member

@t-kikuc t-kikuc commented Apr 3, 2024

What this PR does / why we need it:

Delete deploymentStatus metrics after reporting stats
in order to avoid the error of excess grpc message size by accumulated deploymentStatus records.

Which issue(s) this PR fixes:

Fixes #4786

Does this PR introduce a user-facing change?: no

  • How are users affected by this change: no
  • Is this breaking change: no
  • How to migrate (if breaking change): no

…sage size

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
@t-kikuc
Copy link
Member Author

t-kikuc commented Apr 3, 2024

I will remove the wip tag after discussing the solution for the issue #4786 #4857

Copy link

codecov bot commented Apr 3, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 28.89%. Comparing base (7b1103d) to head (d586946).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #4857      +/-   ##
==========================================
- Coverage   28.90%   28.89%   -0.01%     
==========================================
  Files         317      317              
  Lines       40369    40369              
==========================================
- Hits        11668    11664       -4     
- Misses      27773    27776       +3     
- Partials      928      929       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
khanhtc1202
khanhtc1202 previously approved these changes Apr 4, 2024
Copy link
Member

@khanhtc1202 khanhtc1202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice catch, thank you 🚀

@ffjlabo
Copy link
Member

ffjlabo commented Apr 4, 2024

@t-kikuc
[ASK] Is it OK for the Control Plane?
Just to be sure, I think it would be nice to check whether this change has any effect on the Control Plane as well.

@ffjlabo
Copy link
Member

ffjlabo commented Apr 4, 2024

The point is how the sent data is used on Control Plane 👀

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
@t-kikuc
Copy link
Member Author

t-kikuc commented Apr 5, 2024

deployment_status is used in the grafana-dashboard.

I'm keep investigating the effect of this change.

"expr": "sum(floor(increase(deployment_status{pipecd_component=\"piped\",project=~\"$project\",status=~\"DEPLOYMENT_CANCELLED|DEPLOYMENT_SUCCESS|DEPLOYMENT_FAILURE\"}[24h])))",

@t-kikuc
Copy link
Member Author

t-kikuc commented Apr 5, 2024

Regardless of this change, when a piped reboots, stats in the Control Plane are rewritten to only newer deployments after piped's reboot.
That's because (1) stats in a piped are reset when the piped restarts and (2) whole stats of a piped in the Control Plane are overwritten by statsreporter (see below).

if err := a.pipedStatCache.Put(pipedID, val); err != nil {

If this matters, we need to fix the data structure/algorithm of storing stats in the Control Plane's redis.

ffjlabo
ffjlabo previously approved these changes Apr 5, 2024
Copy link
Member

@ffjlabo ffjlabo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@t-kikuc
Thank you for the fix and difficult investigation 🚀

It would be nice to write the result of the investigation and make a PR for fixing document later.

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
Copy link
Member

@khanhtc1202 khanhtc1202 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Big step forward, thank you 🙌

@t-kikuc t-kikuc merged commit c853313 into master Apr 5, 2024
13 of 14 checks passed
@t-kikuc t-kikuc deleted the delete-old-deployment-status-metrics branch April 5, 2024 05:27
@t-kikuc
Copy link
Member Author

t-kikuc commented Apr 5, 2024

I'll open a PR of doc later!

@github-actions github-actions bot mentioned this pull request Apr 8, 2024
t-kikuc added a commit that referenced this pull request Apr 8, 2024
* Truncate deploymentStatus metrics after reporting to avoid excess message size

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Rename func to Flush() for clarity

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Add comment of what's included in statsreporter's body

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Fix indent in the comment

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of metrics.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of reporter.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

---------

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
t-kikuc added a commit that referenced this pull request Apr 8, 2024
* Truncate deploymentStatus metrics after reporting to avoid excess message size

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Rename func to Flush() for clarity

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Add comment of what's included in statsreporter's body

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Fix indent in the comment

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of metrics.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of reporter.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

---------

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
t-kikuc added a commit that referenced this pull request Apr 8, 2024
* BUGFIX: Added cancel method to fix context leak (#4767)

Signed-off-by: fazledyn-or <ataf@openrefactory.com>

* Define piped pluggin api (#4815)

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Update BuldPlan API for piped pluggin (#4821)

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Relocate plugin proto (#4826)

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Update controller to use new planner logic (#4825)

* Update controller to use new planner logic

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Update proto path

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Fix typo

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Fix typo

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Update planner logic to call proto instead of self executing

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

---------

Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>

* Update plugin proto for ExecutorService and add piped pluginservice (#4834)

* Add plugin planner for k8s (#4819)

* [WIP] Add planner

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Not to use out.Version

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Use last_successful_commit_hash and last_successful_config_file_name

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Use in.WorkingDir

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Use in.PipedConfig

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Create git client

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Create secret encryptor

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Add startup server implementation

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Fix for relocation of proto api

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Add roughly implementation for planner plugin

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Rename pkg name

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Add licence

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Comment out for the testing code

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

---------

Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>

* Truncate `deploymentStatus` metrics after reporting stats (#4857)

* Truncate deploymentStatus metrics after reporting to avoid excess message size

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Rename func to Flush() for clarity

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Add comment of what's included in statsreporter's body

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Fix indent in the comment

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of metrics.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of reporter.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

---------

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

---------

Signed-off-by: fazledyn-or <ataf@openrefactory.com>
Signed-off-by: khanhtc1202 <khanhtc1202@gmail.com>
Signed-off-by: Yoshiki Fujikane <ffjlabo@gmail.com>
Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
Co-authored-by: Ataf Fazledin Ahamed <ataf@openrefactory.com>
Co-authored-by: Khanh Tran <32532742+khanhtc1202@users.noreply.github.com>
Co-authored-by: Yoshiki Fujikane <40124947+ffjlabo@users.noreply.github.com>
sZma5a pushed a commit to sZma5a/pipecd that referenced this pull request Apr 18, 2024
* Truncate deploymentStatus metrics after reporting to avoid excess message size

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Rename func to Flush() for clarity

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Add comment of what's included in statsreporter's body

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Fix indent in the comment

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of metrics.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

* Copy change of reporter.go to pipedv1

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>

---------

Signed-off-by: t-kikuc <tkikuchi07f@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Piped restart due to failed when report stats to controlplane
3 participants