Setup a budget and budget alerts #1375

spiffxp · 2020-10-29T20:20:22Z

ref: https://cloud.google.com/billing/docs/how-to/budgets

Currently we review our billing reports at each meeting, which means we'll notice abnormalities within a 14-day window. As our utilization increases, it would be wise for us to use a budget and alerts to catch things sooner.

I tried experimenting with my account, and didn't have sufficient privileges. We should start there

/priority important-longterm
/wg k8s-infra

fejta-bot · 2021-01-27T20:33:11Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.

If this issue is safe to close now please do so with /close.

Send feedback to sig-testing, kubernetes/test-infra and/or fejta.
/lifecycle stale

spiffxp · 2021-02-08T16:27:07Z

/remove-lifecycle stale
/assign @thockin
I'm assigning you to get your input on whether you think this is worth investing time in

Add a defined budget to `k8s-infra-ii-sandbox` but also use the project to experiment GCP budgets. Ref: kubernetes#1375 Signed-off-by: Arnaud Meukam <ameukam@gmail.com>

thockin · 2021-06-07T23:06:36Z

I think it is long-term valuable but not near-term

spiffxp · 2021-09-29T19:53:29Z

/remove-priority important-longterm
/priority critical-urgent
/milestone v1.23

We discussed last meeting that our spend looks like it's going to put us very near the threshold this year

It's time to come up with a plan for how to make sure we don't cross it, and how to detect if we are about to. Maybe it's not worth implementing technically with cloud budgets, but we should then at least know what number over what period is a flashing danger sign, and have some kind of framework / guidance for what to do next once we see it.

spiffxp · 2021-10-14T19:08:51Z

#2940 - Adds a monthly budget for k8s-infra as a whole, we'll get e-mail alerts if we hit 90% (225K) for the month (which we have been crossing continually since August, but with no alerts setup) and 100% (which we crossed once in August accidentally due to 5k node clusters hanging around for too long)

spiffxp · 2021-10-18T15:25:06Z

What bin is egress bandwidth going into?

Egress is charged to the project hosting the artifacts being transferred, so regardless of which SKU it's billed against, it all goes against the k8s-artifacts-prod project

From https://datastudio.google.com/c/u/0/reporting/14UWSuqD5ef9E4LnsCD9uJWTPv8MHOA3e/page/bPVn

Would it be possible to get the artifacts broken out in terms of size in bytes instead of $/months?

select
    sum(cost) as total_cost,
    sku.description as sku,
    sum(usage.amount_in_pricing_units) amount,
    usage.pricing_unit pricing_unit,
    invoice.month,
from 
    `kubernetes-public.kubernetes_public_billing.gcp_billing_export_v1_018801_93540E_22A20E`
where
    billing_account_id = "018801-93540E-22A20E"
    and project.name = 'k8s-artifacts-prod'
    and usage.pricing_unit = 'gibibyte'
group by
    invoice.month,
    sku,
    pricing_unit
order by
    invoice.month desc, total_cost desc

The units here are GB

From https://console.cloud.google.com/monitoring/metrics-explorer?pageState=%7B%22xyChart%22:%7B%22dataSets%22:%5B%7B%22timeSeriesFilter%22:%7B%22filter%22:%22metric.type%3D%5C%22storage.googleapis.com%2Fnetwork%2Fsent_bytes_count%5C%22%20resource.type%3D%5C%22gcs_bucket%5C%22%22,%22minAlignmentPeriod%22:%2260s%22,%22aggregations%22:%5B%7B%22perSeriesAligner%22:%22ALIGN_RATE%22,%22crossSeriesReducer%22:%22REDUCE_SUM%22,%22alignmentPeriod%22:%2260s%22,%22groupByFields%22:%5B%22resource.label.%5C%22bucket_name%5C%22%22%5D%7D,%7B%22perSeriesAligner%22:%22ALIGN_NONE%22,%22crossSeriesReducer%22:%22REDUCE_NONE%22,%22alignmentPeriod%22:%2260s%22,%22groupByFields%22:%5B%5D%7D%5D,%22pickTimeSeriesFilter%22:%7B%22rankingMethod%22:%22METHOD_MAX%22,%22numTimeSeries%22:%225%22,%22direction%22:%22TOP%22%7D%7D,%22targetAxis%22:%22Y1%22,%22plotType%22:%22LINE%22,%22legendTemplate%22:%22$%7Bresource.labels.bucket_name%7D%22%7D%5D,%22options%22:%7B%22mode%22:%22COLOR%22%7D,%22constantLines%22:%5B%5D,%22timeshiftDuration%22:%220s%22,%22y1Axis%22:%7B%22label%22:%22y1Axis%22,%22scale%22:%22LINEAR%22%7D%7D,%22isAutoRefresh%22:true,%22timeSelection%22:%7B%22timeRange%22:%226w%22%7D%7D&project=kubernetes-public

Bytes sent, top 5 by max value over the last 6W (I don't think our cloud monitoring retention goes further back than that)

I would defer to @BobyMCbobs and @Riaankl to provide a report on which specific artifacts are how large, and how often they're being transferred. That said, I think this is a problem of volume and not specific artifacts.

spiffxp · 2021-10-18T15:26:24Z

#1834 (comment) is our umbrella issue for mitigating artifact hosting costs by use of mirrors, which would allow us to mitigate costs due to large consumers by having them pull from mirrors located closer to them or on their own infra. The comment I'm linking posits that if we could use something like Cloud CDN we could also lower the cost of hosting regardless of where requests are coming from.

It is unclear whether this is possible for container images hosted at k8s.gcr.io which are the vast majority of bytes transferred, as they live in a subdomain of gcr.io that I'm not sure we can take ownership of (replace the endpoint), my understanding is it was provided to us internally

Riaankl · 2021-10-19T18:23:59Z

@jhoblitt we have a report on artifact traffic. This data run form 9 April till Sept 2021
There are several graphs and tables. Here is tables that might answer some of you questions:

jhoblitt · 2021-10-19T18:43:47Z

@spiffxp Thanks for doing that extra analysis. I agree that this sounds more like a pure popularity problem rather than bloated artifacts. I'm not sure what a fitted slope works out to but I'm going to guess that transfers are going to grow faster than gcp bandwidth prices will decrease in the near term and will eventually exceed the total cost envelope. Has there been any discussion of moving away from gcr.io? I would easily believe it will take > 3 years to shift the majority of pulls over to a k8s project registry.

jhoblitt · 2021-10-19T18:48:24Z

@Riaankl I was wondering if there were large artifacts that could be put on a diet but nothing is showing up in the top 10.

Riaankl · 2021-10-19T19:54:11Z

With #1834 we aim to get 2-3 redicetor POC's up where by cloud providers could have local artifacts, and routing is affected by the Redirector based on the requesitng IP's ASN information. Therefore the load is spread to all providers. Idealy the complete set of aritfacts should be hosted by the participants.
80% of the traffic is related to <30 images.

ameukam · 2021-12-06T18:03:40Z

/milestone v1.24

k8s-triage-robot · 2022-03-06T18:43:56Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam · 2022-03-07T06:55:17Z

/remove-lifecycle stale

ameukam · 2022-05-12T03:05:40Z

/milestone clear

k8s-triage-robot · 2022-08-10T04:01:37Z

The Kubernetes project currently lacks enough contributors to adequately respond to all issues and PRs.

This bot triages issues and PRs according to the following rules:

After 90d of inactivity, lifecycle/stale is applied
After 30d of inactivity since lifecycle/stale was applied, lifecycle/rotten is applied
After 30d of inactivity since lifecycle/rotten was applied, the issue is closed

You can:

Mark this issue or PR as fresh with /remove-lifecycle stale
Mark this issue or PR as rotten with /lifecycle rotten
Close this issue or PR with /close
Offer to help out with Issue Triage

Please send feedback to sig-contributor-experience at kubernetes/community.

/lifecycle stale

ameukam · 2022-08-19T21:26:03Z

/remove-lifecycle stale
/lifecycle frozen

BenTheElder · 2022-11-11T22:07:09Z

as per Setup a budget and budget alerts #1375 (comment) yes, we have alerts
spend breakdown: https://datastudio.google.com/c/u/0/reporting/14UWSuqD5ef9E4LnsCD9uJWTPv8MHOA3e1

What exactly do we see as outstanding here?

spiffxp · 2022-11-11T23:15:08Z

spend breakdown: https://datastudio.google.com/c/u/0/reporting/14UWSuqD5ef9E4LnsCD9uJWTPv8MHOA3e1

FWIW I can't access this

What exactly do we see as outstanding here?

I agree with capping this off as the first pass. I think we'll want to revisit how we track our budget in the new year, and that should probably be a separate issue.

Things you might want to consider before capping this off:

The current budget alerts at 90% and 100% of 250K/mo (3M/y). Since we're running over that rate, the alerts are going to be noise for those watching "are we out of credits for the year". Disable and setup a new budget that tracks our remaining spend for the year?
The alerts currently get sent out to k8s-infra leads, consider adding a wider audience?

I'll leave it to @ameukam or others to close if you're fine with this as-is.

BenTheElder · 2022-11-11T23:39:48Z

ACK -- Sorry that link should've been https://datastudio.google.com/c/u/0/reporting/14UWSuqD5ef9E4LnsCD9uJWTPv8MHOA3e/page/tPVn

ameukam · 2022-11-12T00:31:40Z

I think it's ok to close this. Let revisit budget tracking for next year in a separate issue. The different attempts to move workloads to different cloud providers will hopefully impact overall 2023 budget.

/close

k8s-ci-robot · 2022-11-12T00:31:44Z

@ameukam: Closing this issue.

In response to this:

I think it's ok to close this. Let revisit budget tracking for next year in a separate issue. The different attempts to move workloads to different cloud providers will hopefully impact overall 2023 budget.

/close

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

k8s-ci-robot added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. wg/k8s-infra labels Oct 29, 2020

spiffxp added this to Needs Triage in sig-k8s-infra Jan 21, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jan 27, 2021

k8s-ci-robot assigned thockin Feb 8, 2021

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 8, 2021

ameukam added a commit to ameukam/k8s.io that referenced this issue Apr 26, 2021

Add budget to k8s-infra-ii-sandbox

f760c69

Add a defined budget to `k8s-infra-ii-sandbox` but also use the project to experiment GCP budgets. Ref: kubernetes#1375 Signed-off-by: Arnaud Meukam <ameukam@gmail.com>

ameukam mentioned this issue Apr 26, 2021

Add budget to k8s-infra-ii-sandbox #1975

Closed

ameukam added a commit to ameukam/k8s.io that referenced this issue Apr 26, 2021

Add budget to k8s-infra-ii-sandbox

3e70137

Add a defined budget to `k8s-infra-ii-sandbox` but also use the project to experiment GCP budgets. Ref: kubernetes#1375 Signed-off-by: Arnaud Meukam <ameukam@gmail.com>

ameukam added this to the v1.22 milestone May 4, 2021

ameukam modified the milestones: v1.22, v1.23 Jun 29, 2021

ameukam modified the milestones: v1.23, v1.24 Jul 9, 2021

ameukam mentioned this issue Aug 5, 2021

request: US$5K for the year to fund CAPG development kubernetes-retired/funding#18

Closed

k8s-ci-robot added priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. and removed priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. labels Sep 29, 2021

k8s-ci-robot modified the milestones: v1.24, v1.23 Sep 29, 2021

spiffxp moved this from Needs Triage to Backlog (existing infra) in sig-k8s-infra Sep 29, 2021

This was referenced Sep 29, 2021

Deprecate and migrate away from gs://kubernetes-release #2396

Open

Redirect dl.k8s.io traffic to a community-owned GCS bucket instead of kubernetes-release #1569

Open

k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. and removed wg/k8s-infra labels Sep 29, 2021

spiffxp mentioned this issue Oct 14, 2021

terraform/k8s-infra-kubernetes-io: manage k8s-infra-sandbox-capg budget #2940

Merged

k8s-ci-robot modified the milestones: v1.23, v1.24 Dec 6, 2021

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2022

thockin removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Mar 6, 2022

ameukam added priority/important-longterm Important over the long term, but may not be staffed and/or may need multiple releases to complete. and removed priority/critical-urgent Highest priority. Must be actively worked on as someone's top priority right now. labels Mar 22, 2022

ameukam moved this from Backlog (existing infra) to In Progress in sig-k8s-infra Apr 25, 2022

k8s-ci-robot removed this from the v1.24 milestone May 12, 2022

k8s-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Aug 10, 2022

k8s-ci-robot added lifecycle/frozen Indicates that an issue or PR should not be auto-closed due to staleness. and removed lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. labels Aug 19, 2022

k8s-ci-robot closed this as completed Nov 12, 2022

sig-k8s-infra automation moved this from In Progress to Done Nov 12, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Setup a budget and budget alerts #1375

Setup a budget and budget alerts #1375

spiffxp commented Oct 29, 2020

fejta-bot commented Jan 27, 2021

spiffxp commented Feb 8, 2021

thockin commented Jun 7, 2021

spiffxp commented Sep 29, 2021

spiffxp commented Oct 14, 2021

spiffxp commented Oct 18, 2021 •

edited

spiffxp commented Oct 18, 2021

Riaankl commented Oct 19, 2021

jhoblitt commented Oct 19, 2021

jhoblitt commented Oct 19, 2021

Riaankl commented Oct 19, 2021

ameukam commented Dec 6, 2021

k8s-triage-robot commented Mar 6, 2022

ameukam commented Mar 7, 2022

ameukam commented May 12, 2022

k8s-triage-robot commented Aug 10, 2022

ameukam commented Aug 19, 2022

BenTheElder commented Nov 11, 2022

spiffxp commented Nov 11, 2022

BenTheElder commented Nov 11, 2022

ameukam commented Nov 12, 2022

k8s-ci-robot commented Nov 12, 2022

Setup a budget and budget alerts #1375

Setup a budget and budget alerts #1375

Comments

spiffxp commented Oct 29, 2020

fejta-bot commented Jan 27, 2021

spiffxp commented Feb 8, 2021

thockin commented Jun 7, 2021

spiffxp commented Sep 29, 2021

spiffxp commented Oct 14, 2021

spiffxp commented Oct 18, 2021 • edited

spiffxp commented Oct 18, 2021

Riaankl commented Oct 19, 2021

jhoblitt commented Oct 19, 2021

jhoblitt commented Oct 19, 2021

Riaankl commented Oct 19, 2021

ameukam commented Dec 6, 2021

k8s-triage-robot commented Mar 6, 2022

ameukam commented Mar 7, 2022

ameukam commented May 12, 2022

k8s-triage-robot commented Aug 10, 2022

ameukam commented Aug 19, 2022

BenTheElder commented Nov 11, 2022

spiffxp commented Nov 11, 2022

BenTheElder commented Nov 11, 2022

ameukam commented Nov 12, 2022

k8s-ci-robot commented Nov 12, 2022

spiffxp commented Oct 18, 2021 •

edited