Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ci.jenkins.io] Azure billing shows huge cloud cost due to outbound bandwidth #3485

Closed
dduportal opened this issue Mar 31, 2023 · 4 comments
Closed

Comments

@dduportal
Copy link
Contributor

dduportal commented Mar 31, 2023

While checking our cloud billing on Azure, we were able to pin point that ci.jenkins.io costed us ~ 1,300 $ for March in.. outbound bandwidth!

The screenshot below is:

  • For the ci.jenkins.io controller only resource group (1 VM, 2 disks, 1 public IP and 1 NIC in this resource group)
  • For the span of 1st of March 2023 -> 28 March 2023
  • The service "Rtn Preference: MGN" is "Internet Egress (routed via Microsoft Premium Global Network)" in https://azure.microsoft.com/en-us/pricing/details/bandwidth/

Capture d’écran 2023-03-29 à 09 57 09

It means that there are multiple Terabytes of data sent out of ci.jenkins to outside the Azure cloud (we have around 5 $ of cross-region as we use both US East and US East 2 for the infrastructure).

  • Worst case on price per Gb: South America destination, $0.181 per GB means ~7182 Gb
  • Worst case for amount of data: EU/US, means ~ 14900 Gb

=> we need to check and understand how to control this cost.

@dduportal
Copy link
Contributor Author

A few elements after discussing and brainstomring (not exhaustive but great start) to analyse:

  • We have an Apache server in front of ci.jenkins.io: its logs are important to check.

    • todo: check if datadog agent collects these logs (should be the case) and use datadog to check the amount of data Apache reports for outbound network
  • todo: check the "outbound network bandwidth" in datadog for the VM, to see if it report the same amount

  • As discussed with @MarkEWaite , it could be in the controller <-> agents communication area as we launch agents in AWS and DigitalOcean for ci.jenkins.io

    • The unstash pipeline step could be a great candidate for outbound badnwidth: with a mega war at (optimistic evaluation) 100Mb, with ~200 parallel PCT steps, once a day, it is already 620 Gb of outbound data to AWS/DigitalOcean

@dduportal
Copy link
Contributor Author

Proposal about the stash/unstash: using https://plugins.jenkins.io/artifact-manager-s3/ could help:

@dduportal
Copy link
Contributor Author

Capture d’écran 2023-04-24 à 12 23 53

- Next step: creating a dashboard in datadog (enabled by #https://github.com//issues/3514) to measure the outbound bandwidth - Check Apache optimizations (gzip? websockets for agents? etc.)

@dduportal
Copy link
Contributor Author

It seems the unusual consumption is manageable (thanks to the huge work by maintainers in bom along with the S3 Artifact management and persists to be "normal" again:

Capture d’écran 2023-05-15 à 19 15 11 Capture d’écran 2023-05-15 à 19 21 07 Capture d’écran 2023-05-15 à 19 21 38

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant