Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

oci-proxy should redirect per-AWS region #39

Closed
hh opened this issue Mar 31, 2022 · 17 comments · Fixed by #47
Closed

oci-proxy should redirect per-AWS region #39

hh opened this issue Mar 31, 2022 · 17 comments · Fixed by #47
Assignees
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.

Comments

@hh
Copy link
Member

hh commented Mar 31, 2022

Here are the regions (and ASNs) we need to oci-proxy to actively redirect.

@ameukam is creating the AWS infra in kubernetes/k8s.io#3568.

If we treat Amazon ASNs not in ip-ranges.json as a type of region, then it’s #4 at 11.12% of Amazons total:

  • 21.5% : us-west-2
  • 16.7% : us-west-1
  • 13% : us-east-1
  • 11.1% : Other Amazon ASNs (not in ip-ranges.json, but collected in k8s.io/meta/asns/amazon.yaml
  • 11.1% : eu-central-1
  • 9.2% : eu-central-1
  • 6.39% : us-east-2
  • 5.32% : ap-southeast-1
  • 2.99% : us-west-1
  • 2.6% : ap-northeast-1
  • 2.12% : ap-south-1
    =====^^Roughly ~80%^^=====

image

@hh hh changed the title oci-proxy should detect per region and 302 to buckets based on src IP using ip-ranges.json oci-proxy should detect per region (80% of total traffic) Mar 31, 2022
@ameukam
Copy link
Member

ameukam commented Mar 31, 2022

/assign @BenTheElder @jaypipes

/sig k8s-infra
/priority backlog
/milestone v1.24

@k8s-ci-robot k8s-ci-robot added sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra. priority/backlog Higher priority than priority/awaiting-more-evidence. labels Mar 31, 2022
@ameukam
Copy link
Member

ameukam commented Mar 31, 2022

11.1% : eu-central-1
9.2% : eu-central-1

2.99% : us-west-1
16.7% : us-west-1

Why 2 percentages for the same region ?

@riaankleinhans
Copy link

image

@riaankleinhans
Copy link

11.1% : eu-central-1
9.2% : eu-central-1

2.99% : us-west-1
16.7% : us-west-1

Why 2 percentages for the same region ?

@hh typo.
We got the graph now ^^

@BenTheElder
Copy link
Member

BenTheElder commented Mar 31, 2022

how did we go from 6.5% to 11.12% or 20%? #20 (comment)

@hh
Copy link
Member Author

hh commented Mar 31, 2022

It's the denominator that chaged:
6.5% of the total across our entire spend
11.2% of only our Amazon spend

@riaankleinhans
Copy link

80/20 Work out exactly.
6 of the 27 regions account for 80% of the traffic, and 6/27=21.4%
20% of the regions account for 80% of the traffic.
Vilfredo Pareto would be so proud that it still work that way!

@aojea
Copy link
Member

aojea commented Mar 31, 2022

lol

@jaypipes
Copy link

@hh @Riaankl @BenTheElder OK, my advice is just disregard for now the IPs that are assigned with an ASN known to be an Amazon ASN that do not appear in the ip-ranges.json file. I don't know where those are coming from but let's just deal with the ones in ip-ranges.json that we see more than, say, 2% of total container image pulls coming from.

So, disregarding those non-ip-range.json IPs, we get this:

21.5% : us-west-2
16.7% : us-west-1
13% : us-east-1
11.1% : eu-central-1
9.2% : eu-central-1
6.39% : us-east-2
5.32% : ap-southeast-1
2.99% : us-west-1
2.6% : ap-northeast-1
2.12% : ap-south-1

which adds up to 90.92% by my count, which is more than acceptable to start with IMHO.

@hh
Copy link
Member Author

hh commented Mar 31, 2022

I don't know where those are coming from

This AWS ASN Data is generated via https://github.com/kubernetes/k8s.io/blob/main/images/public-log-asn-matcher/README.md

Primary data sources include https://bgp.potaroo.net/cidr/autnums.html
and weekly snapshots of the Routeviews data from 2005-2015, accessible here: https://doi.org/10.4121/uuid:d4d23b8e-2077-4592-8b47-cb476ad16e12.

This is the data that helped us generate usage based on company, and resulting in k8s.io/meta/asns/amazon.yaml

If those aren't Amazon ASNs let's fix that.

@riaankleinhans
Copy link

I agree @jaypipes
The ANS matched IPs account for 11.12% of all AWS data and <6% of the total data set for all IPs.
If we get all the IPs matched via ip-ranges.json file to redirect it a great win.
In the Pareto chart I only included IPs with regions detected via ip-ranges.json.
image

@hh
Copy link
Member Author

hh commented Apr 1, 2022

Initial sources for ASN research kubernetes/k8s.io#1834 (comment)

@BenTheElder BenTheElder added the lifecycle/active Indicates that an issue or PR is actively being worked on by a contributor. label Apr 1, 2022
@BenTheElder
Copy link
Member

[I have been actively continuing development on this]

@BenTheElder BenTheElder changed the title oci-proxy should detect per region (80% of total traffic) oci-proxy should redirect per-AWS region Apr 15, 2022
@BenTheElder
Copy link
Member

Please, let's not editorialize this.

To be clear:

  1. We can match approximately 89% of AWS traffic to a region using ip-ranges.json data per the analysis here
  2. AWS Traffic is something like 50%+ of total traffic to the registry, which would put this at say > 45% of total traffic.
  3. 80% of AWS traffic (not total traffic) is an estimate of how much we will handle with a specific set of regional buckets if we only route to buckets in the exact same region. However we are not ultimately committed either in the set of buckets we choose to stand up or in choosing not to route other regions to the nearest region in which we have a bucket. Through either of these options we can handle ~89% of AWS traffic, assuming the analysis is accurate.
  4. It is less-than-useful to attempt to route AWS clients for which we do not have a region, not only will the cost be totally unpredictable but latency may be very bad versus the status-quo of GCP routing clients to the nearest GCR copy.

@dims
Copy link
Member

dims commented Apr 15, 2022

@BenTheElder Agreed!

@hh @calebamiles @Riaankl Please see above

@BenTheElder
Copy link
Member

#42 implements a library to ~efficiently perform "given an IP, determine the AWS region"

#47 (WIP) implements the rest:

  • identify blob requests
  • given an HTTP request, determine the client-ip for a local client in development or when behind GCLB in cloud run
  • given the client IP of a blob request, get the AWS region
  • given a blob request hash + region, redirect it to a bucket copy, or else to the primary registry

@BenTheElder
Copy link
Member

We should close this after rolling out #47 to the sandbox, and follow up with standing up buckets for #38 in kubernetes/k8s.io#3568 + kubernetes/k8s.io#3620 + kubernetes-sigs/promo-tools#533

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
priority/backlog Higher priority than priority/awaiting-more-evidence. sig/k8s-infra Categorizes an issue or PR as relevant to SIG K8s Infra.
Projects
Development

Successfully merging a pull request may close this issue.

8 participants