New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AWS China region support #390

Merged
merged 27 commits into from Mar 15, 2017

Conversation

Projects
None yet
5 participants
@camilb
Collaborator

camilb commented Mar 7, 2017

Hi @mumoshu, my colleague @mpucholblasco and I were working to adapt kube-aws for AWS China region and we need community feedback.

There were several problems we encountered and tried to fix:

  • there is no KMS support in cn-north-1region. We preferred to disable this entirely for this region only.We think later using https://github.com/mozilla/sops to encrypt the certificates fields.

  • China region has a different S3 endpoint. For now we are passing a flag "--s3-region=cn-north-1" when starting the cluster but we are working to get the region automatically from cluster.yaml

  • there are different ARNs. We automatically detect and change them for cn-north-1

  • most of the docker or rkt images are not pulled from dockerhub or quay and gcr.io is completely blocked in China. On large images, like hyperkube, you get many timeouts from quay.
    We added an option to specify your own images, hosted on a registry in China.
    We are using Alibaba Cloud registry, which is very fast. We also added an option for rkt to pull images from docker repos.
    A special case is with amd64-pause image that kubernetes is using it and can not change its name or download it. In this case we added a simple service that pulls the image from our registry in China and tag it as gcr.io/google_containers/pause-amd64:3.0

  • CoreOS AMIs in cn-north-1 are outdated, so we're using a simple method to clone a AMI from Singapore region.
    #ap-south-1
    mkfs.ext4 /dev/xvdb
    mkdir /ami
    mount /dev/xvdb /ami
    cd /ami
    dd if=/dev/xvda of=root.img bs=1M
    scp -i core.pem /ami/root.img core@cn-north-1_ip_address:/home/core
    #cn-north-1
    dd if=/home/core/root.img of=/dev/xvdb bs=1M oflag=direct
    mkdir -p /ami
    mount /dev/xvdb9 /ami
    rm /ami/home/core/.ssh/authorized_keys
    rm /ami/home/core/.ssh/authorized_keys.d/*
    umount /ami
    Create a snapshot of the volume.
    Create an AMI from the snapshot.

After all of these changes we can successfully deploy kubernetes with kube-aws in cn-north-1 as fast as for other regions and without errors or additional configurations.

We are waiting for suggestions from the community before clean the commits and rebase.

camilb and others added some commits Mar 2, 2017

Merge remote-tracking branch 'coreos/master'
* coreos/master:
  Emit a warning message when `t2.nano` or `t2.micro` is set for `*instanceType` closes #258
  Drop deprecated `hostedZone`(not `hostedZoneId`) in cluster.yaml closes #287
  fix(e2e): The second and the subsequent testinfra stacks fail to be created closes #367
  fix(e2e): node pool "Fleet2" fails to be created closes #362
  feat: Managed HA etcd cluster
  Add additional EBS volumes to worker nodes.
  Update ROADMAP.md
  Fix customSettings unavailability for node pools
  Make tests which depends on `clusterName` to pass on Travis
  Fix the inconsistent S3 object prefix issue
  Add a test to reproduce the inconsistent S3 object prefix issue
  Fix missing Name and ControllerHost in `status` and `up` commands
  Add support for ALBs Target Groups
  Fix typo in provisioner.go
  Fix typo in comment
  credentials: Add /O=systems:master for kube-admin
  Add experimental support for pod security policies
  README: remove the design decisions
  README: make the initial instructions self-contained
  README: remove the survey
Merge remote-tracking branch 'coreos/master'
* coreos/master:
  Emit a warning message when `t2.nano` or `t2.micro` is set for `*instanceType` closes #258
  Drop deprecated `hostedZone`(not `hostedZoneId`) in cluster.yaml closes #287
  fix(e2e): The second and the subsequent testinfra stacks fail to be created closes #367
  fix(e2e): node pool "Fleet2" fails to be created closes #362
  feat: Managed HA etcd cluster
  Add additional EBS volumes to worker nodes.
  Update ROADMAP.md
  Fix customSettings unavailability for node pools
  Make tests which depends on `clusterName` to pass on Travis
  Fix the inconsistent S3 object prefix issue
  Add a test to reproduce the inconsistent S3 object prefix issue
  Fix missing Name and ControllerHost in `status` and `up` commands
  Add support for ALBs Target Groups
  Fix typo in provisioner.go
  Fix typo in comment
  credentials: Add /O=systems:master for kube-admin
  Add experimental support for pod security policies
  README: remove the design decisions
  README: make the initial instructions self-contained
  README: remove the survey
Merge pull request #2 from mpucholblasco/test
Option for rkt to pull from Docker repos.
Show outdated Hide outdated cfnstack/assets.go
Show outdated Hide outdated cfnstack/assets.go
@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu Mar 8, 2017

Collaborator

Thanks @camilb @mpucholblasco, I'm really impressed to see your great work!
I've left comments about cosmetics but the changes LGTM overall.

there is no KMS support in cn-north-1region. We preferred to disable this entirely for this region only.

Sounds good at this stage 👍

China region has a different S3 endpoint. For now we are passing a flag "--s3-region=cn-north-1" when starting the cluster but we are working to get the region automatically from cluster.yaml

👍

there are different ARNs. We automatically detect and change them for cn-north-1

Although I'd prefer adding ifs in golang code rather than templates, the functionality sounds great 👍

most of the docker or rkt images are not pulled from dockerhub or quay and gcr.io is completely blocked in China. On large images, like hyperkube, you get many timeouts from quay.
We added an option to specify your own images, hosted on a registry in China.

👍
// Btw I had imagined of hosting our own images even outside of the China region for faster deployments.

#cn-north-1
dd if=/home/core/root.img of=/dev/xvdb bs=1M oflag=direct
mkdir -p /ami
mount /dev/xvdb9 /ami
rm /ami/home/core/.ssh/authorized_keys
rm /ami/home/core/.ssh/authorized_keys.d/*
umount /ami

Sorry for being very nit picky but perhaps you're missing something like aws ec2 attach-volume --device xvdb9 ... in the beginning of this snippet? 😃

Anyway, this guidance of cloning AMIs among regions would help everyone trying kube-aws within the China region in the future.

Collaborator

mumoshu commented Mar 8, 2017

Thanks @camilb @mpucholblasco, I'm really impressed to see your great work!
I've left comments about cosmetics but the changes LGTM overall.

there is no KMS support in cn-north-1region. We preferred to disable this entirely for this region only.

Sounds good at this stage 👍

China region has a different S3 endpoint. For now we are passing a flag "--s3-region=cn-north-1" when starting the cluster but we are working to get the region automatically from cluster.yaml

👍

there are different ARNs. We automatically detect and change them for cn-north-1

Although I'd prefer adding ifs in golang code rather than templates, the functionality sounds great 👍

most of the docker or rkt images are not pulled from dockerhub or quay and gcr.io is completely blocked in China. On large images, like hyperkube, you get many timeouts from quay.
We added an option to specify your own images, hosted on a registry in China.

👍
// Btw I had imagined of hosting our own images even outside of the China region for faster deployments.

#cn-north-1
dd if=/home/core/root.img of=/dev/xvdb bs=1M oflag=direct
mkdir -p /ami
mount /dev/xvdb9 /ami
rm /ami/home/core/.ssh/authorized_keys
rm /ami/home/core/.ssh/authorized_keys.d/*
umount /ami

Sorry for being very nit picky but perhaps you're missing something like aws ec2 attach-volume --device xvdb9 ... in the beginning of this snippet? 😃

Anyway, this guidance of cloning AMIs among regions would help everyone trying kube-aws within the China region in the future.

@camilb

This comment has been minimized.

Show comment
Hide comment
@camilb

camilb Mar 9, 2017

Collaborator

@mumoshu thanks for your feedback. We agree with all of your suggestions and we'll make the changes these days.

Collaborator

camilb commented Mar 9, 2017

@mumoshu thanks for your feedback. We agree with all of your suggestions and we'll make the changes these days.

@codecov-io

This comment has been minimized.

Show comment
Hide comment
@codecov-io

codecov-io Mar 12, 2017

Codecov Report

Merging #390 into master will decrease coverage by 0.18%.
The diff coverage is 46.51%.

@@            Coverage Diff             @@
##           master     #390      +/-   ##
==========================================
- Coverage   38.71%   38.53%   -0.19%     
==========================================
  Files          29       30       +1     
  Lines        2278     2351      +73     
==========================================
+ Hits          882      906      +24     
- Misses       1277     1324      +47     
- Partials      119      121       +2
Impacted Files Coverage Δ
model/region.go 0% <0%> (ø)
model/derived/etcd_node.go 0% <0%> (ø)
model/image.go 0% <0%> (ø)
model/derived/etcd_cluster.go 80% <100%> (ø)
cfnstack/provisioner.go 11.21% <100%> (+0.43%)
cfnstack/assets.go 40.84% <66.66%> (+1.13%)
core/controlplane/cluster/cluster.go 43.26% <66.66%> (+0.23%)
core/nodepool/cluster/cluster.go 38.01% <66.66%> (ø)
core/controlplane/config/tls_config.go 71.37% <71.42%> (-0.57%)
core/controlplane/config/config.go 65.42% <79.16%> (+0.32%)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c963bd...2655dcc. Read the comment docs.

codecov-io commented Mar 12, 2017

Codecov Report

Merging #390 into master will decrease coverage by 0.18%.
The diff coverage is 46.51%.

@@            Coverage Diff             @@
##           master     #390      +/-   ##
==========================================
- Coverage   38.71%   38.53%   -0.19%     
==========================================
  Files          29       30       +1     
  Lines        2278     2351      +73     
==========================================
+ Hits          882      906      +24     
- Misses       1277     1324      +47     
- Partials      119      121       +2
Impacted Files Coverage Δ
model/region.go 0% <0%> (ø)
model/derived/etcd_node.go 0% <0%> (ø)
model/image.go 0% <0%> (ø)
model/derived/etcd_cluster.go 80% <100%> (ø)
cfnstack/provisioner.go 11.21% <100%> (+0.43%)
cfnstack/assets.go 40.84% <66.66%> (+1.13%)
core/controlplane/cluster/cluster.go 43.26% <66.66%> (+0.23%)
core/nodepool/cluster/cluster.go 38.01% <66.66%> (ø)
core/controlplane/config/tls_config.go 71.37% <71.42%> (-0.57%)
core/controlplane/config/config.go 65.42% <79.16%> (+0.32%)
... and 1 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3c963bd...2655dcc. Read the comment docs.

c.HyperkubeImage.Tag = c.K8sVer
c.AWSCliImage.MergeIfEmpty(main.AWSCliImage)
c.CalicoCtlImage.MergeIfEmpty(main.CalicoCtlImage)
c.PauseImage.MergeIfEmpty(main.PauseImage)

This comment has been minimized.

@mumoshu

mumoshu Mar 12, 2017

Collaborator

👍

@mumoshu

mumoshu Mar 12, 2017

Collaborator

👍

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu Mar 12, 2017

Collaborator

Thanks for your effort @camilb 🙇
I've added 3 nits but LGTM overall.
Could you hopefully fix these nits and rebase?
As this is already LGTM overall, if you got short on time, I can merge this and fix it later 😃

Collaborator

mumoshu commented Mar 12, 2017

Thanks for your effort @camilb 🙇
I've added 3 nits but LGTM overall.
Could you hopefully fix these nits and rebase?
As this is already LGTM overall, if you got short on time, I can merge this and fix it later 😃

@mpucholblasco

This comment has been minimized.

Show comment
Hide comment
@mpucholblasco

mpucholblasco Mar 13, 2017

Hello @mumoshu ,
we just pushed those changes you requested. We tried to rebase, but we have seen some last changes on master and merged with our code to ensure everything works.

After reviewing last changes, if you consider it, please, can you merge and rebase?

Thanks,
@camilb & me.

mpucholblasco commented Mar 13, 2017

Hello @mumoshu ,
we just pushed those changes you requested. We tried to rebase, but we have seen some last changes on master and merged with our code to ensure everything works.

After reviewing last changes, if you consider it, please, can you merge and rebase?

Thanks,
@camilb & me.

@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu Mar 15, 2017

Collaborator

Thanks @mpucholblasco, sure 👍
I'm working on it now

Collaborator

mumoshu commented Mar 15, 2017

Thanks @mpucholblasco, sure 👍
I'm working on it now

@mumoshu mumoshu changed the title from [WIP] AWS China region support to AWS China region support Mar 15, 2017

@mumoshu mumoshu merged commit cdf0ef7 into kubernetes-incubator:master Mar 15, 2017

1 check passed

continuous-integration/travis-ci/pr The Travis CI build passed
Details
@mumoshu

This comment has been minimized.

Show comment
Hide comment
@mumoshu

mumoshu Mar 15, 2017

Collaborator

LGTM. Great job, @camilb and @mpucholblasco!

Collaborator

mumoshu commented Mar 15, 2017

LGTM. Great job, @camilb and @mpucholblasco!

@mumoshu mumoshu added this to the v0.9.5-rc.4 milestone Mar 15, 2017

@mumoshu mumoshu referenced this pull request Mar 17, 2017

Merged

Automatic recovery from permanent failures of etcd3 nodes #417

0 of 4 tasks complete

c-knowles added a commit to c-knowles/kube-aws that referenced this pull request Mar 24, 2017

Rescheduler is off by default, can be enabled
Also align to work from
kubernetes-incubator#390 by allowing
image repo/tag to be overridden.

mumoshu added a commit that referenced this pull request Apr 2, 2017

Introduce the rescheduler (#441)
* Introduce the rescheduler

For #118.

- Currently using raw Pod as per [salt
setup](https://github.com/kubernetes/kubernetes/blob/master/cluster/salt
base/salt/rescheduler/rescheduler.manifest0
- Logging routed via standard streams
- gcr image used, not sure if hyperkube has the rescheduler

* Rescheduler is off by default, can be enabled

Also align to work from
#390 by allowing
image repo/tag to be overridden.

* Comment on experimental nature of rescheduler

* Remove resource todo

* Correct default to v0.2.2

v0.3.0 is not ready until k8s 1.6

* Change rescheduler to Deployment

Rescheduler currently crashes if there is more than one present

* Add validation tests

camilb added a commit to camilb/kube-aws that referenced this pull request Apr 5, 2017

Merge remote-tracking branch 'kubernetes-incubator/master'
* kubernetes-incubator/master: (29 commits)
  Emit errors when kube-aws sees unexpected keys in cluster.yaml Resolves kubernetes-incubator#404
  Tag controller nodes appropriately with `kubernetes.io/role`. Resolves kubernetes-incubator#370
  Make Container Linux AMI fetching a bit more reliable
  Stop locksmithd errors on etcd nodes
  Upgrade heapster to version 1.3.0 (kubernetes-incubator#420)
  Auth token file support (kubernetes-incubator#418)
  Update README.md
  Update README accordingly to the new git repo
  AWS China region support (kubernetes-incubator#390)
  Conform as a Kubernetes Incubator Project
  Fixed typo in template
  upgrade aws-sdk to latest version Fix kubernetes-incubator#388
  Upgrade Kubernetes version to v1.5.4
  Fix assumed public hostnames for EC2 instances in us-east-1
  Fix assumed public hostnames for EC2 instances in us-east-1
  typo
  fix: etcdDataVolumeEncrypted not creating encrypted volumes fixes kubernetes-incubator#383
  Allow disabling wait signals fixes kubernetes-incubator#371
  Update file paths in readme
  Fix an issue with glue security group documentation
  ...

@qqshfox qqshfox referenced this pull request Jul 9, 2017

Closed

support china region #2889

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018

AWS China region support (kubernetes-incubator#390)
AWS China region support + Parameterized docker/rkt images

* Added S3 China
* Parameterized images
* Remove "decrypt" dependencies for China region.
* Fix ca cert for China region.
* Fix ARN names for China region.
* Adding logging
* Remove KMS from China region in nodepools.
* Add "amd64-pause" image service for China.
* Added testing and fixing
* Removed parameter s3-region from all commands
* Added image properties
* Used TLSAssetsEncryptionEnabled instead of asking for China Region
* Changed region string by region model

kylehodgetts pushed a commit to HotelsDotCom/kube-aws that referenced this pull request Mar 27, 2018

Introduce the rescheduler (kubernetes-incubator#441)
* Introduce the rescheduler

For kubernetes-incubator#118.

- Currently using raw Pod as per [salt
setup](https://github.com/kubernetes/kubernetes/blob/master/cluster/salt
base/salt/rescheduler/rescheduler.manifest0
- Logging routed via standard streams
- gcr image used, not sure if hyperkube has the rescheduler

* Rescheduler is off by default, can be enabled

Also align to work from
kubernetes-incubator#390 by allowing
image repo/tag to be overridden.

* Comment on experimental nature of rescheduler

* Remove resource todo

* Correct default to v0.2.2

v0.3.0 is not ready until k8s 1.6

* Change rescheduler to Deployment

Rescheduler currently crashes if there is more than one present

* Add validation tests
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment