Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch AWS staging hub to GitHub Actions #754

Merged
merged 24 commits into from
Oct 6, 2020

Conversation

salvis2
Copy link
Member

@salvis2 salvis2 commented Sep 24, 2020

closes #699

Current setup should only affect the AWS staging hub. Covers:

  • git-crypt secret decrypting
  • AWS image pushing
  • Helm setup
  • EKS Dynamic IP Whitelisting
  • AWS hub deployment

I've made an educated guess on what multi-line yaml is supposed to look like, Let me know if this is wrong.

To-Do:

  • Add GIT_CRYPT_KEY secret to this repo's secrets
  • Add AWS_IP_WHITELIST secret to this repo's secrets
  • Add AWS image building to trigger on PRs (will require different jobs for each job: build and deploy)
  • Remove AWS steps from CircleCI Actions
  • Filter on paths specific to the AWS deployment
  • Input the step to use --commit-range flag

@salvis2
Copy link
Member Author

salvis2 commented Sep 24, 2020

I put in the AWS_IP_WHITELIST secret in the following format, let me know if that seems incorrect:

x.x.x.x/x,x.x.x.x/x

The command that uses it is:

aws eks update-cluster-config --region us-west-2 --name pangeo --resources-vpc-config publicAccessCidrs=${AWS_IP_WHITELIST} > /dev/null

@salvis2
Copy link
Member Author

salvis2 commented Sep 30, 2020

I don't think we need to install some of the pieces that are in the CircleCI setup, hubploy's image gets some of them: https://github.com/yuvipanda/hubploy/blob/master/orb/orb.yml#L24

@salvis2
Copy link
Member Author

salvis2 commented Sep 30, 2020

My main confusion now is how I was getting everything installed on the hackweek hubs. Here is the OceanHackWeek GitHub Action: https://github.com/oceanhackweek/jupyterhub/blob/staging/.github/workflows/deploy.yaml

The above line in hubploy shows how git-crypt is installed, but I'm not installing awscli manually

Nevermind about awscli, it's installed in the hubploy image as well: https://github.com/yuvipanda/hubploy/blob/master/orb/orb.yml#L65-L74

@salvis2
Copy link
Member Author

salvis2 commented Oct 1, 2020

Looking into the commit range detection, I see in hubploy that there is a file called commitrange.py. It states that GitHub Actions is the only supported CI system for the functions in that file, which is probably why we had to write one for our CircleCI system.

Within the main method of hubploy, I think using the --commit-range flag is for manually specifying the commit range. In GitHub Actions, we may not need to specify this. If I understand the first for loop correctly, we need to not specify either of --commit-range or --check-registry.

I will take out the --check-registry flag from the AWS image build action.

@salvis2 salvis2 marked this pull request as ready for review October 1, 2020 20:25
@salvis2 salvis2 changed the title Switch to GitHub Actions Switch AWS staging hub to GitHub Actions Oct 1, 2020
@TomAugspurger
Copy link
Member

Seems OK to me, but I don't know github actions too well :)

Have you verified that this works on your fork? I think it won't run here till it's actually merged, right?

@salvis2
Copy link
Member Author

salvis2 commented Oct 1, 2020

Have you verified that this works on your fork?

Most of this is from the GitHub actions that I used for OHW and IceSat-2 hackweek, so there's a rough verification that this works. Definitely is not all the same code though. I have not tested this on my fork, but I could try to set that up.

I think it won't run here till it's actually merged, right?

I think that's right.

@salvis2
Copy link
Member Author

salvis2 commented Oct 1, 2020

I'll try to set up some actions on my fork to verify that things work.

runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v1
- uses: $HUBPLOY_IMAGE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i wonder what it would take to package hubploy as an action to use the standard pattern like -uses: hubploy/hubploy@v1 (https://docs.github.com/en/free-pro-team@latest/actions/creating-actions/creating-a-docker-container-action).

Copy link
Member Author

@salvis2 salvis2 Oct 2, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yuvipanda , do you have interest in this? I'm wondering if that would make it easier to use / manage hubploy for people. Currently, some pieces of the action feel a bit hacky, see things like this from IP whitelisting:

with:
  entrypoint: /bin/bash
  args: >
    -c "code &&
    more code &&
    end of code"

@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

I will squash some commits when everything is ready. For now, I've gotten the build action to work, see here: https://github.com/salvis2/pangeo-cloud-federation/runs/1196338165?check_suite_focus=true

The deploy command is giving me some trouble. From the most recent log: https://github.com/salvis2/pangeo-cloud-federation/runs/1196343897?check_suite_focus=true

Added new context arn:aws:eks:us-west-2:783380859522:cluster/pangeo to /tmp/tmp2879psy2
Error: apiVersion 'v2' is not valid. The value must be "v1"
Traceback (most recent call last):
  File "/usr/local/bin/hubploy", line 8, in <module>
    sys.exit(main())
  File "/usr/local/lib/python3.8/site-packages/hubploy/__main__.py", line 128, in main
    helm.deploy(
  File "/usr/local/lib/python3.8/site-packages/hubploy/helm.py", line 176, in deploy
    helm_upgrade(
  File "/usr/local/lib/python3.8/site-packages/hubploy/helm.py", line 56, in helm_upgrade
    subprocess.check_call([
  File "/usr/local/lib/python3.8/subprocess.py", line 364, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['helm', 'dep', 'up']' returned non-zero exit status 1.

It gets into the deploy step, authenticates and make the temp kubeconfig file, then attempts to run helm dep up. I know the pangeo-deploy chart that we deploy is v2. I also specifically install helm 3 and get the expected output from helm version from the same action.

Is that error associated with anything besides running Helm 2?

@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

I inserted a helm version command to the IP assignment step: https://github.com/salvis2/pangeo-cloud-federation/runs/1199869433?check_suite_focus=true

Errored out with:

Client: &version.Version{SemVer:"v2.16.9", GitCommit:"8ad7037828e5a0fca1009dabe290130da6368e39", GitTreeState:"clean"}
Error: Get "http://localhost:8080/api/v1/namespaces/kube-system/pods?labelSelector=app%3Dhelm%2Cname%3Dtiller": dial tcp 127.0.0.1:8080: connect: connection refused

So I can see that it is using Helm 2, so now I know I need to figure out how to stop that.

@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

So the hubploy image that we pull overrides the helm executable at /usr/local/bin. I think this is why I saw that helm version indicated that Helm 3 was installed, but in a later step, helm version indicated it was Helm 2. hubploy pulls an environment variable - HELM_EXECUTABLE - as the binary to use [here] (https://github.com/yuvipanda/hubploy/blob/master/hubploy/helm.py#L34). I exposed this as an environment variable in the entire action and it worked like a charm. Cleanup time! Then I will get to suggestions from @scottyhq .

Working deploy action here: https://github.com/salvis2/pangeo-cloud-federation/runs/1200067421?check_suite_focus=true

@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

I'm seeing some funny business in the build action now. Since I did some git rebase stuff, it's having trouble looking at commits (I think). Might not be a problem for the merge. See https://github.com/salvis2/pangeo-cloud-federation/runs/1200134273?check_suite_focus=true for the logs.

@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

Also wondering if overall we can change some of the syntax from things like this in IP whitelisting:

with:
  entrypoint: /bin/bash
  args: >
    -c "code &&
    more code &&
    end of code"

to something more intuitive like how the helm command is:

run: |
  code
  more code
  end of code

I'll try it out and see if I really need the hubploy image setup that we've been using for many of these steps.

@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

So that immediately failed with

a step cannot have both the `uses` and `run` keys

So if we wanted to make the command more intuitive, we would need to install git-crypt manually like we do for CircleCI. Probably not worth it.

I was hoping that the paths_ignore of the image/ folder would override accepting anything in the icesat2/ folder, but that didn't work. Now I am more specific to exclude the image/ folder.
It got deleted in deleting the image build blocks
@salvis2
Copy link
Member Author

salvis2 commented Oct 2, 2020

I had a thought to make the git-crypt step separate from the pulling of the hubploy image. The Docker image pull is separate from the first time it is used, so we might have the packages from hubploy installed and not have to invoke it.

Turns out this is not the case and we need to either invoke hubploy, which has git-crypt in it, or manually install and use git-crypt. See the action. This is a nice rabbit hole that may be more relevant later if we want to restructure what is installed where for CI.

Anyhow, I think this is almost ready to actually be merged. deploy is running correctly and while build isn't, I think that's fine, see #754 (comment). @scottyhq @TomAugspurger let me know if there is anything else you want tested or modified. I will have one last commit to enable the action for AWS's prod hub (if you want) and to disable the action on this temp branch for the PR.

@scottyhq
Copy link
Member

scottyhq commented Oct 2, 2020

@scottyhq @TomAugspurger let me know if there is anything else you want tested or modified. I will have one last commit to enable the action for AWS's prod hub (if you want) and to disable the action on this temp branch for the PR.

thanks for detailing all this @salvis2! Let's plan to merge next week, this shouldn't impact the other hubs, so we can see how things work and migrate the others / iterate on the workflow in follow-up PRs

@TomAugspurger
Copy link
Member

TomAugspurger commented Oct 3, 2020 via email

@salvis2
Copy link
Member Author

salvis2 commented Oct 6, 2020

So the build step has been failing, saying

Error: -: not a valid git-crypt key file
If this key was created prior to git-crypt 0.4, you need to migrate it
by running 'git-crypt migrate-key /path/to/old_key /path/to/migrated_key'.

This step completes correctly in my fork.

I updated both secrets by inputting a value for the secret, copying the value, saving the first secret, then pasting the value in for the second secret, so I'm fairly confident that they are the same on both forks. Anyone have any ideas as to why this might happen?

@scottyhq
Copy link
Member

scottyhq commented Oct 6, 2020

Anyone have any ideas as to why this might happen?

@salvis2 - I'm pretty sure this is one of those confusing points of permissions and secrets access when GitHub Actions running from a PR https://stackoverflow.com/questions/58737785/github-actions-empty-env-secrets

The action running on your fork has access to your fork's secrets, you can see they are encrypted in the log (note asterisks):

  with:
    entrypoint: /bin/bash
    args: -c "echo ${GIT_CRYPT_KEY} | base64 -d | git crypt unlock - && git crypt status"
  env:
    GIT_CRYPT_KEY: ***

If you look at the log in this repo, GIT_CRYPT_KEY is blank suggesting the value couldn't be read

  with:
    entrypoint: /bin/bash
    args: -c "echo ${GIT_CRYPT_KEY} | base64 -d | git crypt unlock - && git crypt status"
  env:
    GIT_CRYPT_KEY: 

The secrets should be read properly once merged, so go ahead!

@salvis2
Copy link
Member Author

salvis2 commented Oct 6, 2020

Right, that makes sense. Thanks @scottyhq !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Switch CI System to GitHub Actions?
3 participants