export the credential file location env variable #738

salvis2 · 2020-09-17T19:06:14Z

So the CI system has been failing on the step that is supposed to update the IP addresses that are allowed to access the AWS cluster. Here is the code and the response:

#!/bin/bash -eo pipefail
AWS_SHARED_CREDENTIALS_FILE=./deployments/icesat2/secrets/aws-config.txt
RUNNERIP=`curl --silent https://checkip.amazonaws.com`
aws --version
aws eks update-cluster-config --region us-west-2 --name pangeo --resources-vpc-config publicAccessCidrs=${RUNNERIP}/32 > /dev/null
sleep 120

aws-cli/1.18.140 Python/3.7.2 Linux/4.15.0-1077-aws botocore/1.17.63
Unable to locate credentials. You can configure credentials by running "aws configure".

Exited with code exit status 255
CircleCI received exit code 255

@TomAugspurger do you think this commit will make a difference? I know there are differences in using export vs not, so if the update-cluster-config command spawns a new process, it might not be getting the value of the env variable, thus the "Unable to locate credentials."

My other thought is that somehow it might still be git-crypt'd, but that seems unlikely.

salvis2 · 2020-09-17T19:09:01Z

It just seems weird that hubploy could be at fault here. It does seem like the main difference for this CI step has been the hubploy commit, but hubploy shouldn't even be used here.

TomAugspurger · 2020-09-17T20:26:52Z

Maybe... That could affect whether variables propagate through to subprocesses? I'm not sure.

OOI is also failing

Deleting outdated charts
Traceback (most recent call last):
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 66, in helm_upgrade
    kubernetes.config.load_kube_config(config_file=kubeconfig)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 739, in load_kube_config
    persist_config=persist_config)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 695, in _get_kube_config_loader_for_yaml_file
    kcfg = KubeConfigMerger(filename)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 650, in __init__
    self.load_config(path)
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/kube_config.py", line 664, in load_config
    config_merged[item] = []
TypeError: 'NoneType' object does not support item assignment

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/circleci/repo/venv/bin/hubploy", line 11, in <module>
    load_entry_point('hubploy==0.1.1', 'console_scripts', 'hubploy')()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/__main__.py", line 139, in main
    args.cleanup_on_fail,
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 187, in deploy
    cleanup_on_fail,
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/hubploy/helm.py", line 68, in helm_upgrade
    kubernetes.config.load_incluster_config()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 94, in load_incluster_config
    cert_filename=SERVICE_CERT_FILENAME).load_and_set()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 45, in load_and_set
    self._load_config()
  File "/home/circleci/repo/venv/lib/python3.7/site-packages/kubernetes/config/incluster_config.py", line 51, in _load_config
    raise ConfigException("Service host/port is not set.")
kubernetes.config.config_exception.ConfigException: Service host/port is not set.

For now I'm going to revert and see if we can get the OOI deployment to finish.

salvis2 · 2020-09-17T20:33:06Z

Might have found the culprit. A recent update to hubploy installed SOPS: https://github.com/mozilla/sops

I don't know where it is getting the information for what to encrypt, but I could definitely see it interfering with reading files. @consideRatio do you know offhand? I see you helped a lot with that update. I'm learning more about SOPS atm.

consideRatio · 2020-09-17T20:52:39Z

@salvis2 hmmmm... The issue is

Unable to locate credentials. You can configure credentials by running "aws configure"

And these credentials have been located by

AWS_SHARED_CREDENTIALS_FILE=./deployments/icesat2/secrets/aws-config.txt

Hmm... and hubploy is supposed to have decrypted this file? Then I don't think it have, because of a .yaml and .json focus as can be seen here.

Note that I believe you won't use sops at all unless you have .yaml or .json with encrypted by sops, which is identified by finding sops keys within the .yaml / .json which is encrypted.

salvis2 · 2020-09-17T20:57:29Z

Gotcha. Thank you for taking a look! It was just a suspicion, so I'll keep digging to see what the issue is.

consideRatio · 2020-09-17T21:04:07Z

@salvis2 this PR doesn't have a failing check, so where is the CI system running? I have not yet managed to overview the pangeo related repositories properly.

salvis2 · 2020-09-17T21:09:37Z

@consideRatio the CI system is on CircleCI. I think the deploy CI action only runs on commits to staging and prod. I'm referencing the failling CI result of 463f6b8, which has the same hubploy version that we have been trying to use.

salvis2 · 2020-09-17T21:14:14Z

From some of the recent commit testing, it seems like updating to the newer hubploy commit is breaking the step listed at the top of this PR. We try to pick up the CI machine's IP address and allow it access to the AWS cluster, but it can't find the credentials. With the old version of hubploy, it can successfully do this, see c7f801b. Very unsure why hubploy would be the problem, since it does not involve a hubploy command.

salvis2 · 2020-09-18T16:47:08Z

Some local testing this morning:

Test 1

With my normal credentials setup in ~/.aws/credentials, I list the pods of a cluster.

I run the following to point to a credentials file that does not have the proper credentials to connect to said cluster.

AWS_SHARED_CREDENTIALS_FILE=./deployments/icesat2/secrets/aws-config.txt
echo $AWS_SHARED_CREDENTIALS_FILE

Yielding

./deployments/icesat2/secrets/aws-config.txt

Running kubectl get pods -A succeeds.

Test 2 (from a fresh Terminal)

export AWS_SHARED_CREDENTIALS_FILE=./deployments/icesat2/secrets/aws-config.txt
echo $AWS_SHARED_CREDENTIALS_FILE

Yields

./deployments/icesat2/secrets/aws-config.txt

Now, running kubectl get pods -A gives me

Unable to locate credentials. You can configure credentials by running "aws configure".
Unable to connect to the server: getting credentials: exec: exit status 255

Conclusion

awscli isn't looking at the environment variable AWS_SHARED_CREDENTIALS_FILE unless I export it. I will probably try to merge this to include the export part and the new version of hubploy to see if everything works. Not sure why this wasn't the case in the other hubploy version or what else changes between here and there.

Suspicion

Actually, I have another suspicion.

So that old version of hubploy used to manually put the AWS credential file in the expected location of ~/.aws/credentials. We ended up changing that because it would wipe out your existing credentials file, which sucks for those who manually deployed. More current versions of hubploy will set the AWS_SHARED_CREDENTIALS_FILE environment variable while JupyterHub deployment is happening.

My guess is that the original command

AWS_SHARED_CREDENTIALS_FILE=./deployments/icesat2/secrets/aws-config.txt

never actually worked, but since the hubploy build command for the AWS hub happened before it, the credentials were already in ~/.aws/credentials, so awscli picked them up and was fine. With the new version of hubploy, this is no longer the case and the command is shown to need export.

salvis2 · 2020-09-18T16:49:22Z

I will probably merge this first when ready for control, then update the hubploy version and try that.

export the credential file location env variable

78769cc

salvis2 marked this pull request as draft September 17, 2020 20:05

salvis2 marked this pull request as ready for review September 18, 2020 16:49

salvis2 merged commit a2bd13b into pangeo-data:staging Sep 18, 2020

salvis2 deleted the aws-timeout-fix branch September 18, 2020 16:53

salvis2 mentioned this pull request Sep 18, 2020

Updated hubploy and boto3 versions #745

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

export the credential file location env variable #738

export the credential file location env variable #738

salvis2 commented Sep 17, 2020 •

edited

salvis2 commented Sep 17, 2020

TomAugspurger commented Sep 17, 2020

salvis2 commented Sep 17, 2020

consideRatio commented Sep 17, 2020 •

edited

salvis2 commented Sep 17, 2020

consideRatio commented Sep 17, 2020

salvis2 commented Sep 17, 2020 •

edited

salvis2 commented Sep 17, 2020

salvis2 commented Sep 18, 2020

salvis2 commented Sep 18, 2020

export the credential file location env variable #738

export the credential file location env variable #738

Conversation

salvis2 commented Sep 17, 2020 • edited

salvis2 commented Sep 17, 2020

TomAugspurger commented Sep 17, 2020

salvis2 commented Sep 17, 2020

consideRatio commented Sep 17, 2020 • edited

salvis2 commented Sep 17, 2020

consideRatio commented Sep 17, 2020

salvis2 commented Sep 17, 2020 • edited

salvis2 commented Sep 17, 2020

salvis2 commented Sep 18, 2020

Test 1

Test 2 (from a fresh Terminal)

Conclusion

Suspicion

salvis2 commented Sep 18, 2020

salvis2 commented Sep 17, 2020 •

edited

consideRatio commented Sep 17, 2020 •

edited

salvis2 commented Sep 17, 2020 •

edited