New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull-kubernetes-e2e-kops-aws failing #73444

Open
spiffxp opened this Issue Jan 29, 2019 · 43 comments

Comments

@spiffxp
Copy link
Member

spiffxp commented Jan 29, 2019

Which jobs are failing: pull-kubernetes-e2e-kops-aws

Which test(s) are failing: Up

Since when has it been failing:
Last pass @ 2019-01-28 05:43 PST https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/73295/pull-kubernetes-e2e-kops-aws/120805
First failure @ 2019-01-28 05:47 PST
https://gubernator.k8s.io/build/kubernetes-jenkins/pr-logs/pull/73318/pull-kubernetes-e2e-kops-aws/120806

Testgrid link: https://testgrid.k8s.io/presubmits-kubernetes-blocking#pull-kubernetes-e2e-kops-aws

Reason for failure: TBD

Anything else we need to know: Would have reported this sooner but was obscured by everything else failing due to kubernetes/test-infra#11001

@spiffxp

This comment has been minimized.

Copy link
Member Author

spiffxp commented Jan 29, 2019

/sig cluster-lifecycle
because kops
/sig aws
because this is running under a sig-aws managed account AFAIK

@spiffxp

This comment has been minimized.

Copy link
Member Author

spiffxp commented Jan 29, 2019

/priority critical-urgent
this is merge blocking

@spiffxp

This comment has been minimized.

Copy link
Member Author

spiffxp commented Jan 29, 2019

@spiffxp

This comment has been minimized.

Copy link
Member Author

spiffxp commented Jan 29, 2019

@justinsb justinsb self-assigned this Jan 29, 2019

@spiffxp

This comment has been minimized.

Copy link
Member Author

spiffxp commented Jan 29, 2019

The CI jobs going from green to red without the underlying kubernetes commit or kubernetes test-infra commit changing make me feel like this is an environment issue, I'm not sure if the version of kops being used changed?

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Jan 29, 2019

I poked around the AWS account, we're getting this error message from the AWS ASGs:

Launching a new EC2 instance. Status Reason: This account is currently blocked and not recognized as a valid account. Please contact aws-verification@amazon.com if you have questions. Launching EC2 instance failed.

@spiffxp

This comment has been minimized.

Copy link
Member Author

spiffxp commented Jan 29, 2019

/assign @shyamjvs
I know you've been doing test-infra work related to presets for AWS credentials, maybe related?

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

@BenTheElder Have we changed sth around this recently? I remember this was fixed few days ago when it broke.

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

Sorry, also @krzyzacy

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Jan 29, 2019

To be clear though, kops is creating the ASGs, security groups & networking resources, so it isn't a normal "wrong password" problem. It looks like AWS has maybe decided our account can't create instances? Like our account has been flagged?

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

Thanks for the lead. I'll try to follow up on this internally.

@BenTheElder

This comment has been minimized.

Copy link
Member

BenTheElder commented Jan 29, 2019

I've not touched anything, I only reverted the naming of the presets in the config, and I don't have access to any of the AWS accounts. I don't believe @krzyzacy has either. Looks like an issue with the account.

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

@justinsb Do you know who/how the account was created?

@justinsb

This comment has been minimized.

Copy link
Member

justinsb commented Jan 29, 2019

It's a CNCF-owned/administered account - I think @idvoretskyi is the contact

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

I've filed a support issue internally as a 3rd person, since I don't have access credentials to that account. @idvoretskyi - could you grant Admin access to the account to me?

@krzyzacy

This comment has been minimized.

Copy link
Member

krzyzacy commented Jan 29, 2019

I think I can also poke, 1min

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

I suggest temporarily making the presubmit non-blocking, until this fixed. WDYT?

@krzyzacy

This comment has been minimized.

Copy link
Member

krzyzacy commented Jan 29, 2019

sgtm

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

I'll send out a PR making the change.

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Jan 29, 2019

screen shot 2019-01-28 at 7 50 26 pm

^ I think this is the reason for the block.

@zetaab

This comment has been minimized.

Copy link
Member

zetaab commented Feb 2, 2019

is there any news when this issue is solved?

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Feb 3, 2019

We're expecting to get this resolved early next week.

@zetaab

This comment has been minimized.

Copy link
Member

zetaab commented Feb 6, 2019

@shyamjvs any news?

@dims

This comment has been minimized.

Copy link
Member

dims commented Feb 6, 2019

@zetaab, I believe @idvoretskyi @arun-gupta are on point, we will get an ETA hopefully in a day or so.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 6, 2019

/assign

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Feb 6, 2019

A request for $19K credit has been approved. This will be applied before this month automatically to the account. Seems like we can start using EC2 now.

/cc @arun-gupta - for confirmation

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Feb 6, 2019

I'm trying to debug issues with the presubmit under PR #73797. Currently I'm blocked on getting access to the account. I've requested @idvoretskyi for it and he said he'll respond within a day.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 7, 2019

@shyamjvs Shyam, the account is still in a suspended status; I'm not able to make any changes yet. Updated the mail thread with @arun-gupta about this issue.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 11, 2019

Status update: the billing issue is still in place, we are currently working on it with AWS.

@zetaab

This comment has been minimized.

Copy link
Member

zetaab commented Feb 13, 2019

@idvoretskyi is there possibility to somehow solve this issue? We have been now waiting for this over 2 weeks. All development in case of KOPS is on hold. In my opinion this is not very good situation, of course we could just skip aws e2e tests, but then there is possibility that we end up to production problems. There is issue like CVE problems, that we should somehow merge to releases but we cannot tests those.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 13, 2019

@zetaab Shyam @shyamjvs is able to provide more details on this issue.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 13, 2019

Status update: we are actively working with AWS to resolve the issue. It is currently unsolved, but hopefully, we'll have more details later today.

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Feb 13, 2019

Sorry for delay.. had to pull a bunch of strings internally. We discussed this in a meeting today and have clear action items currently being worked upon, to get account activated and payments approved. The situation got complicated because of missing multiple payments and neglecting communication for multiple months.

Expect another update by tomorrow morning PST.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 14, 2019

Thanks @shyamjvs

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Feb 14, 2019

Update as promised - The account has been unsuspended by fraud/verification team and is no longer under risk of closure. It takes 24 hours to activate. Meanwhile, the support team is working on applying new credits retroactively to past pending payments. Once that's approved (ETA tomorrow), account will be usable again. Will keep this issue updated.

@idvoretskyi

This comment has been minimized.

Copy link
Member

idvoretskyi commented Feb 14, 2019

Great, thanks @shyamjvs!

@shyamjvs

This comment has been minimized.

Copy link
Member

shyamjvs commented Feb 14, 2019

Good news - Just got the kops-aws presubmit running (see #73797 (comment)). Also you can see the kops CI jobs recovering.. https://testgrid.k8s.io/google-aws#kops-aws. That said, the payments issue is being fixed in the background now.

There are still couple e2e tests that are failing with [sig-storage] (seems like regression that entered during this period). We'll need to fix those before turning on the presubmit, but that's a separate issue (let me file issue for it).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment