Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Teleport UI login error #20963

Closed
Erick-Reyes opened this issue Jan 30, 2023 · 7 comments
Closed

Teleport UI login error #20963

Erick-Reyes opened this issue Jan 30, 2023 · 7 comments
Labels
bug c-cd Internal Customer Reference c-ud Internal Customer Reference c-va Internal Customer Reference good-starter-issue Good starter issue to start contributing to Teleport ui

Comments

@Erick-Reyes
Copy link
Contributor

Expected behavior:
Log in to your Teleport cluster thru the UI without any issues (Okta SAML).

Current behavior:
When you log in to the teleport cluster via WebUI (Okta SAML), it sometimes greets you with the error below:
"Internal error - rpc error: code = Canceled desc = grpc: the client connection is closing"
The error goes away when you refresh the page.

Bug details:

  • Teleport version: 11.1.4 (tsh/tctl=11.2.1)
  • Recreation steps: Attempt logging in to your Teleport cluster thru the WebUI with Okta SAML. After attempting to log in, it will greet you with the previously mentioned error on some occasions. If you refresh the page, the error will go away and you can log in successfully.
    ***Note: Video of the issue observed:
    Bug Video

Debug logs:
Auth Server logs:

Jan 24 18:40:18 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:18Z DEBU [DYNAMODB]  Got 5 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:18 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:18Z INFO [AUTH]      Node "ip-172-18-15-151.ec2.internal" [420238826295-i-0e0b1d5bd0b06d268] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:18 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:18Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-0e0b1d5bd0b06d268" auth/join_ec2.go:331
Jan 24 18:40:19 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:19Z DEBU [DYNAMODB]  Got 2 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:19 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:19Z INFO [AUTH]      Node "ip-172-18-1-28.ec2.internal" [420238826295-i-08c4f9c6856647ffe] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:19 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:19Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-08c4f9c6856647ffe" auth/join_ec2.go:331
Jan 24 18:40:20 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:20Z DEBU [DYNAMODB]  Got 6 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z INFO [AUTH]      Node "ip-172-18-14-209.ec2.internal" [420238826295-i-08e6b9acd1a01b6fd] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-08e6b9acd1a01b6fd" auth/join_ec2.go:331
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z INFO [AUTH]      Node "ip-172-18-0-181.ec2.internal" [631293667815-i-0503e71ceea301c2b] is trying to join with role: Node. auth/join.go:104
Jan 24 18:40:21 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:21Z DEBU [AUTH]      Received Simplified Node Joining request for host "631293667815-i-0503e71ceea301c2b" auth/join_ec2.go:331
Jan 24 18:40:22 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:22Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:23 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:23Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU [DYNAMODB]  Got 1 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:24 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:24Z INFO             No assertion_key_pair was detected. Falling back to signing key for all SAML operations. services/saml.go:213
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Obtained SAML assertions for "user@example.com". auth/saml.go:535
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      SAML assertion warnings: &{OneTimeUse:false ProxyRestriction:<nil> NotInAudience:false InvalidTime:false}. auth/saml.go:536
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      SAML assertion: "username": ["user@example.com"]. auth/saml.go:545
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      SAML assertion: "groups": [REDACTED-GROUPS]. auth/saml.go:545
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Applying 5 SAML attribute to roles mappings. auth/saml.go:557
.........
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Generating dynamic SAML identity okta/user@example.com with roles: [engineering editor access auditor]. Dry run: false. auth/saml.go:248
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Overwriting existing user "user@example.com" created with saml connector okta. auth/saml.go:303
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [AUDIT]     user.update cluster_name:teleport-cluster1 code:T1003I connector:okta ei:0 event:user.update expires:2023-01-25T04:40:25.404222905Z name:user@example.com roles:[engineering editor access auditor] time:2023-01-24T18:40:25.452Z uid:709d84e8-e4e7-4e53-b24b-f11b3943fa21 user:system events/emitter.go:263
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [KEYGEN]    generated user key for [user ec2-user ubuntu root -teleport-internal-join] with expiry on (1674621625) 2023-01-25 04:40:25.556222082 +0000 UTC native/native.go:249
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      Failed setting default kubernetes cluster for user login (user did not provide a cluster); leaving KubernetesCluster extension in the TLS certificate empty auth/auth.go:1433
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [CA]        Generating TLS certificate {0xa0c1f90 0xc00535edf0 1.3.9999.1.15=#13046e6f6e65,1.3.9999.1.7=#131174656c65706f72742d636c757374657231,CN=user@example.com,O=engineering+O=editor+O=access+O=auditor,POSTALCODE={\"groups\":[REDACTED-GROUPS\,\"username\":[\"user@example.com\"]},STREET=teleport-cluster1,L=user+L=ec2-user+L=ubuntu+L=root+L=-teleport-internal-join 2023-01-25 04:40:25.562485929 +0000 UTC [] [] 5 []}. common_name:user@example.com dns_names:[] locality:[user ec2-user ubuntu root -teleport-internal-join] not_after:2023-01-25 04:40:25.562485929 +0000 UTC org:[engineering editor access auditor] org_unit:[] tlsca/ca.go:935
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [AUDIT]     cert.create cert_type:user cluster_name:teleport-cluster1 code:TC000I ei:0 event:cert.create expires:2023-01-25T04:40:25.562485929Z logins:[user ec2-user ubuntu root -teleport-internal-join] prev_identity_expires:0001-01-01T00:00:00Z roles:[engineering editor access auditor] route_to_cluster:teleport-cluster1 teleport_cluster:teleport-cluster1 groups:[REDACTED-GROUPS] username:[user@example.com] user:user@example.com time:2023-01-24T18:40:25.57Z uid:17a04041-2c1b-49da-a332-6b5134953972 events/emitter.go:263
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z INFO [AUDIT]     user.login groups:[REDACTED-GROUPS] username:[user@example.com] cluster_name:teleport-cluster1 code:T1001I ei:0 event:user.login method:saml success:true time:2023-01-24T18:40:25.598Z uid:a231ebf1-a70c-4f13-b1f6-8647c90e9f99 user:user@example.com events/emitter.go:263
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161) auth/middleware.go:674
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:84513399485945296195766192414998272519) auth/middleware.go:674
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU [AUTH:1]    Server certificate cert(9a957fc2-2e02-4cdc-a8e7-47a8e10a1ea0.teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161). auth/middleware.go:311
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:25 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:25Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161) auth/middleware.go:674
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [AUTH]      ClientCertPool -> cert(teleport-cluster1 issued by teleport-cluster1:84513399485945296195766192414998272519) auth/middleware.go:674
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [AUTH:1]    Server certificate cert(9a957fc2-2e02-4cdc-a8e7-47a8e10a1ea0.teleport-cluster1 issued by teleport-cluster1:117937359651789549920656640410007004161). auth/middleware.go:311
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:26 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:26Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:27 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:27Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:28 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:28Z DEBU [DYNAMODB]  Got 7 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:28 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:28Z INFO [AUTH]      Node "ip-172-18-1-60.ec2.internal" [420238826295-i-06b62170969515475] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:28 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:28Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-06b62170969515475" auth/join_ec2.go:331
Jan 24 18:40:29 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:29Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:30 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:30Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:31 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:31Z DEBU [DYNAMODB]  Got 4 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             [SAML] SSO: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml services/saml.go:121
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             [SAML] Issuer: http://www.okta.com/REDACTED services/saml.go:122
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             [SAML] ACS: https://teleport.corp-prod1.exampleinternal.com/v1/webapi/saml/acs/okta services/saml.go:123
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU [DYNAMODB]  Got 1 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z INFO [AUTH]      Node "ip-172-18-1-75.ec2.internal" [420238826295-i-08490c81d86f95b2c] is trying to join with role: Instance. auth/join.go:104
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU [AUTH]      Received Simplified Node Joining request for host "420238826295-i-08490c81d86f95b2c" auth/join_ec2.go:331
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU             Skipping login no-login-de9c65c5-7b6e-4c03-857e-1d4d63441e94, not a valid Unix login. services/role.go:265
Jan 24 18:40:32 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:32Z DEBU [AUTH]      GetServers(341->341) in 13.384451ms. elapsed_fetch:11.946782ms elapsed_filter:1.437669ms user:user@example.com auth/auth_with_roles.go:1250
Jan 24 18:40:33 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:33Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:34 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:34Z DEBU [DYNAMODB]  Got 2 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:35 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:35Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230
Jan 24 18:40:36 ip-10-194-2-173.ec2.internal teleport[30055]: 2023-01-24T18:40:36Z DEBU [DYNAMODB]  Got 3 new stream shard records. dynamo/shards.go:230

Okta-SAML config:

kind: saml
metadata:
  description: Okta connector
  name: okta
spec:
  acs: https://teleport.prod.example.com/v1/webapi/saml/acs/okta
  attributes_to_roles:
  - name: groups
    roles:
    - engineering
    value: org:engineering
  - name: groups
    roles:
    - integration_support
    value: org:integration_support
  - name: groups
    roles:
    - platform:core
    value: org:platform:core
  - name: groups
    roles:
    - platform:cloudeng
    value: org:platform:cloudeng
  - name: groups
    roles:
    - editor
    - access
    - auditor
    value: ac:teleport:admins
  audience: https://teleport.prod.example.com/v1/webapi/saml/acs/okta
  cert: ""
  display: ""
  entity_descriptor: |
    <?xml version="1.0" encoding="UTF-8"?>
    <md:EntityDescriptor xmlns:md="urn:oasis:names:tc:SAML:2.0:metadata" entityID="http://www.okta.com/REDACTED">
      <md:IDPSSODescriptor WantAuthnRequestsSigned="false" protocolSupportEnumeration="urn:oasis:names:tc:SAML:2.0:protocol">
          <md:KeyDescriptor use="signing">
            <ds:KeyInfo xmlns:ds="http://www.w3.org/2000/09/xmldsig#">
                <ds:X509Data>
                  <ds:X509Certificate>REDACTED</ds:X509Certificate>
                </ds:X509Data>
            </ds:KeyInfo>
          </md:KeyDescriptor>
          <md:NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:emailAddress</md:NameIDFormat>
          <md:NameIDFormat>urn:oasis:names:tc:SAML:1.1:nameid-format:unspecified</md:NameIDFormat>
          <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-POST" Location="https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml" />
          <md:SingleSignOnService Binding="urn:oasis:names:tc:SAML:2.0:bindings:HTTP-Redirect" Location="https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml" />
      </md:IDPSSODescriptor>
    </md:EntityDescriptor>
  entity_descriptor_url: ""
  issuer: http://www.okta.com/REDACTED
  service_provider_issuer: https://teleport.prod.example.com/v1/webapi/saml/acs/okta
  signing_key_pair:
    cert: |
      -----BEGIN CERTIFICATE-----
      REDACTED
      -----END CERTIFICATE-----
    private_key: ""
  sso: https://example.okta.com/app/example_teleport_2/REDACTED/sso/saml
version: v2

Screenshot of developer tool error on a check:
Internal Error

We also have a HAR file upon request.

Extras:

  • This happens on all browsers (firefox, safari, and edge). No extensions on safari.
@Erick-Reyes Erick-Reyes added bug ui c-cd Internal Customer Reference labels Jan 30, 2023
@pschisa pschisa added the c-va Internal Customer Reference label Jan 31, 2023
@travelton
Copy link
Contributor

If using systemd, the signal sent to the Teleport process on restart is -HUP.

Forks a new Teleport daemon to serve new connections and initiates the graceful shutdown of the existing process when there are no more clients connected to it.

Ref: https://goteleport.com/docs/reference/signals/

If performing an upgrade of Teleport, while the Proxy is still in rotation and has not been fully drained of connections, the process may hang until all clients disconnect.

If the load balancer attempts to send connections to this Proxy, the following error message will be seen in the Web UI upon authentication.

"Internal error - rpc error: code = Canceled desc = grpc: the client connection is closing"

The proper way to upgrade is to remove the Proxy from the load balancer, drain all connections, and then upgrade the Proxy instance.

@russjones
Copy link
Contributor

Talked with @zmb3 about this. We're thinking about handling this in two ways.

  • Use /healthz and /readyz for inform the LB that connections should not be forwarded to this proxy anymore.
  • Improve the error message to say something like, "Connections from this proxy are being drained, please hit refresh and try again."

@travelton Because this is a nice to have, we're not scheduling this right now. We'll keep it in mind as a good starter issue for the future.

@russjones russjones added the good-starter-issue Good starter issue to start contributing to Teleport label Feb 1, 2023
@Erick-Reyes
Copy link
Contributor Author

@russjones @zmb3 customer checked/verified the systemd services and states all the processes/services are fine, they are not attempting to restart.

@kelcya
Copy link

kelcya commented Feb 2, 2023

A little more information. We originally upgraded from 9.x -> 10.x -> 11.x. The issue started happening in 10.x and 11.x. The proxy and auth servers are all new instances. The upgrade process is as followed:

  1. Reduce auth server to 1 instance
  2. Upgrade auth server (instance refresh with new AMI)
  3. Upgrade proxy servers (instance refresh with new AMI)

The autoscaling group removes and terminates the old instance first before launching the new instance.

@pschisa pschisa added the c-ud Internal Customer Reference label Feb 2, 2023
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
github-actions bot pushed a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 28, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
rosstimothy added a commit that referenced this issue Mar 29, 2023
Closes #23533 and #20963.

There was a race to create the `web.SessionContext` for a session
when multiple Proxies are behind a load balancer. Only the Proxy
that processes the login will have a `web.SessionContext` created
for the session. Any subsequent requests to the other Proxies in
the pool would create one if the request was authenticated. However,
multiple requests within a short succession could cause a
single Proxy to create multiple `web.SessionContext` for a single
session. When that happens the most recently created `web.SessionContext`
gets saved and the previous `web.SessonContext` gets closed. Closing
causes the `auth.Client` to be closed, which causes any active requests
for that client to return with a `grpc: client connection is closing`
error. This manifests in a single request from the web UI to fail
and depending on the request, for a banner to be displayed with the
error. Refreshing the page or navigating to another page would
resolve the problem because the most recent `web.SessionContext`
would be used with the still open `auth.Client`.

To prevent `web.Handler.AuthenticateRequest` from racing to create
the `web.SessionContext` a `singleflight.Group` was added to the
`web.sessionCache`. When multiple requests come in for the same
session they now will all use the first `web.SessionContext` to
be created instead of each creating their own.
@r0mant
Copy link
Collaborator

r0mant commented Mar 29, 2023

Fixed in #23691.

@r0mant r0mant closed this as completed Mar 29, 2023
@kelcya
Copy link

kelcya commented Mar 30, 2023

Is it possible to know what versions will include fix #23691?

@zmb3
Copy link
Collaborator

zmb3 commented Mar 30, 2023

@kelcya the next v10, v11, and v12 release will contain the fix. That will be v12.1.3, v11.3.10, and v10.3.15

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug c-cd Internal Customer Reference c-ud Internal Customer Reference c-va Internal Customer Reference good-starter-issue Good starter issue to start contributing to Teleport ui
Projects
None yet
Development

No branches or pull requests

7 participants