Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consul Ingress-Gateway doesn't pick the configuration from a federated cluster #9201

Closed
dalssaso opened this issue Nov 16, 2020 · 4 comments
Closed
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/federation-usability Anything related to Federation theme/ingress-gw Track ingress work

Comments

@dalssaso
Copy link

Overview of the Issue

Consul Connect Ingress Gateway doesn't pick up the configuration applied when it's federated

I deployed 2 consul clusters federated (normal federation without the mesh-gateway features) and deployed 2 ingress-gateway, one for each datacenter

Architecture

  • dc1 - primary_dc, federated cluster

    • 1 ingress gateway (working)
    • I can see the configuration (kind: ingress-gateway)
  • dc2 - federated cluster

    • 1 ingress-gateway (deployment running but without any configuration)
    • I can't see the configuration when using consul config list -kind ingress-gateway pointing the CONSUL_HTTP_ADDR to the dc2 cluster address
dc1 - agent configuration
{
  "bind_addr": "<redacted>",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "config_entries": {
    "bootstrap": [
      {
        "config": {
          "protocol": "http"
        },
        "kind": "proxy-defaults",
        "name": "global"
      }
    ]
  },
  "connect": {
    "enabled": true
  },
  "data_dir": "/data/consul-server",
  "datacenter": "dc1",
  "dns_config": {
    "enable_truncate": false
  },
  "domain": "consul.",
  "enable_central_service_config": true,
  "http_config": {
    "response_headers": {
      "Access-Control-Allow-Headers": "*",
      "Access-Control-Allow-Methods": "*",
      "Access-Control-Allow-Origin": "*"
    }
  },
  "node_name": "<redacted>",
  "primary_datacenter": "dc1",
  "retry_join": ["<consul_server_ips>"],
  "retry_join_wan": ["consul.service.dc2.consul"],
  "server": true,
  "translate_wan_addrs": true,
  "ui": true
}
dc1 - ingress-gateway configuration
{
  "Kind": "ingress-gateway",
  "Name": "ingress-dc1",
  "TLS": {
    "Enabled": false
  },
  "Listeners": [
    {
      "Port": 80,
      "Protocol": "http",
      "Services": [
        {
          "Name": "*"
        }
      ]
    }
  ]
}
dc2 - agent configuration
{
  "bind_addr": "<redacted>",
  "bootstrap_expect": 3,
  "client_addr": "0.0.0.0",
  "config_entries": {
    "bootstrap": [
      {
        "config": {
          "protocol": "http"
        },
        "kind": "proxy-defaults",
        "name": "global"
      }
    ]
  },
  "connect": {
    "enabled": true
  },
  "data_dir": "/data/consul-server",
  "datacenter": "dc2",
  "dns_config": {
    "enable_truncate": false
  },
  "domain": "consul.",
  "enable_central_service_config": true,
  "http_config": {
    "response_headers": {
      "Access-Control-Allow-Headers": "*",
      "Access-Control-Allow-Methods": "*",
      "Access-Control-Allow-Origin": "*"
    }
  },
  "node_name": "<redacted>",
  "primary_datacenter": "dc1",
  "retry_join": ["<consul_internal_ips>"],
  "retry_join_wan": ["consul.service.dc1.consul"],
  "server": true,
  "translate_wan_addrs": true,
  "ui": true
}
dc2 - ingress-gateway configuration
{
  "Kind": "ingress-gateway",
  "Name": "ingress-dc2",
  "TLS": {
    "Enabled": false
  },
  "Listeners": [
    {
      "Port": 80,
      "Protocol": "http",
      "Services": [
        {
          "Name": "*"
        }
      ]
    }
  ]
}

dc2 - consul server logs

This is the only log that I see that is referencing ingress-gateway, all the other logs in the consul servers are related to the snapshots that I do


2020-11-16T14:24:10.271Z [WARN]  agent.server.catalog: no terminating-gateway or ingress-gateway associated with this gateway: gateway=ingress-dc2

Consul config list results pointing to both consul clusters
export CONSUL_HTTP_ADDR=https://consul.service.dc1.consul:8501 # primary datacenter

consul config list -kind ingress-gateway
ingress-dc1
ingress-dc2


export CONSUL_HTTP_ADDR=https://consul.service.dc2.consul:8501

consul config list -kind ingress-gateway
<empty_result>

Reproduction Steps

Steps to reproduce this issue, eg:

  1. Create 2 clusters with 5 nodes as servers, with connect enabled
  2. Federate them and set the first cluster as a primary_datacenter
  3. Deploy 2 ingress gateways one in each datacenter in different machines (we use kubernetes for our ingress deployments)
  4. Apply the above configurations related to ingress-gateway
  5. Check the logs to see the same error as above

Consul info for both Client and Server

Client info - dc1
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 23
        services = 24
build:
        prerelease =
        revision = 12b16df3
        version = 1.8.4
consul:
        acl = disabled
        known_servers = 5
        server = false
runtime:
        arch = amd64
        cpu_count = 8
        goroutines = 123
        max_procs = 8
        os = linux
        version = go1.14.6
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 728
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 11
        member_time = 40674
        members = 167
        query_queue = 0
        query_time = 1

Server info - dc1
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 3
        services = 4
build:
        prerelease =
        revision = 1e03567d
        version = 1.8.5
consul:
        acl = disabled
        bootstrap = false
        known_datacenters = 8
        leader = false
        leader_addr = <redacted>
        server = true
raft:
        applied_index = 112444010
        commit_index = 112444010
        fsm_pending = 0
        last_contact = 6.075656ms
        last_log_index = 112444011
        last_log_term = 101
        last_snapshot_index = 112438812
        last_snapshot_term = 101
        latest_configuration = [{Suffrage:Voter ID:7779e685-74ce-72af-9980-d623fd085e6a Address:<redacted>} {Suffrage:Voter ID:0a67ec51-aa87-5f15-b936-b64b4f6e34dc Address:<redacted>} {Suffrage:Voter ID:a6fb2d74-f27e-4554-3ff4-3d7e73123355 Address:<redacted>} {Suffrage:Voter ID:b06613b5-3f09-1d7f-076c-26bf178f14df Address:<redacted>} {Suffrage:Voter ID:d83bf8df-ba8b-8b5a-65bd-9c1419401e14 Address:<redacted>}]
        latest_configuration_index = 0
        num_peers = 4
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Follower
        term = 101
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 1617
        max_procs = 4
        os = linux
        version = go1.14.9
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 728
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 11
        member_time = 40676
        members = 167
        query_queue = 0
        query_time = 1
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 24735
        members = 30
        query_queue = 0
        query_time = 1

Client info - dc2
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 38
        services = 38
build:
        prerelease =
        revision = 12b16df3
        version = 1.8.4
consul:
        acl = disabled
        known_servers = 5
        server = false
runtime:
        arch = amd64
        cpu_count = 8
        goroutines = 139
        max_procs = 8
        os = linux
        version = go1.14.6
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 619
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 2
        member_time = 27462
        members = 153
        query_queue = 0
        query_time = 9
Server info - dc2
agent:
        check_monitors = 0
        check_ttls = 0
        checks = 3
        services = 4
build:
        prerelease =
        revision = 1e03567d
        version = 1.8.5
consul:
        acl = disabled
        bootstrap = false
        known_datacenters = 8
        leader = true
        leader_addr = <redacted>
        server = true
raft:
        applied_index = 1446006913
        commit_index = 1446006913
        fsm_pending = 0
        last_contact = 0
        last_log_index = 1446006913
        last_log_term = 4363
        last_snapshot_index = 1446004069
        last_snapshot_term = 4363
        latest_configuration = [{Suffrage:Voter ID:8de95d84-a685-449d-8161-a04befe02201 Address:<redacted>} {Suffrage:Voter ID:984dc359-69fc-222f-8eef-df40dce97c45 Address:<redacted>} {Suffrage:Voter ID:50cae96d-4a8b-c892-354f-2b0d557281e0 Address:<redacted>} {Suffrage:Voter ID:1f24d1e1-5489-71bc-20a7-1c4805b61157 Address:<redacted>} {Suffrage:Voter ID:75edcea3-2e50-335f-8f8c-cce225b88f01 Address:<redacted>}]
        latest_configuration_index = 0
        num_peers = 4
        protocol_version = 3
        protocol_version_max = 3
        protocol_version_min = 0
        snapshot_version_max = 1
        snapshot_version_min = 0
        state = Leader
        term = 4363
runtime:
        arch = amd64
        cpu_count = 4
        goroutines = 1418
        max_procs = 4
        os = linux
        version = go1.14.9
serf_lan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 619
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 2
        member_time = 27462
        members = 153
        query_queue = 0
        query_time = 9
serf_wan:
        coordinate_resets = 0
        encrypted = false
        event_queue = 0
        event_time = 1
        failed = 0
        health_score = 0
        intent_queue = 0
        left = 0
        member_time = 24735
        members = 30
        query_queue = 0
        query_time = 1


Operating system and Environment details

Servers

dc1
  • Consul version: 1.8.5
  • OS: #24~18.04.1-Ubuntu SMP
dc2:
  • Consul version: 1.8.5
  • OS: #25~18.04.1-Ubuntu SMP

Agents

dc1
  • Consul version: 1.8.4
  • OS: COS GKE
  • Kubernetes Version: v1.15.12-gke.20
dc2
  • Consul version: 1.8.4
  • OS: COS GKE
  • Kubernetes Version: v1.15.12-gke.20

Ingress Gateways

dc1
  • Consul Version: 1.8.5
  • Envoy Version: 923c4111bb48405ac96ef050c4f59ebbad3d7761/1.14.4/Clean/RELEASE/BoringSSL
dc2
  • Consul Version: 1.8.5
  • Envoy Version: 923c4111bb48405ac96ef050c4f59ebbad3d7761/1.14.4/Clean/RELEASE/BoringSSL

Log Fragments

Server logs

dc1

2020-11-16T15:39:26.675Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=31.834840092s
2020-11-16T15:39:33.802Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?stale= from=10.5.128.46:49704 latency=252.533µs
2020-11-16T15:39:57.246Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=30.503123602s
2020-11-16T15:40:28.380Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=31.065783041s
2020-11-16T15:40:33.819Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1-metrics?stale= from=10.5.128.46:49704 latency=368.767µs
2020-11-16T15:40:59.042Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=30.593500854s
2020-11-16T15:41:03.778Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?stale= from=10.5.128.46:49704 latency=277.3µs
2020-11-16T15:41:27.613Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.36.76:37074 latency=30.359345347s
2020-11-16T15:41:30.891Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=31.781314198s
2020-11-16T15:41:33.577Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?stale= from=10.5.128.46:49704 latency=322.639µs
2020-11-16T15:41:58.599Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.36.76:37074 latency=30.960145614s
2020-11-16T15:42:01.607Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=30.647781923s
2020-11-16T15:42:03.692Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?stale= from=10.5.128.46:49704 latency=270.218µs
2020-11-16T15:42:29.884Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.36.76:37074 latency=31.25928856s
2020-11-16T15:42:32.719Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=31.043901346s
2020-11-16T15:42:33.666Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?stale= from=10.5.128.46:49704 latency=300.49µs
2020-11-16T15:43:01.028Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.36.76:37074 latency=31.118413652s
2020-11-16T15:43:03.843Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1-metrics?stale= from=10.5.128.46:49704 latency=296.78µs
2020-11-16T15:43:04.536Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc1?index=60431920&stale=&wait=30000ms from=10.1.218.47:59256 latency=31.749158071s

dc2

2020-11-16T15:40:22.263Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc2?stale= from=10.5.0.201:47818 latency=265.351µs
2020-11-16T15:40:52.064Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc2?stale= from=10.5.0.201:47820 latency=229.061µs
2020-11-16T15:41:22.377Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc2?stale= from=10.5.0.201:47820 latency=272.421µs
2020-11-16T15:41:52.015Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc2?stale= from=10.5.0.201:47818 latency=316.425µs
2020-11-16T15:42:22.390Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc2?stale= from=10.5.0.201:47818 latency=266.684µs
2020-11-16T15:42:52.547Z [DEBUG] agent.http: Request finished: method=GET url=/v1/health/service/ingress-dc2?stale= from=10.5.0.201:47818 latency=284.381µs

@jsosulska jsosulska added theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/ingress-gw Track ingress work theme/federation-usability Anything related to Federation labels Nov 17, 2020
@woz5999
Copy link
Contributor

woz5999 commented Dec 31, 2020

i think the improvement in #9320 would provide greater visibility into what i'd guess is actually a replication issue

@blake
Copy link
Member

blake commented Feb 19, 2021

Hi @dalssaso, this does sound like a replication issue, and might be related to #9319. Does that sound like the issue you're experiencing?

@dalssaso
Copy link
Author

It does seem to be the same issue @blake.
We're starting the work again on Ingress gateways and Consul this week so i'll have more updates.

We still didn't update to 1.9.x, but we're also gonna do it

@dalssaso
Copy link
Author

As I'm not using consul with this configuration anymore I can't really say if it's fixed or not and I can't test it.

I'm closing the issue

@dalssaso dalssaso closed this as not planned Won't fix, can't repro, duplicate, stale Apr 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
theme/connect Anything related to Consul Connect, Service Mesh, Side Car Proxies theme/federation-usability Anything related to Federation theme/ingress-gw Track ingress work
Projects
None yet
Development

No branches or pull requests

4 participants