Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for transparent connect proxies #10628

Closed
apollo13 opened this issue May 20, 2021 · 24 comments · Fixed by #20175
Closed

Add support for transparent connect proxies #10628

apollo13 opened this issue May 20, 2021 · 24 comments · Fixed by #20175
Assignees
Labels
Milestone

Comments

@apollo13
Copy link
Contributor

Hi, I know that https://www.consul.io/docs/connect/transparent-proxy is still beta, but it would be sooooo great if Nomad could support that in the next releases :)

@tgross
Copy link
Member

tgross commented May 20, 2021

Hi @apollo13! Much of what transparent proxy provides is for filling gaps in k8s (because it's a bit more "kit of parts" than combining Nomad and Consul). There's a conflict here in the security model between the two: Consul transparent proxy assumes and requires 1 service per network namespace, whereas Nomad supports n services per network namespace. We already provide most of what transparent proxies support, with the major exception of preventing service mesh circumvention.

So we're still brainstorming what exactly transparent proxy support would look like in Nomad. It's likely to be implemented a set of features that together make up all the same things as the Consul tproxy, rather than a standalone "support Consul tproxy" feature. Some ideas we've considered, not all of which are compatible with each other:

  • Implement a network.mode = "transparent" that updates the routing table in the allocation's network NS to prevent circumvention.
  • Expose Consul DNS to allocations.
  • Create a separate net NS for each service. This could complicate a lot of use cases for task groups with multiple tasks that communicate over localhost rather than Unix Domain Sockets.

In any case, just a heads up that this isn't on our very near-term roadmap.

@tgross tgross added the theme/consul/connect Consul Connect integration label May 20, 2021
@apollo13
Copy link
Contributor Author

Ok, my main goal is to get rid of the manual declaration of all the connect services that I need. If there are ideas on how to do that without manually specifying all, then it doesn't have to be the transparent proxy. Thank you for the clarifications!

@shoenig
Copy link
Member

shoenig commented May 20, 2021

@apollo13 if you're talking about no longer needing to define upstreams, then transparent proxy only helps in that regard in that upstreams are inherited from intentions, so that you only need to declare them once. (Unless intentions are disabled, in which case the default behavior is a free-for-all). Making services connect-native is another way to eliminate the need for upstreams, but of course that only helps if you own the code. (And you'd still need to declare intentions)

@apollo13
Copy link
Contributor Author

apollo13 commented May 20, 2021 via email

@komapa
Copy link

komapa commented Sep 27, 2022

We would like to see this implemented for Nomad as well. Mainly echoing @apollo13's comment from above, 100%.

@tgross tgross self-assigned this Dec 14, 2023
@tgross
Copy link
Member

tgross commented Dec 15, 2023

I'll be picking this work up after the new year. The rough plan of action is going to be:

  1. Implement CNI_ARGS support in Nomad, which we'll need for better-supporting third party service meshes anyways.
  2. Put together a hacky first pass at figuring out what updates we'd need to add to consul-cni (probably allowing the cni-proxy-config annotation to be passed in as a CNI arg and that causing it to bypass the bits that make k8s API calls.)
  3. Use that hacked CNI plugin to prototype out some ideas of what configuration we want to expose to job authors.
  4. Work up a design doc to share with our colleagues over on Consul to get their buy in on our approach.
  5. Finalize that implementation.

@apollo13
Copy link
Contributor Author

That sounds great!

Work up a design doc to share with our colleagues over on Consul to get their buy in on our approach.

Assuming this is not ending up as an enterprise only feature, do you mind sharing the design doc publicly (once you have written it ;))? Maybe the community can spot some errors etc in there?

@tgross
Copy link
Member

tgross commented Dec 15, 2023

Assuming this is not ending up as an enterprise only feature, do you mind sharing the design doc publicly (once you have written it ;))? Maybe the community can spot some errors etc in there?

This is planned for Nomad CE. We don't make a practice of sharing the design docs directly because they usually have a bunch of private background info in there like specific customer requests or internal business requirements. But I'd love to extract what I can and share it ahead of implementation. That'd be a great habit for us to get into!

tgross added a commit that referenced this issue Feb 13, 2024
In order to provide a DNS address and port to Connect tasks configured for
transparent proxy, we need to fingerprint the Consul DNS address and port. The
client will pass this address/port to the iptables configuration provided to the
`consul-cni` plugin.

Ref: #10628
@tgross
Copy link
Member

tgross commented Feb 13, 2024

As mentioned above, I wanted to surface our internal design doc ("RFC") here. Note that this is an excerpt with internal discussions removed. Please note this is a design document that we use to build rough consensus within the team and with our sibling teams (like the Consul folks). It may not 100% match up with the final implementation as I progress through it. No promises here! So if you're reading this in the future after the feature has shipped, please instead refer to the documentation! 😀


Background

On Linux, both Kubernetes and Nomad typically run workloads in network namespaces. Both orchestrators use Container Network Interface (CNI) plugins to configure the network namespace by creating network bridges, iptables rules, etc. In Kubernetes, containers within a pod share the same network namespace. In Nomad, tasks within an allocation share the same network namespace. This arrangement is what allows Consul Connect to run an Envoy proxy in one task while the user's application runs in another task in the same allocation.

Consul Connect provides for secure communication between workloads in the service mesh. However, it does not by default enforce that workloads only communicate over the mesh. Transparent proxy mode directs all inbound and outbound traffic for a workload through the Envoy sidecar via iptables. This forces workloads to use only the service mesh, with the option to configure exceptions for specific CIDRs, ports, and non-mesh destinations (or to enforce mesh destinations only).

The primary benefit of transparent proxy is the ability for applications to use Consul DNS URLs to access upstreams, e.g. http://foo.virtual.consul without having to manually configure upstreams and local listener ports, ex. http://localhost:1234.

The diagram below shows a typical Connect allocation with two tasks. The group has a network.port mapping configured for port 8000 only.

diagram

Nomad clients run a series of "allocrunner hooks" for the allocation as a whole before running a task runner for each task within the allocation, and the task runner has its own set of per-task "taskrunner hooks". Since Nomad 0.10, network configuration is setup in one of the allocrunner hooks. Nomad network blocks can have one of 4 modes: none, host, bridge, and cni. Only bridge networking is supported for Connect workloads, even though the cni mode is configured in mostly the same way as bridge networking (so in theory it could support Connect but users would have to provide the correct CNI config themselves).

Nomad clients have a hard-coded CNI configuration template that includes calls to the loopback, bridge, firewall, and portmap plugins, for use in bridge networking mode.

When an allocation with bridge networking starts, the network_hook executes the following steps:

  • Call a platform-specific network manager's CreateNetwork method. On Linux this creates the network namespace and returns the path to the file that represents that namespace.
  • Call a platform-specific network configurator's Setup method. On Linux this:
    • Idempotently creates required iptables chains.
    • Renders the hard-coded CNI configuration template with the subnet and bridge name. (With mode="cni/*", the network configurator uses the named CNI configuration from disk.)
    • Creates an argument for the CNI portmap plugin to configure the port mappings specified in the allocation's group.network block.
    • Invokes CNI plugins with the rendered CNI plugin configs against the network namespace.

On success, the resulting structs.AllocNetworkStatus is optionally used by task drivers to set labels on the container, etc.

Note that Connect is not involved in this existing networking workflow at any point. When the job is submitted, the server automatically adds an Envoy proxy sidecar task to each group that requires one for Connect. The taskrunner for that task invokes Consul's envoy bootstrap command to configure the sidecar, and the Envoy task is in the same allocation as the user's application so they share the same network namespace. To support tproxy, Nomad networking configuration will need to be aware of this Connect-specific configuration.

Under Kubernetes, Consul CNI plugin receives its configuration from the k8s API. The consul-k8s control plane annotates pods with the expected iptables configuration. When the CNI plugin is run, it makes requests to the k8s API to determine how to configure iptables rules for the pod, and to update the k8s control plane with status of that work. The configuration is a JSON-encoded blob that deserializes into the Consul SDK’s iptables.Config struct.

Proposal

Nomad will support Connect transparent proxy mode by invoking Consul CNI during network setup when requested by the user. This will require changes to both Nomad and the Consul CNI plugin.

Updates to Nomad Job Spec

The Nomad job’s service.connect.proxy block will be updated to add a new transparent_proxy block. As much as possible, Nomad will configure a sensible iptables configuration for tproxy without additional job spec changes. But the tproxy block will also allow further customization.

A minimum transparent_proxy block configuration will be as follows. This configuration will direct all incoming and output communication through the Envoy sidecar.

proxy {
  transparent_proxy {}
}

Applications may need to expose additional ports for health checks or other external traffic. A transparent_proxy.exclude_inbound_ports field will allow for a user-supplied list of ports. Nomad will automatically add the ports for the existing expose.path block, and the port of any health check where check.expose=true (see Health Checks below). The exclude_inbound_ports field will not configure the Envoy proxy itself, only the iptables rules used for exclusion. A complete transparent_proxy configuration might look like the following (only the transparent_proxy section is new). This example configuration allows inbound access to the ports labeled http and metrics w/o going through the Envoy proxy.

proxy {
  transparent_proxy {
    uid                    = 101   # default, see iptables.Config
    outbound_port          = 15001 # default
    exclude_inbound_ports  = []    # default, can be set with a name
                                   # that matches a network.port
                                   # label or a port number.
    exclude_outbound_ports = []    # default
    exclude_outbound_cidrs = []    # default
    exclude_uids           = []    # default
  }

  expose {

    path {
      path            = "/metrics"
      protocol        = "http"
      local_path_port = 9001

      # Any expose.path.listener_port will be automatically
      # added to the exclude_inbound_ports set.
      listener_port   = "metrics"
    }
  }

  # Note that when using tproxy, upstreams blocks are no longer
  # required. But a user might want to have both while migrating
  # their services to use tproxy. Nomad does not automatically create 
  # Consul intentions from the upstream blocks.
  upstreams {
    destination_name = "count-api"
    local_bind_port  = 8080
  }

}

Any task group that includes a transparent_proxy block for any service will have an implicit constraint added during job registration that the node has the fingerprinted attribute plugins.cni.version.consul-cni, which indicates that the node has the Consul CNI plugin available.

Updates to Consul CNI

The Consul CNI plugin currently accepts K8S_POD_NAMESPACE and K8S_POD_INFRA_CONTAINER_ID arguments from the CNI command line arguments. It uses these values to make API calls to k8s to determine which CNI configuration to read from disk, and to update k8s with status updates of the process. This workflow poses two problems in Nomad:

  • First, Nomad doesn't have a facility to provide API access to CNI plugins. Although Workload Identity tokens are available at the time we run the networking hook, we'd need to sign a token with a policy specific to this use case. Additionally, the Task API socket is created by the task runner which is specific to a single task (not the whole allocation), and happens much later. This would mean the CNI plugin would need to communicate with the Nomad HTTP server, which is likely protected with mTLS.
  • Second, Nomad doesn't currently have an API that the CNI plugin could use to derive the iptables configuration. In k8s, the iptables configuration is added as a JSON blob to a pod annotation.

Nomad has no need for "pod annotations" from the CNI plugin, so the workflow where the CNI plugin updates its status is unnecessary.

This allows for a much simpler implementation for Nomad and the Consul CNI plugin. The Nomad client will provide the JSON-encoded iptables.Config as a CNI command-line argument. The Consul CNI plugin will need to be updated as follows:

  • Look for a IPTABLES_CONFIG argument in the CNI_ARGS. If present:
    • Decode the iptables configuration from that value
    • Skip over the remaining k8s API calls
    • Setup iptables in the network namespace as usual

Updates to Nomad bridge networking and CNI configuration

The network_hook will now have two hard-coded CNI configuration templates; the new configuration will be identical to the original but with the addition of consul-cni to the list of plugins. When an allocation with bridge networking starts on Linux, the network_hook will execute the following steps:

  • Call the platform-specific network manager's CreateNetwork method to create the network namespace, just as before.
  • Call the platform-specific network configurator's Setup method. This will:
    • Idempotently create the required iptables chains.
    • Determine if any service.connect.proxy has tproxy enabled, to select which of the CNI configuration templates to use. Render that template.
    • Create an argument for the CNI portmap plugin to configure the port mappings specified in the allocation's group.network block, just as before.
    • Create a IPTABLES_CONFIG argument for the Consul CNI plugin (see iptables.Config below).
    • Invoke CNI plugins with the rendered CNI plugins against the network namespace.

iptables.Config

The IPTABLES_CONFIG argument passed to the Consul CNI plugin will be a JSON-encoded version of the Consul SDK's iptables.Config struct. The fields of this will be set as follows:

  • ConsulDNSIP: Set by Nomad from the fingerprinted Consul agent. See the DNS section below.
  • ConsulDNSPort: Set by Nomad from the fingerprinted Consul DNS port.
  • ProxyUserID: defaults to 101, which is the UID that the Envoy container image runs as (in the Envoy container’s user namespace, if any). Can be set in the job spec by tproxy.uid
  • ProxyInboundPort: set to the dynamic port set by the Nomad client.
  • ProxyOutboundPort: defaults to 15001, the hard-coded value copied from k8s connect-inject, which matches what the Envoy bootstrap command sets. Can be set in the job spec by the tproxy.outbound_port field.
  • ExcludeInboundPorts: derived from combining the proxy.exclude_inbound_ports field and the proxy.path block as described above.
  • ExcludeOutboundPorts: defaults to empty, user-configurable via tproxy.exclude_outbound_ports block
  • ExcludeOutboundCIDRs: defaults to empty, user-configurable via tproxy.exclude_outbound_cidrs block
  • ExcludeUIDs: defaults to empty, user-configurable via tproxy.exclude_uids block
  • NetNS: set from the drivers.NetworkIsolationSpec.Path field, which is the network namespace path that'll be used for the CNI configuration; Nomad will also set the CNI_NETNS environment variable.

DNS

As noted above, tasks using transparent proxy should be able to use Consul DNS URLs to access upstreams. In the http://foo.virtual.consul example, Consul will resolve this hostname to the Consul-issued virtual IP of the foo service. When the request gets made using that IP it is redirected via IPtables config to the Envoy proxy. The Envoy proxy matches on the request IP to direct the request to the foo service.

In order for the application container to resolve foo.virtual.consul the DNS lookup must be made to the local Consul client (or Consul server but that’s not applicable for Nomad). The local Consul client is listening on port 8600 and is accessible to the application container through the host IP (assuming the Consul agent’s bind_addr is set to a host IP and not localhost). Nomad can write a new nameserver to /etc/resolv.conf but resolv.conf does not support ports. Therefore we must use a combination of iptables rules and resolv.conf rules to get the DNS lookup directed to the local Consul client.

  1. Nomad will fingerprint the address that the Consul agent has bound on, and write an additional nameserver with that address to the top of /etc/resolv.conf. This will ensure DNS lookups are first made to <Consul agent IP>:53. We must add an additional nameserver so that if the DNS lookup fails (e.g. it wasn’t to a Consul service), the next nameserver in the list will be used. Consul does support doing its own recursive lookup using the recursors config but that would require users to configure which is not wanted.
  1. Consul’s iptables sdk will handle writing the iptables rule to redirect udp and tcp requests to <Consul agent IP>:53 to the configured ConsulDNSIP and ConsulDNSPort. Which will be the address of the node-local Consul client from within the application container and Consul’s DNS port respectively. The DNS port will be determined via fingerprinting as it is exposed in the /agent/self endpoint.

Note: if users enable DialedDirectly they will also be able to use regular Consul DNS hostnames, e.g. foo.service.consul., but mesh functionality is limited when using these URLs.

Health Checks

Consul health checks of type grpc, tcp, and http require that the service being checked can be reached from the Consul agent running on Nomad and that it can be reached without going through the Envoy proxy, because the Consul agent doesn’t have the required TLS. 

Nomad has an existing check.expose field that will automatically generate an expose.path block for http and grpc check types at the time of job submission. We’ll extend this to allow for check.expose to be set on tcp checks, but only when a transparent_proxy block is in use. For these task groups, we’ll generate a proxy.exclude_inbound_ports that includes the check port in the same job mutating hook.

tgross added a commit that referenced this issue Feb 13, 2024
In order to provide a DNS address and port to Connect tasks configured for
transparent proxy, we need to fingerprint the Consul DNS address and port. The
client will pass this address/port to the iptables configuration provided to the
`consul-cni` plugin.

Ref: #10628
@tgross
Copy link
Member

tgross commented Feb 13, 2024

my rough implementation checklist:

@apollo13
Copy link
Contributor Author

Hi tgross, this all sounds very exciting. A few small questions/remarks:

Only bridge networking is supported for Connect workloads, even though the cni mode is configured in mostly the same way as bridge networking (so in theory it could support Connect but users would have to provide the correct CNI config themselves).

Probably the only thing limiting theory from becoming reality here is that nomad hardcodes and checks for bridge in a few places?

Nomad will fingerprint the address that the Consul agent has bound on, and write an additional nameserver with that address to the top of /etc/resolv.conf.

It would be great if it would be possible to disable this behavior (or at least also have it for normal allocations as well?). As it currently stands I am already pushing a DNS server that is aware of the .consul domain into my allocations so they can use consul service lookups.

Can you expand a little bit on how the transparent proxy works (I guess I haven't understood it fully yet) and especially what the virtual addresses do? Am I correct that it works somewhat like this:

  • All application outbound traffic is redirected to the tproxy process (probably by checking the originating uid in the namespace?)
  • As far as I understood it all the traffic is routed to a single port in the tproxy (namely OutboundListenerPort?)
  • How does the tproxy now determine which IP:PORT was requested?

Assuming I am using virtual services where I guess each service gets it's own IP: Can my application now access foo.virtual.consul:any_port and it ends up at the correct service (the virtual IP docs in consul seem rather sparse)?

What would DialDirectly change? One would use the actual IPs from foo.service.consul (or rather foo.connect.consul or is that resolved implicitly?) and have the tproxy connect to that exact instance?

Sorry if the questions about tproxy itself are kinda out of scope, but I am trying hard to understand what is happening here :)

@tgross
Copy link
Member

tgross commented Feb 13, 2024

It would be great if it would be possible to disable this behavior (or at least also have it for normal allocations as well?). As it currently stands I am already pushing a DNS server that is aware of the .consul domain into my allocations so they can use consul service lookups.

Good call. We discussed that a bit but didn't have a reasonable use case in mind, so having your example is valuable.

Can you expand a little bit on how the transparent proxy works (I guess I haven't understood it fully yet)
...
All application outbound traffic is redirected to the tproxy process (probably by checking the originating uid in the namespace?)

Right. From a high-level view, it's adding some iptables rules to the existing Connect implementation. Those rules are applied internal to the network namespace, so that outbound traffic from the task flows through the Envoy proxy we've configured for Connect, instead of just wherever the task wants.

How does the tproxy now determine which IP:PORT was requested?

The virtual IP address in this case is just Envoy load balancing between the real IP addresses that Nomad is advertising for the service to Consul. The DialDirectly would be for "I want to reach that allocation", but I'm not sure I've worked out how Consul exposes that via DNS.

@apollo13
Copy link
Contributor Author

The virtual IP address in this case is just Envoy load balancing between the real IP addresses that Nomad is advertising for the service to Consul.

Understood, the missing link for me is if I request foo.virtual.consul and bar.virtual.consul the iptables rules redirect both requests to 127.0.0.1:tproxy_outbound_port (so far so good). Now how does the tproxy reconstruct the original target IP so it can figure out which service was requested?

@apollo13
Copy link
Contributor Author

apollo13 commented Feb 13, 2024

Okay, learned something. There is SO_ORIGINAL_DST: https://www.envoyproxy.io/docs/envoy/latest/configuration/listeners/listener_filters/original_dst_filter

EDIT:// And even more interesting with the TPROXY target https://blog.cloudflare.com/how-we-built-spectrum

@blake
Copy link
Member

blake commented Feb 14, 2024

The virtual IP address in this case is just Envoy load balancing between the real IP addresses that Nomad is advertising for the service to Consul. The DialDirectly would be for "I want to reach that allocation", but I'm not sure I've worked out how Consul exposes that via DNS.

When transparent proxy is enabled for a service, Consul allocates a virtual IP (VIP) for that service from the 240.0.0.0/4 address range.

The outbound listener on the downstream service's Envoy proxy has a series of filter chains that match destination VIP addresses to the corresponding Envoy cluster for the target upstream service. For example,

{
  "filter_chains": [
    {
      "filter_chain_match": {
        "prefix_ranges": [
          {
            # Consul assigned virtual IP for the service `fake-service`
            "address_prefix": "240.0.0.1",
            "prefix_len": 32
          }
        ]
      },
      "filters": [
        {
          "name": "envoy.filters.network.tcp_proxy",
          "typed_config": {
            "@type": "type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy",
            "stat_prefix": "upstream.fake-service.default.default.dc2",
            "cluster": "fake-service.default.dc2.internal.fabb7415-0d8e-5230-5455-d31c35d0ddd9.consul"
          }
        }
      ]
    }
  ]
}

The cluster contains a list of each endpoint / service instance for the logical service and their actual addresses in the cluster.

{
  "dynamic_endpoint_configs": [
    {
      "endpoint_config": {
        "@type": "type.googleapis.com/envoy.config.endpoint.v3.ClusterLoadAssignment",
        "cluster_name": "fake-service.default.dc2.internal.fabb7415-0d8e-5230-5455-d31c35d0ddd9.consul",
        "endpoints": [
          {
            "locality": {},
            "lb_endpoints": [
              {
                "endpoint": {
                  "address": {
                    "socket_address": {
                       # IP address of the upstream service instance
                      "address": "10.42.1.78",
                      "port_value": 20000
                    }
                  },
                  "health_check_config": {}
                },
                ...snip...

In this configuration, downstream applications need to use the <name>.virtual.consul hostname when addressing upstreams so that Envoy to correctly route the connection over the mesh to the upstream service. Traffic is load balanced across all healthy endpoints/instances in the upstream cluster.

When TransparentProxy.DialedDirectly is enabled on an upstream service, the filter chain on the downstream proxy is configured to match directly on the IP addresses of the upstream service instances instead of the virtual IP.

{
  "filter_chains": [
    {
      "filter_chain_match": {
        "prefix_ranges": [
          {
            # upstream instance 1
            "address_prefix": "10.42.1.78",
            "prefix_len": 32
          },
          {
            # upstream instance 2
            "address_prefix": "10.42.1.79",
            "prefix_len": 32
          }
        ]
      },
      "filters": [
        {
          "name": "envoy.filters.network.tcp_proxy",
          "typed_config": {
            "@type": "type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy",
            "stat_prefix": "upstream.fake-service.default.default.dc2",
            "cluster": "passthrough~fake-service.default.dc2.internal.fabb7415-0d8e-5230-5455-d31c35d0ddd9.consul"
          }
        }
      ]
      ...snip...

Instead of load balancing traffic across available upstream instances, Envoy is configured to send connections to a passthrough / original destination cluster that forwards the connection directly to the original destination IP and port.

In this configuration, downstream applications need to use the <name>.service.consul hostname when addressing the upstream.

@tgross
Copy link
Member

tgross commented Feb 14, 2024

Thanks for the assist @blake! 😀

tgross added a commit that referenced this issue Feb 14, 2024
In order to provide a DNS address and port to Connect tasks configured for
transparent proxy, we need to fingerprint the Consul DNS address and port. The
client will pass this address/port to the iptables configuration provided to the
`consul-cni` plugin.

Ref: #10628
tgross added a commit that referenced this issue Feb 14, 2024
In order to provide a DNS address and port to Connect tasks configured for
transparent proxy, we need to fingerprint the Consul DNS address and port. The
client will pass this address/port to the iptables configuration provided to the
`consul-cni` plugin.

Ref: #10628
tgross added a commit that referenced this issue Feb 14, 2024
While working on #10628 I discovered that the `expose` block was missing an
implementation of Diff, which means it doesn't show up correctly in `job plan`
output.
tgross added a commit that referenced this issue Feb 15, 2024
While working on #10628 I discovered that the `expose` block was missing an
implementation of Diff, which means it doesn't show up correctly in `job plan`
output.

Also, fix field comparison in `ServiceCheck.Equal`.
This is a bug in the method but it doesn't look like it impacts production code.
tgross added a commit to hashicorp/consul-k8s that referenced this issue Mar 26, 2024
Nomad will implement support for Connect transparent proxy. Unlike in K8s, the
CNI plugin can't contact the Nomad API to read allocation metadata (pod labels)
to get the iptables configuration, and doesn't use the rest of the Consul-K8s
control plane to inject that metadata. Instead, Nomad will pass the iptables
configuration JSON-serialized in the CNI arguments.

This changeset implements the behavior switch by detecting the `IPTABLES_CONFIG`
argument in the CNI arguments. This hypothetically allows for non-Nomad
workflows to use the same code path, if desired.

Ref: hashicorp/nomad#10628
tgross added a commit that referenced this issue Mar 27, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit that referenced this issue Mar 27, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
tgross added a commit that referenced this issue Mar 27, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit that referenced this issue Mar 27, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
tgross added a commit that referenced this issue Mar 27, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit to hashicorp/consul-k8s that referenced this issue Mar 28, 2024
Nomad will implement support for Connect transparent proxy. Unlike in K8s, the
CNI plugin can't contact the Nomad API to read allocation metadata (pod labels)
to get the iptables configuration, and doesn't use the rest of the Consul-K8s
control plane to inject that metadata. Instead, Nomad will pass the iptables
configuration JSON-serialized in the CNI arguments.

This changeset implements the behavior switch by detecting the
`CONSUL_IPTABLES_CONFIG` argument in the CNI arguments. This hypothetically
allows for non-Nomad workflows to use the same code path, if desired.

Ref: hashicorp/nomad#10628
tgross added a commit that referenced this issue Mar 29, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
tgross added a commit that referenced this issue Mar 29, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit that referenced this issue Apr 3, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
tgross added a commit that referenced this issue Apr 3, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit that referenced this issue Apr 4, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
tgross added a commit that referenced this issue Apr 4, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit that referenced this issue Apr 4, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
tgross added a commit that referenced this issue Apr 4, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
tgross added a commit that referenced this issue Apr 10, 2024
Add support for Consul Connect transparent proxies

Fixes: #10628
@tgross
Copy link
Member

tgross commented Apr 10, 2024

This has been merged to main and will ship in Nomad 1.8.0. I'm working on Tutorial updates now in the private repository for those.

@apollo13
Copy link
Contributor Author

apollo13 commented Apr 10, 2024 via email

@suikast42
Copy link
Contributor

Awesome

philrenaud pushed a commit that referenced this issue Apr 18, 2024
Add a transparent proxy block to the existing Connect sidecar service proxy
block. This changeset is plumbing required to support transparent proxy
configuration on the client.

Ref: #10628
philrenaud pushed a commit that referenced this issue Apr 18, 2024
When `transparent_proxy` block is present and the network mode is `bridge`, use
a different CNI configuration that includes the `consul-cni` plugin. Before
invoking the CNI plugins, create a Consul SDK `iptables.Config` struct for the
allocation. This includes:

* Use all the `transparent_proxy` block fields
* The reserved ports are added to the inbound exclusion list so the alloc is
  reachable from outside the mesh
* The `expose` blocks and `check` blocks with `expose=true` are added to the
  inbound exclusion list so health checks work.

The `iptables.Config` is then passed as a CNI argument to the `consul-cni`
plugin.

Ref: #10628
hc-github-team-consul-core added a commit to hashicorp/consul-k8s that referenced this issue May 21, 2024
…ycleShutdown… into release/1.4.x (#4007)

* Fix meshgw tests (#3532)

* Fix meshgw tests

* change protocol on mesh gw tests to tcp from mesh

* add nightly for rc branch (#3533)

* [NET-7243] Stub APIGateway Controller for v2 (#3507)

* stub api-gateway-controller

* Add setup to v2 controller

* Net 7376 Status struct on api gateway with required info from kubesig (#3530)

* add status structs

* update status

* updated script to point at RC version correctly (#3541)

* updated script to point at RC version correctly

* Mw/prepare main for 1.5 dev (#3535)

* bump versions to next version

* updated script to handle new Consul-k8s images

* [COMPLIANCE] Add Copyright and License Headers (#3499)

Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>

* Net 7279 consul k8s write failing acceptance test for tcp route (#3540)

* add status structs

* update status

* fixtures for v2

* checkpoint

* add hook to only run test when flag is enabled

* clean up reversions, delte extra files

* remove http listeners

* delete extra file

* revert accidental IDE changes

* clean up lint issues

* Add json tags to api-gateway types (#3550)

* reconcile consul-k8s with changes made in Consul (#3543)

* [NET-7656] Add GatewayClassConfig watch for MeshGateway controller (#3537)

* Add GatewayClass[Config] watches for MeshGateway controller

* Update merge logic for deployment + service

* Add test coverage for MergeDeployment

* Add test coverage for MergeService

* Copy over owner references to new Service + Deployment

* Ensure signals are passed to commands (#3548)

* Ensure signals are passed to commands

Change `/bin/sh -ec "<command>"` to
`/bin/sh -ec "exec <command>"`. Adding `exec` ensures that `<command>`
is not executed as a child process but replaces the `/bin/sh` process.
This ensure that `<command>` receives any signals.

Specifically this is an issue when attempting to trap SIGTERMs as part
of graceful pod shutdown. Without this change, we weren't receiving any
signals because they aren't passed down by `/bin/sh -c`.

* Fix broken bats tests and add changelog

Signed-off-by: Ashwin Venkatesh <ashwin.what@gmail.com>

---------

Signed-off-by: Ashwin Venkatesh <ashwin.what@gmail.com>
Co-authored-by: Ashwin Venkatesh <ashwin.what@gmail.com>

* [NET-7158] CRUD hooks for api gateway v2 (#3519)

* Add hooks for CRUD side effects for apigateway controller

* Added tests for controller

* [NET-6465] Respect connectInject.initContainer.resources for v1 API gateways (#3531)

* Respect connectInject.initContainer.resources for v1 API gateways

* Add changelog entry

* Add test coverage for init container resources on API gateway Pods

* Add NET_BIND_SERVICE to the security context in the deployment of Mesh Gateway (NET-6463) (#3549)

* Add NET_BIND_SERVICE to the security context in the deployment of Mesh Gateway

* [NET-7657,NET-6934] Define v2 GatewayClass + GatewayClassConfig locally (#3559)

* Define GatewayClass's spec model locally instead of consuming proto from Consul

* Update gateway resources job to use new types, constants

* Make description optional, regenerate CRD definitions

* Remove GatewayClass columns related to syncing into Consul

* [NET-7156] Gateways Controllers Reusability (#3574)

* make controller setup for gateway controllers generic and reusable, add
indices onto gateway resources in k8s for more efficient lookups

* cleanup from PR review

* Update control-plane/controllers/resources/gateway_controller_setup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/controllers/resources/gateway_indices.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/controllers/resources/gateway_controller_setup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* Update control-plane/controllers/resources/gateway_controller_setup.go

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* clean up from PR review

---------

Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>

* [NET-6465] Consider init container resources when determining if existing + desired deployments are equal (#3575)

* Consider init container resources when determining if existing + desired deployments are equal

* Add test coverage for compareDeployments

* Update control-plane/api-gateway/gatekeeper/deployment_test.go

* [NET-7657] Consume version of proto-public with GatewayClass[Config] removed (#3581)

[NET-7657] Consume version of proto-public with GatewayClass + GatewayClassConfig removed

* Update multicluster v2beta1 to v2 (#3560)



Co-authored-by: skpratt <sarah.pratt@hashicorp.com>

* [NET-7156] Generalize MeshGatewayBuilder to just GatewayBuilder (#3538)

* update gateway builder to be generic

* Add api gateway to gateway builder

* Updated service test for gateway listeners/ports

* update test names

* update listener functions

* remove check for listener name

* fix tests

* release: Update 10-util.sh to adjust formatting (#3588)

Update 10-util.sh

* use go 1.21.7 (#3591)

* add make target script (#3596)

add new make target for go mod tidy check

* v2tenancy: namespace mirroring acceptance tests (#3590)

* add linting back (#3603)

added linting back

* [COMPLIANCE] Add Copyright and License Headers (#3610)

Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>

* Datadog Integration (#3407)

* datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation, deployment override failsafes

* datadog-integration: updated consul-server agent telemetry-config.json with dd specific items as well as additional missing VM based options, unit tests, dd unix socket integration, dd agent acl token generation | final initial-push

* changelog entry update

* datadog-integration: updated consul-server agent server.config (enable_debug) and telemetry.config update | enable_debug to server.config

* curt pr review changes (minus extraConfig templating verification changes)

* global.metrics.AgentMetrics -> global.metrics.enableAgentMetrics

* dogstatsd and otlp mutually exclusive verification checks

* breaking changes now incorporated into consul.validateExtraConfig helper template function as precheck

* extraConfig hash updates post merge conflict update

* fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets

* update changelog .txt to match new PR number

* updated server-statefulset.yaml to correct ad.datadoghq.com/consul.logs annotation to valid single quote string

* fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets

* fix helpers.tpl consul.extraConfig from merge --> /consul/tmp/extra-config/extra-from-values.json | add labels to rolebinding for datadog secrets

* update UDP dogstatsdPort behavior to exclude including a port value if using a kube service address (as determined by user overrides)

* update _helpers.tpl consul.ValidateDatadogConfiguration func to account for using 'https' as protocol => should fail

* update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul

* update server-statefulset.yaml to exclude prometheus.io annotations if enabling datadog openmetrics method for consul server metrics scrape. conflict present with http vs https that breaks openemtrics scrape on consul

* correct otlp protocol helpers.tpl check to lower-case the protocol to match the open-telemetry-deployment.yaml behavior

* fix server-acl-init command_test.go for datadog token policy - datacenter should have been dc1

* add in server-statefulset bats test for extraConfig validation testing

* Net 7238 - consul k8s modify gateway resources job to create apigw gatewayclass and gatewayclassconfig (#3564)

* configmap update

* udpate chart to respect api-gateway-config

* fix typo

* added unit tests, added some stuff missed in initial pass

* added thorough unit tests for gateway-resources-configmap.yaml

* remove unneeded extra line

* additional debugging

* test

* test

* remove extra escapes

* final test

* test again

* one more test

* this should work

* fix spacing issue

* Fix logic on apigateway that ignores current annotations on services (#3597)

* [NET-7449] Generalize CRUD hooks for Gateways (#3576)

Generalize the crud hooks for gateways

* [NET-5932] chore: remove comment from closed ticket (#3636)

chore: remove comment from closed ticket

* [NET-2420] security: Upgrade helm containerd and several other dependencies (#3625)

* security: upgrade helm/v3 to 3.13.3

Addresses multiple CVEs:
- CVE-2023-25165
- CVE-2022-23524
- CVE-2022-23526
- CVE-2022-23525

* chore: upgrade k8s dependencies to match controller-runtime

* security: upgrade containerd to latest

Addresses GHSA-7ww5-4wqc-m92c (GO-2023-2412)

* security: upgrade docker/docker to latest

Addresses GHSA-jq35-85cj-fj4p

* security: upgrade docker/distribution to latest

Addresses CVE-2023-2253

* security: upgrade filepath-securejoin to latest patch

Addresses GHSA-6xv5-86q9-7xr8 (GO-2023-2048)

* chore: upgrade oras-go to fix docker incompatibility

* Add changelog

* build: Create arm64 packages as well (#3428)

During the CRT on-boarding, packaging for other Linux architectures (arm64) was
not enabled. This change adds packaging support for those architectures. I've
specifically opted not to include 32-bit.

See #1132.
Related to hashicorp/releng-support#178.

Other related updates:

 - To make future support a bit easier, I've enabled the build workflow from
   releng prefixed branches.
 - Using qemu emulation for testing package installs on other architectures,
   thus allowing us to validate the binaries work as intended
 - Minor alteration to the package install tests to use yum instead of rpm

Co-authored-by: David Yu <dyu@hashicorp.com>

* [NET-2420] security: re-enable security scan release block (#3628)

* security: upgrade helm/v3 to 3.13.3

Addresses multiple CVEs:
- CVE-2023-25165
- CVE-2022-23524
- CVE-2022-23526
- CVE-2022-23525

* chore: upgrade k8s dependencies to match controller-runtime

* security: upgrade containerd to latest

Addresses GHSA-7ww5-4wqc-m92c (GO-2023-2412)

* security: upgrade docker/docker to latest

Addresses GHSA-jq35-85cj-fj4p

* security: upgrade docker/distribution to latest

Addresses CVE-2023-2253

* security: upgrade filepath-securejoin to latest patch

Addresses GHSA-6xv5-86q9-7xr8 (GO-2023-2048)

* chore: upgrade oras-go to fix docker incompatibility

* Add changelog

* security: re-enable security scan release block

This was previously disabled due to an unresolved false-positive CVE.
Re-enabling both secrets and OSV + Go Modules scanning, which per our
current scan results should not be a blocker to future releases.

Also add security scans on PR and merge to protected branches to allow
proactive triage going forward.

See hashicorp/consul#19978 for similar change in that repo, adapted
here.

* [NET-8174] security: add scan triage for CVE-2024-25620 (helm/v3) (#3657)

security: add scan triage for CVE-2024-25620 (helm/v3)

Triage this scan result as `consul-k8s` should not be directly
impacted and it is medium severity. Follow-up ticket filed for
remediation.

Also improve formatting of scan config since this change will be
backported.

* Update main changelog for 1.1.10, 1.2.6 and 1.3.3 (#3662)

* Update main changelog for 1.1.10, 1.2.6 and 1.3.3
* include previous missed releases

* [COMPLIANCE] Add Copyright and License Headers (#3654)

Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>

* [NET-7450] setup crud hooks for APIGateway v2 (#3580)

* setup crud hooks for APIGateway v2

* update CRDS and reorganize code in api gateway type

* pass in gateway kind for annotations

* Fix tests

* Fix tests

* register all types needed for test

* values.yaml - tlsServerName docs (#3656)

* Update values.yaml

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

---------

Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>

* [NET-6741] make: Add target for updating dependencies across all modules (#3669)

make: Add target for updating dependencies across all modules

To enable more consistent and error-proof dependency management, add a
Make target that will set a dependency version across all submodules
that require it.

Also runs `go mod tidy`. This first ensures the dependency addition is
reverted if the module in question does not require it; it also ensures
that any additional cleanup needed in `go.mod`/`go.sum` is applied.

* build.yml: Add ECR images back (#3668)

* Update build.yml
* Create 3668.txt

* build.yml: typo on tags (#3681)

* bump kind to v0.22.0 and update k8s support (#3675)

* bump kind to v0.22.0 and update k8s support

* Create 3675.txt

* Update README.md

* [NET-8174] security: add scan triage for CVE-2024-26147 (helm/v3) (#3688)

security: add scan triage for CVE-2024-26147 (helm/v3)

* chore: upgrade Consul dependencies to latest (#3695)

* chore: upgrade Consul dependencies to latest

* chore: upgrade control-plane submodule dependencies to latest

* fix: update GatewayClass finalizer reference

* release: add \n to end of NOTE for releases  (#3700)

* Update 10-util.sh

* Update control-plane/build-support/functions/10-util.sh

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>

* chore: upgrade `consul/api` to latest (#3702)

chore: upgrade consul/api to latest

v1.28.0 was retracted due to double-publish.

* [NET-8174] security: add triage alias for GO-2024-2554 (#3705)

security: add triage alias for GO-2024-2554

This vulnerability was already triaged via its GHSA alias, but the
scanner is flagging it under this name, so adding an explicit entry.

* docs: update `CHANGELOG` for K8s 1.4.0 release (#3710)

docs: update CHANGELOG for K8s 1.4.0 release

* docs: update 1.4.0 Helm docs per Docs team feedback (#3714)

* [NET-8367] security: upgrade google.golang.org/protobuf to 1.33.0  (#3719)

* update protobuf lib

* add changelog

* NET-6878: Fix Flake API Gateway Acceptance (#3717)

* test upgraded library

* remove toolchain reference

* add toolchain

* NET-8391: fix cleanup script (#3725)

* NET-8391: fix cleanup script

* cleanup testing comments

* NET-8391: fix cleanup script - remove network interface(s) (#3730)

* cleanup network interfaces

* clean up test

* updates k8s version (#3731)

* fix(control-plane): acl tokens deleted while pods in graceful shutdown (#3736)

* NET-6878: Remove finalizers from CRDs during test resource cleanup (#3739)

* remove finalizers from crds

* add comments

* Upgrade to go 1.21.8 (#3741)

* Upgrade to use Go `1.21.8`. This resolves CVEs
[CVE-2024-24783](https://nvd.nist.gov/vuln/detail/CVE-2024-24783) (`crypto/x509`).
[CVE-2023-45290](https://nvd.nist.gov/vuln/detail/CVE-2023-45290) (`net/http`).
[CVE-2023-45289](https://nvd.nist.gov/vuln/detail/CVE-2023-45289) (`net/http`, `net/http/cookiejar`).
[CVE-2024-24785](https://nvd.nist.gov/vuln/detail/CVE-2024-24785) (`html/template`).
[CVE-2024-24784](https://nvd.nist.gov/vuln/detail/CVE-2024-24784) (`net/mail`).

Update the Consul Build Go base image to `alpine3.19`. This resolves CVEs
[CVE-2023-52425](https://nvd.nist.gov/vuln/detail/CVE-2023-52425)
[CVE-2023-52426⁠](https://nvd.nist.gov/vuln/detail/CVE-2023-52426)

* Add changelog

* Fix typo in values file for sync catalog test (#3760)

* upgraded helm v3 to address GHSA-jw44-4f3j-q396 (#3768)

* disable scan for "GHSA-jw44-4f3j-q396" until patch fix in helm v3

* addressed comments

* Net 6821 - Regenerate Terminating Gateway CRD with new field  (#3737)

* initial updates

* regen crds

* Add fixes for flaky-cni and failing cloud-nightly tests (#3764)

Add fixes for flaky-cni

* Catalog: Use EndpointSlice and propagate Kubernetes Topology information to synced consul service (#3693)

* Use EndpointSlice and propagate zone metadata to consul service

* Fix tests

* Add test for zone metadata

* Cleanup and changelog entry

* Fix clusterrole permissions and type on Informer

* Include region info for NodePort services

* Include topology region for all service types

* Update release note

* Fix tests

* fix sync-catalog-clusterrole and tests

* fix stash conflict

* adding endpoints permission back to sync catalog since it still uses it.

* Fix endpointslice map

* Fix topology region

* Remove region lookups, remove endpoints permissions, use pointers for endpointslice map

* Drop region test

---------

Co-authored-by: John Murret <john.murret@hashicorp.com>

* Increase timeout for running commands in acceptance test (#3784)

increase timeout for running commands

* Bugfix: Don't recreate servicemap for catalog sync (#3785)

* test: fix TestConnectInject_ProxyLifecycleShutdown (#3774)

* Removes Legacy API Gateway Stanza that was deprecated in Consul 1.16 (#3718)

* Removes Legacy API Gateway Stanza that was deprecated in Consul 1.16

* remove unit test for previously removed `consul-cni` validation (#3794)

In #1527, we added support for OpenShift and Multus, which meant that the
`consul-cni` plugin was no longer necessarily the final CNI plugin run. While
working on a patch to allow compatibility with Nomad transparent proxy, I
discovered we'd never removed a now-failing unit test of the plugin for the
validation step. It looks like the remaining unit tests still cover the
remaining validation, so we can safely remove this test.

Ref: #1527
Ref: hashicorp/nomad#10628

* [NET-8412] Fix order of APIGW ACL policy/role creation (#3779)

* Reorder gateway policy and role creation to avoid error messages in consul when policy/role already exists

* refactor for readability

* fix spacing

* Added changelog

* improve reliability of acceptance tests (#3800)

* improve reliability of acceptance tests

* remove update to timeout

* add output to error

* [net-8411] bug: fix premature token and service instance deletion due to pod fetch errors (#3758)

* API gateway metrics (#3811)

* First metrics pass

* Fix up build

* move to non-deprecated chart options

* Fix up charts and defaults

* Add changelog

* Fix bad merge

* Fix test

* fix linter error

* Fix extra yaml block from bad merge

* Switch == true check to use ParseBool

* Add support for Nomad transparent proxy (#3795)

Nomad will implement support for Connect transparent proxy. Unlike in K8s, the
CNI plugin can't contact the Nomad API to read allocation metadata (pod labels)
to get the iptables configuration, and doesn't use the rest of the Consul-K8s
control plane to inject that metadata. Instead, Nomad will pass the iptables
configuration JSON-serialized in the CNI arguments.

This changeset implements the behavior switch by detecting the
`CONSUL_IPTABLES_CONFIG` argument in the CNI arguments. This hypothetically
allows for non-Nomad workflows to use the same code path, if desired.

Ref: hashicorp/nomad#10628

* fix version output for `consul-cni` (#3829)

The `consul-cni` plugin emits "version unknown" because the CNI library's
`PluginMain` uses a global variable that isn't being set as part of our build
process. Import the `control-plane/version` package so that we have an identical
version in builds across both binaries.

* [NET-8601] Upgrade `vault/api` and `docker/docker` to resolve open CVEs (#3837)

* security: upgrade vault/api to remove go-jose.v2

* security: upgrade docker/docker to v25.0.5

* add changelog

* Remove anyuid SCC requirement for OpenShift (#3813)

Remove SCC requirement for anyuid for OpenShift

* Cleanup formatting to follow consul-k8s standard (#3852)

* Datadog Unix Socket Path Custom Path fix (#3635)

* Update dogstatsd hostPath rendering for Unix domain sockets -- override customizable and volumeMount/volume should align

* changelog update

* changelog: reviewer update to include datadog specific context

* readd dev image tags for fips ubi (#3881)

* readd dev image tags for fips ubi

* fix up bad copy paste

* [net-7710] don't overwrite prometheus path annotation if it's already been specified (#3846)

don't overwrite prometheus path annotation if it's already been specified

* feat: Add startup-grace-period-seconds and graceful-startup-path (#3878)

* feat: Add startup-grace-period-seconds and graceful-startup-path

* Add changelog

---------

Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>

* NET-8594: Disable TestSyncCatalog (#3815)

* [NET-8946 NET-8947 NET-8948] security: bump go, x/net and envoy versions (#3893)

security: bump go and x/net

* NET-8594: Disable TestSyncCatalogIngress (#3904)

* Helm: support sync-lb-services-endpoints for sync catalog (#3905)

* Helm: support sync-lb-services-endpoints for sync catalog

* add test

* fix template tag order

---------

Co-authored-by: jukie <10012479+Jukie@users.noreply.github.com>

* Datadog Integration Acceptance Tests / Bug fixes (#3685)

* datadog: acceptance tests - initial commit (not fully working yet)
* server-statefulset: update logic for prometheus annotations (only enabled if using dogstatsd, otherwise disabled)
* datadog: acceptance test working with dd-client api and operator deployment frameword
* datadog-acceptance: main branch rebase merge conflict cherry-pick
* datadog: acceptance testing update to metric name matching using regex
* datadog: acceptance testing helper update for backoff retry
* datadog: acceptance testing working timeseries query verification udp + uds
* datadog: update helpers for /v1/query
* server-statefulset.yaml: update to correct release name prepend to consul-server URL
* datadog: acceptance testing consul integration checks working
* server-statefulset: yaml and bats updates for datadog openmetrics and consul integration check URLs to use consul.fullname-server
* PR3685: changelog update
* datadog: openmetrics acceptance test update
* datadog: added OTEL_EXPORTER_OTLP_ENDPOINT to consul telemetry collector deployment for dd-agent ingestion (passes tag info to DD)
* otlp: datadog otlp acceptance test updates for telemetry-collector (grpc => http prefix) | staged otlp acceptance test
* datadog-acceptance: fake-intake fixture addition
* datadog-acceptance: update _helpers.tpl for consul version sanitization (truncate to <64)
* datadog-acceptance: update base fixture for fake-intake
* datadog-acceptance: add DogstatsD stats enablement (required for curling agent local endpoint)
* datadog-acceptance: add DogstatsD stats enablement (required for curling agent local endpoint)
* datadog-acceptance: first-round fake-intake testing - works but is innaccurate
* datadog-acceptance: datadog framework - remove dd client agent requirement (fake-intake)
* datadog-acceptance: update flags to not require API and APP key (fake-intake)
* datadog-acceptance: go mod updates for uuid downgrade
* acceptance-test: remove otlp acceptance test -- no fake-intake or agent endpoint to verify
* datadog-acceptance: acceptance test lint fixes
* acceptance-test: update control-plane/cni/main.go l:272 comment with period for lint testing.
* acceptance-test: retry lint fixes
* acceptance-test: correct telemetry collector URL from grpc:// to http://

* [NET-8412] Fix APIGW policy creation ordering for upgrade path (#3918)

* fix policy creation for upgrading

* Added changelog

* Add post-release  changelogs (#3867)

Add changelogs

* GH-3406 - Only error for config entries from different datacenters when the config entries are different (#3873)

* GH-3406 - Only error for config entries from different datacenters when the config entries are different

* add changelog

* fixing tests and logic

* refactoring code to make tests pass and also use a switch statement for readability and also get rid of intermediate state flag of requireMigration in a long iterative section of code.

* add missing license file (#3921)

* add missing license file

* missed copying the license file to workdir

* make up missing value and remove redundant directory creation

* [COMPLIANCE] Add Copyright and License Headers (#3936)

Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>

* Net 9069/xw add license file to all bin (#3942)

* debug: missing LICENSE

* use abs path

* [NET-6466] Remove secrets from termgw role (#3928)

* remove unnecessary permissions for terminating gateways

* add changelog

* Net 9069/fix local brokerage (#3948)

* make copy of license file into control plane

* remove redundant copy in gh workflow

* use env instead of arg

* [NET-8091] Use file-system-certificate in Consul instead of inline-certificate (#3767)

* Use file-system-certificate in Consul instead of inline-certificate

* Actually update correctly from merges

* Adds changelog

* Updates go.mod in acceptance tests with latest consul api, updates the acceptance gateway lifecycle test

* Small updates

* Update comment

---------

Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>

* chore: remove workstream from JIRA sync (#3960)

* NET-9154: Update Kubernetes version (#3958)

Update Kubernetes version

* chore: fix JIRA workflow (#3965)

* [NET-9097, NET-8174] Upgrade controller-runtime (#3935)

* Consume controller-runtime v0.16.3

This is the version required by gateway-api v1.0.0, which will be consumed in a future PR

* Reconcile breaking changes in controller-runtime

* Fix linter errors

* gofmt

* Update controller tests to handle new fake client requirements

* Update test assertion to handle changes in controller-runtime

* Restore incorrectly-removed flags

* Use a proper delete on the fake client since DeletionTimestamp is immutable

* Update enterprise tests to specify status subresources

* Update controller-runtime dependency for acceptance tests

* Explicitly inject decoder into webhooks

* Appease the linter

* Use SetupWithManager pattern from controllers for webhook setup

* Consume consistent version of k8s.io/client-go everywhere

* Upgrade related dependencies for CLI, including helm/v3

* Consume latest release of helm/v3

* changelog

* Inline function calls for testing

* Consume controller-runtime v0.16.5

---------

Co-authored-by: Ronald Ekambi <ronekambi@gmail.com>

* Fix a panic in connect-inject when the provided upstreams list is malformed (#3956)

* Check if an upstream is malformed, if so ignore it.

* support multiple upstreams separator (<space>, <comma>) add tests

* add /n as a separator

* add changelog

* added log when upstream is skipped

* [NET-9152] CRD for service registeration (#3943)

* service is registering

* add all the fields

* health checks working

* handle finalizers to clean up

* Add status to registration CRD

* Added initial unit test for reconcile

* success paths for registration and deregistration

* added failure tests, moved finalizer removal logic so it occurs after
service is successfully deregistered

* first test for to catalog registration type

* maximal registration to catalog test

* test all the things

* deregistration tests

* update some comments and fields, re-run generators

* Added changelog

* linting all the things

* fixing test setup for new controller runtime

* Handle errors for parsing duration

* Add ReadOnlyRootFilesystem to Security Context (#2909)

* Add readOnlyRootFilesystem to security context (#2771)

* readOnlyRootFilesystem

* Add mount for /tmp

* Add /tmp mountpoint

* Update ingress-gateways-deployment.yaml

* Update terminating-gateways-deployment.yaml

* Update helm unit tests

* Create 2781.txt

* rename changelog file

* rename changelog file

* Mount /tmp to volume for snapshots

* rename changelog

* changelog

---------

Co-authored-by: mr-miles <miles.waller@gmail.com>
Co-authored-by: Paul Glass <pglass@hashicorp.com>
Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com>

* activate tproxy mode even when a cluster IP is not assigned to pod (#3974)

* activate tproxy mode even when a cluster IP is not assigned to pod.

* add changelog

* fix failing tests

* security: Upgrade Go to 1.21.10 (#3980)

* NET-9178-Consul-api-gateway-not-starting-after-restart (#3978)

* don't error if role already exists on restart

* changelog

* lint

* [NET-9153] Handle Terminating Gateway ACL Setup  (#3975)

* first pass at creating write policy for service and updating term gw acl
role

* handle deregistering, update tests for registering with acls

* existing deregister tests passing

* failures with term gw role not existing

* clean up

* reorg code

* Move to own package

* watch for terminating gateways

* move files back, handle multiple terminating gateways

* handle errors and ensure finalizer is set

* Add tests for finalizers

* remove unused file

* fix import naming

* linting

* fix comment, extract constant

* [NET-9201] Validating webhook for registrations (#3990)

* Add validating webhook for registrations

* cleaned up registration webhook setup

* fix setup for webhook, updated docs

* fix typo, remove debugging log, rename variables for readability

* Updating GitHub action versions to the latest TSCCR approved version (#3979)

* test: fix PeeringGateway acceptance (#3992)

* Adds ability to set the imagePullPolicy for all Consul images (consul… (#3991)

* Adds ability to set the imagePullPolicy for all Consul images (consul, consul-dataplane, consul-k8s, consul-telemetry-collector)

* [NET-9155] Cache resources for Registrations (#3993)

* Add set for adding and removing services

* remove service add

* first pass at populating cache

* cache is working, need to fix how statuses are handled

* move to new directory, fix up the status conditions (still todos on this), handle results

* updated tests

* unexport methods that don't need to be exported

* handle consul deregistrations

* clean up before code review

* show ACLUpdate as false if consul deregistered service

* fix issue with updating acl status on consul deregistration

* fix linting errors

* FLAKEY_TEST: Add retry to outbound request for ProxyLifecycleShutdownTest

* increase retry count for TestAPIGateway_GatewayClassConfig test

* backport of commit b7ecab4

* backport of commit 2fcccd2

---------

Signed-off-by: Ashwin Venkatesh <ashwin.what@gmail.com>
Co-authored-by: John Maguire <john.maguire@hashicorp.com>
Co-authored-by: Michael Wilkerson <62034708+wilkermichael@users.noreply.github.com>
Co-authored-by: sarahalsmiller <100602640+sarahalsmiller@users.noreply.github.com>
Co-authored-by: hashicorp-copywrite[bot] <110428419+hashicorp-copywrite[bot]@users.noreply.github.com>
Co-authored-by: Anita Akaeze <anita.akaeze@hashicorp.com>
Co-authored-by: Nathan Coleman <nathan.coleman@hashicorp.com>
Co-authored-by: Luke Kysow <1034429+lkysow@users.noreply.github.com>
Co-authored-by: Ashwin Venkatesh <ashwin.what@gmail.com>
Co-authored-by: Melisa Griffin <missylbytes@users.noreply.github.com>
Co-authored-by: Chris S. Kim <ckim@hashicorp.com>
Co-authored-by: skpratt <sarah.pratt@hashicorp.com>
Co-authored-by: David Yu <dyu@hashicorp.com>
Co-authored-by: Semir Patel <semir.patel@hashicorp.com>
Co-authored-by: natemollica-dev <57850649+natemollica-nm@users.noreply.github.com>
Co-authored-by: Michael Zalimeni <michael.zalimeni@hashicorp.com>
Co-authored-by: Daniel Kimsey <90741+dekimsey@users.noreply.github.com>
Co-authored-by: Curt Bushko <cbushko@gmail.com>
Co-authored-by: Jeff Boruszak <104028618+boruszak@users.noreply.github.com>
Co-authored-by: NicoletaPopoviciu <87660255+NicoletaPopoviciu@users.noreply.github.com>
Co-authored-by: Dan Stough <dan.stough@hashicorp.com>
Co-authored-by: Ashwin Venkatesh <ashwin@hashicorp.com>
Co-authored-by: Isaac Wilson <10012479+jukie@users.noreply.github.com>
Co-authored-by: John Murret <john.murret@hashicorp.com>
Co-authored-by: Tim Gross <tgross@hashicorp.com>
Co-authored-by: Nitya Dhanushkodi <nitya@hashicorp.com>
Co-authored-by: Andrew Stucki <andrew.stucki@hashicorp.com>
Co-authored-by: Alvin Huang <17609145+alvin-huang@users.noreply.github.com>
Co-authored-by: Andrea Scarpino <andrea@scarpino.dev>
Co-authored-by: Deniz Onur Duzgun <59659739+dduzgun-security@users.noreply.github.com>
Co-authored-by: wangxinyi7 <xinyi.wang@hashicorp.com>
Co-authored-by: Melisa Griffin <melisa.griffin@hashicorp.com>
Co-authored-by: Ronald Ekambi <ronekambi@gmail.com>
Co-authored-by: Dhia Ayachi <dhia@hashicorp.com>
Co-authored-by: mr-miles <miles.waller@gmail.com>
Co-authored-by: Paul Glass <pglass@hashicorp.com>
Co-authored-by: Sarah Alsmiller <sarah.alsmiller@hashicorp.com>
@koder29406
Copy link

This has been merged to main and will ship in Nomad 1.8.0. I'm working on Tutorial updates now in the private repository for those.

Hello, are there any plans for detailed examples on how to use the transparent proxy in a Nomad job?
A full example would be good, in addition to the minimal one here: https://developer.hashicorp.com/nomad/docs/job-specification/transparent_proxy#minimal-example

@tgross
Copy link
Member

tgross commented May 30, 2024

@koder29406 the Service Mesh tutorial https://developer.hashicorp.com/nomad/tutorials/integrate-consul/consul-service-mesh has been updated to use transparent proxy now. (Small warning that there are a minor issues int the setup in that Tutorial which I've got a PR up to fix... that should land later today.)

@rahadiangg
Copy link

Hi @tgross, I use nomad 1.8.0 and try to use transparent_proxy{}. Based on nomad documentation that you provide, nomad required consul-cni 1.5.4 or above, but currently the latest consul-cni is 1.4.3.

I'm also try to validate my job that required consul-cni grather than 1.4.2. So, I think the documentation needs to be updated?

Screenshot 2024-06-07 at 15 59 23

@tgross
Copy link
Member

tgross commented Jun 7, 2024

Hi @rahadiangg! By documentation you're talking about here: https://developer.hashicorp.com/nomad/tutorials/integrate-consul/consul-service-mesh#verify-nomad-client-consul-configuration right? That's definitely just a typo, which I'll fix. But is there another place in the docs I'm missing?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
Status: 1.8
Development

Successfully merging a pull request may close this issue.

8 participants