Join GitHub today
GitHub is home to over 20 million developers working together to host and review code, manage projects, and build software together.
Implement a "proxy" device type (udp/tcp/unix-socket/unix-abstract between host and container) #2504
Comments
|
You can bind mount individual files using the disk device type. lxc config device add my-socket path=/container/path source=/hostpath Until some remapping filesystem comes along, you'll still need your relay on the host to get around the ownership issue though. |
|
Having native support in LXD for that would definitely be nice but it'd likely be pretty tricky to do due to how Unix sockets work. The main issue being that LXD itself can restart at any time. And since Unix sockets aren't reconnectable, that'd mean breaking connectivity, requiring anything using the socket in the container to be restarted whenever LXD restarts. |
|
The one way I can think of to deal with this would be to spawn a separate process for each of those socket proxies. |
stgraber
changed the title from
support forwarding UNIX sockets to an LXC/LXD container
to
Implement a "proxy" device type (udp/tcp/unix-socket/unix-abstract between host and container)
Oct 20, 2016
stgraber
added
Feature
Documentation
API
labels
Oct 20, 2016
stgraber
added this to the later milestone
Oct 20, 2016
|
Since a Unix socket is just an open file descriptor, you could pass All open files to a temporary process and after restart is complete, pass them back. Im not sure this is possible with go, but with c you would call send_msg() with SCM_RIGHTS.
|
|
You'd hit the same problem as when bind-mounting a unix socket though. If the socket is reset, its inode number changes, invalidating anything that's tied to it, including mounts and fds. The only way to avoid this is to have a middleman that will always be running and which will reconnect as needed based on the path rather than a handle. |
|
For anyone interested, the currently planned design for this would be:
Property wise, I expect it to be something like:
The process would be multi-threaded, allowing for multiple simultaneous connections, it would also re-connect on connection failure. |
|
What about performance when the complete traffic goes through another process? |
|
@srkunze you'd certainly take a bit of a CPU hit with this approach but would benefit from more flexibility (cross-protocol, no need for network connectivity in the container, host doesn't need to be the gateway, ...) I also wouldn't be opposed to allowing for optimizations in that device type. Say adding a "proxytype" property (similar to nictype for nic device) which could take "relay" or "forward" as values. "relay" would use the intermediate process whereas "forward" would use iptables and would be restricted to same protocol. |
|
How does it work with port forwarding?
Now what? I need to access container port from client. Which command should I execute to get to it? |
stgraber
referenced this issue
Jul 12, 2017
Closed
Network profile for mapping open ports to localhost #3532
|
So, is there any progress on this issue? |
|
Just to clarify the overall design/logic for this feature at a very high level: LXD is the daemon that will be spawning the multiple processes handling the port forwarding. Whenever a command is issued to setup this forwarding for a container, the daemon will spawn a "proxy" process to handle it and add its metadata to a new "proxy" device type data structure. It will also write this data to a config file to keep track of the processes handling the forwarding. In case of a reset, the daemon will read this file to handle the reconnect. The logic for the processes that will be spawned: The first thing to do would be binding the specified socket in the host namespace. Then you would switch namespaces to the correct container namespace and bind the specified socket in that namespace. A few things that I am confused about: How does the "bind, switch, bind" logic work for the "proxy" processes? Is it just binding, accepting, and listening in the host namespace and redirecting the data to the socket in the container namespace? Would this "middleman" process even need to do anything after it sets up this connection? I obviously am not an expert in sockets. I apologize if this is way off base. What exactly are you supposed to do when LXD resets to reconnect? I am thinking that the "proxy" processes are the "middleman" and should not be affected by an LXD reset. |
|
Correct, we're going to have one subprocess running for every proxy device that's setup. In general there will only be three times where those can be spawned or stopped:
When LXD gets reset, nothing would need to happen to the running proxy processes for any of the containers. It just needs to be able to track them down should it need to stop them, so keeping a pidfile on disk is likely easiest. As for the namespace logic, in general, you'll end up with one side of the proxy that's listening on a socket, possibly getting multiple connections to it. On the other side, the proxy is going to be creating a new connection every time it received one. As we don't want to flip flop the proxy between namespaces for every connection, the easiest way to do this is to first have the proxy attach to whichever side it needs to listen on (either host or container), bind that socket, getting an fd back from the kernel, then switch namespace to the connecting side, at which point it can start accepting connections on the socket it bound earlier and open new connections to the target from the right namespace. For every new connection on the bound socket, the proxy will need to establish a new connection to the target and then copy data back and forth between the two. |
|
One thing to keep in mind here as we're doing all this in Go is that you CANNOT use setns() after the Go runtime has started. That's because setns() applies to your current thread but Go has an internal scheduler which schedules goroutines among a set of threads. So calling setns() from Go may affect other goroutines and subsequent calls from your context may end up in another thread which wouldn't be in the right namespace anymore. So you're going to need a trick to do the initial bind() and setns() piece of the logic above at a time where Go is guaranteed to be using only a single thread. Once that's done, you can start using normal goroutines to handle connections, establish new outside connections and to mirror data since no more namespace switch will be needed at that point. |
|
I have a couple follow up questions after doing some research into switching namespaces in Go. To switch namespaces in go safely, we found this trick called the CGO constructor trick. The trick basically runs C code before the Go runtime starts up to allow you to switch namespaces safely before the Go runtime starts. There is a file in this repo under /lxd/main_exec.go that seems to use this trick. For some more information about this method, you can check out this repo: CGO Trick. The only problem with this trick is that you have to fork and exec in order to switch namespaces. That would mean that you would end up with a process in the host namespace and one in the container namespace. Through this method, the FD that was bound in one namespace under one process would not map to the same socket in another process in a different namespace. Because the FD is per process, it seems like this method would not work. Does this problem make sense, or is our understanding of file descriptors wrong? One way that could work would be to have two processes that send the data between each other, but this seems like a horribly inefficient way to do this. Another way could be to bind the socket and get the FD back in the CGO constructor code before the Go runtime has started and then that process would have the FD from the correct namespace while running in a different one. With this way though, would the FD work inside a different namespace? Basically, we are very confused about how file descriptors would work across namespaces. |
|
I meant to play with this a bit tonight but the 2.20 release took too much time, I'll try to find some time for this tomorrow. My current thought is that we should actually be able to do all this without requiring the use of much C code at all. The only bit we'll need to do from a C constructor is the setns itself, but everything else can be done in Go. I'll try to write a small test binary which simply binds port 1234 on the host and has that forward to port 80 in the container. My thought on how to do this with the minimum amount of C is:
If we want to do the reverse, binding a port in the container and having that result in connections coming out of the host namespace, we'd have the code do:
That effectively limits the use of the C constructor in the proxy to just basic argument parsing and calling setns. Everything else can be done in Go, limiting the amount of duplication needed between C and Go. |
|
Going to spend a bit of time on the proof of concept this evening, that should give you a good base on top of which to implement all the other bits that are mentioned in this issue. My current plan is to write a tiny piece of Go called "proxy" which will take the following arguments:
To bind all addresses on port 1234 on the host and forward to 127.0.0.1 on port 80 of the container, one would use:
To bind 127.0.0.1:1234 in the container and have that connect to 1.2.3.4 on port 80 from the host, you'd do:
This should make for pretty clean code and then allow hooking up supports for udp sockets, unix sockets and abstract unix sockets. Once you've done that part, we'll look into hooking this up to LXD itself. |
|
main.go package main
import (
"fmt"
"io"
"net"
"os"
"strings"
"syscall"
"github.com/lxc/lxd/shared"
)
func main() {
err := run()
if err != nil {
fmt.Fprintf(os.Stderr, "error: %v\n", err)
os.Exit(1)
}
os.Exit(0)
}
func run() error {
if len(os.Args) != 5 {
return fmt.Errorf("Invalid arguments")
}
// Get all our arguments
listenPid := os.Args[1]
listenAddr := os.Args[2]
connectPid := os.Args[3]
connectAddr := os.Args[4]
// Check where we are in initialization
if !shared.PathExists("/proc/self/fd/100") {
fmt.Printf("Listening on %s in %s, forwarding to %s from %s\n", listenAddr, listenPid, connectAddr, connectPid)
fmt.Printf("Setting up the listener\n")
fields := strings.SplitN(listenAddr, ":", 2)
addr, err := net.ResolveTCPAddr(fields[0], fields[1])
if err != nil {
return fmt.Errorf("failed to resolve listener address: %v", err)
}
listener, err := net.ListenTCP(fields[0], addr)
if err != nil {
return fmt.Errorf("failed to setup listener: %v", err)
}
file, err := listener.File()
if err != nil {
return fmt.Errorf("failed to extra fd from listener: %v", err)
}
defer file.Close()
fd := file.Fd()
err = syscall.Dup3(int(fd), 100, 0)
if err != nil {
return fmt.Errorf("failed to duplicate the listener fd: %v", err)
}
fmt.Printf("Re-executing ourselves\n")
err = syscall.Exec("/proc/self/exe", os.Args, []string{})
if err != nil {
return fmt.Errorf("failed to re-exec: %v", err)
}
}
// Re-create listener from fd
listenFile := os.NewFile(100, "listener")
listener, err := net.FileListener(listenFile)
if err != nil {
return fmt.Errorf("failed to re-assemble listener: %v", err)
}
fmt.Printf("Starting to proxy\n")
for {
// Accept a new client
srcConn, err := listener.Accept()
if err != nil {
fmt.Fprintf(os.Stderr, "error: Failed to accept new connection: %v\n", err)
continue
}
// Connect to the target
fields := strings.SplitN(connectAddr, ":", 2)
dstConn, err := net.Dial("tcp", fields[1])
if err != nil {
fmt.Fprintf(os.Stderr, "error: Failed to connect to target: %v\n", err)
srcConn.Close()
continue
}
go io.Copy(srcConn, dstConn)
go io.Copy(dstConn, srcConn)
}
return nil
}nsexec.go package main
/*
#define _GNU_SOURCE
#include <linux/limits.h>
#include <fcntl.h>
#include <unistd.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <sched.h>
#include <errno.h>
#define CMDLINE_SIZE (8 * PATH_MAX)
#define ADVANCE_ARG_REQUIRED() \
do { \
while (*cur != 0) \
cur++; \
cur++; \
if (size <= cur - buf) { \
return; \
} \
} while(0)
int dosetns(int pid, char *nstype) {
int mntns;
char buf[PATH_MAX];
sprintf(buf, "/proc/%d/ns/%s", pid, nstype);
mntns = open(buf, O_RDONLY);
if (mntns < 0) {
return -1;
}
if (setns(mntns, 0) < 0) {
close(mntns);
return -1;
}
close(mntns);
return 0;
}
__attribute__((constructor)) void init(void) {
int cmdline, listen_pid, connect_pid;
char buf[CMDLINE_SIZE];
ssize_t size;
char *cur;
// Read the arguments
cmdline = open("/proc/self/cmdline", O_RDONLY);
if (cmdline < 0) {
_exit(1);
}
memset(buf, 0, sizeof(buf));
if ((size = read(cmdline, buf, sizeof(buf)-1)) < 0) {
close(cmdline);
_exit(1);
}
close(cmdline);
cur = buf;
// Get the arguments
ADVANCE_ARG_REQUIRED();
listen_pid = atoi(cur);
ADVANCE_ARG_REQUIRED();
ADVANCE_ARG_REQUIRED();
connect_pid = atoi(cur);
ADVANCE_ARG_REQUIRED();
// Join the listener ns if not already setup
if (access("/proc/self/fd/100", F_OK) < 0) {
// Attach to the network namespace of the listener
if (dosetns(listen_pid, "net") < 0) {
fprintf(stderr, "Failed setns to listener network namespace: %s\n", strerror(errno));
_exit(1);
}
} else {
// Join the connector ns now
if (dosetns(connect_pid, "net") < 0) {
fprintf(stderr, "Failed setns to connector network namespace: %s\n", strerror(errno));
_exit(1);
}
}
// We're done, jump back to Go
}
*/
import "C"This gives me: stgraber@castiana:~/Desktop/proxy$ sudo ./proxy 7148 tcp:127.0.0.1:80 10513 tcp:127.0.0.1:1234
Listening on tcp:127.0.0.1:80 in 7148, forwarding to tcp:127.0.0.1:1234 from 10513
Setting up the listener
Re-executing ourselves
Starting to proxy
|
|
And forwarding does work, though the connection handling isn't quite there so closing the connection in the container will not terminate the connection with the target, there's certainly some more cleanup to do there and to expand for all the other protocols, but as a proof of concept this should work. |
katiewasnothere
commented
Nov 19, 2017
|
What is the desired syntax for this command? You mention earlier using lxc device add .... but we want to see if this is still what is wanted. |
|
Yeah, it's going to be a new device type, called "proxy".
Which would setup a new listener on the host, binding 1.2.3.4:80 and forwarding any connection to that over to the container at 127.0.0.1:80. To get the reverse effect, you can do:
Which would have the proxy listen on 127.0.0.1:1234 in the container and forward any connection on that to 127.0.0.1:80 on the host. I'm open to suggestion on a better naming for that "bind" property as it may sound a bit obscure. For the initial implementation we'd have a new device type called "proxy" with 3 properties:
|
|
Suggestion:
I wonder if |
fwyzard
commented
Nov 20, 2017
|
hi @stgraber,
why the |
|
That's the identifier (device name) for that device in that container's config. |
|
Could it be made optional? I mean autogenerated with an option to set it explicitly. |
|
No, the list of devices attached to a container is a map so it's got to have a key. |
|
We have a few questions about the behavior of the proxy process in regards to LXD restarting, clients removing the proxy device, and connection failures. If there is a connection error, do we want the proxy process to re-exec itself or just be killed? Our understanding is that we only want to remove the proxy device from the container's device list is if the client issues the remove device command. When LXD restarts, do we want it to restart proxy processes that have died or just let them be dead? If we let them stay dead, then the client would have to manually restart the proxy with a new command. Which container status codes should mean killing the proxy process(es) for the container? Should we only kill them when the container is Stopped? Also, in our current design, we are going to create new files for each proxy device. The file will have the following naming convention: |
|
If the proxy fails to start (bind), then it should exit non-zero which will have LXD return an error to the user. If the proxy fails to connect to its target upon receiving a client connection, then I'd expect it to log an error and disconnect the client, but not exit as there's technically nothing wrong with the proxy and re-execing wouldn't fix anything. I also wouldn't expect the LXD daemon restarting to cause any interaction with the proxy processes. Those processes should be tied to the container lifecycle instead, so be spawned when the container starts and be killed when the container stops. If they somehow crash while running, then I'm fine with the user having to restart the container to get them back up. So I don't think we actually need to store the args that were used, instead we only really need one file per proxy under /var/lib/lxd/devices/CONTAINER-NAME, that should be named after the name of the device entry in LXD and contain the PID. For example for container "blah" with a proxy device called "http", I'd expect to see:
Containing the PID of that particular proxy process. |
kianaalcala
commented
Nov 30, 2017
•
|
We are struggling to re-exec the proxy process. LXD prints out a usage error whenever we try to run syscall.Exec in the proxy process - it seems like it wants us to use run command but we can't do that because it will create a new process and we still want to use the same fd. We just want to re-exec, is there a way to do this or are we probably doing something wrong? We are currently doing: |
|
So you're doing that re-exec from a sub-process of the main LXD right? |
kianaalcala
commented
Nov 30, 2017
|
Yes. |
|
I suspect the problem may be as simple as args.Params not including arg[0]. When calling syscall.Exec you should pass the exec path as first argument, the command name ("lxd") as second argument and then all actual arguments after that. |
|
I have a few questions about the proxy process: You mentioned earlier that if the proxy fails to connect to its target upon receiving a client connection, we should log an error and disconnect the client, but not exit the process since nothing is wrong. Why would we not want to exit if there is a connection error on the target side? I would think that there would be no harm in exiting if we cannot connect to the target. Currently, we are handling TCP and Unix connections, but not UDP connections for the proxy process since there is no way to get a listener for a UDP connection. In Go, net.ListenUDP() returns a connection and not a listener that we can turn into a file with a file descriptor. It seems like the only way to handle UDP then, would be to have to keep re-execing and switching namespaces back and forth every time we get a connection. This seems very inefficient though. Is there a better way to handle UDP? |
|
I'll have to look into the UDP issue. At the kernel level, you sure can get a listener fd for udp so we just need to find the right way to have Go do that for us :) |
|
For the other question. Say you have the proxy forward from 127.0.0.1:1234 in the container to www.google.com:80. That means it's go a listening socket inside the container binding port 1234 (tcp) on 127.0.0.1. Now a client connects to 127.0.0.1:1234 but the proxy fails to connect to www.google.com:80 because of a network glitch or because your laptop just doesn't have internet access at the time. There is nothing wrong going on with the proxy, it just can't connect to the outside at that time. Having it exit and be respawned isn't going to make any difference and having it die completely (not respawned) would mean that the proxy will be broken even if host connectivity is restored and you'd then need to restart the container to fix it. Instead I'd just expect an error be logged and the client to be disconnected. It would keep behaving in that way whenever it fails to connect. Once it can connect again, then things work as usual. |
|
I have a question regarding the structure/organization of the proxy process. We currently have TCP implemented completely and are making good progress on Unix sockets. While working on Unix sockets, we realized that we do not need to switch network namespaces for its implementation since Unix sockets are implemented using socket files, independent of the network namespace. Also, the Unix proxy needs the container name so that it can access its socket file in /var/lib/lxd/containers/CONTAINER_NAME/rootfs. We are proposing to have 3 separate types of proxy processes based on the connection type necessary. The reasons we are proposing this split up:
Do you think it would be best to separate it into 3 separate processes started by different internal subcommands or should we just have a single process that would call 3 separate functions to handle each connection type? |
|
I think different functions for the different socket types is probably best. While you indeed don't need to attach to the container's network namespace to get a unix socket going, you should attach to the container's user namespace and mount namespace so that the socket ownership is correct and so that you use the container's mount table. |
|
Also note that we may want to support proxying from tcp to unix socket or vice-versa, so having different functions for the different socket types that we can mix and match based on what's requested is certainly best. |
|
I also looked into the UDP problem. The main issue there is that it requires a pretty different design that tcp and unix sockets as it's connection-less. That effectively means that the fd you're getting should be loaded as a FileConn, then rather than doing accept() calls (which don't exist with udp), you should listen for new messages and for every message grab the peer connection, that should given you the IP and PORT of whoever sent you that message. Then based on that you'll have to maintain a pool of UDP connections to forward things to the target. |
katiewasnothere
commented
Dec 10, 2017
|
We plan on making separate PR's for TCP, unix, and UDP since we're creating separate functions for each. TCP should be pretty much ready for review, minus testing. Currently we've been manually testing, what would be the best way to write automated tests and ensure we haven't broken any other functionality? |
|
For this kind of stuff, integration tests under tests/suite/... is usually best. I'd just make sure that adding a proxy device then starting the container works and communication is functional. Then remove it from a running container, confirm that the proxy is gone and then add it back and confirm that adding to a live container works too. This may be made slightly trickier than it sounds by the fact that we use a very minimal busybox container in our tests which may lack some of the commands you'd typically use for this. |
|
So I have spent a lot of time(10+hours) trying to test the proxy device using busybox with no luck. I am testing using nc while redirecting the output to a file so I can check the output against the data that I sent through the proxy. The issue is that it seems like there is no way to background nc in busybox while redirecting its output to a file. I spent some time to see if could get it working with the Ubuntu:16.04 image, and after a lot of trial and error, I got a command that would work. In Ubuntu, I used the following command: For Busybox, I have tried every conceivable permutation of the following command: I have a version of the test done that uses an Ubuntu image, but it's obviously much slower and not ideal for testing. Is there any other way to test the proxy using busybox without using nc that I am missing? |
|
Hmm, busybox's netcat really doesn't seem to like being backgrounded... that's frustrating. nsenter -n -t $(lxc query /1.0/containers/proxyTest/state | jq .pid) -- nc -6 -l 1234 |
fwyzard commentedOct 15, 2016
•
Edited 1 time
-
fwyzard
Oct 15, 2016
For some use cases it may be useful to forward a UNIX socket from the host to the container.
One example is to share the
SSH_AUTH_SOCKsocket used byssh-agent.Currently I'm doing
which relies on directly accessing the container's rootfs (and on the fact that my UID and GID are the same on the host and container).
I would propose support for this directly in lxd, e.g. via some syntax like