Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Containernet.addDockerHost hang at buffered read #12

Closed
Baldomo opened this issue Dec 29, 2021 · 12 comments
Closed

Containernet.addDockerHost hang at buffered read #12

Baldomo opened this issue Dec 29, 2021 · 12 comments
Assignees
Labels
bug Something isn't working

Comments

@Baldomo
Copy link
Contributor

Baldomo commented Dec 29, 2021

I'm creating a Docker host with the following code (a small excerpt from the whole topology):

net = Containernet(
    controller=Controller, ipBase="10.0.0.0/8", link=TCLink
)
net.addController("c0")

switch_core = net.addSwitch("s1", ip="10.80.95.0/24")

epc = net.addDockerHost(
    "srsepc",
    dcmd="srsepc --mme.mme_bind_addr=10.80.95.10 --spgw.gtpu_bind_addr=10.80.95.10",
    dimage="srsran",
    docker_args={
        "devices": ["/dev/net/tun"],
        "cap_add": ["SYS_NICE", "NET_ADMIN"]
    },
)
net.addLink(switch_core, epc, intf=TCIntf, ip="10.80.95.10", bw=1000, delay="1ms")

The code hangs indefinitely at addDockerHost, when sending a SIGINT the stacktrace is as follows:

Click to expand (stacktrace)

Traceback (most recent call last):
  File "src/4g.py", line 170, in <module>
    net = Simple4G(daemon=False)
  File "src/4g.py", line 73, in __init__
    self._net.addDockerHost(
  File "/home/vagrant/comnetsemu/comnetsemu/net.py", line 60, in addDockerHost
    return self.addHost(name, cls=DockerHost, **params)
  File "/usr/local/lib/python3.8/dist-packages/mininet/net.py", line 232, in addHost
    h = cls( name, **defaults )
  File "/home/vagrant/comnetsemu/comnetsemu/node.py", line 123, in __init__
    super(DockerHost, self).__init__(name, **kwargs)
  File "/usr/local/lib/python3.8/dist-packages/mininet/node.py", line 123, in __init__
    self.startShell()
  File "/home/vagrant/comnetsemu/comnetsemu/node.py", line 174, in startShell
    data = self.read(1024)
  File "/usr/local/lib/python3.8/dist-packages/mininet/node.py", line 249, in read
    data = os.read( self.stdout.fileno(), size - count )
KeyboardInterrupt

Using setLogLevel("debug"), the logs stop with these last lines:

Command:  srsepc --mme.mme_bind_addr=10.80.95.10 --spgw.gtpu_bind_addr=10.80.95.10
Created docker container object srsepc
image: srsran
Before creating the container
Docker container srsepc started. ID:c7415be20351fed52820e079a99c3ad49e86278d7c00944c83fa4e861749951a
*** errRun: ['which', 'mnexec'] 
/usr/bin/mnexec
  0*** errRun: ['which', 'ifconfig'] 
/usr/sbin/ifconfig
  0Insert interactive shell bin and argsDocker host master:12, slave:13
_popen ['docker', 'exec', '-it', 'srsepc', 'env', 'PS1=\x7f', 'bash', '--norc', '-is', 'mininet:srsepc'] 42829 
^^^ (hangs here) ^^^
Click to expand (full debug logs)

*** errRun: ['grep', '-c', 'processor', '/proc/cpuinfo'] 
4
  0*** Setting resource limits
*** errRun: ['which', 'mnexec'] 
/usr/bin/mnexec
  0*** errRun: ['which', 'ifconfig'] 
/usr/sbin/ifconfig
  0_popen ['mnexec', '-cd', 'env', 'PS1=\x7f', 'bash', '--norc', '--noediting', '-is', 'mininet:c0'] 42725*** c0 : ('unset HISTFILE; stty -echo; set +m',)
unset HISTFILE; stty -echo; set +m
*** errRun: ['which', 'telnet'] 
/usr/bin/telnet
  0*** c0 : ('echo A | telnet -e A 127.0.0.1 6653',)
Telnet escape character is 'A'.
Trying 127.0.0.1...
telnet: Unable to connect to remote host: Connection refused
*** errRun: ['which', 'ovs-vsctl'] 
/usr/bin/ovs-vsctl
  0*** errRun: ['ovs-vsctl', '-t', '1', 'show'] 
37247a23-d0d6-4a19-ad09-b8e880b10579
    ovs_version: "2.13.3"
  0*** errRun: ['ovs-vsctl', '--version'] 
ovs-vsctl (Open vSwitch) 2.13.3
DB Schema 8.2.0
  0_popen ['mnexec', '-cd', 'env', 'PS1=\x7f', 'bash', '--norc', '--noediting', '-is', 'mininet:s1'] 42733*** s1 : ('unset HISTFILE; stty -echo; set +m',)
unset HISTFILE; stty -echo; set +m

added intf lo (0) to node s1
*** s1 : ('ifconfig', 'lo', 'up')
_popen ['mnexec', '-cd', 'env', 'PS1=\x7f', 'bash', '--norc', '--noediting', '-is', 'mininet:s2'] 42736*** s2 : ('unset HISTFILE; stty -echo; set +m',)
unset HISTFILE; stty -echo; set +m

added intf lo (0) to node s2
*** s2 : ('ifconfig', 'lo', 'up')

Command:  srsepc --mme.mme_bind_addr=10.80.95.10 --spgw.gtpu_bind_addr=10.80.95.10
Created docker container object srsepc
image: srsran
Before creating the container
Docker container srsepc started. ID:c7415be20351fed52820e079a99c3ad49e86278d7c00944c83fa4e861749951a
*** errRun: ['which', 'mnexec'] 
/usr/bin/mnexec
  0*** errRun: ['which', 'ifconfig'] 
/usr/sbin/ifconfig
  0Insert interactive shell bin and argsDocker host master:12, slave:13
_popen ['docker', 'exec', '-it', 'srsepc', 'env', 'PS1=\x7f', 'bash', '--norc', '-is', 'mininet:srsepc'] 42829

I tried replicating the topology in docker-compose and everything works (only relevant lines):

services:
  srsepc:
    container_name: srsepc
    image: srsran
    networks:
      core:
        ipv4_address: 10.80.95.10
    cap_add:
      - NET_ADMIN
      - SYS_NICE
    devices:
      - /dev/net/tun
    command:
      - srsepc
      - --mme.mme_bind_addr=10.80.95.10
      - --spgw.gtpu_bind_addr=10.80.95.10

And running the single Docker CLI command in the same way comnetsemu does still works fine, with docker exec not hanging and successfully attaching to the running container.

$ docker run --privileged -itd \
    --network bridge \
    --name srsepc -v $(pwd)/config:/etc/srsran:ro \
    --device /dev/net/tun \
    srsran \
    srsepc --mme.mme_bind_addr=127.0.0.1 --spgw.gtpu_bind_addr=127.0.0.1

$ docker exec -it srsepc bash --norc -is mininet:srsepc

Notes:

  • I'm running everything inside the stock Vagrant VM with just the addition of my files and docker-compose v2 installed as standalone binary (not through apt)
  • I cannot test other containers in the topology because they depend on this first one
@stevelorenz stevelorenz self-assigned this Dec 30, 2021
@stevelorenz stevelorenz added the bug Something isn't working label Dec 30, 2021
@stevelorenz
Copy link
Owner

Thanks for the report on this issue. I haven't encountered this issue before and will need some time to delve into it. Once I figure out the possible problem (blocking) in the addDockerHost method, I'll add my response here.

@Baldomo
Copy link
Contributor Author

Baldomo commented Dec 30, 2021

Thank you for the swift response. I have been doing some testing on my own too, and running some other blocking process instead of srsepc doesn't seem to break things. For example, running an empty bash shell:

net.addDockerHost(
    "srsepc",
    dcmd="bash",
    dimage="srsran",
    docker_args={
        "devices": ["/dev/net/tun"],
        "cap_add": ["SYS_NICE", "NET_ADMIN"]
    },
)

The _popen ['docker', 'exec'] etc. works fine, the container stays running and the code doesn't hang. I also tried the following:

@stevelorenz
Copy link
Owner

Sorry that I can't easily reproduce this blocking with e.g. ping command (use a infinite ping command in addDockerHost method does not hang the method...). I cannot easily determine the problem through the stack trace and debug logs you shared (By the way, thanks so much for your detailed feedback materials). Would you mind share me the Dockerfile you used to build the srsepc container? So I can test this hanging on my side? Maybe the current addDocketHost method is blocking waiting for some signals/network signals when executing srsepc --mme.mme_bind_addr=127.0.0.1 --spgw.gtpu_bind_addr=127.0.0.1. Thanks for the help.

@Baldomo
Copy link
Contributor Author

Baldomo commented Dec 31, 2021

Yeah that could make sense actually. As for the Dockerfile, the whole repo is hosted at comnetsemu-srsran. You can use the make.sh script to build everything

@Baldomo
Copy link
Contributor Author

Baldomo commented Jan 6, 2022

I think I may be onto something. srsEPC (as well as the other srsRAN programs) binds to an arbitrary IP address (the one in the command line flags). That would normally resolve to an attached network interface, but since net.addLink hasn't been called yet, the interface(s) haven't been setup yet and that somehow causes the program to hang, so Docker also hangs, etc.

Now this is quite the problem. Is it possible to initialize interfaces for a DockerHost before it is started? Another host also needs to be connected to two switches with two separate interfaces and separate IPs. The fact that interfaces aren't initialized at container creation will most surely break things. I'll snoop around comnetsemu a bit and report back I guess.

As a sidenote, I see you've been quite busy fixing things here and there (especially the tooling, which I was gonna open a pull request for, at some point), nice job and congrats for v0.3.0!

@stevelorenz
Copy link
Owner

stevelorenz commented Jan 6, 2022

I think I may be onto something. srsEPC (as well as the other srsRAN programs) binds to an arbitrary IP address (the one in the command line flags). That would normally resolve to an attached network interface, but since net.addLink hasn't been called yet, the interface(s) haven't been setup yet and that somehow causes the program to hang, so Docker also hangs, etc.

Aha! Makes sense! Thanks for the explanation. When addDockerHost method is called, the host does not know yet which links will be added later via e.g. net.addLink method. The docker container only has the default network interface (will be generated when you run docker run without special network configuration) connecting to the docker daemon. This interface is NOT used for the traffic in the virtual network (managed by TC for delay and loss configuration blabla). The reason for this is that DockerHost is a node to emulate a physical machine in real world. So when machine is booted, it only starts the kernel and basic shell. Then it is connected to a switch/router and then run the application that consumes the network. So I typically use something in the script like this

net = Containernet(controller=Controller, link=TCLink)
# Do not give special dcmd or use "bash" to just let the Docker container start without running the actual application
core = net.addDockerHost("5gc", dimage="my_fancy_5g_core", ip="192.168.0.111/24")
s1 = net.addSwitch("s1")
net.addLink(core, s1, bw=1000, delay="1ms", intfName1="core1-s1", intfName2="s1-core1")
# Start the network, now interfaces will be created and available for processes running on the core container
net.start()
# Now use the cmd() method to execute the actual program, so it will bind the available interfaces, blabla
ret = core.cmd("./start_my_fancy_5g_core.sh")
if not ret:
    raise RuntimeError("Error! Blabla")

So the DockerHost is currently not a application container, but rather a system container like LXC. Because a Host node in Mininet is to emulate a physical machine in real world. Then currently ComNetsEmu has an another type called APPContainer, which is used to emulate the application containers in e.g. K8S domain. So there is a dockerindocker.py example.

I agree that this may not be the best and elegant approach. But emulation of a nested virtual environment on a single laptop was an open question for me, so I decided to use this Docker-in-Docker design to avoid using nested VM approach (e.g. running KVM inside Virtualbox).

Now this is quite the problem. Is it possible to initialize interfaces for a DockerHost before it is started? Another host also needs to be connected to two switches with two separate interfaces and separate IPs. The fact that interfaces aren't initialized at container creation will most surely break things. I'll snoop around comnetsemu a bit and report back I guess.

Based on my explanation above. Would you think the "workaround" for starting the srsEPC after calling net.start() could be a solution now? If you have ideas/suggestions for a better design and implementation, please share with me or create pull requests. I'm happy to implement or review them. Current implementation was just my decision in 2019 to meet the deadline 😉.

As a sidenote, I see you've been quite busy fixing things here and there (especially the tooling, which I was gonna open a pull request for, at some point), nice job and congrats for v0.3.0!

Thanks so much! Sorry for the delayed reply. My PhD contract ends this month and I'm busy writing my PhD thesis recently 🙈 . I was in a hurry in 2019 to meet the deadline, so tooling and some code were a little bit messy... I try to cleanup them before the end of my PhD journey. If you have any good ideas and feature suggestions, feel free to open issues. Have a nice day.

@Baldomo
Copy link
Contributor Author

Baldomo commented Jan 6, 2022

Based on my explanation above. Would you think the "workaround" for starting the srsEPC after calling net.start() could be a solution now?

Good idea, it was the first thing I tried to implement but now some other problems popped up (although now the programs succesfully bind to the right IPs/interfaces). Since calling .cmd("srsepc ... etc") also started hanging, I used sendCmd, which supposedly just runs a command without waiting on stdout. Please skim over this log from one of the containers since I'm not quite sure what's going on:

::: Running cmd in container (srsepc): srsepc --mme.mme_bind_addr=10.80.95.10 --spgw.gtpu_bind_addr=10.80.95.10
::: Running cmd in container (srsenb): srsenb --enb.mme_addr=10.80.95.10 --enb.gtp_bind_addr=10.80.95.11 --enb.s1c_bind_addr=10.80.95.11 --rf.device_name=zmq '--rf.device_args='"'"'id=enb,fail_on_disconnect=true,tx_port=tcp://*:2000,rx_port=tcp://10.80.97.12:2001,base_srate=23.04e6'"'"'' --enb_files.sib_config=/etc/srsran/sib.conf
::: Running cmd in container (srsue): srsue --rf.device_name=zmq '--rf.device_args='"'"'id=ue,fail_on_disconnect=true,tx_port=tcp://*:2001,rx_port=tcp://10.80.97.11:2000,base_srate=23.04e6'"'"''
::: Waiting for containers
*** Starting CLI:
stopping srsepc 
sendInt: writing chr(3)

Built in Release mode using 21.10.0.


---  Software Radio Systems EPC  ---

Couldn't open , trying /root/.config/srsran/epc.conf
Couldn't open /root/.config/srsran/epc.conf either, trying /etc/srsran/epc.conf
Reading configuration file /etc/srsran/epc.conf...
Couldn't open user_db.csv, trying /root/.config/srsran/user_db.csv
Couldn't open /root/.config/srsran/user_db.csv either, trying /etc/srsran/user_db.csv
HSS Initialized.
MME S11 Initialized
MME GTP-C Initialized
MME Initialized. MCC: 0xf001, MNC: 0xff01
SPGW GTP-U Initialized.
SPGW S11 Initialized.
SP-GW Initialized.
Stopping ..

The logs also go on with these lines, one after the other:

stopping srsenb 
sendInt: writing chr(3)

Couldn't stop after 5s. Forcing exit.
stopping srsue 
sendInt: writing chr(3)

Why are the containers being killed? It looks like they're being treated as if they're hanging or something, but they're working as intended. Is mininet not intended to run long-lasting processes? (Which seems kinda weird but I haven't been using it long enough to know for sure).

I guess I'll try the docker-in-docker/APPContainer thing. I'm just kinda bummed since I have to present this project in a couple weeks and I'm getting stuck on these input/output issues.

@stevelorenz
Copy link
Owner

Based on my explanation above. Would you think the "workaround" for starting the srsEPC after calling net.start() could be a solution now?

Good idea, it was the first thing I tried to implement but now some other problems popped up (although now the programs succesfully bind to the right IPs/interfaces). Since calling .cmd("srsepc ... etc") also started hanging, I used sendCmd, which supposedly just runs a command without waiting on stdout. Please skim over this log from one of the containers since I'm not quite sure what's going on:

Did you run run the srsepc in bachground with &, something like srsepc --blabla > /var/log/srsepc.log 2>&1 & ? ComNetsEmu's DockerHost follows the Mininet's style, so the cmd actually runs by a long-running shell (by default bash) attached to the Docker container after the creation of the Docker container. So cmd() waits for a return code (and return value if exists). If the command does not return any status code, it will hang/block waiting for it. So it's common to run long-running command in background with & or use the daemon mode (if the program supports daemon mode). It's like when you run ping localhost > /dev/null 2>&1 in the shell, the shell is blocked forever until you stop the ping or run ping in background.

@Baldomo
Copy link
Contributor Author

Baldomo commented Jan 8, 2022

You're right, thanks for the heads up! Now the containers all start happily. However, yet another problem popped up, one of the containers cannot bind to one of its IPs, might as well ask here even though it's kinda off topic. I need to have a container with two separate IPs:

  • one for a Internet-like network (e.g. 10.80.95.11)
  • one for an internal network which moves raw radio packets (e.g. 10.80.97.11) strictly for radio emulation

The container fails to bind to the first IP with a connection timeout. I'm pretty sure this is an issue with hardcoding link IPs with addLink(ip=...), but I need to have the actual IP somehow and pass it to the srsenb via command line. How would I go on about implementing such a thing? I've been searching around for a couple hours but nobody's ever done this apparently (except for maybe this Github discussion, which was solved by specifying the right IP).

Hastily made network diagram:

        switch_core       switch_rf

           │  │             │   │
┌────────┐ │  │ ┌────────┐  │   │ ┌────────┐
│        │ │  │ │        │  │   │ │        │
│ srsEPC ├─┘  └─┤ srsENB ├──┘   └─┤ srsUE  │
│        │      │        │        │        │
└────────┘      └────────┘        └────────┘

And the connections for srsENB:

 switch_core
      │
      │

 10.80.95.11

      │
┌─────┴─────┐
│           │
│  srsENB   │
│           │
└─────┬─────┘
      │

 10.80.97.11

      │
      │
  switch_rf

@stevelorenz
Copy link
Owner

You're right, thanks for the heads up! Now the containers all start happily. However, yet another problem popped up, one of the containers cannot bind to one of its IPs, might as well ask here even though it's kinda off topic. I need to have a container with two separate IPs:

  • one for a Internet-like network (e.g. 10.80.95.11)
  • one for an internal network which moves raw radio packets (e.g. 10.80.97.11) strictly for radio emulation

Congrats that all containers are working without blocking. Unfortunately, Mininet's API is not designed for the scenario when the node has multiple network interfaces... So I normally use "hard-coding" workaround currently:

# This is helper function provided by ComNetsEmu. When the link is created, the name of the interface inside is called "h1-s1", then the interface in the root namespace is "s1-h1", the FIRST link attach to the host will have the IP of the host.
net.addLinkNamedIfce("h1", "s1")
# The "h1-s2" interface is created without any IP address.
net.addLinkNamedInfce("h1", "s2")
net.start()
# The IP address can be assigned after starting the network
h1.cmd("ip addr add fancy_ip dev h1-s2")

I have not implemented an API or mechanism for better management of multiple interfaces (for clearer management and dataplane traffic separation) for DockerHost and AppContainer yet. Maybe it's a nice TODO for the 0.4.0 🙈

@Baldomo
Copy link
Contributor Author

Baldomo commented Jan 10, 2022

Thank you for the workaround, I almost got it to work (!) but then I just gave up and rewrote the whole network topology. Now both RF data and standard network traffic go through a single switch and everything works!

comnetsemu

Turns out hardcoding IPs is fine as long as the host has a single IP and it's defined in the addHost method. This is good enough for the purpose of the assignment so that's that. I can't thank you enough for the support and the time you spent on this issue.

@Baldomo Baldomo closed this as completed Jan 10, 2022
@stevelorenz
Copy link
Owner

Thanks for the feedback! I'm very happy that everything works. Yes, currently addHost only accept one IP address. This is the default behavior of Mininet and I have not yet extended it. Best wishes for your project, feel free to open new issues if there's any other problems.

By the way, just in case you have time some when, could you add some bullet points (very brief description is just OK) about "Nice to have" features or improvements of ComNetsEmu? This helps me a lot 😉 Have a nice day 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants