Skip to content

restart_container

chenyo-17 edited this page Jun 10, 2024 · 9 revisions

The restarting process is different between the branch master and the latest developing branch commnet24. Please refer to the correct section below.

commnet24

This branch is being actively developed, hence it is not thoroughly tested!

To start any single container, one runs the setup/restart_container.sh script with different arguments for different container types. One has to run this script under the platform/ directory and has to run with sudo. The container to be restarted can either have already stopped, or be still running, but the container must appear in docker ps -a!

This script will then reconnects the links between the given container and each of its neighbors, e.g., a router, a switch, a host or a service container. If any neighbor is stopped, the script will first reboot the neighbor container with the docker restart command, but it may not fully reconfigure it, hence one also needs to restart any stopped container with the script.

After a container is restarted, one may still need to wait several seconds for all network functionalities, e.g., ping, traceroute to work again.

Restart a L3 host

To restart a L3 host, one runs sudo setup/restart_container.sh l3-host <AS> <REGION> <host|host[0-9]>:

  • If the host belongs to a student AS, i.e., not controlled by TAs, the last argument is always host. For example, assume AS 3 is a student AS, then to restart the host 3_CAPEhost, one runs sudo setup/restart_container.sh l3-host 3 CAPE host.

  • If the host belongs to a TA-controlled AS, i.e., it is a Tier-1 AS or a stub AS, there is a digit suffix after host and one can recognize the digit from the container name. For example. to restart the host 1_CAIRhost0, one runs sudo setup/restart_container.sh l3-host 1 CAIR host0.

If the <AS> is tagged with the Config flag in config/AS_config.txt, the script also reconfigures the host, i.e., the ipv4 interface and the gateway.

Restart a router

To restart a normal router (i.e., not an IXP), one runs sudo setup/restart_container.sh router <AS> <REGION>. For example, to restart the router 3_CAPErouter, one runs sudo setup/restart_container.sh router 3 CAPE.

If the <AS> is tagged with the Config flag in config/AS_config.txt, the script also reconfigures all L3 hosts connected to it. This is because once a router is stopped, any related configuration on its connected hosts, i.e., the interface and the gateway are also removed. If the <REGION> is also listed in config/l2_tunnel.txt, the script also reconfigures the 6in4 tunnel in the router.

Once the router is restarted, the FRR configuration file should be auto-reloaded and all the previous connections, e.g., BGP routes should be restored. One can run sh run in the router to confirm this.

If in any case BGP routes are not fully restored, one can go to the router container and manually reset it with the following commands:

docker exec -it 3_CAPErouter vtysh  # go to the container
clear ip bgp *  # reset BGP routes
# if the router is also configured with RPKI, also execute the following commands
conf t  
rpki
rpki reset  # reset RPKI
exit

Restart a L2 host

To restart a L2 host, one runs sudo setup/restart_container.sh l2-host <AS> <hostname>. For example, to restart the L2 host 3_L2_L2S_S_AU, one runs sudo setup/restart_container.sh l2-host 3 S_AU.

If the <AS> is tagged with the Config flag in config/AS_config.txt, the script also reconfigures the host, i.e., the ipv4/v6 interfaces and the gateways.

Restart a switch

To restart a switch, one runs sudo setup/restart_container.sh switch <AS> <switch>. For example, to restart the switch 3_L2_L2S_S2, one runs sudo setup/restart_container.sh switch 3 S2.

If the <AS> is tagged with the Config flag in config/AS_config.txt, the script also reconfigures all L2 hosts connected to it. This is because once a switch is stopped, any related configuration on its connected hosts, i.e., the interfaces and the gateways are also removed.

Once the switch is restarted, all previous configurations, e.g., VLANs, should be auto-reloaded. One can run ovs-vsctl show to confirm this.

Restart an IXP

To restart an IXP, one runs sudo setup/restart_container.sh ixp <AS>. For example, to restart the IXP 142_IXP, one runs sudo setup/restart_container.sh ixp 142.

The IXP container is always reconfigured. Note that the reconfiguration takes about 5 minutes, which is much longer than restarting any other container. This is because when one manually restarts an IXP, the interfaces on other routers, e.g., ixp_142 may take minutes to be cleared, hence the script waits 5 minutes for all IXP interfaces to be removed before reconfiguring them.

If in any case, one encounters File exists error after running the script, one can either:

  1. (Recommended) increase the waiting time in the function restart_one_ixp() (search for sleep 300), or

  2. (May need multiple trials) first figure out which interface reconfiguration command causes the error with set -x, and manually clear the interface with sudo ip link delete <intf>. For example, if one encounters the following output:

    + ip link add 15bb5416c7595_a type veth peer name 15bb5416c7595_b
    + ip link set 15bb5416c7595_a netns 2766998
    + ip netns exec 2766998 ip link set dev 15bb5416c7595_a name ixp_142
    RTNETLINK answers: File exists
    + delete_netns_symlink

One can delete the virtual interface 15bb5416c7595_a with sudo ip link delete 15bb5416c7595_a, and then rerun the script. Note that one may need to try multiple times before success.

Restart a SSH container

To restart an SSH container, one runs sudo setup/restart_container.sh ssh <AS>. For example, to restart the 3_ssh container, one runs sudo setup/restart_container.sh ssh 3.

Restart a service container

To restart a service container, one runs sudo setup/restart_container.sh <service>. For example, to restart the DNS container, one runs sudo setup/restart_container.sh dns. There are currently 4 service names: dns, web, measurement, and matrix.

master

Restart a container

It can happen that a container crashes while the mini-Internet is running. It is quite a hassle to restart a container and manually connect it to the other containers according to the topology. Therefore, the mini-Internet automatically generates the script restart_container.sh during startup. This script enables to reconnect a container to the other containers automatically.

For instance if the container CONTAINER_NAME has crashed or has a problem, just run the following commands:

docker kill CONTAINER_NAME
docker start CONTAINER_NAME
./groups/restart_container.sh CONTAINER_NAME     # needs sudo!

ℹ️ This script can take few minutes.

ℹ️ Sometimes the MAC addresses on some interfaces must follow a particular scheme (for instance the ones connected to the MATRIX container). Configuring these MAC addresses must be done manually.

Restarting the SSH proxy container

It can happen that an SSH container fails if a student starts more than 100 parallel processes in it. We configured this number to limit the overall load on the server. When this problem occurs, you can no longer access the docker container, and you need to restart it following the procedure depicted above. Besides restarting the docker container, you also need to re-enable the SSH port forwarding for that particular SSH proxy container. You can do that from the host server with the following command:

ssh -i groups/id_rsa -o UserKnownHostsFile=/dev/null -o "StrictHostKeyChecking no" -f -N -L 0.0.0.0:[2000+X]:157.0.0.[X+10]:22 root@157.0.0.[X+10]

where X is the group number.

Clone this wiki locally