Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

debugging Mesos and net-modules #90

Open
unclejack opened this issue Jan 30, 2016 · 12 comments
Open

debugging Mesos and net-modules #90

unclejack opened this issue Jan 30, 2016 · 12 comments
Assignees

Comments

@unclejack
Copy link

The logging provided by the Mesos slave, net-modules and Marathon doesn't provide many hints regarding the reason behind the startup failure of an application.

A setup which uses Marathon 0.14.0, Mesos 0.26 and the latest net-modules master exhibits the following behaviour:

  • when the isolator/ipam is missing, the application (which uses groups in the same manner as the stars isolated example from the demo) starts and runs as expected
  • when a dummy isolator is active, the application starts and stops continuously, but never stays in the running state

Observations, things I've tried so far and things I've investigated:

  • replacing the application made no difference; it runs without net-modules, not with it; the app makes no use of the network in any way
  • the dummy ipam/isolator receives requests and returns the right responses (no errors, no silent failures, no hangs, mesos-slave receives them as expected)
  • there doesn't seem to be any kind of failure logged in the mesos-slave logs, not from net-modules or from mesos-slave
  • the tasks fail to run in this setup and Marathon doesn't log errors (like when the ipam/isolator are broken on purpose to check if errors are passed on)
  • the mesos-slave starts the process, creates the network namespace and the isolator is able to set it up
  • setting up network interfaces in the container or not makes no difference, the tasks fail to start
  • the groups set up for the app made no difference (one task with a single public group, multiple tasks with a shared group and one single task with a public group)

Given the issue described above, I have a few questions

  • how should this be debugged?
  • are there any side effects of enabling net-modules and using an isolator (e.g. does it change something in how Mesos runs the tasks)?
  • does the Mesos and net-modules setup require a real network to be set up by the isolator (e.g. does it perform some kind of specific sanity check after Isolate is done)?
  • net-modules is using marathon version 0.14.0-RC1; is there a known problem with the final version or should 0.14.0 also be OK?

I've read all the documentation I could read and I've also read some of the code to try to figure out if I'm missing something obvious.

Please let me know if I should also provide some logs.

@karya0
Copy link
Contributor

karya0 commented Jan 31, 2016

@unclejack: Can you provide the slave logs corresponding to the failed logs? If possible, I'd like to take a look at the sequence of events in there. We can then talk about adding more logging to the module and see what would help streamline debugging even further.

@karya0 karya0 self-assigned this Jan 31, 2016
@unclejack
Copy link
Author

@karya0 Of course, you'll have the logs, the full JSON config of the application and all the relevant information in a few hours.

@unclejack
Copy link
Author

The full log of the mesos-slave can be found below:

I0201 11:44:00.499222 18350 main.cpp:190] Build: 2016-01-20 09:54:18 by root
I0201 11:44:00.499286 18350 main.cpp:192] Version: 0.26.0
I0201 11:44:00.499372 18350 containerizer.cpp:142] Using isolation: com_mesosphere_mesos_NetworkIsolator,filesystem/posix
I0201 11:44:00.499851 18350 network_isolator.cpp:431] Loading Network Isolator module
I0201 11:44:00.530853 18350 linux_launcher.cpp:103] Using /sys/fs/cgroup/freezer as the freezer hierarchy for the Linux launcher
I0201 11:44:00.537849 18350 main.cpp:277] Starting Mesos slave
I0201 11:44:00.538010 18350 slave.cpp:191] Slave started on 1)@10.0.2.15:5051
I0201 11:44:00.538019 18350 slave.cpp:192] Flags at startup: --appc_store_dir="/tmp/mesos/store/appc" --authenticatee="crammd5" --cgroups_cpu_enable_pids_and_tids_count="false" --cgroups_enable_cfs="false" --cgroups_hierarchy="/sys/fs/cgroup" --cgroups_limit_swap="false" --cgroups_root="mesos" --container_disk_watch_interval="15secs" --containerizers="mesos" --default_role="*" --disk_watch_interval="1mins" --docker="docker" --docker_auth_server="auth.docker.io" --docker_auth_server_port="443" --docker_kill_orphans="true" --docker_local_archives_dir="/tmp/mesos/images/docker" --docker_puller="local" --docker_puller_timeout="60" --docker_registry="registry-1.docker.io" --docker_registry_port="443" --docker_remove_delay="6hrs" --docker_socket="/var/run/docker.sock" --docker_stop_timeout="0ns" --docker_store_dir="/tmp/mesos/store/docker" --enforce_container_disk_quota="false" --executor_registration_timeout="1mins" --executor_shutdown_grace_period="5secs" --fetcher_cache_dir="/tmp/mesos/fetch" --fetcher_cache_size="2GB" --frameworks_home="" --gc_delay="1weeks" --gc_disk_headroom="0.1" --hadoop_home="" --help="false" --hooks="com_mesosphere_mesos_NetworkHook" --hostname_lookup="true" --image_provisioner_backend="copy" --initialize_driver_logging="true" --isolation="com_mesosphere_mesos_NetworkIsolator" --launcher_dir="/mesos/build/src" --logbufsecs="0" --logging_level="INFO" --master="zk://localhost:2181/mesos" --modules="libraries {
  file: "/usr/local/lib/mesos/libmesos_network_isolator.so"
  modules {
    name: "com_mesosphere_mesos_NetworkIsolator"
    parameters {
      key: "isolator_command"
      value: "/calico/calico_mesos"
    }
    parameters {
      key: "ipam_command"
      value: "/calico/calico_mesos"
    }
  }
  modules {
    name: "com_mesosphere_mesos_NetworkHook"
  }
}
" --oversubscribed_resources_interval="15secs" --perf_duration="10secs" --perf_interval="1mins" --port="5051" --qos_correction_interval_min="0ns" --quiet="false" --recover="reconnect" --recovery_timeout="15mins" --registration_backoff_factor="1secs" --revocable_cpu_low_priority="true" --sandbox_directory="/mnt/mesos/sandbox" --strict="true" --switch_user="true" --systemd_runtime_directory="/run/systemd/system" --version="false" --work_dir="/tmp/mesos"
I0201 11:44:00.538449 18350 slave.cpp:392] Slave resources: cpus(*):1; mem(*):1985; disk(*):35164; ports(*):[31000-32000]
I0201 11:44:00.538480 18350 slave.cpp:400] Slave attributes: [  ]
I0201 11:44:00.538485 18350 slave.cpp:405] Slave hostname: vagrant-ubuntu-trusty-64
I0201 11:44:00.538487 18350 slave.cpp:410] Slave checkpoint: true
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@712: Client environment:zookeeper.version=zookeeper C client 3.4.5
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@716: Client environment:host.name=vagrant-ubuntu-trusty-64
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@723: Client environment:os.name=Linux
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@724: Client environment:os.arch=3.13.0-76-generic
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@725: Client environment:os.version=#120-Ubuntu SMP Mon Jan 18 15:59:10 UTC 2016
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@733: Client environment:user.name=vagrant
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@741: Client environment:user.home=/root
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@log_env@753: Client environment:user.dir=/root/net-modules
2016-02-01 11:44:00,539:18350(0x7febf9f72700):ZOO_INFO@zookeeper_init@786: Initiating client connection, host=localhost:2181 sessionTimeout=10000 watcher=0x7fec03589b10 sessionId=0 sessionPasswd=<null> context=0x7febf0003d50 flags=0
I0201 11:44:00.544509 18414 state.cpp:54] Recovering state from '/tmp/mesos/meta'
2016-02-01 11:44:00,581:18350(0x7febf572e700):ZOO_INFO@check_events@1703: initiated connection to server [127.0.0.1:2181]
I0201 11:44:00.581164 18417 status_update_manager.cpp:202] Recovering status update manager
I0201 11:44:00.581164 18417 containerizer.cpp:384] Recovering containerizer
I0201 11:44:00.581691 18417 slave.cpp:4230] Finished recovery
2016-02-01 11:44:00,601:18350(0x7febf572e700):ZOO_INFO@check_events@1750: session establishment complete on server [127.0.0.1:2181], sessionId=0x1529ca3f0700004, negotiated timeout=10000
I0201 11:44:00.601882 18417 group.cpp:331] Group process (group(1)@10.0.2.15:5051) connected to ZooKeeper
I0201 11:44:00.601898 18417 group.cpp:805] Syncing group operations: queue size (joins, cancels, datas) = (0, 0, 0)
I0201 11:44:00.601904 18417 group.cpp:403] Trying to create path '/mesos' in ZooKeeper
I0201 11:44:00.629608 18419 detector.cpp:156] Detected a new leader: (id='0')
I0201 11:44:00.629708 18419 group.cpp:674] Trying to get '/mesos/json.info_0000000000' in ZooKeeper
I0201 11:44:00.635550 18419 detector.cpp:482] A new leading master (UPID=master@10.0.2.15:5050) is detected
I0201 11:44:00.635598 18419 slave.cpp:729] New master detected at master@10.0.2.15:5050
I0201 11:44:00.635702 18419 slave.cpp:754] No credentials provided. Attempting to register without authentication
I0201 11:44:00.635721 18419 slave.cpp:765] Detecting new master
I0201 11:44:00.635737 18419 status_update_manager.cpp:176] Pausing sending status updates
I0201 11:44:01.947412 18419 slave.cpp:904] Registered with master master@10.0.2.15:5050; given slave ID 16fa4ecd-2025-48b0-9031-74f08c512b2d-S0
I0201 11:44:01.947597 18403 status_update_manager.cpp:183] Resuming sending status updates
I0201 11:44:01.947721 18419 slave.cpp:963] Forwarding total oversubscribed resources 
I0201 11:45:00.540904 18403 slave.cpp:4039] Current disk usage 13.53%. Max allowed age: 5.352910639842731days
I0201 11:46:00.378325 18403 slave.cpp:1294] Got assigned task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe for framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:00.380084 18403 slave.cpp:1410] Launching task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe for framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:00.380933 18403 paths.cpp:436] Trying to chown '/tmp/mesos/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe/runs/b16d1bb2-dc6e-4acc-a520-074957065147' to user 'root'
I0201 11:46:00.385494 18403 slave.cpp:4999] Launching executor testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000 with resources cpus(*):0.1; mem(*):32 in work directory '/tmp/mesos/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe/runs/b16d1bb2-dc6e-4acc-a520-074957065147'
I0201 11:46:00.385627 18403 slave.cpp:1628] Queuing task 'testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' for executor 'testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:00.385772 18403 containerizer.cpp:618] Starting container 'b16d1bb2-dc6e-4acc-a520-074957065147' for executor 'testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' of framework '16fa4ecd-2025-48b0-9031-74f08c512b2d-0000'
I0201 11:46:00.386060 18403 network_isolator.cpp:211] NetworkIsolator::prepare for container: b16d1bb2-dc6e-4acc-a520-074957065147
I0201 11:46:00.386616 18403 network_isolator.cpp:314] Sending IP request command to IPAM
I0201 11:46:00.387308 18403 network_isolator.cpp:104] Sending command to /calico/calico_mesos: {"args":{"hostname":"vagrant-ubuntu-trusty-64","netgroups":["public"],"num_ipv4":1,"num_ipv6":0,"uid":"38b00d50-9459-4a38-8b3c-f20d73c2a2c7"},"command":"allocate"}
I0201 11:46:00.401631 18403 network_isolator.cpp:125] Got response from /calico/calico_mesos: {"ipv4":"192.168.11.1"}
I0201 11:46:00.401752 18403 network_isolator.cpp:329] IP(s) 192.168.11.1 allocated with IPAM.
I0201 11:46:00.403147 18403 linux_launcher.cpp:365] Cloning child process with flags = CLONE_NEWNET
I0201 11:46:00.407737 18403 containerizer.cpp:851] Checkpointing executor's forked pid 19018 to '/tmp/mesos/meta/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe/runs/b16d1bb2-dc6e-4acc-a520-074957065147/pids/forked.pid'
I0201 11:46:00.408087 18416 network_isolator.cpp:378] Sending isolate command to Isolator
I0201 11:46:00.408524 18416 network_isolator.cpp:104] Sending command to /calico/calico_mesos: {"args":{"container_id":"b16d1bb2-dc6e-4acc-a520-074957065147","hostname":"vagrant-ubuntu-trusty-64","ipv4_addrs":["192.168.11.1"],"netgroups":["public"],"pid":19018},"command":"isolate"}
I0201 11:46:00.485188 18416 network_isolator.cpp:125] Got response from /calico/calico_mesos: {}
I0201 11:46:00.541923 18416 slave.cpp:4039] Current disk usage 13.49%. Max allowed age: 5.356046553481574days
I0201 11:46:05.731503 18417 containerizer.cpp:1257] Executor for container 'b16d1bb2-dc6e-4acc-a520-074957065147' has exited
I0201 11:46:05.731542 18417 containerizer.cpp:1074] Destroying container 'b16d1bb2-dc6e-4acc-a520-074957065147'
I0201 11:46:05.733041 18417 cgroups.cpp:2429] Freezing cgroup /sys/fs/cgroup/freezer/mesos/b16d1bb2-dc6e-4acc-a520-074957065147
I0201 11:46:05.733933 18417 cgroups.cpp:1411] Successfully froze cgroup /sys/fs/cgroup/freezer/mesos/b16d1bb2-dc6e-4acc-a520-074957065147 after 868096ns
I0201 11:46:05.738904 18413 cgroups.cpp:2447] Thawing cgroup /sys/fs/cgroup/freezer/mesos/b16d1bb2-dc6e-4acc-a520-074957065147
I0201 11:46:05.739802 18413 cgroups.cpp:1440] Successfullly thawed cgroup /sys/fs/cgroup/freezer/mesos/b16d1bb2-dc6e-4acc-a520-074957065147 after 878080ns
I0201 11:46:05.745223 18419 network_isolator.cpp:407] Requesting IPAM to release IPs: 192.168.11.1 
I0201 11:46:05.745654 18419 network_isolator.cpp:104] Sending command to /calico/calico_mesos: {"args":{"ips":["192.168.11.1"]},"command":"release"}
I0201 11:46:05.758720 18419 network_isolator.cpp:125] Got response from /calico/calico_mesos: {}
I0201 11:46:05.759049 18419 network_isolator.cpp:104] Sending command to /calico/calico_mesos: {"args":{"container_id":"b16d1bb2-dc6e-4acc-a520-074957065147","hostname":"vagrant-ubuntu-trusty-64"},"command":"cleanup"}
I0201 11:46:05.770232 18419 network_isolator.cpp:125] Got response from /calico/calico_mesos: {}
I0201 11:46:05.770485 18419 slave.cpp:3553] Executor 'testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000 terminated with signal Killed
I0201 11:46:05.771064 18419 slave.cpp:2762] Handling status update TASK_FAILED (UUID: 9ba6e7dc-22d1-477c-80ff-dc2f17b1bb2a) for task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000 from @0.0.0.0:0
I0201 11:46:05.771103 18419 network_isolator.cpp:480] NetworkHook::task status label decorator
I0201 11:46:05.771112 18419 network_isolator.cpp:512] NetworkHook:: added ip address(es) 192.168.11.1 
W0201 11:46:05.771200 18419 containerizer.cpp:966] Ignoring update for unknown container: b16d1bb2-dc6e-4acc-a520-074957065147
I0201 11:46:05.771255 18419 status_update_manager.cpp:322] Received status update TASK_FAILED (UUID: 9ba6e7dc-22d1-477c-80ff-dc2f17b1bb2a) for task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:05.771409 18419 status_update_manager.cpp:826] Checkpointing UPDATE for status update TASK_FAILED (UUID: 9ba6e7dc-22d1-477c-80ff-dc2f17b1bb2a) for task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:05.772874 18415 slave.cpp:3087] Forwarding the update TASK_FAILED (UUID: 9ba6e7dc-22d1-477c-80ff-dc2f17b1bb2a) for task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000 to master@10.0.2.15:5050
I0201 11:46:05.835114 18418 status_update_manager.cpp:394] Received status update acknowledgement (UUID: 9ba6e7dc-22d1-477c-80ff-dc2f17b1bb2a) for task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:05.835155 18418 status_update_manager.cpp:826] Checkpointing ACK for status update TASK_FAILED (UUID: 9ba6e7dc-22d1-477c-80ff-dc2f17b1bb2a) for task testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:05.836791 18418 slave.cpp:3657] Cleaning up executor 'testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' of framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:05.836951 18418 slave.cpp:3745] Cleaning up framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:46:05.836983 18418 gc.cpp:56] Scheduling '/tmp/mesos/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe/runs/b16d1bb2-dc6e-4acc-a520-074957065147' for gc 6.9999903137037days in the future
I0201 11:46:05.837008 18418 gc.cpp:56] Scheduling '/tmp/mesos/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' for gc 6.99999031342518days in the future
I0201 11:46:05.837018 18418 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe/runs/b16d1bb2-dc6e-4acc-a520-074957065147' for gc 6.99999031328889days in the future
I0201 11:46:05.837025 18418 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000/executors/testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe' for gc 6.99999031319407days in the future
I0201 11:46:05.837034 18418 gc.cpp:56] Scheduling '/tmp/mesos/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000' for gc 6.99999031284741days in the future
I0201 11:46:05.837041 18418 gc.cpp:56] Scheduling '/tmp/mesos/meta/slaves/16fa4ecd-2025-48b0-9031-74f08c512b2d-S0/frameworks/16fa4ecd-2025-48b0-9031-74f08c512b2d-0000' for gc 6.99999031276741days in the future
I0201 11:46:05.837047 18418 status_update_manager.cpp:284] Closing status update streams for framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000
I0201 11:47:00.386456 18415 slave.cpp:3947] Framework 16fa4ecd-2025-48b0-9031-74f08c512b2d-0000 seems to have exited. Ignoring registration timeout for executor 'testfoo_probe.5d441717-c8d9-11e5-983a-024293d08dbe'
I0201 11:47:00.543108 18418 slave.cpp:4039] Current disk usage 13.08%. Max allowed age: 5.384249413155556days
I0201 11:48:00.545049 18413 slave.cpp:4039] Current disk usage 13.08%. Max allowed age: 5.384248734386365days
I0201 11:49:00.546118 18414 slave.cpp:4039] Current disk usage 13.08%. Max allowed age: 5.384237874079386days
I0201 11:50:00.546681 18403 slave.cpp:4039] Current disk usage 13.08%. Max allowed age: 5.384237195310209days

The JSON configuration I was using is:

{
   "apps" : [
      {
         "backoffSeconds" : 10000,
         "mem" : 32,
         "id" : "probe",
         "uris" : [
            "file:///vagrant/foo"
         ],
         "cpus" : 0.1,
         "cmd" : "./foo",
         "ipAddress" : {
            "groups" : [
               "public"
            ]
         }
      }
   ],
   "id" : "/testfoo"
}

I've used backoffSeconds to make it avoid spamming the logs with too many attempts to run.

Mesos: 0.26
Marathon: 0.14.0 (using the official Docker image from mesosphere)
net-modules: current master

Please let me know if there are any other bits of information I should provide.

update: /calico/calico_mesos isn't the Calico isolator, it's just a dummy isolator used for testing. It wasn't important to change that during testing with a dummy isolator.

@unclejack unclejack reopened this Feb 1, 2016
@unclejack
Copy link
Author

@karya0 Have you had a chance to take a look at this yet?

@karya0
Copy link
Contributor

karya0 commented Feb 10, 2016

@unclejack: Sorry for the delay in getting back to you. Can you tell me a bit more about your dummy isolator implementation? Is it going to create veth pairs for the container? Without it, the executor won't be able to bind to the the given IP address and hence it would fail to connect to the agent and eventually exit.

@unclejack
Copy link
Author

@karya0 My dummy isolator doesn't set up veth pairs, but this isn't documented in the net-modules API. Could you tell me more about this and any other requirements around these side effects, please?

@karya0
Copy link
Contributor

karya0 commented Feb 10, 2016

Okay, so once the isolator is activate, it would ask the Mesos agent to create a network namespace for the container being launched. Thus, someone needs to setup the network for the new container. In the case of calico, the calico binary does all that as part of the "isolation" command. It also creates the appropriate routes to make it possible for the executor to talk to the agent and any other nodes as needed.

@karya0
Copy link
Contributor

karya0 commented Feb 10, 2016

We can work together on getting a README.md that is more catered towards writing a new isolation-service-provider. Does that sound reasonable?

@ozdanborne
Copy link
Contributor

There's some info on it here, but its a tad outdated: https://github.com/mesosphere/net-modules/blob/master/docs/api.md#network-virtualizer-api

@unclejack
Copy link
Author

@karya0 That sounds good. I'll write my implementation and send some PRs to update the documentation.

@djosborne It's unfortunate that the API documentation is a bit outdated. I've been using that documentation to implement an IPAM and isolator. Up to date documentation would have saved me lots of hours of debugging.

@karya0
Copy link
Contributor

karya0 commented Feb 10, 2016

SGTM.

@sanfern
Copy link

sanfern commented Jul 2, 2016

I am facing the similar issue. Can you point me to latest doc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants