This repository has been archived by the owner. It is now read-only.
Permalink
Commits on Apr 27, 2018
  1. readme: Deprecate the project

    kklin committed Apr 27, 2018
Commits on Mar 28, 2018
  1. js/install: Fix path of the base infrastructure script

    kklin committed Mar 28, 2018
    `kelda base-infrastructure` shells out to `kelda-base-infra-init.js`,
    but we installed the script as `kelda-base-infra.js` instead. This
    caused the command to fail since it couldn't find the script.
  2. changelog: Prepare for release 0.13.0

    kklin committed Mar 27, 2018
Commits on Mar 26, 2018
  1. supervisor: Properly restart containers when they exit

    kklin committed Mar 19, 2018
    Before, if a system container exited, the supervisor would attempt to start a
    new one in its place, but the creation would fail because Docker doesn't
    allow having two containers with the same name. This commit gets around
    this by renaming the old container so that the new container can use the
    friendly name.
  2. google: Upgrade Ubuntu to latest version

    kklin committed Mar 20, 2018
Commits on Mar 20, 2018
  1. foreman: Don't update machine row when it's stopping

    kklin committed Mar 20, 2018
    Before, if the foreman tried to connect to a machine while it was
    stopping, it would overwrite the machine's Stopping status with a status
    generated from the connection state. This was confusing because it
    looked like the machine wasn't stopping anymore. Furthermore, the
    information isn't useful since the machine is going to be stopped soon
    anyways.
  2. ovs: Upgrade to 2.9.0

    kklin committed Mar 20, 2018
    Version 2.8.1 didn't support the 4.13 kernel. We'll be upgrading to 4.13
    when we update the Ubuntu images.
  3. openflow: Trigger on connection table

    kklin committed Mar 15, 2018
    This way, the openflow will be updated as soon as connections are
    changed, rather than at the next tick of the trigger.
  4. network: Decrease chance of public connections race condition

    kklin committed Mar 15, 2018
    Right now, the code that boots containers is completely disjoint from
    the code that sets up inbound/outbound public network connections. The
    flow of data is completely different:  the kubelet boots containers when
    it reads the container from the Kubernetes-managed partition of etcd,
    while the minion sets up public connections when it reads the
    connections and containers from the Kelda-managed partition of etcd.
    However, configuring the public connections requires the OVS ports to
    have been created, so it must happen after the kubelet boots the
    containers.
    
    While this commit does not remove the possibility of this race
    condition, it increases the polling interval to decrease the possible
    time that the container's public connections will be unconfigured. An
    upcoming change to our OVN architecture will make it possible to fully
    fix this lack of synchronization between container boot and network
    configuration.
  5. ci: Log GET failure timestamp

    kklin committed Mar 15, 2018
    It's helpful to see when the GET request failed relative to the other
    requests (e.g. whether it was the first request that failed, and all
    subsequent requests succeeded).
Commits on Mar 19, 2018
  1. amazon: Log more helpful error when floating IP is missing

    kklin committed Mar 14, 2018
    Before, if a user tried to boot a machine with an IP that wasn't
    reserved, we logged the very unhelpful error:
    
    MissingParameter: Either public IP or allocation id must be specified
    
    This commit makes the error much clearer:
    
    unknown floating IP 8.8.8.8. Has the IP been reserved for the region us-west-1?
  2. ci: Increase the permissible delay in build-dockerfile test

    kklin committed Mar 8, 2018
    Before, the test failed if any containers were started more than 15
    seconds apart. While uncommon, it was possible for the containers to
    take longer to boot because of unrelated issues, such as network latency.
    
    This commit increases the maximum duration to account for such
    issues, while still asserting that the images were built in parallel.
  3. ci: Use human readable time output in build-dockerfile test

    kklin committed Mar 8, 2018
    The duration was printed in milliseconds before. This commit makes it so
    that the easiest-to-read unit is used (e.g. 2s, or 3m15s).
  4. cloud: Poll less aggressively once machines are booted

    kklin committed Mar 15, 2018
    Before, if there were any machines running in a region, we polled it
    once every five seconds. This resulted in a lot of unnecessary API
    requests to the cloud provider since once the machines are booted, they
    won't change unless they die.
    
    This commit modifies the poll logic so that it polls once every 30
    seconds after the machines have booted to pick up machines that have
    terminated. It also tweaks the other poll intervals to take advantage of
    the ability to return a specific poll interval (e.g. retrying faster
    when a provider that's used by the blueprint fails to initialize).
  5. supervisor: Run worker join regularly

    kklin committed Mar 18, 2018
    Before, the supervisor join that booted the system containers on workers
    only ran based on database triggers. Therefore, if booting a system container
    failed (or if a system container died), the container wouldn't get
    booted again until the next time the database triggered the Minion or
    Etcd table.
    
    This commit makes it so the join runs every 30 seconds, in addition to
    the database triggers.
Commits on Mar 17, 2018
  1. foreman: Write machine role directly to database

    kklin committed Feb 25, 2018
    Before, the foreman stored the status information for minions (their
    connection state, and their role) in the foreman package, and required
    the cloud join to query the roles every time it ran its join.
    
    This commit makes it so that the foreman writes the status information
    directly to the database, and makes the cloud join operate on just the
    information in the database. This way, the cloud join will trigger on
    changes to the role, just like any other database attribute.
Commits on Mar 15, 2018
  1. connection: Improve error message when unix socket already exists

    kklin committed Mar 15, 2018
    It's somewhat common for users to accidentally run two daemons on the
    same machine. Because this error is especially common with
    first-time users, this commit updates the error message to have
    instructions for fixing the error.
  2. network: Allow repeated hostnames in connections

    kklin committed Mar 4, 2018
    Before, if `allowTraffic` was called with repeated hostnames in the
    `From` or `To` set (e.g. `kelda.allowTraffic([blue, blue], [red], 80)`),
    the minion would fail to setup the connection. This was because OVSDB
    requires address sets have no duplicate items, so it would reject the
    call to create the address set for the connection.
    
    This commit makes it so duplicates are filtered before being passed to
    OVSDB.
  3. ci: Increase container boot timeout

    kklin committed Mar 14, 2018
    The Kibana container can take around 15 minutes to build on
    DigitalOcean, so this commit increases the boot timeout for all
    containers to 20 minutes.
    
    The container takes so long to build because downloading the plugin can
    take a long time (I saw 7 minutes when testing) since the plugin is
    hosted in a S3 bucket in us-east, but the DigitalOcean VM is running in
    SFO.
    
    Additionally, the "optimization" step of the plugin installation takes
    multiple minutes due to a Kibana implementation detail. This is true on
    all of the cloud providers. See
    https://discuss.elastic.co/t/x-pack-not-installed-for-kibana-6-1-0/111840
    for more information.
Commits on Mar 12, 2018
  1. vendor: Remove unused dependencies

    kklin committed Mar 9, 2018
  2. kelda: Replace scheduler with Kubernetes

    kklin committed Jan 6, 2018
    Before, we had a custom scheduler for ensuring the desired containers
    were running. While it is not too difficult to implement a basic
    scheduler, moving to Kubernetes will allow us to iterate more quickly on
    features such as volumes since Kubernetes already implements much of the
    logic for actually creating and managing volumes -- we will be able to
    just focus on how to express volumes in the blueprint language.
    
    Furthermore, this change allows us to potentially, in the future, run on
    top of an existing Kubernetes cluster. This will be helpful given the
    general movement in industry towards using Kubernetes.
    
    At a high level, this commit encompasses four major changes:
    1) It modifies the deployment engine to boot a Kubernetes cluster on the
    Kelda machines.
    2) It replaces the custom scheduler with code that translates the
    containers specified by the user into deployments to be enforced by the
    Kubernetes cluster.
    3) It replaces the Docker network plugin with a CNI network plugin
    compatible with Kubernetes.
    4) It replaces Vault in favor of the built-in Kubernetes solution for
    secret management. It's simpler to just use the Kubernetes solution, and
    we weren't making use of any of the more advanced Vault features.
  3. minion: Use blueprint row in database

    kklin committed Mar 5, 2018
    This commit makes it so the minion stores the user's blueprint in the
    Blueprint database table rather than as a field in the Minion table.
    Before, the Blueprint database table was only used by `kelda daemon`.
    
    This commit also makes it so every time the blueprint is updated, the
    minion's dependent rows are also updated. This way, on the leader
    minion, there's never a case where containers in the database reference
    volumes in the blueprint that haven't been committed to the database
    yet. While this won't solve the current log test failure because it's
    still possible for the blueprint to be inconsistent on the workers
    (since the workers don't derive their containers from the blueprint,
    they read them from etcd), this consistency is necessary for the
    Kubernetes patch.
Commits on Mar 8, 2018
  1. engine: Don't modify placement rules if blueprint is reprocessed

    kklin committed Feb 27, 2018
    Before, if the leader reprocessed the same blueprint (as happens when
    the daemon restarts) the engine would sometimes modify the placement
    rules in the database. This was because the `portPlacements` function
    made use of the fact that the order of TargetContainer and
    OtherContainer doesn't matter:  if there was an exclusive placement, it
    would create one rule, using the first referenced container in the given
    connections as the TargetContainer. Therefore, if the order of
    connections passed to `portPlacements` changed, the resulting placement
    rules might have swapped TargetContainers and OtherContainers. Note that
    it's expected for the order to change since the connections are read
    from the database, which returns rows in a non-deterministic order.
    
    This commit addresses this problem by creating two placement rules for
    each exclusive placement:  one rule for each affected container, with
    it as the TargetContainer.
    
    Although this placement swapping didn't result in incorrect behavior in
    the current scheduler implementation, it will be problematic in the
    Kubernetes implementation. This is because the placement information is
    part of the deployment spec we pass to Kubernetes, so if the placement
    rules move from one container to another, their deployment specs will
    change, and Kubernetes will restart the changed deployments.
    
    Furthermore, defining placement rules for both affected containers will
    make the Kubernetes deployment specs easier to debug. Looking at the
    placement rules for a given deployment will be sufficient to understand
    its placement constraints:  there's no possibility that another
    deployment has a placement rule that references the container.
Commits on Mar 2, 2018
  1. changelog: Prepare for release 0.12.0

    kklin committed Mar 2, 2018
  2. ci: Add a test for FUSE mounts

    kklin committed Mar 1, 2018
    This commit adds an integration test that checks that if the
    `privileged` flag is enabled, containers can create FUSE mounts.
  3. ci: Add test for hostPath volumes

    kklin committed Mar 1, 2018
    This commit adds an integration test for mounting files from the host.
  4. kelda: Add support for hostPath volumes

    kklin committed Mar 1, 2018
    This commit adds the `Volume` and `VolumeMount` classes, which allow
    containers to specify volumes. Currently, only `hostPath` volumes are
    supported.
    
    hostPath volumes are necessary for containers to access the Docker run
    socket on the host.
  5. bindings: Encapsulate unique name logic in a class

    kklin committed Mar 1, 2018
    An upcoming commit will need to make use of the same logic for volume
    names. It won't be able to just reuse the `uniqueHostname` function
    because volume names and network hostnames can overlap.
  6. ci: Add integration test for privileged mode

    kklin committed Feb 28, 2018
    This commit adds an integration test that ensures containers in
    privileged mode can list the `/dev/fuse` device.
  7. kelda: Support running containers in privileged mode

    kklin committed Feb 28, 2018
    This commit adds the `Privileged` field to the `Container` class to
    support running containers in Privileged mode. The effect is the same as
    running `docker run --privileged IMAGE` on the command line.
    
    This feature is necessary in order to mount FUSE mounts, since the
    container needs access to the host's /dev/fuse device.
Commits on Feb 25, 2018
  1. changelog: Prepare for release 0.11.0

    kklin committed Feb 24, 2018
Commits on Feb 24, 2018
  1. readme: Remove trailing whitespace

    kklin committed Feb 23, 2018