This repository has been archived by the owner. It is now read-only.
Permalink
Commits on Apr 27, 2018
  1. readme: Deprecate the project

    kklin committed Apr 27, 2018
Commits on Apr 3, 2018
  1. network: Replace custom OpenFlow with localport OVN configuration

    aegamesi authored and kklin committed Mar 6, 2018
    This removes the custom OpenFlow code used to transport packets
    out of the OVN private network destined for the public internet
    (via IPTables) and the minion (for DNS). The same functionality
    is instead implemented purely with OVN config using the new
    'localport' feature. This should be significantly easier to
    understand and maintain than the custom OpenFlow.
Commits on Mar 28, 2018
  1. js/install: Fix path of the base infrastructure script

    kklin committed Mar 28, 2018
    `kelda base-infrastructure` shells out to `kelda-base-infra-init.js`,
    but we installed the script as `kelda-base-infra.js` instead. This
    caused the command to fail since it couldn't find the script.
  2. changelog: Prepare for release 0.13.0

    kklin committed Mar 27, 2018
Commits on Mar 27, 2018
  1. docs: Add documentation for base-infrastructure and configure-provider

    luise committed Mar 20, 2018
    This commit updates the docs to reflect the separation of `kelda init`
    into `kelda base-infrastructure` and `kelda configure-provider`.
  2. cli: Add command for provider configuration

    luise committed Mar 20, 2018
    This commit adds the `kelda configure-provider` command which will help
    users format their cloud provider credentials and put them in the
    correct location. With this command and the recently added `kelda
    base-infrastructure` command, the old `kelda init` command is redundant
    and can therefore be removed.
  3. cli: Make a separate base-infrastructure command

    luise committed Mar 19, 2018
    We are splitting up `kelda init` to have separate `kelda` commands for
    configuring a base infrastructure and configuring cloud provider
    credentials. As a first step, this commit creates a new command called
    `kelda base-infrastructure` that will only create an infrastructure, and
    not set up provider credentials. `kelda init` stays the same as before.
Commits on Mar 26, 2018
  1. supervisor: Properly restart containers when they exit

    kklin committed Mar 19, 2018
    Before, if a system container exited, the supervisor would attempt to start a
    new one in its place, but the creation would fail because Docker doesn't
    allow having two containers with the same name. This commit gets around
    this by renaming the old container so that the new container can use the
    friendly name.
  2. google: Upgrade Ubuntu to latest version

    kklin committed Mar 20, 2018
Commits on Mar 20, 2018
  1. foreman: Don't update machine row when it's stopping

    kklin committed Mar 20, 2018
    Before, if the foreman tried to connect to a machine while it was
    stopping, it would overwrite the machine's Stopping status with a status
    generated from the connection state. This was confusing because it
    looked like the machine wasn't stopping anymore. Furthermore, the
    information isn't useful since the machine is going to be stopped soon
    anyways.
  2. ovs: Upgrade to 2.9.0

    kklin committed Mar 20, 2018
    Version 2.8.1 didn't support the 4.13 kernel. We'll be upgrading to 4.13
    when we update the Ubuntu images.
  3. openflow: Trigger on connection table

    kklin committed Mar 15, 2018
    This way, the openflow will be updated as soon as connections are
    changed, rather than at the next tick of the trigger.
  4. network: Decrease chance of public connections race condition

    kklin committed Mar 15, 2018
    Right now, the code that boots containers is completely disjoint from
    the code that sets up inbound/outbound public network connections. The
    flow of data is completely different:  the kubelet boots containers when
    it reads the container from the Kubernetes-managed partition of etcd,
    while the minion sets up public connections when it reads the
    connections and containers from the Kelda-managed partition of etcd.
    However, configuring the public connections requires the OVS ports to
    have been created, so it must happen after the kubelet boots the
    containers.
    
    While this commit does not remove the possibility of this race
    condition, it increases the polling interval to decrease the possible
    time that the container's public connections will be unconfigured. An
    upcoming change to our OVN architecture will make it possible to fully
    fix this lack of synchronization between container boot and network
    configuration.
  5. ci: Log GET failure timestamp

    kklin committed Mar 15, 2018
    It's helpful to see when the GET request failed relative to the other
    requests (e.g. whether it was the first request that failed, and all
    subsequent requests succeeded).
Commits on Mar 19, 2018
  1. amazon: Log more helpful error when floating IP is missing

    kklin committed Mar 14, 2018
    Before, if a user tried to boot a machine with an IP that wasn't
    reserved, we logged the very unhelpful error:
    
    MissingParameter: Either public IP or allocation id must be specified
    
    This commit makes the error much clearer:
    
    unknown floating IP 8.8.8.8. Has the IP been reserved for the region us-west-1?
  2. ci: Increase the permissible delay in build-dockerfile test

    kklin committed Mar 8, 2018
    Before, the test failed if any containers were started more than 15
    seconds apart. While uncommon, it was possible for the containers to
    take longer to boot because of unrelated issues, such as network latency.
    
    This commit increases the maximum duration to account for such
    issues, while still asserting that the images were built in parallel.
  3. ci: Use human readable time output in build-dockerfile test

    kklin committed Mar 8, 2018
    The duration was printed in milliseconds before. This commit makes it so
    that the easiest-to-read unit is used (e.g. 2s, or 3m15s).
  4. cloud: Poll less aggressively once machines are booted

    kklin committed Mar 15, 2018
    Before, if there were any machines running in a region, we polled it
    once every five seconds. This resulted in a lot of unnecessary API
    requests to the cloud provider since once the machines are booted, they
    won't change unless they die.
    
    This commit modifies the poll logic so that it polls once every 30
    seconds after the machines have booted to pick up machines that have
    terminated. It also tweaks the other poll intervals to take advantage of
    the ability to return a specific poll interval (e.g. retrying faster
    when a provider that's used by the blueprint fails to initialize).
  5. supervisor: Run worker join regularly

    kklin committed Mar 18, 2018
    Before, the supervisor join that booted the system containers on workers
    only ran based on database triggers. Therefore, if booting a system container
    failed (or if a system container died), the container wouldn't get
    booted again until the next time the database triggered the Minion or
    Etcd table.
    
    This commit makes it so the join runs every 30 seconds, in addition to
    the database triggers.
Commits on Mar 17, 2018
  1. foreman: Write machine role directly to database

    kklin committed Feb 25, 2018
    Before, the foreman stored the status information for minions (their
    connection state, and their role) in the foreman package, and required
    the cloud join to query the roles every time it ran its join.
    
    This commit makes it so that the foreman writes the status information
    directly to the database, and makes the cloud join operate on just the
    information in the database. This way, the cloud join will trigger on
    changes to the role, just like any other database attribute.
Commits on Mar 15, 2018
  1. connection: Improve error message when unix socket already exists

    kklin committed Mar 15, 2018
    It's somewhat common for users to accidentally run two daemons on the
    same machine. Because this error is especially common with
    first-time users, this commit updates the error message to have
    instructions for fixing the error.
  2. network: Allow repeated hostnames in connections

    kklin committed Mar 4, 2018
    Before, if `allowTraffic` was called with repeated hostnames in the
    `From` or `To` set (e.g. `kelda.allowTraffic([blue, blue], [red], 80)`),
    the minion would fail to setup the connection. This was because OVSDB
    requires address sets have no duplicate items, so it would reject the
    call to create the address set for the connection.
    
    This commit makes it so duplicates are filtered before being passed to
    OVSDB.
  3. ci: Increase container boot timeout

    kklin committed Mar 14, 2018
    The Kibana container can take around 15 minutes to build on
    DigitalOcean, so this commit increases the boot timeout for all
    containers to 20 minutes.
    
    The container takes so long to build because downloading the plugin can
    take a long time (I saw 7 minutes when testing) since the plugin is
    hosted in a S3 bucket in us-east, but the DigitalOcean VM is running in
    SFO.
    
    Additionally, the "optimization" step of the plugin installation takes
    multiple minutes due to a Kibana implementation detail. This is true on
    all of the cloud providers. See
    https://discuss.elastic.co/t/x-pack-not-installed-for-kibana-6-1-0/111840
    for more information.
Commits on Mar 12, 2018
  1. vendor: Remove unused dependencies

    kklin committed Mar 9, 2018
  2. kelda: Replace scheduler with Kubernetes

    kklin committed Jan 6, 2018
    Before, we had a custom scheduler for ensuring the desired containers
    were running. While it is not too difficult to implement a basic
    scheduler, moving to Kubernetes will allow us to iterate more quickly on
    features such as volumes since Kubernetes already implements much of the
    logic for actually creating and managing volumes -- we will be able to
    just focus on how to express volumes in the blueprint language.
    
    Furthermore, this change allows us to potentially, in the future, run on
    top of an existing Kubernetes cluster. This will be helpful given the
    general movement in industry towards using Kubernetes.
    
    At a high level, this commit encompasses four major changes:
    1) It modifies the deployment engine to boot a Kubernetes cluster on the
    Kelda machines.
    2) It replaces the custom scheduler with code that translates the
    containers specified by the user into deployments to be enforced by the
    Kubernetes cluster.
    3) It replaces the Docker network plugin with a CNI network plugin
    compatible with Kubernetes.
    4) It replaces Vault in favor of the built-in Kubernetes solution for
    secret management. It's simpler to just use the Kubernetes solution, and
    we weren't making use of any of the more advanced Vault features.
  3. minion: Use blueprint row in database

    kklin committed Mar 5, 2018
    This commit makes it so the minion stores the user's blueprint in the
    Blueprint database table rather than as a field in the Minion table.
    Before, the Blueprint database table was only used by `kelda daemon`.
    
    This commit also makes it so every time the blueprint is updated, the
    minion's dependent rows are also updated. This way, on the leader
    minion, there's never a case where containers in the database reference
    volumes in the blueprint that haven't been committed to the database
    yet. While this won't solve the current log test failure because it's
    still possible for the blueprint to be inconsistent on the workers
    (since the workers don't derive their containers from the blueprint,
    they read them from etcd), this consistency is necessary for the
    Kubernetes patch.
  4. cloud: Prefer to leave a floating IP in place when possible

    ejj committed Mar 8, 2018
    Before this patch, the cloud preferred to join machines with known
    roles even if that meant reassigning a floating IP away from a machine
    with an unknown role to a different worker with a known role.  This
    meant that on boot (or namespace change), a perfectly health machine
    could have its floating IP taken away from it, simply because one of
    its peers declared its role to the daemon before it got a chance to.
    
    There's really no particular reason for this preference, so this patch
    fixes the issue by reversing it.  All things being equal, we will
    prefer to match a machine with the correct floating IP, even if that
    means its role is unknown.
Commits on Mar 9, 2018
  1. README: Add logo

    luise committed Mar 9, 2018
Commits on Mar 8, 2018
  1. engine: Don't modify placement rules if blueprint is reprocessed

    kklin committed Feb 27, 2018
    Before, if the leader reprocessed the same blueprint (as happens when
    the daemon restarts) the engine would sometimes modify the placement
    rules in the database. This was because the `portPlacements` function
    made use of the fact that the order of TargetContainer and
    OtherContainer doesn't matter:  if there was an exclusive placement, it
    would create one rule, using the first referenced container in the given
    connections as the TargetContainer. Therefore, if the order of
    connections passed to `portPlacements` changed, the resulting placement
    rules might have swapped TargetContainers and OtherContainers. Note that
    it's expected for the order to change since the connections are read
    from the database, which returns rows in a non-deterministic order.
    
    This commit addresses this problem by creating two placement rules for
    each exclusive placement:  one rule for each affected container, with
    it as the TargetContainer.
    
    Although this placement swapping didn't result in incorrect behavior in
    the current scheduler implementation, it will be problematic in the
    Kubernetes implementation. This is because the placement information is
    part of the deployment spec we pass to Kubernetes, so if the placement
    rules move from one container to another, their deployment specs will
    change, and Kubernetes will restart the changed deployments.
    
    Furthermore, defining placement rules for both affected containers will
    make the Kubernetes deployment specs easier to debug. Looking at the
    placement rules for a given deployment will be sufficient to understand
    its placement constraints:  there's no possibility that another
    deployment has a placement rule that references the container.
Commits on Mar 7, 2018
  1. docs: Fix dockerfile language tag

    luise committed Mar 6, 2018
    Highlight.js uses "dockerfile" rather than "docker".
  2. bindings: Make getConnectableName() private

    luise committed Mar 6, 2018
    These functions are only used internally, so having them in the
    documentation is probably more confusing than helpful.
  3. docs: Add class properties in JSDoc

    luise committed Mar 6, 2018
    Users might sometimes want to interact with certain properties of the
    created JavaScript objects, but we previously didn't document what
    properties were available. With this commit we document the properties
    that users might want to access or edit. To not confuse users, we
    leave out the properties they shouldn't have to interact with.