Skip to content

Commit

Permalink
ovn: Design and Schema changes for Container integration.
Browse files Browse the repository at this point in the history
The design was come up after inputs and discussions with multiple
people, including (in alphabetical order) Aaron Rosen, Ben Pfaff,
Ganesan Chandrashekhar, Justin Pettit, Russell Bryant and Somik Behera.

Signed-off-by: Gurucharan Shetty <gshetty@nicira.com>
Acked-by: Russell Bryant <rbryant@redhat.com>
Acked-by: Ben Pfaff <blp@nicira.com>
  • Loading branch information
shettyg committed Mar 30, 2015
1 parent a416ff2 commit 9fb4636
Show file tree
Hide file tree
Showing 7 changed files with 387 additions and 43 deletions.
121 changes: 121 additions & 0 deletions ovn/CONTAINERS.OpenStack.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,121 @@
Integration of Containers with OVN and OpenStack
------------------------------------------------

Isolation between containers is weaker than isolation between VMs, so
some environments deploy containers for different tenants in separate
VMs as an additional security measure. This document describes creation of
containers inside VMs and how they can be made part of the logical networks
securely. The created logical network can include VMs, containers and
physical machines as endpoints. To better understand the proposed integration
of containers with OVN and OpenStack, this document describes the end to end
workflow with an example.

* A OpenStack tenant creates a VM (say VM-A) with a single network interface
that belongs to a management logical network. The VM is meant to host
containers. OpenStack Nova chooses the hypervisor on which VM-A is created.

* A Neutron port may have been created in advance and passed in to Nova
with the request to create a new VM. If not, Nova will issue a request
to Neutron to create a new port. The ID of the logical port from
Neutron will also be used as the vif-id for the virtual network
interface (VIF) of VM-A.

* When VM-A is created on a hypervisor, its VIF gets added to the
Open vSwitch integration bridge. This creates a row in the Interface table
of the Open_vSwitch database. As explained in the [IntegrationGuide.md],
the vif-id associated with the VM network interface gets added in the
external_ids:iface-id column of the newly created row in the Interface table.

* Since VM-A belongs to a logical network, it gets an IP address. This IP
address is used to spawn containers (either manually or through container
orchestration systems) inside that VM and to monitor the health of the
created containers.

* The vif-id associated with the VM's network interface can be obtained by
making a call to Neutron using tenant credentials.

* This flow assumes a component called a "container network plugin".
If you take Docker as an example for containers, you could envision
the plugin to be either a wrapper around Docker or a feature of Docker itself
that understands how to perform part of this workflow to get a container
connected to a logical network managed by Neutron. The rest of the flow
refers to this logical component that does not yet exist as the
"container network plugin".

* All the calls to Neutron will need tenant credentials. These calls can
either be made from inside the tenant VM as part of a container network plugin
or from outside the tenant VM (if the tenant is not comfortable using temporary
Keystone tokens from inside the tenant VMs). For simplicity, this document
explains the work flow using the former method.

* The container hosting VM will need Open vSwitch installed in it. The only
work for Open vSwitch inside the VM is to tag network traffic coming from
containers.

* When a container needs to be created inside the VM with a container network
interface that is expected to be attached to a particular logical switch, the
network plugin in that VM chooses any unused VLAN (This VLAN tag only needs to
be unique inside that VM. This limits the number of container interfaces to
4096 inside a single VM). This VLAN tag is stripped out in the hypervisor
by OVN and is only useful as a context (or metadata) for OVN.

* The container network plugin then makes a call to Neutron to create a
logical port. In addition to all the inputs that a call to create a port in
Neutron that are currently needed, it sends the vif-id and the VLAN tag as
inputs.

* Neutron in turn will verify that the vif-id belongs to the tenant in question
and then uses the OVN specific plugin to create a new row in the Logical_Port
table of the OVN Northbound Database. Neutron responds back with an
IP address and MAC address for that network interface. So Neutron becomes
the IPAM system and provides unique IP and MAC addresses across VMs and
containers in the same logical network.

* The Neutron API call above to create a logical port for the container
could add a relatively significant amount of time to container creation.
However, an optimization is possible here. Logical ports could be
created in advance and reused by the container system doing container
orchestration. Additional Neutron API calls would only be needed if the
port needs to be attached to a different logical network.

* When a container is eventually deleted, the network plugin in that VM
may make a call to Neutron to delete that port. Neutron in turn will
delete the entry in the Logical_Port table of the OVN Northbound Database.

As an example, consider Docker containers. Since Docker currently does not
have a network plugin feature, this example uses a hypothetical wrapper
around Docker to make calls to Neutron.

* Create a Logical switch, e.g.:

```
% ovn-docker --cred=cca86bd13a564ac2a63ddf14bf45d37f create network LS1
```

The above command will make a call to Neutron with the credentials to create
a logical switch. The above is optional if the logical switch has already
been created from outside the VM.

* List networks available to the tenant.

```
% ovn-docker --cred=cca86bd13a564ac2a63ddf14bf45d37f list networks
```

* Create a container and attach a interface to the previously created switch
as a logical port.

```
% ovn-docker --cred=cca86bd13a564ac2a63ddf14bf45d37f --vif-id=$VIF_ID \
--network=LS1 run -d --net=none ubuntu:14.04 /bin/sh -c \
"while true; do echo hello world; sleep 1; done"
```

The above command will make a call to Neutron with all the inputs it currently
needs to create a logical port. In addition, it passes the $VIF_ID and a
unused VLAN. Neutron will add that information in OVN and return back
a MAC address and IP address for that interface. ovn-docker will then create
a veth pair, insert one end inside the container as 'eth0' and the other end
as a port of a local OVS bridge as an access port of the chosen VLAN.

[IntegrationGuide.md]:IntegrationGuide.md
4 changes: 3 additions & 1 deletion ovn/automake.mk
Original file line number Diff line number Diff line change
Expand Up @@ -74,7 +74,9 @@ SUFFIXES += .xml
$(AM_V_GEN)$(run_python) $(srcdir)/build-aux/xml2nroff \
--version=$(VERSION) $< > $@.tmp && mv $@.tmp $@

EXTRA_DIST += ovn/TODO
EXTRA_DIST += \
ovn/TODO \
ovn/CONTAINERS.OpenStack.md

# ovn IDL
OVSIDL_BUILT += \
Expand Down
186 changes: 165 additions & 21 deletions ovn/ovn-architecture.7.xml
Original file line number Diff line number Diff line change
Expand Up @@ -149,8 +149,9 @@
<li>
<code>ovn-controller</code>(8) is OVN's agent on each hypervisor and
software gateway. Northbound, it connects to the OVN Database to learn
about OVN configuration and status and to populate the PN and <code>Bindings</code>
tables with the hypervisor's status. Southbound, it connects to
about OVN configuration and status and to populate the PN table and the
<code>Chassis</code> column in <code>Bindings</code> table with the
hypervisor's status. Southbound, it connects to
<code>ovs-vswitchd</code>(8) as an OpenFlow controller, for control over
network traffic, and to the local <code>ovsdb-server</code>(1) to allow
it to monitor and control Open vSwitch configuration.
Expand Down Expand Up @@ -258,6 +259,12 @@
understand. Here's an example.
</p>

<p>
A VIF on a hypervisor is a virtual network interface attached either
to a VM or a container running directly on that hypervisor (This is
different from the interface of a container running inside a VM).
</p>

<p>
The steps in this example refer often to details of the OVN and OVN
Northbound database schemas. Please see <code>ovn</code>(5) and
Expand Down Expand Up @@ -288,7 +295,10 @@
rows to the OVN database <code>Pipeline</code> table to reflect the new
port, e.g. add a flow to recognize that packets destined to the new
port's MAC address should be delivered to it, and update the flow that
delivers broadcast and multicast packets to include the new port.
delivers broadcast and multicast packets to include the new port. It
also creates a record in the <code>Bindings</code> table and populates
all its columns except the column that identifies the
<code>chassis</code>.
</li>

<li>
Expand Down Expand Up @@ -316,24 +326,25 @@
notices <code>external-ids</code>:<code>iface-id</code> in the new
Interface. In response, it updates the local hypervisor's OpenFlow
tables so that packets to and from the VIF are properly handled.
Afterward, it updates the <code>Bindings</code> table in the OVN DB,
adding a row that links the logical port from
<code>external-ids</code>:<code>iface-id</code> to the hypervisor.
Afterward, in the OVN DB, it updates the <code>Bindings</code> table's
<code>chassis</code> column for the row that links the logical port
from <code>external-ids</code>:<code>iface-id</code> to the hypervisor.
</li>

<li>
Some CMS systems, including OpenStack, fully start a VM only when its
networking is ready. To support this, <code>ovn-nbd</code> notices the
new row in the <code>Bindings</code> table, and pushes this upward by
updating the <ref column="up" table="Logical_Port" db="OVN_NB"/> column
in the OVN Northbound database's <ref table="Logical_Port" db="OVN_NB"/>
table to indicate that the VIF is now up. The CMS, if it uses this
feature, can then react by allowing the VM's execution to proceed.
<code>chassis</code> column updated for the row in <code>Bindings</code>
table and pushes this upward by updating the <ref column="up"
table="Logical_Port" db="OVN_NB"/> column in the OVN Northbound
database's <ref table="Logical_Port" db="OVN_NB"/> table to indicate
that the VIF is now up. The CMS, if it uses this feature, can then
react by allowing the VM's execution to proceed.
</li>

<li>
On every hypervisor but the one where the VIF resides,
<code>ovn-controller</code> notices the new row in the
<code>ovn-controller</code> notices the completely populated row in the
<code>Bindings</code> table. This provides <code>ovn-controller</code>
the physical location of the logical port, so each instance updates the
OpenFlow tables of its switch (based on logical datapath flows in the OVN
Expand All @@ -350,16 +361,16 @@
<li>
On the hypervisor where the VM was powered on,
<code>ovn-controller</code> notices that the VIF was deleted. In
response, it removes the logical port's row from the
<code>Bindings</code> table.
response, it removes the <code>Chassis</code> column content in the
<code>Bindings</code> table for the logical port.
</li>

<li>
On every hypervisor, <code>ovn-controller</code> notices the row removed
from the <code>Bindings</code> table. This means that
<code>ovn-controller</code> no longer knows the physical location of the
logical port, so each instance updates its OpenFlow table to reflect
that.
On every hypervisor, <code>ovn-controller</code> notices the empty
<code>Chassis</code> column in the <code>Bindings</code> table's row
for the logical port. This means that <code>ovn-controller</code> no
longer knows the physical location of the logical port, so each instance
updates its OpenFlow table to reflect that.
</li>

<li>
Expand All @@ -376,8 +387,8 @@
<li>
<code>ovs-nbd</code> receives the OVN Northbound update and in turn
updates the OVN database accordingly, by removing or updating the
rows from the OVN database <code>Pipeline</code> table that were related
to the now-destroyed VIF.
rows from the OVN database <code>Pipeline</code> table and
<code>Bindings</code> table that were related to the now-destroyed VIF.
</li>

<li>
Expand All @@ -390,4 +401,137 @@
</li>
</ol>

<h2>Life Cycle of a container interface inside a VM</h2>

<p>
OVN provides virtual network abstractions by converting information
written in OVN_NB database to OpenFlow flows in each hypervisor. Secure
virtual networking for multi-tenants can only be provided if OVN controller
is the only entity that can modify flows in Open vSwitch. When the
Open vSwitch integration bridge resides in the hypervisor, it is a
fair assumption to make that tenant workloads running inside VMs cannot
make any changes to Open vSwitch flows.
</p>

<p>
If the infrastructure provider trusts the applications inside the
containers not to break out and modify the Open vSwitch flows, then
containers can be run in hypervisors. This is also the case when
containers are run inside the VMs and Open vSwitch integration bridge
with flows added by OVN controller resides in the same VM. For both
the above cases, the workflow is the same as explained with an example
in the previous section ("Life Cycle of a VIF").
</p>

<p>
This section talks about the life cycle of a container interface (CIF)
when containers are created in the VMs and the Open vSwitch integration
bridge resides inside the hypervisor. In this case, even if a container
application breaks out, other tenants are not affected because the
containers running inside the VMs cannot modify the flows in the
Open vSwitch integration bridge.
</p>

<p>
When multiple containers are created inside a VM, there are multiple
CIFs associated with them. The network traffic associated with these
CIFs need to reach the Open vSwitch integration bridge running in the
hypervisor for OVN to support virtual network abstractions. OVN should
also be able to distinguish network traffic coming from different CIFs.
There are two ways to distinguish network traffic of CIFs.
</p>

<p>
One way is to provide one VIF for every CIF (1:1 model). This means that
there could be a lot of network devices in the hypervisor. This would slow
down OVS because of all the additional CPU cycles needed for the management
of all the VIFs. It would also mean that the entity creating the
containers in a VM should also be able to create the corresponding VIFs in
the hypervisor.
</p>

<p>
The second way is to provide a single VIF for all the CIFs (1:many model).
OVN could then distinguish network traffic coming from different CIFs via
a tag written in every packet. OVN uses this mechanism and uses VLAN as
the tagging mechanism.
</p>

<ol>
<li>
A CIF's life cycle begins when a container is spawned inside a VM by
the either the same CMS that created the VM or a tenant that owns that VM
or even a container Orchestration System that is different than the CMS
that initially created the VM. Whoever the entity is, it will need to
know the <var>vif-id</var> that is associated with the network interface
of the VM through which the container interface's network traffic is
expected to go through. The entity that creates the container interface
will also need to choose an unused VLAN inside that VM.
</li>

<li>
The container spawning entity (either directly or through the CMS that
manages the underlying infrastructure) updates the OVN Northbound
database to include the new CIF, by adding a row to the
<code>Logical_Port</code> table. In the new row, <code>name</code> is
any unique identifier, <code>parent_name</code> is the <var>vif-id</var>
of the VM through which the CIF's network traffic is expected to go
through and the <code>tag</code> is the VLAN tag that identifies the
network traffic of that CIF.
</li>

<li>
<code>ovn-nbd</code> receives the OVN Northbound database update. In
turn, it makes the corresponding updates to the OVN database, by adding
rows to the OVN database's <code>Pipeline</code> table to reflect the new
port and also by creating a new row in the <code>Bindings</code> table
and populating all its columns except the column that identifies the
<code>chassis</code>.
</li>

<li>
On every hypervisor, <code>ovn-controller</code> subscribes to the
changes in the <code>Bindings</code> table. When a new row is created
by <code>ovn-nbd</code> that includes a value in <code>parent_port</code>
column of <code>Bindings</code> table, the <code>ovn-controller</code>
in the hypervisor whose OVN integration bridge has that same value in
<var>vif-id</var> in <code>external-ids</code>:<code>iface-id</code>
updates the local hypervisor's OpenFlow tables so that packets to and
from the VIF with the particular VLAN <code>tag</code> are properly
handled. Afterward it updates the <code>chassis</code> column of
the <code>Bindings</code> to reflect the physical location.
</li>

<li>
One can only start the application inside the container after the
underlying network is ready. To support this, <code>ovn-nbd</code>
notices the updated <code>chassis</code> column in <code>Bindings</code>
table and updates the <ref column="up" table="Logical_Port"
db="OVN_NB"/> column in the OVN Northbound database's
<ref table="Logical_Port" db="OVN_NB"/> table to indicate that the
CIF is now up. The entity responsible to start the container application
queries this value and starts the application.
</li>

<li>
Eventually the entity that created and started the container, stops it.
The entity, through the CMS (or directly) deletes its row in the
<code>Logical_Port</code> table.
</li>

<li>
<code>ovn-nbd</code> receives the OVN Northbound update and in turn
updates the OVN database accordingly, by removing or updating the
rows from the OVN database <code>Pipeline</code> table that were related
to the now-destroyed CIF. It also deletes the row in the
<code>Bindings</code> table for that CIF.
</li>

<li>
On every hypervisor, <code>ovn-controller</code> receives the
<code>Pipeline</code> table updates that <code>ovn-nbd</code> made in the
previous step. <code>ovn-controller</code> updates OpenFlow tables to
reflect the update.
</li>
</ol>
</manpage>
6 changes: 6 additions & 0 deletions ovn/ovn-nb.ovsschema
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,12 @@
"refTable": "Logical_Switch",
"refType": "strong"}}},
"name": {"type": "string"},
"parent_name": {"type": {"key": "string", "min": 0, "max": 1}},
"tag": {
"type": {"key": {"type": "integer",
"minInteger": 0,
"maxInteger": 4095},
"min": 0, "max": 1}},
"macs": {"type": {"key": "string",
"min": 0,
"max": "unlimited"}},
Expand Down
Loading

0 comments on commit 9fb4636

Please sign in to comment.