Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Already on GitHub? Sign in to your account

hub module creates loops upon link failure / recovery #26

Open
colin-scott opened this Issue Jan 20, 2014 · 30 comments

Comments

Projects
None yet
9 participants

Hey,

We found what appears to be a bug in pyretic's hub module (proactive0 mode) while running some experiments.

We discovered a loop in the network while fuzz testing pyretic's pyretic.modules.hub module using STS on a 3-switch mesh topology.

After minimizing the trace generated from fuzz testing, we found that the events that triggered the loop were a link failure followed by a link recovery.

We took a brief look at pyretic's code, and we believe the root cause is related to the invocation at line 450 of pyretic/core/network.py:

self.reconcile_attributes(topology) # Root Cause?

Our understanding of this bug is that reconcile_attributes is neglecting to filter out down links, which is ultimately causing the MST to be computed improperly. So for example, on a 3-switch mesh, suppose pyretic initially computes the MST s1 <-> s2 <-> s3. Then, link s1 <-> s2 goes down, so pyretic recomputes a new MST s1 <-> s3 <-> s2. Then, the link goes back up. reconconcile_attributes neglects to account for the failed/recovered link, so we end up with flow entries in the network s1 <-> s3 <-> s2 <-> s1, which forms a loop. The loop seems to stay around for some time, until other events in the network cause pyretic to correct its mistake, or until pyretic periodically flushes all flow entries.

Can you verify that this is indeed a bug?

Steps for reproducing:

First, we need to fix a minor issue. Unfortunately, STS depends on a different version of POX than pyretic, both of which will be in PYTHONPATH. In pyretic.py, hardcode poxpath rather than searching PYTHONPATH, e.g.:

171         poxpath = '/home/mininet/pox'
172         #for p in output.split(':'):
173         #     if re.match('.*pox/?$',p):
174         #         poxpath = os.path.abspath(p)
175         #         break

Then clone STS and replay the minimized trace:

$ git clone git://github.com/ucb-sts/sts.git
$ cd sts
$ git clone -b frenetic_test git://github.com/ucb-sts/pox.git
$ (git submodule init && git submodule update && cd sts/hassel/hsa-python && source setup.sh)
$ git clone git://github.com/ucb-sts/experiments.git
# Assumes pyretic is at ../pyretic/
$ ./simulator.py -c experiments/new_pyretic_loop_mcs/replay_config.py

Throughout the replay you can see both STS's and pyretic's console output (pyretic's is in orange). You can also view the console output offline at experiments/new_pyretic_loop_mcs_replay/simulator.out

At the end of the replay you can examine the flow entries to see the loop:

STS [next] >show_flows 1
--------------------------------------------------------------------------------------------------------------------------------
|  Prio | in_port | dl_type | dl_src | dl_dst | nw_proto | nw_src | nw_dst | tp_src | tp_dst |                         actions |
--------------------------------------------------------------------------------------------------------------------------------
...
| 59995 |       2 |    None |   None |   None |     None |   None |   None |   None |   None |   output(1), IN_PORT, output(3) |
--------------------------------------------------------------------------------------------------------------------------------
STS [next] >show_flows 2
--------------------------------------------------------------------------------------------------------------------------------
|  Prio | in_port | dl_type | dl_src | dl_dst | nw_proto | nw_src | nw_dst | tp_src | tp_dst |                         actions |
--------------------------------------------------------------------------------------------------------------------------------
...
| 59999 |       1 |    None |   None |   None |     None |   None |   None |   None |   None |            output(2), output(3) |
--------------------------------------------------------------------------------------------------------------------------------
STS [next] >show_flows 3
---------------------------------------------------------------------------------------------------------------------
|  Prio | in_port | dl_type | dl_src | dl_dst | nw_proto | nw_src | nw_dst | tp_src | tp_dst |              actions |
---------------------------------------------------------------------------------------------------------------------
| 59995 |    None |    None |   None |   None |     None |   None |   None |   None |   None | output(1), output(3) |
---------------------------------------------------------------------------------------------------------------------

You can also see examine the network topology:

STS [next] > topology.network_links
[(1:1) -> (3:2), (1:2) -> (2:2), (3:2) -> (1:1), (2:2) -> (1:2), (2:1) -> (3:1), (3:1) -> (2:1)]

And verify that the invariant violation is there:

STS [next] > inv loops

Thanks!

Member

joshreich commented Jan 20, 2014

Hi Colin,

Thanks so much for reporting this! It is rare we get a bug report that is so clearly and completely documented (but I guess that's the whole point of STS ;-). I'm traveling today and tomorrow but will try to look into it by Wed. evening, though it's possible another member of the team will have time before I do.

Cheers,
-Josh

I also just noticed this on one of delta debugging's iterations for the original trace:

[c1]   File "/home/mininet/pyretic/of_client/pox_client.py", line 512, in _handle_ConnectionUp
[c1]     assert event.dpid not in self.switches
[c1] AssertionError

Any chance you could verify that this is a bug before Friday? I'm guessing you're probably working on your own submission so I understand if you don't have any time, but hopefully this shouldn't take too long.

Thanks!
-Colin

Member

joshreich commented Jan 28, 2014

Hi Colin,

I thought @nkatta response had been recorded here, but I see that it hasn't. I'm inlining that response below, which verifies that you have detected at least one bug in our topology detection code. I've also assigned this issue to @nkatta and @SiGe who will hopefully look into the assertion error above (you are right, I'm pushing for an end-of-the-week deadline myself), though I'm pretty sure it's a genuine bug as well.


Hi guys,

Omid and I were looking at the origins of the bug and the following seems to be the case :

Pyretic is basicaly installing rules that forward packets on non-existent links — even before the network topology is discovered, we are installing rules that flood packets on all the ports of a switch. This ends up creating a loop in the installed policy. Doing away with such loops in the bootstrap phase seems to be particularly tricky until you discover the entire topology. So, one possible solution is that we wait (with the help of a timer — a best effort scenario) till we discover the topology (by sending LLDP packets).

-Naga


@ghost ghost assigned nkatta, SiGe and nkatta Jan 28, 2014

Awesome, thanks guys!

Member

joshreich commented Jan 28, 2014

very welcome and good luck! (it looks like you are doing really neat stuff :-)

Member

nkatta commented Jan 28, 2014

Hi Colin,

My take on this is that there is a bug (in the Header Space Analysis sense) - Pyretic installs a policy that can make packets go in loop when topology changes occur. So we should make sure we do not install such rules. A “strict” solution in this case would be to treat “ports” that are not linked to another port (by topology discovery) as “down” ports and not forward packets on those ports. But this would mean we cannot forward packets to end hosts (because they don’t respond to LLDP packets). But we do not want that! Another “non-strict” solution alternative is to treat the topology bootstrap process differently and set a timer so that we wait till the entire topology is discovered and then produce a policy that can be installed on the network. However, this might not solve the problem entirely because you may not discover the topology entirely before the timer expires.

The question is it possible to do away with this bug entirely? The answer seems to be “No” (?). There is always a timer that is set before you decide you discovered the entire topology using LLDP packets and then decide to install your policy on the switches. It can always happen that the topology discovery may not complete before the timer expires and hence one can produce a packet that might either end up in loops (if we do it the “non-strict" way) or it may actually isolate end-hosts entirely (if we did it the strict way). My guess is that there is a similar issue with every other controller that is out there because all of them should depend on timers of some sort for topology discovery.

Thanks
Naga

Makes sense. But I'm not sure that the distinction between "learning the topology" and "topology has been learned" is meaningful, simply because the topology can change at any time (i.e. links can go down or come up at any point), and there is a delay between those changes and when the controller learns about them. In other words, the controller is always bootstrapping its knowledge -- it never really stops.

If we place a priori resrictions on which ports hosts can be attached to, would you be able to avoid the loops?

Member

SiGe commented Jan 29, 2014

Colin, I have a question regarding the show_flows command.

We always install higher priority rules first (wireshark supports this as
well), but in the show_flows some of the lower priority rules are installed
while the higher priority ones are missing. I am not quite sure how this
might happen. Do you have any insights that might help?

On Tue, Jan 28, 2014 at 7:03 PM, Colin Scott notifications@github.comwrote:

Makes sense. But I'm not sure that the distinction between "learning the
topology" and "topology has been learned" is meaningful, simply because the
topology can change at any time (i.e. links can go down or come up at
any point), and there is a delay between those changes and when the
controller learns about them.

If we place a priori resrictions on which ports hosts can be attached to,
would you be able to avoid the loops?

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33542669
.

Member

nkatta commented Jan 29, 2014

@colin-scott I sort of abused the terminology there -- it is indeed not meaningful to say that I have learned the topology at some stage (as the topology does evolve all the time). What I meant was that there should be some way to recognize where the hosts are connected as you were suggesting. That should help avoid loops.

If you run sts in verbose mode (pass -v on the command line) it should
print the flow table of each switch whenever a flow_mod arrives. You could
trace through the console output to see when the higher priority flow_mods
are installed or uninstalled.

On Tue, Jan 28, 2014 at 5:33 PM, Omid Alipourfard
notifications@github.comwrote:

Collin, I have a question regarding the show_flows command.

We always install higher priority rules first (wireshark supports this as
well), but in the show_flows some of the lower priority rules are installed
while the higher priority ones are missing. I am quite not sure how this
might happen. Do you have any insights that might help?

On Tue, Jan 28, 2014 at 7:03 PM, Colin Scott <notifications@github.com

wrote:

Makes sense. But I'm not sure that the distinction between "learning the
topology" and "topology has been learned" is meaningful, simply because
the
topology can change at any time (i.e. links can go down or come up at
any point), and there is a delay between those changes and when the
controller learns about them.

If we place a priori resrictions on which ports hosts can be attached to,
would you be able to avoid the loops?

Reply to this email directly or view it on GitHub<
https://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33542669>

.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33548196
.

Member

princedpw commented Jan 29, 2014

We might sit down and try to think of a principled solution to this problem
in an abstract setting. Try to give precise definitions of "host port" and
"internal port" and "inactive port" based on packets seen arriving at such
ports (or by external fiat). Then enforce constraints on policy or rule
installation based on those definitions.

Dave

On Tue, Jan 28, 2014 at 8:39 PM, Colin Scott notifications@github.comwrote:

If you run sts in verbose mode (pass -v on the command line) it should
print the flow table of each switch whenever a flow_mod arrives. You could
trace through the console output to see when the higher priority flow_mods
are installed or uninstalled.

On Tue, Jan 28, 2014 at 5:33 PM, Omid Alipourfard
notifications@github.comwrote:

Collin, I have a question regarding the show_flows command.

We always install higher priority rules first (wireshark supports this as
well), but in the show_flows some of the lower priority rules are
installed
while the higher priority ones are missing. I am quite not sure how this
might happen. Do you have any insights that might help?

On Tue, Jan 28, 2014 at 7:03 PM, Colin Scott <notifications@github.com

wrote:

Makes sense. But I'm not sure that the distinction between "learning
the
topology" and "topology has been learned" is meaningful, simply because
the
topology can change at any time (i.e. links can go down or come up at
any point), and there is a delay between those changes and when the
controller learns about them.

If we place a priori resrictions on which ports hosts can be attached
to,
would you be able to avoid the loops?

Reply to this email directly or view it on GitHub<
#26 (comment)

.

Reply to this email directly or view it on GitHub<
https://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33548196>

.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33548498
.

AghOmid commented Jan 29, 2014

Hello Colin,

It seems like that the order that the OpenFlow rules get installed are not
the same as the order that we sent them to the switch. Is there anything on
STS side that might cause this?

Adding a timer in between rule installations seems to fix the loop
invariant violation. Does STS reorder the FlowMods in someway? [Wireshark
shows the same ordering, while STS shows a different ordering when rules
are installed.]

Wireshark: the first flow mod that was sent: 60000
STS: The first flow mod that was installed: 59996

On Tue, Jan 28, 2014 at 9:11 PM, David Walker notifications@github.comwrote:

We might sit down and try to think of a principled solution to this problem
in an abstract setting. Try to give precise definitions of "host port" and
"internal port" and "inactive port" based on packets seen arriving at such
ports (or by external fiat). Then enforce constraints on policy or rule
installation based on those definitions.

Dave

On Tue, Jan 28, 2014 at 8:39 PM, Colin Scott <notifications@github.com

wrote:

If you run sts in verbose mode (pass -v on the command line) it should
print the flow table of each switch whenever a flow_mod arrives. You
could
trace through the console output to see when the higher priority
flow_mods
are installed or uninstalled.

On Tue, Jan 28, 2014 at 5:33 PM, Omid Alipourfard
notifications@github.comwrote:

Collin, I have a question regarding the show_flows command.

We always install higher priority rules first (wireshark supports this
as
well), but in the show_flows some of the lower priority rules are
installed
while the higher priority ones are missing. I am quite not sure how
this
might happen. Do you have any insights that might help?

On Tue, Jan 28, 2014 at 7:03 PM, Colin Scott <notifications@github.com

wrote:

Makes sense. But I'm not sure that the distinction between "learning
the
topology" and "topology has been learned" is meaningful, simply
because
the
topology can change at any time (i.e. links can go down or come up
at
any point), and there is a delay between those changes and when the
controller learns about them.

If we place a priori resrictions on which ports hosts can be attached
to,
would you be able to avoid the loops?

Reply to this email directly or view it on GitHub<

#26 (comment)

.

Reply to this email directly or view it on GitHub<
#26 (comment)

.

Reply to this email directly or view it on GitHub<
https://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33548498>

.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33550149
.

Member

princedpw commented Jan 29, 2014

we need consistent updates.

On Tue, Jan 28, 2014 at 9:24 PM, Omid notifications@github.com wrote:

Hello Collin,

It seems like that the order that the OpenFlow rules get installed are not
the same as the order that we sent them to the switch. Is there anything on
STS side that might cause this?

Adding a timer in between rule installations seems to fix the loop
invariant violation. Does STS reorder the FlowMods in someway? [Wireshark
shows the same ordering, while STS shows a different ordering when rules
are installed.]

Wireshark: the first flow mod that was sent: 60000
STS: The first flow mod that was installed: 59996

On Tue, Jan 28, 2014 at 9:11 PM, David Walker <notifications@github.com

wrote:

We might sit down and try to think of a principled solution to this
problem
in an abstract setting. Try to give precise definitions of "host port"
and
"internal port" and "inactive port" based on packets seen arriving at
such
ports (or by external fiat). Then enforce constraints on policy or rule
installation based on those definitions.

Dave

On Tue, Jan 28, 2014 at 8:39 PM, Colin Scott <notifications@github.com

wrote:

If you run sts in verbose mode (pass -v on the command line) it should
print the flow table of each switch whenever a flow_mod arrives. You
could
trace through the console output to see when the higher priority
flow_mods
are installed or uninstalled.

On Tue, Jan 28, 2014 at 5:33 PM, Omid Alipourfard
notifications@github.comwrote:

Collin, I have a question regarding the show_flows command.

We always install higher priority rules first (wireshark supports
this
as
well), but in the show_flows some of the lower priority rules are
installed
while the higher priority ones are missing. I am quite not sure how
this
might happen. Do you have any insights that might help?

On Tue, Jan 28, 2014 at 7:03 PM, Colin Scott <
notifications@github.com

wrote:

Makes sense. But I'm not sure that the distinction between
"learning
the
topology" and "topology has been learned" is meaningful, simply
because
the
topology can change at any time (i.e. links can go down or come
up
at
any point), and there is a delay between those changes and when the
controller learns about them.

If we place a priori resrictions on which ports hosts can be
attached
to,
would you be able to avoid the loops?

Reply to this email directly or view it on GitHub<

#26 (comment)

.

Reply to this email directly or view it on GitHub<

#26 (comment)

.

Reply to this email directly or view it on GitHub<
#26 (comment)

.

Reply to this email directly or view it on GitHub<
https://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33550149>

.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33550675
.

Owner

jnfoster commented Jan 29, 2014

"We might sit down and try to think of a principled solution to this
problem"

"we need consistent updates."

Indeed. Party like it's January 2012.

-N

On Tue, Jan 28, 2014 at 9:42 PM, David Walker notifications@github.comwrote:

we need consistent updates.

On Tue, Jan 28, 2014 at 9:24 PM, Omid notifications@github.com wrote:

Hello Collin,

It seems like that the order that the OpenFlow rules get installed are
not
the same as the order that we sent them to the switch. Is there anything
on
STS side that might cause this?

Adding a timer in between rule installations seems to fix the loop
invariant violation. Does STS reorder the FlowMods in someway? [Wireshark
shows the same ordering, while STS shows a different ordering when rules
are installed.]

Wireshark: the first flow mod that was sent: 60000
STS: The first flow mod that was installed: 59996

On Tue, Jan 28, 2014 at 9:11 PM, David Walker <notifications@github.com

wrote:

We might sit down and try to think of a principled solution to this
problem
in an abstract setting. Try to give precise definitions of "host port"
and
"internal port" and "inactive port" based on packets seen arriving at
such
ports (or by external fiat). Then enforce constraints on policy or rule
installation based on those definitions.

Dave

On Tue, Jan 28, 2014 at 8:39 PM, Colin Scott <notifications@github.com

wrote:

If you run sts in verbose mode (pass -v on the command line) it
should
print the flow table of each switch whenever a flow_mod arrives. You
could
trace through the console output to see when the higher priority
flow_mods
are installed or uninstalled.

On Tue, Jan 28, 2014 at 5:33 PM, Omid Alipourfard
notifications@github.comwrote:

Collin, I have a question regarding the show_flows command.

We always install higher priority rules first (wireshark supports
this
as
well), but in the show_flows some of the lower priority rules are
installed
while the higher priority ones are missing. I am quite not sure how
this
might happen. Do you have any insights that might help?

On Tue, Jan 28, 2014 at 7:03 PM, Colin Scott <
notifications@github.com

wrote:

Makes sense. But I'm not sure that the distinction between
"learning
the
topology" and "topology has been learned" is meaningful, simply
because
the
topology can change at any time (i.e. links can go down or come
up
at
any point), and there is a delay between those changes and when
the
controller learns about them.

If we place a priori resrictions on which ports hosts can be
attached
to,
would you be able to avoid the loops?

Reply to this email directly or view it on GitHub<

#26 (comment)

.

Reply to this email directly or view it on GitHub<

#26 (comment)

.

Reply to this email directly or view it on GitHub<

#26 (comment)

.

Reply to this email directly or view it on GitHub<
#26 (comment)

.

Reply to this email directly or view it on GitHub<
https://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33550675>

.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33551442
.

Member

SiGe commented Jan 29, 2014

@princedpw @jnfoster Haha, I guess this brings back memories.

@princedpw We need consistent updates, but I feel like this problem goes a bit beyond that. It seems like that either STS or the software switch is not preserving the order that flow mods are installed. Doesn't TCP already guarantee that packets are received in order?

Collaborator

reitblatt commented Jan 29, 2014

Messages are guaranteed to be delivered in order via TCP, but OF does not
require that messages be processed in order. This is to allow switches to
optimally reschedule messages depending upon their HW limitations. For
example, a switch may not be able to process two FLOW_MOD that match on IP
simultaneously, but it can install an ethernet FLOW_MOD and an IP FLOW_MOD
simultaneously (because they hit different HW tables). If message ordering
matters, then you have to use a BARRIER.

On Tue, Jan 28, 2014 at 9:55 PM, Omid Alipourfard
notifications@github.comwrote:

@princedpw https://github.com/princedpw @jnfosterhttps://github.com/jnfosterHaha, I guess this brings back memories.

@princedpw https://github.com/princedpw We need consistent updates, but
I feel like this problem goes a bit beyond that. Ot seems like that either
STS or the software switch is not preserving the order that flow mods are
installed. Doesn't TCP already guarantee that packets are received in order?

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33551973
.

Member

SiGe commented Jan 29, 2014

@reitblatt, I tried to send barrier messages in between install_rules. Rules still get installed in the wrong order. The only explanation that I could come up with is that something is changing the order that OF messages are getting sent.

Collaborator

reitblatt commented Jan 29, 2014

Must be a bug. If you send a BARRIER between each FLOW_MOD, then there
should be no reordering. I noticed in experiments a while ago that
wireshark missed some OF messages (in that case BARRIER_REQ). I never
tracked down that bug.

On Tue, Jan 28, 2014 at 10:04 PM, Omid Alipourfard <notifications@github.com

wrote:

@reitblatt https://github.com/reitblatt, I tried to send barrier
messages in between install_rules. Rules still get installed in the wrong
order. The only explanation that I could come up with is that something is
changing the order that OF messages are getting sent.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33552344
.

@jnfoster Isn't it the case that consistent updates are intended only for planned changes? (whereas link failures are unplanned?)

@SiGe By default STS' switches should always install flow_mods in-order, immediately upon receiving them. There is an option to have them reorder flow_mods (between barrier_in's), but I can't think of why that would be enabled. You're sure that with verbose mode the flow_mods shown on the console show up in the wrong order?

You can also view the order the flow_mods are installed from the original trace with:

./tools/pretty_print_event_trace.py experiments/new_pyretic_loop_mcs/mcs.trace
Member

SiGe commented Jan 29, 2014

Colin,

Going back home now. I'll send you log files of sts/wireshark sometime
tonight.

I'll also check the trace file.
On Jan 28, 2014 10:14 PM, "Colin Scott" notifications@github.com wrote:

@jnfoster https://github.com/jnfoster Isn't it the case that consistent
updates are intended only for planned changes? (whereas link failure are
unplanned)

@SiGe https://github.com/SiGe By default STS' switches should always
install flow_mods in-order, immediately upon receiving them. There is an
option to have them reorder flow_mods (between barrier_in's), but I can't
think of why that would be enabled. You're sure that with verbose mode the
flow_mods shown on the console are processed in the wrong order?

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33552751
.

Owner

jnfoster commented Jan 29, 2014

@jnfoster https://github.com/jnfoster Isn't it the case that consistent
updates are intended only for planned changes? (whereas link failure are
unplanned)

Not at all! Semantically, a consistent update is just a sequence of
instructions that possesses certain properties. There are many possible
implementations. Some implementations, like two-phase update, probably do
work best with planned change. But you could also implement a consistent
update using other features like fast-failover groups that work well with
unexpected situations like link failures.

The point is, with any consistent update, you would be sure that no loops
are introduced as long as the old and new policies are loop-free.

-N

Gotcha, makes sense

mcanini commented Jan 29, 2014

What switch are you running on?
Certain OF agents are known to not implement barriers correctly.

-Marco

On Wed, Jan 29, 2014 at 4:04 AM, Omid Alipourfard
notifications@github.comwrote:

@reitblatt https://github.com/reitblatt, I tried to send barrier
messages in between install_rules. Rules still get installed in the wrong
order. The only explanation that I could come up with is that something is
changing the order that OF messages are getting sent.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33552344
.

mcanini commented Jan 29, 2014

That is reminiscent of how STP works in a LAN. Each port goes through the
learning phase before it is considered safe to forward traffic on that
while guaranteeing loop freedom.

-Marco

On Wed, Jan 29, 2014 at 2:39 AM, Naga Praveen Katta <
notifications@github.com> wrote:

@colin-scott https://github.com/colin-scott I sort of abused the
terminology there -- it is indeed not meaningful to say that I have learned
the topology at some stage (as the topology does evolve all the time). What
I meant was that there should be some way to recognize where the hosts are
connected as you were suggesting. That should help avoid loops.

Reply to this email directly or view it on GitHubhttps://github.com/frenetic-lang/pyretic/issues/26#issuecomment-33548493
.

We built our own. The latest version has been verified as openflow 1.0-compliant by oftest.

STS buffers messages between the controllers and the switches in order to maintain causality during replay. By default, during replay STS should only allow messages that are functionally equivalent to messages in the original run. I'm guessing that what @SiGe is seeing is that some packets are being sent, but not immediately allowed through to the switches since they weren't in the original run. If you want to allow new messages through, you can do so by adding "allow_unexpected_messages=True" as a parameter to Replayer in experiments/new_pyretic_loop_mcs/replay_config.py

Member

SiGe commented Jan 29, 2014

@reitblatt Thanks Mark, that makes sense. Maybe what @colin-scott explains why the barrier messages didn't "work".

@mcanini I am on 1.10.2. I believe this is the same version that @jnfoster was running.

@SiGe Incidentally, if you want to replay the trace exactly as it occurred originally, invoking

./simulator.py -c experiments/new_pyretic_loop_mcs/interactive_replay_config.py

will rerun the execution "headless" without pyretic to show the state transitions the network went through

Member

SiGe commented Jan 29, 2014

@colin-scott ,

Here's a summary of what I have so far.

With allow_unexpected_messages and @reitblatt's suggestion about adding a barrier after every flow_mod the loop invariant violation error that I was getting a few hours back is gone.

Although, I have started to see non-deterministic behavior. On subsequent runs, sometimes I get an error about loop invariant violation (not the same as the first one, although with the same signature), and sometimes I don't. As @princedpw, @jnfoster suggested this will probably get fixed with consistent updates.

I also think that @nkatta and @mcanini are correct in that we need to come up with a solution for identifying port status. Maybe until we "discover" the topology we should not install new rules.

I'll try to find an explanation on why the violation is still happening tomorrow.

P.S You can check my simulation output here: https://github.com/SiGe/sts_output
It is the run with no loop invariant violation, but as I said, in this run I got lucky.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment