New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next: Changes queued for 2017.11 release #1250

Merged
merged 171 commits into from Dec 12, 2017

Conversation

Projects
None yet
8 participants
@lukego
Member

lukego commented Nov 17, 2017

This release will follow quickly on the heels of the much-delayed v2017.09 release.

alexandergall and others added some commits Jun 29, 2017

intel_mp: move shm initialization to constructor
Initialisation of the shm frame needs to be done in the constructor
method to avoid being overwritten by the frame created in core.app.

Add a copy of the mtu to the device-specific stats table, add a
counter to record the speed of the interface.
Fix MAC address registration in VMDq mode
Avoid using index sets and just query the registers since it's
hard to share this state among the apps. Also do the MAC registration
in the right place.
Remove RSS bit check for setting RETA
RSS is always enabled, so there shouldn't be a case in which
this is a no-op anyway. Also the check is complicated by the fact
that RSS can be enabled in many ways (with DCB, VMDq, etc.).
Move MAC pool enable code and make it cross-NIC
Also add code to unset MAC on stop()
Add code for enabling VLAN filtering/tagging
This code is mostly copied from the intel10g.lua driver.
Also add a new VLAN test and adjust old tests as needed.
Refactor vmdq test script
The previous iteration was unreliable for unknown reasons. This
refactoring should be clearer and run more reliably.
Enable VLAN tag stripping in VMDq mode
Also add new test for VLAN stripping
Set registers needed to make Tx test work
 * PFVFTE to enable Tx from appropriate VFs
 * RTTD1TC for bandwidth allocation algorithm for Tx
Fix 1q vmdq test and testrecv.lua
The former wasn't updated for an API change and the latter
was missing a module declaration.
Adjust VLAN test to avoid testing tag insertion
The tag insertion part of this test didn't work because the
MAC didn't match when the app tried to resend the packet. It's
not necessary since the transmit test covers this anyway.
@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Nov 21, 2017

Member

Huzzah! The new intel_mp driver is now adopted as the default and the intel10g driver (which represents some of the first lines of code ever written in Snabb) becomes a legacy backup option.

Great hacking to all the many intel_mp hackers !!! 👍

Member

lukego commented Nov 21, 2017

Huzzah! The new intel_mp driver is now adopted as the default and the intel10g driver (which represents some of the first lines of code ever written in Snabb) becomes a legacy backup option.

Great hacking to all the many intel_mp hackers !!! 👍

Reperator and others added some commits Nov 22, 2017

Fix file mode
`mode` was not passed to `io.open`.
Merge pull request #1228 from dpino/fix-get_mempolicy
Set mask.size equals to MAX_NUMNODES if lower
@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Dec 5, 2017

Member

Sorry about the wait. I wanted to recommend this for release now but I see a performance regression that needs to be resolved. I will first check if this is related to overuse of trace barriers in #1242.

Member

lukego commented Dec 5, 2017

Sorry about the wait. I wanted to recommend this for release now but I see a performance regression that needs to be resolved. I will first check if this is related to overuse of trace barriers in #1242.

@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Dec 6, 2017

Member

Confirmed that the performance regression is due to the JIT barrier. The lukego-optimize performance with that change reverted matched master again. I will test using fewer barriers (on entry or exit to an app, but not both) and see if that is better. Otherwise the JIT barrier might not be suitable (too expensive) for app/engine transitions.

Member

lukego commented Dec 6, 2017

Confirmed that the performance regression is due to the JIT barrier. The lukego-optimize performance with that change reverted matched master again. I will test using fewer barriers (on entry or exit to an app, but not both) and see if that is better. Otherwise the JIT barrier might not be suitable (too expensive) for app/engine transitions.

@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Dec 8, 2017

Member

I have reverted the calls to jit.tracebarrier() around app callbacks. These seem to be a bit of a performance drag overall and their benefit is not really established.

So the jit.tracebarrier() primitive still exists but the engine doesn't currently use it.

Let's wait for the standard Hydra tests to complete now and then we should be 👍 for the release.

Member

lukego commented Dec 8, 2017

I have reverted the calls to jit.tracebarrier() around app callbacks. These seem to be a bit of a performance drag overall and their benefit is not really established.

So the jit.tracebarrier() primitive still exists but the engine doesn't currently use it.

Let's wait for the standard Hydra tests to complete now and then we should be 👍 for the release.

@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Dec 11, 2017

Member

@eugeneia There is a performance regression on the iperf benchmark but I propose that we ship this now anyway.

I have been combing through recent CI results with @wingo on Slack and it seems like the problem is caused by voodoo. I have another branch with almost identical contents that does not show the issue. The only difference is whether the jit.tracebarrier() primitive exists in the C code. Having that code present seems to provoke the problem even though it is never called.

I am reluctant to make a "nonsense" change to "solve" this problem. I would prefer to accept it for now and focus on finding the root cause of why we see variance in the iperf benchmark. I see the new RaptorJIT tooling as the way to do this and so I want to spend my time now on integrating that with Snabb. Hence my willingness to accept this symptom of the root problem (wider variance on the iperf benchmark) for the moment.

Hypothetically if the problem is something obscure, like whether two Lua loop bytecode addresses hash into the same JIT hot counter, then there are probably very many different ways that it could be provoked (e.g. choice of C compiler version) and so I am not really confident that nailing down one such issue in the test environment would translate into a real world benefit. This would need to be solved more thoroughly in the JIT after seeing exactly what is really going on.

WDYT?

Member

lukego commented Dec 11, 2017

@eugeneia There is a performance regression on the iperf benchmark but I propose that we ship this now anyway.

I have been combing through recent CI results with @wingo on Slack and it seems like the problem is caused by voodoo. I have another branch with almost identical contents that does not show the issue. The only difference is whether the jit.tracebarrier() primitive exists in the C code. Having that code present seems to provoke the problem even though it is never called.

I am reluctant to make a "nonsense" change to "solve" this problem. I would prefer to accept it for now and focus on finding the root cause of why we see variance in the iperf benchmark. I see the new RaptorJIT tooling as the way to do this and so I want to spend my time now on integrating that with Snabb. Hence my willingness to accept this symptom of the root problem (wider variance on the iperf benchmark) for the moment.

Hypothetically if the problem is something obscure, like whether two Lua loop bytecode addresses hash into the same JIT hot counter, then there are probably very many different ways that it could be provoked (e.g. choice of C compiler version) and so I am not really confident that nailing down one such issue in the test environment would translate into a real world benefit. This would need to be solved more thoroughly in the JIT after seeing exactly what is really going on.

WDYT?

@eugeneia

This comment has been minimized.

Show comment
Hide comment
@eugeneia

eugeneia Dec 11, 2017

Member

Totally agree. I think #1244 is not included in this release, or is it? Maybe, as a last resort I would test to see if it resolves this regression like it did mine. This doesn’t block the release from my point of view though. Ready when you are!

Member

eugeneia commented Dec 11, 2017

Totally agree. I think #1244 is not included in this release, or is it? Maybe, as a last resort I would test to see if it resolves this regression like it did mine. This doesn’t block the release from my point of view though. Ready when you are!

@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Dec 11, 2017

Member

@eugeneia Glad to hear that this change has helped you. I pushed it here and let's see what Hydra says tomorrow.

Member

lukego commented Dec 11, 2017

@eugeneia Glad to hear that this change has helped you. I pushed it here and let's see what Hydra says tomorrow.

@lukego

This comment has been minimized.

Show comment
Hide comment
@lukego

lukego Dec 12, 2017

Member

@eugeneia This was worth a try but the new benchmark results show the same issue: https://hydra.snabb.co/build/2822290/download/2/report.html#iperf.

Member

lukego commented Dec 12, 2017

@eugeneia This was worth a try but the new benchmark results show the same issue: https://hydra.snabb.co/build/2822290/download/2/report.html#iperf.

@wingo

This comment has been minimized.

Show comment
Hide comment
@wingo

wingo Dec 12, 2017

Contributor

In a way that's good to know that the results are the same. I was unsure whether to blame a voodoo change to the software itself or some problem with the statistics!

Contributor

wingo commented Dec 12, 2017

In a way that's good to know that the results are the same. I was unsure whether to blame a voodoo change to the software itself or some problem with the statistics!

@eugeneia eugeneia merged commit 6c0f065 into master Dec 12, 2017

1 check passed

SnabbDoc Documentation as single HTML file
Details

eugeneia added a commit that referenced this pull request Dec 12, 2017

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment