Solarflare #404

hanshuebner · 2015-03-14T12:49:32Z

It may be better to collapse this stuff into one commit, do you agree?

eugeneia · 2015-03-15T13:27:21Z

I see you figured it out, awesome. Everyone should always do a git pull --rebase <snabb/snabbswitch> master before pushing a PR. E.g.rebase your fork's master onto the main repo's master and then rebase your branch onto the up-to-date master. I do see a bit of a manual merge here (e.g. f24fc79) that's superfluous if you had rebased.

For instance:

At some point I forked snabbco/snabbswitch on GitHub and then I
clone it locally:
git clone https://github.com/eugeneia/snabbswitch.git

Now I want to stay in sync with the mainline so I add a remote:
git remote add snabbco https://github.com/SnabbCo/snabbswitch.git

When ever I want to sync (always before opening a PR) I can just:
(I do this whenever a PR gets merged really, why fall behind?)
git checkout master
git pull --rebase snabbco master
git checkout my-working-branch
git rebase master
git push -f

Along the way I will solve all possible merge conflicts.

I suggest you go back to where you first opened the PR and do the rebase-fu. I suspect most test failures will disappear and generally you will get a cleaner history. You can do this all while the PR is open. A push -f will rewrite history, and SnabbBot will reevaluate.

One conflict I suspect you will see is regarding the lib.pci.open_usable_devices function. I removed that function not too long ago, and this PR reintroduces it without any explanation as far as I can tell. On rebase conflict-resolution you will probably see its gone and can than decide if you really want it (I suspect not).

lukego · 2015-03-15T14:08:35Z

My own views on rebasing are evolving. Broadly speaking, I used to be for rebasing and now I am against it.

Linus said it best: clean history should be (a) clean and (b) history.

If we rebase routinely then we create a couple of problems:

We have a fictionalized history that does not show when work was done, what version(s) it was tested against, and so on.
We can't pull from each others' trees without making a mess and so we end up with a centralized workflow instead of a distributed one.

I suspect that we should stop rebasing branches after we publish them on github/PR. (To keep history clean we should consider rebasing them immediately before publishing them, to make sure all the changes make sense and have excellent commit messages.) If something is really screwed up then we can create a new branch and rebase onto that, but this should be the exception. If the changes are just messy then, well, that is history, and we should either live with it or adjust our development habits.

It seems to me like the reason that rebasing is solving test problems is that we have had regressions on our default branch that people base changes on. I see a couple of potential solutions to this. First to be more careful to avoid letting in regressions, which we are already doing. Second we could perhaps use a separate branch for integrating and debugging changes and keep the default/master branch more stable.

(End braindump.)

eugeneia · 2015-03-15T17:32:45Z

It seems to me like the reason that rebasing is solving test problems is that we have had regressions on our default branch that people base changes on. I see a couple of potential solutions to this. First to be more careful to avoid letting in regressions, which we are already doing.

In this specific case I think its the worst of all worlds to have multiple fixes of the same regressions, each months apart, without any context. I think rebasing open PR's is a very sensible choice because the history will be clean and accurately historic once it's merged. Compare the history of e.g. Hans' branch vs the history of Snabb mainline.

This is not rebasing the Snabb master, but instead editing the point of a branch being merged. Merging with the current master is in any way superior than merging with an ancient version of master. Not rebasing is just repeating already resolved merges and giving a lot of opportunity to mistakes. (E.g. reintroducing previously purged functions).

Btw rebasing leaves the timestamps intact so again, this is when was it written vs when was it merged.

plajjan · 2015-03-15T18:40:27Z

+1 on @eugeneia
One should never mess around with history on Snabb master. Published branches should be avoided if possible but if it's short-lived, like for the duration of a PR, I think it's perfectly okay to squash on to get it to a clean state.

lukego · 2015-03-16T04:42:54Z

Let me see if I understand the options. (Git workflows are complex with long-lived branches.)

Squashing. This makes things easy to review by compressing all the history into one "Added Solarflare driver" commit. It also discards information that only exists in history, for example code that drives the Solarflare scatter-gather I/O interface that was only used in pre-straightline ancestors of the current code.
Rebasing: This eliminates the merge commits so the history will only contain code directly related to adding the Solarflare driver. The history could still be available for browsing in git log but you could not check it out and run it anymore because the older versions of the driver really do depend on older versions of master (e.g. pre-straightline, pre-program).
Keeping: If we keep the history as it is now then it is more work to review and merge, and it will look more convoluted in git log, but you could check out the older versions of the driver and they would compile and run the same way as when they were developed.

Is that a reasonable overview of how these things work with git?

Generally we need to work out some more details of our suggested git workflow:

When should I pull from other branches?
When should I rebase?
When should I send a pull request and to which branch?

I want to have a truly distributed workflow like the Linux kernel has. I want master to be "just another branch" and not a global bottleneck for all review and merge work. This stuff is hard :).

plajjan · 2015-03-16T10:34:59Z

No, squashing doesn't mean you compress everything into one commit - you can squash an arbitrary number of commits together and rewrite the commit message to reflect what is going on or do multiple squashes of different commits.

I produce a new driver with a few commits:

Initial commit with standard scaffolding
Complete TX side of driver
Complete RX side of drive
Whopsi-dhopsi, fixed something for TX side
Update documentation
Initial work on straightline support
Rewrite of RX for straightline
Rewrite of TX for straightline
Improved RX by switching to khash

Now you can take commit 4 and squash it into commit 2. Since 4 seems like just a fix, you might not want to update the commit message as you still achieve the same end goal, ie a complete TX side driver but it is now fixed, perhaps as a consequence of some feedback on a PR. There is a git action for this called "fixup".

During the time this driver was developed and before it was merged, Snabb has merged straightline, so is it really relevant to keep the pre-straightline version of the driver? Most likely not, so we can squash 7 into 3 and 8 into 2 but perhaps this time we rewrite the messages somewhat to reflect that we are inline with the straigtline concepts.

At the end, we are down to

Initial commit with standard scaffolding
Complete TX side of driver
Complete RX side of driver
Update documentation
Improved RX by switching to khash

We can certainly discuss what kind of history we want to keep but I would like to highlight that there is a spectra on how squashing/fixups can be applied - it's not about squashing everything into one single commit.

EDIT: funny how I picked pre vs post straightline for example. I just reread your comment and saw you might want to keep some stuff pre-straightline for history :) I don't want to get into the specifics of what to keep or not - just outline that it's not about flattening an entire branch - it's about selectively squashing commits to achieve a clean history.

lukego · 2015-03-16T10:46:48Z

@plajjan I see what you mean!

So we could squash together several sub-series of commits to shorten the history into fewer better-defined points and still expect them to behave the same as they always did.

Question then is which series to squash? And do we have to be careful with the merge commits somehow or can we squash those at will too?

Would a reasonable starting point be to plan on squashing the commits between merges from upstream? Then those could be split up a little if it makes logical sense and if we are lucky we could squash away some unnecessary merges (e.g. when there are two consecutive merge commits)?

lukego · 2015-03-16T12:21:06Z

@plajjan @eugeneia Further thought: we are all saying "rebase" but we are talking about different things, right? ("Rebase" is a very versatile verb.. like to "emacs" a branch it can mean almost anything.)

Kristian is talking about reducing the number of commits on this branch by coalescing them. The effect of this is a shorter history, which mostly affects how things will look in git log.

Max is talking about eliminating the merges so that the commits on this branch all target the same version of master. The effect of this is to make it look like the code was all written for the current version of master, which will make the older commits look nonsensical e.g. code that uses APIs that don't exist anymore. If we take this route then it seems like the history will be a negative thing to have around ("how the heck did this ever work?" - it didn't) and we would be better off squashing the history and discarding the code that would never have existed in our new historical narrative.

I am sure I am missing multiple important things. Improvements to my world view always welcome :).

lukego · 2015-03-16T12:23:38Z

(Incidentally my favourite source of history in the world is novels by Robert Graves and Gore Vidal so I do appreciate the value of a spiced up historical narrative with no more of the truth than is needed :-))

plajjan · 2015-03-16T12:24:58Z

I tried to consistently use "squash" when talking about.. well, squashing commits. The command to do that is still "git rebase ..." but it differs very much from other rebase, like when you are just replaying changes over a new branch.

Should we have this discussion on this PR? While somewhat on-topic, it is getting rather long and has a wider effect than on this PR. ML or open dummy issue to discuss?

eugeneia · 2015-03-16T14:09:51Z

src/lib/hardware/pci.lua

+                    program=port.Port.loopback_test,
+                    report=true}
+   port.selftest(options)
+end


This function was intentionally removed in a previous PR #382 . It shouldn't be reintroduced here. AFAIK its not used anywhere.

eugeneia · 2015-03-16T14:52:38Z

We went way off topic here which sucks because I am really talking about two specific issues (see inline comments above) I see in this PR that were caused by the branch not being rebased onto SnabbCo/snabbswitch:master and instead (I assume) mistakenly merged. The rebase vs merge is not really important here, what's important is that the mainline master must always precede your branch. For instance if you get a conflict where mainline deletes code and your branch changes a line of that code, you must throw away your changes and accept mainline's deletion. Otherwise you undid an intentional change in mainline unrelated to your branch.

It took me ~30 minutes just to track down how these changes slipped in, which proves my point being that the history will be harder to understand without the rebase described by me which should be done at 4f0d42a.

Can we please rewind in time, forget the whole git philosophy discussion, and focus on the fact that the way this branch is currently merged is problematic. In the end I don't care how we get a clean merge of this, I can guarantee you that rebasing is the smartest and easiest way to do it, and I am happy to explain why in another thread, but please... stop derailing this.

(I don't think I should try to answer any of @lukego or @plajjan comments here because this would just worsen the situation for @hanshuebner really.)

lukego · 2015-03-16T15:25:42Z

Let us merge this change in two steps:

Hans -> Max -> Luke

Max has already clearly said what is needed to get into his tree.

I will accept squashed, rebased, or fixed-in-a-new-merge-commit.

eugeneia · 2015-03-16T19:45:09Z

@lukego Ok. I ran the rebase I proposed just out of curiosity this afternoon and it indeed needed a ton of manual conflict resolution. So its no surprise a few small things slipped through.

@hanshuebner I threw out ~20 commits because they didn't apply to the current master anymore. I also adjusted a few documentation changes that were still targeting the pre-christmas tree.

I wonder if we can remove doc/device-interface.md? As far as I can tell its obsoleted by the current src/README.md (App chapter).

Also for simplicities sake I might want to leave out a few bench_env related changes that might benefit from their own PR.

I will post you guys with a rebased version of the branch for review once I polished it a bit further.

hanshuebner · 2015-03-16T19:55:59Z

@eugeneia Thanks for taking care of the rebase! I don't think that we need doc/device-interface.md - I wrote it mostly as a reminder as to how things work.

With respect to bench_env changes: The logging changes have been obsoleted anyway, and I'm not married to any other changes as long as the SolarFlare driver is still supported.

eugeneia · 2015-03-16T20:30:34Z

@hanshuebner How do I test the solarflare driver? I hear there is a solarflare NIC in chur, which device is it?

eugeneia · 2015-03-16T20:37:06Z

@lukego Can't get rid of this using make clean && make, any hints?

selftest: ./snabb binary portability
Scanning for symbols requiring GLIBC > 2.7
GLIBC_2.14 memcpy
^^^ Error ^^^
(You might just need to 'make clean; make' at the top level.)
EXITCODE: 1

hanshuebner · 2015-03-16T20:44:16Z

The solarflare NIC is in grindelwald (0000:86:00.0/1). It passes the nfv selftest and the tests that "make test" runs. There also is programs/solarflare which is a simple tx/rx test which I used for basic validation - That directory should probably not stay in the mainline once the driver is integrated, I just find it useful as a quick confidence test.

eugeneia · 2015-03-16T21:04:38Z

That directory should probably not stay in the mainline once the driver is integrated, I just find it useful as a quick confidence test.

A standalone driver test would be important to keep, can we turn it into a selftest similar to make apps.intel.intel_app? The simplest way to add it as a unit test is to add an executable src/program/solarflare/selftest.sh shell script that invokes the test and exits with status 1 on failure, status 0 on success and status 43 if skipped (for instance if the hypothetical environment variables SNABB_TEST_SOLARFLARE_PCIDEV[A|B] are not set).

apps/solarflare/solarflare.lua:23: ef_vi library does not have the correct version identified, need 201502, got djr_b1c7f30a84a0

What is this about? Is this something external or did I miss a submodule update along the rebase?

hanshuebner · 2015-03-16T21:10:32Z

2015-03-16 22:04 GMT+01:00 Max Rottenkolber notifications@github.com:

That directory should probably not stay in the mainline once the driver is
integrated, I just find it useful as a quick confidence test.

A standalone driver test would be important to keep, can we turn it into a
selftest similar to make apps.intel.intel_app? The simplest way add it as
a unit test is to create an executable src/program/solarflare/selftest.sh
shell script that invokes the test and exits with status 1 on failure,
status 0 on success and status 43 if skipped (for instance if the
hypothetical environment variablels SNABB_TEST_SOLARFLARE_PCIDEV[A|B] are
not set).

Yes, that should be the right way moving forward.

apps/solarflare/solarflare.lua:23: ef_vi library does not have the
correct version identified, need 201502, got djr_b1c7f30a84a0

What is this about? Is this something external or did I miss a submodule
update along the rebase?

You are using the wrong version of liciul.so, which is strange because I
did update the version that is installed on grindelwald (/lib/libciul.so).
Are you using grindelwald? Are is there some other version of libciul.so
on your LD_LIBRARY_PATH?

The SolarFlare driver needs a specific version of libciul.so because we're
including a hand-edited copy of the SolarFlare header files in our source
base. The reason for that is that the LuaJIT FFI cannot deal with the
original header files directly. This is painful, but we discussed our
options and concluded that this would be the way to implement it. The
version probe that is in the code makes sure that we're not trying to use a
shared library that is incompatible with our hand crafted header file.

eugeneia · 2015-03-16T21:15:10Z

Are you using grindelwald?

Not yet. ;) I just asked because I get that GLIBC > 2.7 error too and thought I maybe they are related and I broke something along my rebase.

lukego · 2015-03-17T05:38:39Z

The basic solarflare test/benchmark could live in programs/snabbmark and be called from a selftest.sh.

I like the idea of the snabbmark growing a collection of benchmarks that we could track and check for regressions. (This is the successor to the bench/ scripts directory that we had previously.)

lukego · 2015-03-17T07:29:13Z

@eugeneia here is the story on memcpy:

GCC will compile memcpy to a dependency on GLIBC >= 2.14 by default.
__asm__(".symver memcpy,memcpy@GLIBC_2.2.5"); declaration forces a more conservative dependency.
This declaration exists in snabbswitch/gcc-preinclude.h.
GCC can be forced to use this with gcc -include gcc-preinclude.h ....
This should always be done when we compile C code.

Could be that you need to make clean at the top-level e.g. if your libluajit.a is compiled with the old memcpy?

eugeneia · 2015-03-17T11:24:56Z

Could be that you need to make clean at the top-level e.g.

I did that and it didn't fix things.

eugeneia · 2015-03-17T19:55:36Z

@lukego Got it was a mistake when merging on my side.

eugeneia · 2015-03-17T21:25:44Z

@lukego @hanshuebner Requesting review on #409 (Solarflare branch Rebased)!

I have tested this on grindelwald to full success. I have to say I am very impressed that the NFV selftest works flawlessly with the solarflare NIC, I know its supposed to be that way but anyways: Great work Hans! 🍻

lukego · 2015-03-18T06:46:08Z

Superceded by #409.

As discussed on IRC, merging without waiting for checks since it's only doc changes.

hanshuebner added 30 commits June 16, 2014 10:42

initial attempt to make a IB header file that ffi.cdefs can consume

e92f7b2

commit mellanox related work

7d5ad65

add solarflare stuff

10f684b

sending and receiving now

fba1c52

try some tweaks to prevent rx problems

a1cb67a

remove unneeded case

b08e069

work on receiver reliability / problem determination

865ba2c

add device-interface.md describing the internal device API

6818bbd

tune number of buffers, validate data when it is received

c829307

improve error analysis

6b0c017

limit number of receive error details reported

42f2d5c

receiver changes that seem to break things reliably

6c8e7cb

run.sh changes that seem to break things

efff6ce

send.c changes that seem to break things

603a084

clean up a little before turning to work on luajit stuff

3a9c3cf

clean out test programs, create solarflare lua stuff

805b395

reindent

4b8ed5f

can open driver and ef_vi interface

83d6e7d

polish and SolarFlareNic make into proper class

33c572a

add memory region related stuff

4e347ef

invoke test function

7d71458

refactor for short lines

93fc411

begin memory management related work

c063055

callback to drivers for memory registration

cf367e1

set up filter

a4704e7

can receive packets (and discard them)

5825036

can breathe a bit

d972937

compress stats to one line

85217ac

use link names matching configuration

d457531

can sink packets

0269a74

Explicit ef_vi version check

086e409

eugeneia reviewed Mar 16, 2015
View reviewed changes

lukego closed this Mar 18, 2015

dpino pushed a commit to dpino/snabb that referenced this pull request Aug 31, 2016

Remove conflicts from the counters doc. (snabbco#404)

92d6f52

As discussed on IRC, merging without waiting for checks since it's only doc changes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Solarflare #404

Solarflare #404

hanshuebner commented Mar 14, 2015

eugeneia commented Mar 15, 2015

lukego commented Mar 15, 2015

eugeneia commented Mar 15, 2015

plajjan commented Mar 15, 2015

lukego commented Mar 16, 2015

plajjan commented Mar 16, 2015

lukego commented Mar 16, 2015

lukego commented Mar 16, 2015

lukego commented Mar 16, 2015

plajjan commented Mar 16, 2015

eugeneia Mar 16, 2015

eugeneia commented Mar 16, 2015

lukego commented Mar 16, 2015

eugeneia commented Mar 16, 2015

hanshuebner commented Mar 16, 2015

eugeneia commented Mar 16, 2015

eugeneia commented Mar 16, 2015

hanshuebner commented Mar 16, 2015 via email

eugeneia commented Mar 16, 2015

hanshuebner commented Mar 16, 2015

eugeneia commented Mar 16, 2015

lukego commented Mar 17, 2015

lukego commented Mar 17, 2015

eugeneia commented Mar 17, 2015

eugeneia commented Mar 17, 2015

eugeneia commented Mar 17, 2015

lukego commented Mar 18, 2015

Solarflare #404

Solarflare #404

Conversation

hanshuebner commented Mar 14, 2015

eugeneia commented Mar 15, 2015

lukego commented Mar 15, 2015

eugeneia commented Mar 15, 2015

plajjan commented Mar 15, 2015

lukego commented Mar 16, 2015

plajjan commented Mar 16, 2015

lukego commented Mar 16, 2015

lukego commented Mar 16, 2015

lukego commented Mar 16, 2015

plajjan commented Mar 16, 2015

eugeneia Mar 16, 2015

Choose a reason for hiding this comment

eugeneia commented Mar 16, 2015

lukego commented Mar 16, 2015

eugeneia commented Mar 16, 2015

hanshuebner commented Mar 16, 2015

eugeneia commented Mar 16, 2015

eugeneia commented Mar 16, 2015

hanshuebner commented Mar 16, 2015 via email

eugeneia commented Mar 16, 2015

hanshuebner commented Mar 16, 2015

eugeneia commented Mar 16, 2015

lukego commented Mar 17, 2015

lukego commented Mar 17, 2015

eugeneia commented Mar 17, 2015

eugeneia commented Mar 17, 2015

eugeneia commented Mar 17, 2015

lukego commented Mar 18, 2015