Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Solarflare #404

Closed
wants to merge 117 commits into from
Closed

Solarflare #404

wants to merge 117 commits into from

Conversation

hanshuebner
Copy link
Contributor

It may be better to collapse this stuff into one commit, do you agree?

@eugeneia
Copy link
Member

I see you figured it out, awesome. Everyone should always do a git pull --rebase <snabb/snabbswitch> master before pushing a PR. E.g.rebase your fork's master onto the main repo's master and then rebase your branch onto the up-to-date master. I do see a bit of a manual merge here (e.g. f24fc79) that's superfluous if you had rebased.

For instance:

At some point I forked snabbco/snabbswitch on GitHub and then I
clone it locally:
git clone https://github.com/eugeneia/snabbswitch.git

Now I want to stay in sync with the mainline so I add a remote:
git remote add snabbco https://github.com/SnabbCo/snabbswitch.git

When ever I want to sync (always before opening a PR) I can just:
(I do this whenever a PR gets merged really, why fall behind?)
git checkout master
git pull --rebase snabbco master
git checkout my-working-branch
git rebase master
git push -f

Along the way I will solve all possible merge conflicts.

I suggest you go back to where you first opened the PR and do the rebase-fu. I suspect most test failures will disappear and generally you will get a cleaner history. You can do this all while the PR is open. A push -f will rewrite history, and SnabbBot will reevaluate.

One conflict I suspect you will see is regarding the lib.pci.open_usable_devices function. I removed that function not too long ago, and this PR reintroduces it without any explanation as far as I can tell. On rebase conflict-resolution you will probably see its gone and can than decide if you really want it (I suspect not).

@lukego
Copy link
Member

lukego commented Mar 15, 2015

My own views on rebasing are evolving. Broadly speaking, I used to be for rebasing and now I am against it.

Linus said it best: clean history should be (a) clean and (b) history.

If we rebase routinely then we create a couple of problems:

  1. We have a fictionalized history that does not show when work was done, what version(s) it was tested against, and so on.
  2. We can't pull from each others' trees without making a mess and so we end up with a centralized workflow instead of a distributed one.

I suspect that we should stop rebasing branches after we publish them on github/PR. (To keep history clean we should consider rebasing them immediately before publishing them, to make sure all the changes make sense and have excellent commit messages.) If something is really screwed up then we can create a new branch and rebase onto that, but this should be the exception. If the changes are just messy then, well, that is history, and we should either live with it or adjust our development habits.

It seems to me like the reason that rebasing is solving test problems is that we have had regressions on our default branch that people base changes on. I see a couple of potential solutions to this. First to be more careful to avoid letting in regressions, which we are already doing. Second we could perhaps use a separate branch for integrating and debugging changes and keep the default/master branch more stable.

(End braindump.)

@eugeneia
Copy link
Member

It seems to me like the reason that rebasing is solving test problems is that we have had regressions on our default branch that people base changes on. I see a couple of potential solutions to this. First to be more careful to avoid letting in regressions, which we are already doing.

In this specific case I think its the worst of all worlds to have multiple fixes of the same regressions, each months apart, without any context. I think rebasing open PR's is a very sensible choice because the history will be clean and accurately historic once it's merged. Compare the history of e.g. Hans' branch vs the history of Snabb mainline.

This is not rebasing the Snabb master, but instead editing the point of a branch being merged. Merging with the current master is in any way superior than merging with an ancient version of master. Not rebasing is just repeating already resolved merges and giving a lot of opportunity to mistakes. (E.g. reintroducing previously purged functions).

Btw rebasing leaves the timestamps intact so again, this is when was it written vs when was it merged.

@plajjan
Copy link
Contributor

plajjan commented Mar 15, 2015

+1 on @eugeneia
One should never mess around with history on Snabb master. Published branches should be avoided if possible but if it's short-lived, like for the duration of a PR, I think it's perfectly okay to squash on to get it to a clean state.

@lukego
Copy link
Member

lukego commented Mar 16, 2015

Let me see if I understand the options. (Git workflows are complex with long-lived branches.)

  • Squashing. This makes things easy to review by compressing all the history into one "Added Solarflare driver" commit. It also discards information that only exists in history, for example code that drives the Solarflare scatter-gather I/O interface that was only used in pre-straightline ancestors of the current code.
  • Rebasing: This eliminates the merge commits so the history will only contain code directly related to adding the Solarflare driver. The history could still be available for browsing in git log but you could not check it out and run it anymore because the older versions of the driver really do depend on older versions of master (e.g. pre-straightline, pre-program).
  • Keeping: If we keep the history as it is now then it is more work to review and merge, and it will look more convoluted in git log, but you could check out the older versions of the driver and they would compile and run the same way as when they were developed.

Is that a reasonable overview of how these things work with git?

Generally we need to work out some more details of our suggested git workflow:

  • When should I pull from other branches?
  • When should I rebase?
  • When should I send a pull request and to which branch?

I want to have a truly distributed workflow like the Linux kernel has. I want master to be "just another branch" and not a global bottleneck for all review and merge work. This stuff is hard :).

@plajjan
Copy link
Contributor

plajjan commented Mar 16, 2015

No, squashing doesn't mean you compress everything into one commit - you can squash an arbitrary number of commits together and rewrite the commit message to reflect what is going on or do multiple squashes of different commits.

I produce a new driver with a few commits:

  1. Initial commit with standard scaffolding
  2. Complete TX side of driver
  3. Complete RX side of drive
  4. Whopsi-dhopsi, fixed something for TX side
  5. Update documentation
  6. Initial work on straightline support
  7. Rewrite of RX for straightline
  8. Rewrite of TX for straightline
  9. Improved RX by switching to khash

Now you can take commit 4 and squash it into commit 2. Since 4 seems like just a fix, you might not want to update the commit message as you still achieve the same end goal, ie a complete TX side driver but it is now fixed, perhaps as a consequence of some feedback on a PR. There is a git action for this called "fixup".

During the time this driver was developed and before it was merged, Snabb has merged straightline, so is it really relevant to keep the pre-straightline version of the driver? Most likely not, so we can squash 7 into 3 and 8 into 2 but perhaps this time we rewrite the messages somewhat to reflect that we are inline with the straigtline concepts.

At the end, we are down to

  1. Initial commit with standard scaffolding
  2. Complete TX side of driver
  3. Complete RX side of driver
  4. Update documentation
  5. Improved RX by switching to khash

We can certainly discuss what kind of history we want to keep but I would like to highlight that there is a spectra on how squashing/fixups can be applied - it's not about squashing everything into one single commit.

EDIT: funny how I picked pre vs post straightline for example. I just reread your comment and saw you might want to keep some stuff pre-straightline for history :) I don't want to get into the specifics of what to keep or not - just outline that it's not about flattening an entire branch - it's about selectively squashing commits to achieve a clean history.

@lukego
Copy link
Member

lukego commented Mar 16, 2015

@plajjan I see what you mean!

So we could squash together several sub-series of commits to shorten the history into fewer better-defined points and still expect them to behave the same as they always did.

Question then is which series to squash? And do we have to be careful with the merge commits somehow or can we squash those at will too?

Would a reasonable starting point be to plan on squashing the commits between merges from upstream? Then those could be split up a little if it makes logical sense and if we are lucky we could squash away some unnecessary merges (e.g. when there are two consecutive merge commits)?

@lukego
Copy link
Member

lukego commented Mar 16, 2015

@plajjan @eugeneia Further thought: we are all saying "rebase" but we are talking about different things, right? ("Rebase" is a very versatile verb.. like to "emacs" a branch it can mean almost anything.)

Kristian is talking about reducing the number of commits on this branch by coalescing them. The effect of this is a shorter history, which mostly affects how things will look in git log.

Max is talking about eliminating the merges so that the commits on this branch all target the same version of master. The effect of this is to make it look like the code was all written for the current version of master, which will make the older commits look nonsensical e.g. code that uses APIs that don't exist anymore. If we take this route then it seems like the history will be a negative thing to have around ("how the heck did this ever work?" - it didn't) and we would be better off squashing the history and discarding the code that would never have existed in our new historical narrative.

I am sure I am missing multiple important things. Improvements to my world view always welcome :).

@lukego
Copy link
Member

lukego commented Mar 16, 2015

(Incidentally my favourite source of history in the world is novels by Robert Graves and Gore Vidal so I do appreciate the value of a spiced up historical narrative with no more of the truth than is needed :-))

@plajjan
Copy link
Contributor

plajjan commented Mar 16, 2015

I tried to consistently use "squash" when talking about.. well, squashing commits. The command to do that is still "git rebase ..." but it differs very much from other rebase, like when you are just replaying changes over a new branch.

Should we have this discussion on this PR? While somewhat on-topic, it is getting rather long and has a wider effect than on this PR. ML or open dummy issue to discuss?

program=port.Port.loopback_test,
report=true}
port.selftest(options)
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function was intentionally removed in a previous PR #382 . It shouldn't be reintroduced here. AFAIK its not used anywhere.

@eugeneia
Copy link
Member

We went way off topic here which sucks because I am really talking about two specific issues (see inline comments above) I see in this PR that were caused by the branch not being rebased onto SnabbCo/snabbswitch:master and instead (I assume) mistakenly merged. The rebase vs merge is not really important here, what's important is that the mainline master must always precede your branch. For instance if you get a conflict where mainline deletes code and your branch changes a line of that code, you must throw away your changes and accept mainline's deletion. Otherwise you undid an intentional change in mainline unrelated to your branch.

It took me ~30 minutes just to track down how these changes slipped in, which proves my point being that the history will be harder to understand without the rebase described by me which should be done at 4f0d42a.

Can we please rewind in time, forget the whole git philosophy discussion, and focus on the fact that the way this branch is currently merged is problematic. In the end I don't care how we get a clean merge of this, I can guarantee you that rebasing is the smartest and easiest way to do it, and I am happy to explain why in another thread, but please... stop derailing this.

(I don't think I should try to answer any of @lukego or @plajjan comments here because this would just worsen the situation for @hanshuebner really.)

@lukego
Copy link
Member

lukego commented Mar 16, 2015

Let us merge this change in two steps:

Hans -> Max -> Luke

Max has already clearly said what is needed to get into his tree.

I will accept squashed, rebased, or fixed-in-a-new-merge-commit.

@eugeneia
Copy link
Member

@lukego Ok. I ran the rebase I proposed just out of curiosity this afternoon and it indeed needed a ton of manual conflict resolution. So its no surprise a few small things slipped through.

@hanshuebner I threw out ~20 commits because they didn't apply to the current master anymore. I also adjusted a few documentation changes that were still targeting the pre-christmas tree.

I wonder if we can remove doc/device-interface.md? As far as I can tell its obsoleted by the current src/README.md (App chapter).

Also for simplicities sake I might want to leave out a few bench_env related changes that might benefit from their own PR.

I will post you guys with a rebased version of the branch for review once I polished it a bit further.

@hanshuebner
Copy link
Contributor Author

@eugeneia Thanks for taking care of the rebase! I don't think that we need doc/device-interface.md - I wrote it mostly as a reminder as to how things work.

With respect to bench_env changes: The logging changes have been obsoleted anyway, and I'm not married to any other changes as long as the SolarFlare driver is still supported.

@eugeneia
Copy link
Member

@hanshuebner How do I test the solarflare driver? I hear there is a solarflare NIC in chur, which device is it?

@eugeneia
Copy link
Member

@lukego Can't get rid of this using make clean && make, any hints?

selftest: ./snabb binary portability
Scanning for symbols requiring GLIBC > 2.7
GLIBC_2.14 memcpy
^^^ Error ^^^
(You might just need to 'make clean; make' at the top level.)
EXITCODE: 1

@hanshuebner
Copy link
Contributor Author

hanshuebner commented Mar 16, 2015 via email

@eugeneia
Copy link
Member

That directory should probably not stay in the mainline once the driver is integrated, I just find it useful as a quick confidence test.

A standalone driver test would be important to keep, can we turn it into a selftest similar to make apps.intel.intel_app? The simplest way to add it as a unit test is to add an executable src/program/solarflare/selftest.sh shell script that invokes the test and exits with status 1 on failure, status 0 on success and status 43 if skipped (for instance if the hypothetical environment variables SNABB_TEST_SOLARFLARE_PCIDEV[A|B] are not set).

apps/solarflare/solarflare.lua:23: ef_vi library does not have the correct version identified, need 201502, got djr_b1c7f30a84a0

What is this about? Is this something external or did I miss a submodule update along the rebase?

@hanshuebner
Copy link
Contributor Author

2015-03-16 22:04 GMT+01:00 Max Rottenkolber notifications@github.com:

That directory should probably not stay in the mainline once the driver is
integrated, I just find it useful as a quick confidence test.

A standalone driver test would be important to keep, can we turn it into a
selftest similar to make apps.intel.intel_app? The simplest way add it as
a unit test is to create an executable src/program/solarflare/selftest.sh
shell script that invokes the test and exits with status 1 on failure,
status 0 on success and status 43 if skipped (for instance if the
hypothetical environment variablels SNABB_TEST_SOLARFLARE_PCIDEV[A|B] are
not set).

Yes, that should be the right way moving forward.

apps/solarflare/solarflare.lua:23: ef_vi library does not have the
correct version identified, need 201502, got djr_b1c7f30a84a0

What is this about? Is this something external or did I miss a submodule
update along the rebase?

You are using the wrong version of liciul.so, which is strange because I
did update the version that is installed on grindelwald (/lib/libciul.so).
Are you using grindelwald? Are is there some other version of libciul.so
on your LD_LIBRARY_PATH?

The SolarFlare driver needs a specific version of libciul.so because we're
including a hand-edited copy of the SolarFlare header files in our source
base. The reason for that is that the LuaJIT FFI cannot deal with the
original header files directly. This is painful, but we discussed our
options and concluded that this would be the way to implement it. The
version probe that is in the code makes sure that we're not trying to use a
shared library that is incompatible with our hand crafted header file.

@eugeneia
Copy link
Member

Are you using grindelwald?

Not yet. ;) I just asked because I get that GLIBC > 2.7 error too and thought I maybe they are related and I broke something along my rebase.

@lukego
Copy link
Member

lukego commented Mar 17, 2015

The basic solarflare test/benchmark could live in programs/snabbmark and be called from a selftest.sh.

I like the idea of the snabbmark growing a collection of benchmarks that we could track and check for regressions. (This is the successor to the bench/ scripts directory that we had previously.)

@lukego
Copy link
Member

lukego commented Mar 17, 2015

@eugeneia here is the story on memcpy:

  • GCC will compile memcpy to a dependency on GLIBC >= 2.14 by default.
  • __asm__(".symver memcpy,memcpy@GLIBC_2.2.5"); declaration forces a more conservative dependency.
  • This declaration exists in snabbswitch/gcc-preinclude.h.
  • GCC can be forced to use this with gcc -include gcc-preinclude.h ....
  • This should always be done when we compile C code.

Could be that you need to make clean at the top-level e.g. if your libluajit.a is compiled with the old memcpy?

@eugeneia
Copy link
Member

Could be that you need to make clean at the top-level e.g.

I did that and it didn't fix things.

@eugeneia
Copy link
Member

@lukego Got it was a mistake when merging on my side.

@eugeneia
Copy link
Member

@lukego @hanshuebner Requesting review on #409 (Solarflare branch Rebased)!

I have tested this on grindelwald to full success. I have to say I am very impressed that the NFV selftest works flawlessly with the solarflare NIC, I know its supposed to be that way but anyways: Great work Hans! 🍻

@lukego
Copy link
Member

lukego commented Mar 18, 2015

Superceded by #409.

@lukego lukego closed this Mar 18, 2015
dpino pushed a commit to dpino/snabb that referenced this pull request Aug 31, 2016
As discussed on IRC, merging without waiting for checks since it's only doc changes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants