Commits on Jul 20, 2016
  1. repack: warn when "-l" is not used with alternates

    Failing to use "-l" means that we will copy objects from the
    source repository, nullifying the usefulness of "-s". We
    don't want to make this an error, though, since "git repack
    -a" is used to intentionally break the dependency.
    committed Aug 17, 2009
  2. diff: turn on rename detection progress reporting

    Since all of the progress happens before we generate any
    output, this looks OK, even when output goes to a pager.
    We do the usual --progress/--no-progress options and check
    isatty(2) to enable the feature.
    
    The argument parsing is a little ad-hoc, but we currently
    have no parse-options infrastructure here at all.  However,
    it should be safe to parse like this, because the prior
    call to setup_revisions will have removed any options that
    take an argument, and our parsing removes --progress from
    argv for later parsers. The one exception is diff_no_index,
    which may get called early, and needs to learn to ignore
    --progress.
    committed Mar 24, 2011
  3. shortlog: change "author" variables to "ident"

    This is in preparation for shortlog counting more things
    than just authors. Breaking it out into a separate patch
    keeps the noise down when the real changes come.
    committed Nov 3, 2015
  4. show: turn on rename detection progress reporting

    For large commits, it is nice to have some eye candy for the
    rename detection.
    
    However, because show can display multiple commits, we have
    to be careful not to clutter existing output. We show the
    progress report only before we have generated any actual
    output; once we have sent output to the terminal or pager,
    we turn off progress reporting.
    
    This also makes it safe to use with "git log", though it
    will only be useful if the first commit is the slow one.
    So this patch actually enables it for all of the
    log/whatchanged/show/reflog family.
    
    We also handle the usual --{no-}progress option and check
    that stderr goes to a terminal before turning on progress.
    committed Mar 23, 2011
  5. progress: use pager's original_stderr if available

    If we are outputting to a pager, stderr is redirected to the
    pager. However, progress messages should not be part of that
    stream, as they are time-sensitive and should end up being
    hidden once we actually have output.
    committed Mar 23, 2011
  6. pager: save the original stderr when redirecting to pager

    When we redirect stdout to the pager, we also redirect
    stderr (if it would otherwise go to the terminal) so that
    error messages do not get overwritten by the pager.
    
    However, some stderr output may still want to go to the
    terminal, because they are time-sensitive (like progress
    reports) and should be overwritten by the pager.
    
    This patch stashes away the original stderr descriptor and
    creates a new stdio buffer for it.
    committed Mar 23, 2011
  7. receive-pack: send keepalives during quiet periods

    After a client has sent us the complete pack, we may spend
    some time processing the data and running hooks. If the
    client asked us to be quiet, receive-pack won't send any
    progress data during the index-pack or connectivity-check
    steps. And hooks may or may not produce their own progress
    output. In these cases, the network connection is totally
    silent from both ends.
    
    Git itself doesn't care about this (it will wait forever),
    but other parts of the system (e.g., firewalls,
    load-balancers, etc) might hang up the connection. So we'd
    like to send some sort of keepalive to let the network and
    the client side know that we're still alive and processing.
    
    We can use the same trick we did 05e9515 (upload-pack: send
    keepalive packets during pack computation, 2013-09-08).
    Namely, we will send an empty sideband data packet every `N`
    seconds that we do not relay any stderr data over the
    sideband channel. As with 05e9515, this means that we won't
    bother sending sidebands when there's actual progress data,
    but will kick in when there isn't (or if there is a lull in
    the progress data).
    
    The concept is simple, but the details are subtle enough
    that they need discussing here.
    
    Before the client sends us the pack, we don't want to do any
    keepalives. We'll have sent our ref advertisement, and we're
    waiting for them to send us the pack (and tell us that they
    support sidebands at all).
    
    While we're receiving the pack from the client (or waiting
    for it to start), there's no need for keepalives; it's up to
    them to keep the connection active by sending data[1].
    Moreover, it would be wrong for us to do so. When we are the
    server in the smart-http protocol, we must treat our
    connection as half-duplex. So any keepalives we send while
    receiving the pack would potentially be buffered by the
    webserver. Not only does this make them useless (since they
    would not be delivered in a timely manner), but it could
    actually cause a deadlock if we fill up the buffer with
    keepalives. (It wouldn't be wrong to send keepalives in this
    phase for a full-duplex connection like ssh; it's simply
    pointless, as it is the client's responsibility to speak).
    
    As soon as we've gotten all of the pack data, then the
    client is waiting for us to speak, and we should start
    keepalives immediately. From here until the end of the
    connection, we send one any time we are not otherwise
    sending data.
    
    But there's a catch. The moment we've gotten all the data is
    not known to receive-pack; it's only known to index-pack,
    who may then spend a lot of time resolving deltas.
    
    To make this work, we instruct the sideband muxer to enable
    keepalives in three phases:
    
      1. In the beginning, not at all.
    
      2. While reading from index-pack, wait for a signal
         indicating end-of-input, and then start them.
    
      3. Afterwards, always.
    
    The signal from index-pack in phase 2 has to come over the
    stderr channel the muxer is reading from it. We can't use an
    extra pipe because the portable run-command interface only
    gives us stderr and stdout.
    
    Stdout is already used to pass the .keep filename back to
    receive-pack. We could also send a signal there, but then we
    would find out about it in the main thread. And the
    keepalive needs to be done by the async thread (since it's
    the once writing sideband data back to the client). And we
    can't reliably signal the async thread from the main thread,
    because sometimes we use threads, and sometimes we use
    separate processes for the async code.
    
    Therefore the signal must come over the stderr channel,
    where it may be interspersed with other random
    human-readable messages. This patch makes the signal a
    single NUL byte.  This is easy to parse, should not appear
    in any normal stderr output, and we don't have to worry
    about any timing issues (like seeing half the signal bytes
    in one read(), and half in a subsequent one).
    
    This is a bit ugly, but it's simple to code and should work
    reliably.
    
    Another option would be to stop using an async thread for
    muxing entirely, and just poll() both stderr and stdout of
    index-pack from the main thread. This would work for
    index-pack (because we aren't doing anything useful in the
    main thread while it runs). But it would make the
    connectivity check and the hook muxers much more
    complicated, as they need to simultaneously feed the
    sub-programs while reading their stderr.
    
    The index-pack phase is the only one that needs this
    signaling, so it could simply behave differently than the
    other two. That would mean having two separate
    implementations of copy_to_sideband (and the keepalive
    code), though.
    
    One final note: this signaling trick is only done with
    index-pack, not with unpack-objects. There's no point in
    doing it for the latter, because by definition it only kicks
    in for a small number of objects, where keepalives are not
    as useful (and this conveniently lets us avoid duplicating
    the implementation).
    committed Mar 9, 2016
  8. receive-pack: turn on connectivity progress

    When we receive a large push, the server side of the
    connection may spend a lot of time (30s or more for a full
    push of linux.git) walking the object graph without
    producing any output. Let's give the user some indication
    that we're actually working.
    committed Jul 15, 2016
  9. receive-pack: relay connectivity errors to sideband

    If the connectivity check encounters a problem when
    receiving a push, the error output goes to receive-pack's
    stderr, whose destination depends on the protocol used
    (ssh tends to send it to the user, though without a "remote"
    prefix; http will generally eat it in the server's error
    log).
    
    The information should consistently go back to the user, as
    there is a reasonable chance their client is buggy and
    generating a bad pack.
    
    We can do so by muxing it over the sideband as we do with
    other sub-process stderr.
    committed May 18, 2015
  10. receive-pack: turn on index-pack resolving progress

    When we receive a large push, the server side may have to
    spend a lot of CPU processing the incoming packfile.
    
    During the "receiving" phase, we are typically network
    bound, and the client is writing its own progress to the
    user. But during the delta resolution phase, we may spend
    minutes (e.g., for a full push of linux.git) without
    making any indication to the user that the connection has
    not hung.
    
    Let's ask index-pack to produce progress output for this
    phase (unless the client asked us to be quiet, of course).
    committed Jul 15, 2016
  11. index-pack: add flag for showing delta-resolution progress

    The index-pack command has two progress meters: one for
    "receiving objects", and one for "resolving deltas". You get
    neither by default, or both with "-v".
    
    But for a push through receive-pack, we would want only the
    "resolving deltas" phase, _not_ the "receiving objects"
    progress. There are two reasons for this.
    
    One is simply that existing clients are already printing
    "writing objects" progress at the same time.  Arguably
    "receiving" from the far end is more useful, because it
    tells you what has actually gotten there, as opposed to what
    might be stuck in a buffer somewhere between the client and
    server. But that would require a protocol extension to tell
    clients not to print their progress. Possible, but
    complexity for little gain.
    
    The second reason is much more important. In a full-duplex
    connection like git-over-ssh, we can print progress while
    the pack is incoming, and it will immediately get to the
    client. But for a half-duplex connection like git-over-http,
    we should not say anything until we have received the full
    request.  Anything we write is subject to being stuck in a
    buffer by the webserver.  Worse, we can end up in a deadlock
    if that buffer fills up.
    
    So our best bet is to avoid writing anything that isn't a
    small fixed size until we've received the full pack.
    committed Jul 15, 2016
  12. clone: use a real progress meter for connectivity check

    Because the initial connectivity check for a cloned
    repository can be slow, 0781aa4 (clone: let the user know
    when check_everything_connected is run, 2013-05-03) added a
    "fake" progress meter; we simply say "Checking connectivity"
    when it starts, and "done" at the end, with nothing between.
    
    Since check_connected() now knows how to do a real progress
    meter, we can drop our fake one and use that one instead.
    committed May 18, 2015
  13. check_connected: add progress flag

    Connectivity checks have to traverse the entire object graph
    in the worst case (e.g., a full clone or a full push). For
    large repositories like linux.git, this can take 30-60
    seconds, during which time git may produce little or no
    output.
    
    Let's add the option of showing progress, which is taken
    care of by rev-list.
    committed Jul 15, 2016
  14. check_connected: relay errors to alternate descriptor

    Unless the "quiet" flag is given, check_connected sends any
    errors to the stderr of the caller (because the child
    rev-list inherits that descriptor). However, server options
    may want to send these over a sideband channel instead.
    Let's make that possible.
    committed May 18, 2015
  15. check_everything_connected: use a struct with named options

    The number of variants of check_everything_connected has
    grown over the years, so that the "real" function takes
    several possibly-zero, possibly-NULL arguments. We hid the
    complexity behind some wrapper functions, but this doesn't
    scale well when we want to add new options.
    
    If we add more wrapper variants to handle the new options,
    then we can get a combinatorial explosion when those options
    might be used together (right now nobody wants to use both
    "shallow" and "transport" together, so we get by with just a
    few wrappers).
    
    If instead we add new parameters to each function, each of
    which can have a default value, then callers who want the
    defaults end up with confusing invocations like:
    
      check_everything_connected(fn, 0, data, -1, 0, NULL);
    
    where it is unclear which parameter is which (and every
    caller needs updated when we add new options).
    
    Instead, let's add a struct to hold all of the optional
    parameters. This is a little more verbose for the callers
    (who have to declare the struct and fill it in), but it
    makes their code much easier to follow, because every option
    is named as it is set (and unused options do not have to be
    mentioned at all).
    
    Note that we could also stick the iteration function and its
    callback data into the option struct, too. But since those
    are required for each call, by avoiding doing so, we can let
    very simple callers just pass "NULL" for the options and not
    worry about the struct at all.
    
    While we're touching each site, let's also rename the
    function to check_connected(). The existing name was quite
    long, and not all of the wrappers even used the full name.
    committed Jul 15, 2016
  16. check_everything_connected: convert to argv_array

    This avoids the magic "9" array-size which we must avoid
    overflowing, making further patches simpler.
    committed May 18, 2015
  17. rev-list: add optional progress reporting

    It's easy to ask rev-list to do a traversal that may takes
    many seconds (e.g., by calling "--objects --all"). In theory
    you can monitor its progress by the output you get to
    stdout, but this isn't always easy.
    
    Some operations, like "--count", don't make any output until
    the end.
    
    And some callers, like check_everything_connected(), are
    using it just for the error-checking of the traversal, and
    throw away stdout entirely.
    
    This patch adds a "--progress" option which can be used to
    give some eye-candy for a user waiting for a long traversal.
    This is just a rev-list option and not a regular traversal
    option, because it needs cooperation from the callbacks in
    builtin/rev-list.c to do the actual count.
    committed May 18, 2015
  18. check_everything_connected: always pass --quiet to rev-list

    The check_everything_connected function takes a "quiet"
    parameter which does two things if non-zero:
    
      1. redirect rev-list's stderr to /dev/null to avoid
         showing errors to the user
    
      2. pass "--quiet" to rev-list
    
    Item (1) is obviously useful. But item (2) is
    surprisingly not. For rev-list, "--quiet" does not have
    anything to do with chattiness on stderr; it tells rev-list
    not to bother writing the list of traversed objects to
    stdout, for efficiency.  And since we always redirect
    rev-list's stdout to /dev/null in this function, there is no
    point in asking it to ever write anything to stdout.
    
    The efficiency gains are modest; a best-of-five run of "git
    rev-list --objects --all" on linux.git dropped from 32.013s
    to 30.502s when adding "--quiet". That's only about 5%, but
    given how easy it is, it's worth doing.
    committed Jul 15, 2016
  19. show git tag output in pager

    On Thu, Sep 29, 2011 at 11:37:49AM +0200, Michal Vyskocil wrote:
    
    > On Tue, Sep 27, 2011 at 04:19:39PM +0200, Matthieu Moy wrote:
    > > The commit message should explain why this is needed, and in particular
    > > why you prefer this to setting pager.tag in your ~/.gitconfig.
    >
    > Opps! I read a documentation, but I did not realize this works for all
    > commands and not only for them calling setup_pager(). Then sorry, no
    > change is needed.
    
    I don't think you want to set pager.tag. It will invoke the pager for
    all tag subcommands, including tag creation and deletion. It's not a
    huge deal if your pager exits immediately when the input is less than a
    page (which I think our default LESS settings will do). But I wouldn't
    be surprised if it ends up confusing some program at some point.
    
    I think instead, you want some way for commands to say "OK, I'm in a
    subcommand that might or might not want a pager now".
    
    Something like the (thoroughly not tested) patch below, which you can
    use like:
    
      git config pager.tag.list
    committed Sep 30, 2011
  20. pager_in_use: make sure output is still going to pager

    When we start a pager, we set GIT_PAGER_IN_USE=1 in the
    environment. This lets sub-processes know that even though
    isatty(1) is not true, it is because it is connected to a
    pager (and we should still turn on human-readable niceties
    like auto-color).
    
    Unfortunately, this is too inclusive for scripts which
    invoke git sub-programs with their stdout going somewhere
    else. For example, if you run "git -p pull rebase", git-pull
    will invoke "git rebase", which invokes:
    
      git format-patch ... >rebased-patches
    
    This format-patch process knows that its stdout is not a
    tty, but because of GIT_PAGER_IN_USE it assumes this is
    because stdout is going to a pager. As a result, it writes
    colorized output, and the matching "git am" invocation
    chokes on it, causing the rebase to fail.
    
    We could work around this by passing "--no-color" to
    format-patch, or by removing GIT_PAGER_IN_USE from the
    environment. But we should not have to do so; format-patch
    should be able to realize that even though GIT_PAGER_IN_USE
    is set, its stdout is not actually going to that pager.
    
    For this simple case, format-patch could see that its output
    is not even a pipe. But that would not catch a case like:
    
      git format-patch | some-program >rebased-patches
    
    where it cannot distinguish between the pipe to the pager
    and the pipe to some-program.
    
    This patch solves it by actually noting the inode of the
    pipe to the pager in the environment, which readers of
    GIT_PAGER_IN_USE can check against their stdout. This
    technically makes GIT_PAGER_IN_USE redundant (we can just
    check the new GIT_PAGER_PIPE_ID), but we keep using both
    variables for compatibility with external scripts:
    
      - scripts which check GIT_PAGER_IN_USE can continue to do
        so, and will just ignore the new pipe-id variable.
        Meaning they may accidentally turn on colors if their
        output is redirected to a file, but that is the same as
        today and we cannot fix that. We do not actively break
        them from showing colors when their stdout _does_ go to
        the pager.
    
      - scripts which set GIT_PAGER_IN_USE but not
        GIT_PAGER_PIPE_ID will continue to turn on colorization
        for git sub-commands (again, they do not benefit from
        the new code, but we are not making anything worse).
    
    The inode-retrieval code itself is abstracted into compat/,
    as different platforms may represent the pipe id
    differently. These ids do not need to be portable across
    systems, only within processes on the same system.
    
    Note that there is an existing test in t7006 which tests for
    the exact _opposite_ of what we are trying to achieve
    (namely, that GIT_PAGER_IN_USE does _not_ cause us to write
    colors to a random file). This test comes from a battery of
    tests added by 60b6e22 (tests: Add tests for automatic use
    of pager, 2010-02-19), and I think is simply misguided, as
    evidenced by the real "git pull" bug above. If you want to
    ensure colors in a file, you do it with "--color", not by
    pretending you have a pager.
    
    Rather than delete the test, though, we simply re-title it
    here. It actually makes a good check of the "scripts which
    set PAGER_IN_USE but not PAGER_PIPE_ID" historical
    compatibility mentioned above.
    committed Aug 10, 2015
  21. support pager.* for aliases

    Until this patch, doing something like:
    
      git config alias.foo log
      git config pager.foo /some/specific/pager
    
    would not respect pager.foo at all. With this patch, we
    will use pager.foo for the "foo" alias.  We will also
    fallback to pager.log if "foo" is a non-shell alias that
    uses the "log" command (but any pager.foo overrides
    pager.log).
    committed Aug 18, 2011
  22. parse_options: allocate a new array when concatenating

    In exactly one callers (builtin/revert.c), we build up the
    options list dynamically from multiple arrays. We do so by
    manually inserting "filler" entries into one array, and then
    copying the other array into the allocated space.
    
    This is tedious and error-prone, as you have to adjust the
    filler any time the second array is modified (although we do
    at least check and die() when the counts do not match up).
    
    Instead, let's just allocate a new array.
    committed Jul 5, 2016
  23. pager_in_use: use git_env_bool

    This function basically reimplements git_env_bool (because
    it predates it). Let's reuse that helper, which is shorter
    and avoids repeating a string literal.
    committed Aug 9, 2015
  24. Makefile: use VCSSVN_LIB to refer to svn library

    We have an abstracted variable; let's use it consistently.
    committed Jul 1, 2016
  25. receive-pack: respect receive.advertiseAlternates config

    Usually receive-pack advertises ref tips from alternates
    repositories so that clients can sometimes avoid sending
    objects that are already upstream.
    
    However, if you have a very large alternates network, then
    the number of ".have" refs can get cumbersome, and you spend
    more time advertising refs that the client doesn't care
    about than you are saving in the optimization.
    
    This patch adds a config variable to drop .have lines
    entirely.
    
    An alternative approach would be to restrict the .have lines
    to a smaller portion of the namespace that is known to be
    "interesting" to most clients. This patch is much simpler;
    if the loss of the .have optimization turns out to be too
    much, we can try something more complex.
    committed Jan 4, 2012
  26. commit: give a hint when a commit message has been abandoned

    If we launch an editor for the user to create a commit
    message, they may put significant work into doing so.
    Typically we try to check common mistakes that could cause
    the commit to fail early, so that we die before the user
    goes to the trouble.
    
    We may still experience some errors afterwards, though; in
    this case, the user is given no hint that their commit
    message has been saved. Let's tell them where it is.
    
    Signed-off-by: Jeff King <peff@peff.net>
    committed Jul 23, 2012
  27. Makefile: drop extra dependencies for test helpers

    A few test-helpers have Makefile dependencies on specific
    object files. But since these files are part of libgit.a
    (which all of the helpers link against), the inclusion is
    simply redundant.
    
    These were once necessary, but became redundant due to
    5c5ba73 (Makefile: Use generic rule to build test programs,
    2007-05-31), which added the $(GITLIBS) dependency (but
    didn't prune the extra dependency lines). Later commits then
    cargo-culted the practice (e.g., b4285c7).
    
    Note that we _do_ need to leave the dependencies on the svn
    library, as that is not part of the usual link command. So
    those must be left.
    committed Jul 1, 2016
  28. print an error when remote helpers die during capabilities

    The transport-helper code generally relies on the
    remote-helper to provide an informative message to the user
    when it encounters an error. In the rare cases where the
    helper does not do so, the output can be quite confusing.
    E.g.:
    
      $ git clone https://example.com/foo.git
      Cloning into 'foo'...
      $ echo $?
      128
      $ ls foo
      /bin/ls: cannot access foo: No such file or directory
    
    We tried to address this with 81d340d (transport-helper:
    report errors properly, 2013-04-10).
    
    But that makes the common case much more confusing. The
    remote helper protocol's method for signaling normal errors
    is to simply hang up. So when the helper does encounter a
    routine error and prints something to stderr, the extra
    error message is redundant and misleading. So we dropped it
    again in 266f1fd (transport-helper: be quiet on read errors
    from helpers, 2013-06-21).
    
    This puts the uncommon case right back where it started. We
    may be able to do a little better, though. It is common for
    the helper to die during a "real" command, like fetching the
    list of remote refs. It is not common for it to die during
    the initial "capabilities" negotiation, right after we
    start. Reporting failure here is likely to catch fundamental
    problems that prevent the helper from running (and reporting
    errors) at all. Anything after that is the responsibility of
    the helper itself to report.
    committed Jan 17, 2014
  29. docs/filter-branch: clean up newsubdir example

    Over the years, this simple example has ended up quite hard
    to read because of the number of special cases that must be
    handled. Let's simplify it a bit:
    
      1. Use the new "index-info --clear" to avoid the need for
         a temporary index.
    
      2. Use "-z" and "perl -0" to avoid dealing with quoting
         issues. As a bonus, using perl means that "\t" will
         work consistently in regexps (the previous example
         using sed was reported to fail on OS X).
    
      3. Change the indentation to keep one logical unit per
         line and avoid extra backslash-escaping.
    committed Apr 16, 2012
  30. update-index: add --clear option

    This just discards existing entries from the index, which
    can be useful if you are rewriting all entries with
    "--index-info" or similar.
    committed Apr 16, 2012
  31. fix?

    committed Aug 21, 2015
  32. setup: don't choose tracked bare repos

    If you have something that looks like a bare repo, but is
    actually tracked inside another non-bare repo (i.e., it is
    in the index), you most likely still want git commands to go
    to the outer repo (e.g., if you had git test fixtures inside
    another repository). This patch tweaks the auto-discovery to
    skip over such bare repos. You can still get to them
    explicitly with GIT_DIR, of course.
    
    This implementation is horrible and inefficient, but should
    at least show whether the idea is sane.
    committed Sep 10, 2015
  33. zeromq dumpstat implementation

    This tries to do as little as possible on top of zeromq,
    which might not work. It sends an "id" field with each
    message to keep some state. Probably this should be
    inherited through the environment.
    committed May 3, 2012
  34. combine-diff: zero memory used for callback filepairs

    In commit 25e5e2b, the combined-diff code learned how to
    make a multi-sourced diff_filepair to pass to a diff
    callback. When we create each filepair, we do not bother to
    fill in many of the fields, because they would make no sense
    (e.g., there can be no rename score or broken_pair flag
    because we do not go through the diffcore filters). However,
    we did not even bother to zero them, leading to random
    values. Let's make sure everything is blank with xcalloc,
    just as the regular diff code does.
    
    We would potentially want to set the "status" flag to
    something non-zero, but it is not clear to what. Possibly a
    new DIFF_STATUS_COMBINED would make sense, as this is not
    strictly a modification, nor does it fit any other category.
    
    Since it is not yet clear what callers would want, this
    patch simply leaves it as "0", the same empty flag that is
    seen when diffcore_std is not used at all.
    committed Nov 6, 2012
  35. @andersk

    t9350-fast-export: Add failing test for symlink-to-directory

    git fast-export | git fast-import fails to preserve a commit that replaces
    a symlink with a directory.  Add a failing test case demonstrating this
    bug.
    
    The fast-export output for the commit in question looks like
    
      commit refs/heads/master
      mark :4
      author …
      committer …
      data 4
      two
      M 100644 :1 foo/world
      D foo
    
    fast-import deletes the symlink foo and ignores foo/world.  Swapping the M
    line with the D line would give the correct result.
    
    Signed-off-by: Anders Kaseorg <andersk@mit.edu>
    andersk committed with Aug 19, 2015