Skip to content
Commits on Sep 1, 2015
  1. @peff @gitster

    pkt-line: show packets in async processes as "sideband"

    peff committed with gitster
    If you run "GIT_TRACE_PACKET=1 git push", you may get
    confusing output like (line prefixes omitted for clarity):
    
       packet:      push< \1000eunpack ok0019ok refs/heads/master0000
       packet:      push< unpack ok
       packet:      push< ok refs/heads/master
       packet:      push< 0000
       packet:      push< 0000
    
    Why do we see the data twice, once apparently wrapped inside
    another pkt-line, and once unwrapped? Why do we get two
    flush packets?
    
    The answer is that we start an async process to demux the
    sideband data. The first entry comes from the sideband
    process reading the data, and the second from push itself.
    Likewise, the first flush is inside the demuxed packet, and
    the second is an actual sideband flush.
    
    We can make this a bit more clear by marking the sideband
    demuxer explicitly as "sideband" rather than "push". The
    most elegant way to do this would be to simply call
    packet_trace_identity() inside the sideband demuxer. But we
    can't do that reliably, because it relies on a global
    variable, which might be shared if pthreads are in use.
    
    What we really need is thread-local storage for
    packet_trace_identity. But the async code does not provide
    an interface for that, and it would be messy to add it here
    (we'd have to care about pthreads, initializing our
    pthread_key_t ahead of time, etc).
    
    So instead, let us just assume that any async process is
    handling sideband data. That's always true now, and is
    likely to remain so in the future.
    
    The output looks like:
    
       packet:  sideband< \1000eunpack ok0019ok refs/heads/master0000
       packet:      push< unpack ok
       packet:      push< ok refs/heads/master
       packet:      push< 0000
       packet:  sideband< 0000
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Jun 16, 2015
  1. @peff @gitster

    pkt-line: support tracing verbatim pack contents

    peff committed with gitster
    When debugging the pack protocol, it is sometimes useful to
    store the verbatim pack that we sent or received on the
    wire. Looking at the on-disk result is often not helpful for
    a few reasons:
    
      1. If the operation is a clone, we destroy the repo on
         failure, leaving nothing on disk.
    
      2. If the pack is small, we unpack it immediately, and the
         full pack never hits the disk.
    
      3. If we feed the pack to "index-pack --fix-thin", the
         resulting pack has the extra delta bases added to it.
    
    We already have a GIT_TRACE_PACKET mechanism for tracing
    packets. Let's extend it with GIT_TRACE_PACKFILE to dump the
    verbatim packfile.
    
    There are a few other positive fallouts that come from
    rearranging this code:
    
     - We currently disable the packet trace after seeing the
       PACK header, even though we may get human-readable lines
       on other sidebands; now we include them in the trace.
    
     - We currently try to print "PACK ..." in the trace to
       indicate that the packfile has started. But because we
       disable packet tracing, we never printed this line. We
       will now do so.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Jun 15, 2015
  1. @peff @gitster

    pkt-line: tighten sideband PACK check when tracing

    peff committed with gitster
    To find the start of the pack data, we accept the word PACK
    at the beginning of any sideband channel, even though what
    we really want is to find the pack data on channel 1. In
    practice this doesn't matter, as sideband-2 messages tend to
    start with "error:" or similar, but it is a good idea to be
    explicit (especially as we add more code in this area, we
    will rely on this assumption).
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. @peff @gitster

    pkt-line: simplify starts_with checks in packet tracing

    peff committed with gitster
    We carefully check that our pkt buffer has enough characters
    before seeing if it starts with "PACK". The intent is to
    avoid reading random memory if we get a short buffer like
    "PAC".
    
    However, we know that the traced packets are always
    NUL-terminated. They come from one of these sources:
    
      1. A string literal.
    
      2. `format_packet`, which uses a strbuf.
    
      3. `packet_read`, which defensively NUL-terminates what we
         read.
    
    We can therefore drop the length checks, as we know we will
    hit the trailing NUL if we have a short input.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Dec 10, 2014
  1. @peff @gitster

    pkt-line: allow writing of LARGE_PACKET_MAX buffers

    peff committed with gitster
    When we send out pkt-lines with refnames, we use a static
    1000-byte buffer. This means that the maximum size of a ref
    over the git protocol is around 950 bytes (the exact size
    depends on the protocol line being written, but figure on a sha1
    plus some boilerplate).
    
    This is enough for any sane workflow, but occasionally odd
    things happen (e.g., a bug may create a ref "foo/foo/foo/..."
    accidentally).  With the current code, you cannot even use
    "push" to delete such a ref from a remote.
    
    Let's switch to using a strbuf, with a hard-limit of
    LARGE_PACKET_MAX (which is specified by the protocol).  This
    matches the size of the readers, as of 74543a0 (pkt-line:
    provide a LARGE_PACKET_MAX static buffer, 2013-02-20).
    Versions of git older than that will complain about our
    large packets, but it's really no worse than the current
    behavior. Right now the sender barfs with "impossibly long
    line" trying to send the packet, and afterwards the reader
    will barf with "protocol error: bad line length %d", which
    is arguably better anyway.
    
    Note that we're not really _solving_ the problem here, but
    just bumping the limits. In theory, the length of a ref is
    unbounded, and pkt-line can only represent sizes up to
    65531 bytes. So we are just bumping the limit, not removing
    it.  But hopefully 64K should be enough for anyone.
    
    As a bonus, by using a strbuf for the formatting we can
    eliminate an unnecessary copy in format_buf_write.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Feb 24, 2013
  1. @peff @gitster

    pkt-line: share buffer/descriptor reading implementation

    peff committed with gitster
    The packet_read function reads from a descriptor. The
    packet_get_line function is similar, but reads from an
    in-memory buffer, and uses a completely separate
    implementation. This patch teaches the generic packet_read
    function to accept either source, and we can do away with
    packet_get_line's implementation.
    
    There are two other differences to account for between the
    old and new functions. The first is that we used to read
    into a strbuf, but now read into a fixed size buffer. The
    only two callers are fine with that, and in fact it
    simplifies their code, since they can use the same
    static-buffer interface as the rest of the packet_read_line
    callers (and we provide a similar convenience wrapper for
    reading from a buffer rather than a descriptor).
    
    This is technically an externally-visible behavior change in
    that we used to accept arbitrary sized packets up to 65532
    bytes, and now cap out at LARGE_PACKET_MAX, 65520. In
    practice this doesn't matter, as we use it only for parsing
    smart-http headers (of which there is exactly one defined,
    and it is small and fixed-size). And any extension headers
    would be breaking the protocol to go over LARGE_PACKET_MAX
    anyway.
    
    The other difference is that packet_get_line would return
    on error rather than dying. However, both callers of
    packet_get_line are actually improved by dying.
    
    The first caller does its own error checking, but we can
    drop that; as a result, we'll actually get more specific
    reporting about protocol breakage when packet_read dies
    internally. The only downside is that packet_read will not
    print the smart-http URL that failed, but that's not a big
    deal; anybody not debugging can already see the remote's URL
    already, and anybody debugging would want to run with
    GIT_CURL_VERBOSE anyway to see way more information.
    
    The second caller, which is just trying to skip past any
    extra smart-http headers (of which there are none defined,
    but which we allow to keep room for future expansion), did
    not error check at all. As a result, it would treat an error
    just like a flush packet. The resulting mess would generally
    cause an error later in get_remote_heads, but now we get
    error reporting much closer to the source of the problem.
    
    Brown-paper-bag-fixes-by: Ramsay Jones <ramsay@ramsay1.demon.co.uk>
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Feb 20, 2013
  1. @peff @gitster

    pkt-line: provide a LARGE_PACKET_MAX static buffer

    peff committed with gitster
    Most of the callers of packet_read_line just read into a
    static 1000-byte buffer (callers which handle arbitrary
    binary data already use LARGE_PACKET_MAX). This works fine
    in practice, because:
    
      1. The only variable-sized data in these lines is a ref
         name, and refs tend to be a lot shorter than 1000
         characters.
    
      2. When sending ref lines, git-core always limits itself
         to 1000 byte packets.
    
    However, the only limit given in the protocol specification
    in Documentation/technical/protocol-common.txt is
    LARGE_PACKET_MAX; the 1000 byte limit is mentioned only in
    pack-protocol.txt, and then only describing what we write,
    not as a specific limit for readers.
    
    This patch lets us bump the 1000-byte limit to
    LARGE_PACKET_MAX. Even though git-core will never write a
    packet where this makes a difference, there are two good
    reasons to do this:
    
      1. Other git implementations may have followed
         protocol-common.txt and used a larger maximum size. We
         don't bump into it in practice because it would involve
         very long ref names.
    
      2. We may want to increase the 1000-byte limit one day.
         Since packets are transferred before any capabilities,
         it's difficult to do this in a backwards-compatible
         way. But if we bump the size of buffer the readers can
         handle, eventually older versions of git will be
         obsolete enough that we can justify bumping the
         writers, as well. We don't have plans to do this
         anytime soon, but there is no reason not to start the
         clock ticking now.
    
    Just bumping all of the reading bufs to LARGE_PACKET_MAX
    would waste memory. Instead, since most readers just read
    into a temporary buffer anyway, let's provide a single
    static buffer that all callers can use. We can further wrap
    this detail away by having the packet_read_line wrapper just
    use the buffer transparently and return a pointer to the
    static storage.  That covers most of the cases, and the
    remaining ones already read into their own LARGE_PACKET_MAX
    buffers.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  2. @peff @gitster

    pkt-line: teach packet_read_line to chomp newlines

    peff committed with gitster
    The packets sent during ref negotiation are all terminated
    by newline; even though the code to chomp these newlines is
    short, we end up doing it in a lot of places.
    
    This patch teaches packet_read_line to auto-chomp the
    trailing newline; this lets us get rid of a lot of inline
    chomping code.
    
    As a result, some call-sites which are not reading
    line-oriented data (e.g., when reading chunks of packfiles
    alongside sideband) transition away from packet_read_line to
    the generic packet_read interface. This patch converts all
    of the existing callsites.
    
    Since the function signature of packet_read_line does not
    change (but its behavior does), there is a possibility of
    new callsites being introduced in later commits, silently
    introducing an incompatibility.  However, since a later
    patch in this series will change the signature, such a
    commit would have to be merged directly into this commit,
    not to the tip of the series; we can therefore ignore the
    issue.
    
    This is an internal cleanup and should produce no change of
    behavior in the normal case. However, there is one corner
    case to note. Callers of packet_read_line have never been
    able to tell the difference between a flush packet ("0000")
    and an empty packet ("0004"), as both cause packet_read_line
    to return a length of 0. Readers treat them identically,
    even though Documentation/technical/protocol-common.txt says
    we must not; it also says that implementations should not
    send an empty pkt-line.
    
    By stripping out the newline before the result gets to the
    caller, we will now treat the newline-only packet ("0005\n")
    the same as an empty packet, which in turn gets treated like
    a flush packet. In practice this doesn't matter, as neither
    empty nor newline-only packets are part of git's protocols
    (at least not for the line-oriented bits, and readers who
    are not expecting line-oriented packets will be calling
    packet_read directly, anyway). But even if we do decide to
    care about the distinction later, it is orthogonal to this
    patch.  The right place to tighten would be to stop treating
    empty packets as flush packets, and this change does not
    make doing so any harder.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  3. @peff @gitster

    pkt-line: provide a generic reading function with options

    peff committed with gitster
    Originally we had a single function for reading packetized
    data: packet_read_line. Commit 46284dd grew a more "gentle"
    form, packet_read, that returns an error instead of dying
    upon reading a truncated input stream. However, it is not
    clear from the names which should be called, or what the
    difference is.
    
    Let's instead make packet_read be a generic public interface
    that can take option flags, and update the single callsite
    that uses it. This is less code, more clear, and paves the
    way for introducing more options into the generic interface
    later. The function signature is changed, so there should be
    no hidden conflicts with topics in flight.
    
    While we're at it, we'll document how error conditions are
    handled based on the options, and rename the confusing
    "return_line_fail" option to "gentle_on_eof".  While we are
    cleaning up the names, we can drop the "return_line_fail"
    checks in packet_read_internal entirely.  They look like
    this:
    
      ret = safe_read(..., return_line_fail);
      if (return_line_fail && ret < 0)
    	  ...
    
    The check for return_line_fail is a no-op; safe_read will
    only ever return an error value if return_line_fail was true
    in the first place.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  4. @peff @gitster

    pkt-line: drop safe_write function

    peff committed with gitster
    This is just write_or_die by another name. The one
    distinction is that write_or_die will treat EPIPE specially
    by suppressing error messages. That's fine, as we die by
    SIGPIPE anyway (and in the off chance that it is disabled,
    write_or_die will simulate it).
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
  5. @peff @gitster

    pkt-line: move a misplaced comment

    peff committed with gitster
    The comment describing the packet writing interface was
    originally written above packet_write, but migrated to be
    above safe_write in f3a3214, probably because it is meant to
    generally describe the packet writing interface and not a
    single function. Let's move it into the header file, where
    users of the interface are more likely to see it.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Commits on Mar 8, 2011
  1. @peff @gitster

    add packet tracing debug code

    peff committed with gitster
    This shows a trace of all packets coming in or out of a given
    program. This can help with debugging object negotiation or
    other protocol issues.
    
    To keep the code changes simple, we operate at the lowest
    level, meaning we don't necessarily understand what's in the
    packets. The one exception is a packet starting with "PACK",
    which causes us to skip that packet and turn off tracing
    (since the gigantic pack data will not be interesting to
    read, at least not in the trace format).
    
    We show both written and read packets. In the local case,
    this may mean you will see packets twice (written by the
    sender and read by the receiver). However, for cases where
    the other end is remote, this allows you to see the full
    conversation.
    
    Packet tracing can be enabled with GIT_TRACE_PACKET=<foo>,
    where <foo> takes the same arguments as GIT_TRACE.
    
    Signed-off-by: Jeff King <peff@peff.net>
    Signed-off-by: Junio C Hamano <gitster@pobox.com>
Something went wrong with that request. Please try again.