Permalink
Commits on Mar 21, 2016
  1. Update README.md

    dagrh committed Mar 21, 2016
  2. colo-proxy: Turn off nagling on comparison socket

    dagrh committed Mar 18, 2016
    We need the compairson packets to get to the destination ASAP; no
    delay!
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  3. colo-proxy: Trigger checkpoints on old packets

    dagrh committed Mar 17, 2016
    If the primary starts generating a series of packets on a connection
    but the secondary never does anything, then a checkpoint won't currently
    happen - the packets from the primary wont get sent until the end of the
    COLO checkpoint unless something else triggers them.
    
    Add a regular check of the packet queues for any old packets, if we find
    them then trigger a checkpoint.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commits on Mar 4, 2016
  1. Create README.md

    dagrh committed Jan 21, 2016
Commits on Mar 3, 2016
  1. Measurement only: Make colo improvements switchable

    dagrh committed Mar 3, 2016
    Make the parallel ram flushing and locked reset switchable
    using migration capabilities to make it easy to measure their effect.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commits on Mar 2, 2016
  1. colo-proxy: Continue updating sequence numbers after failover

    dagrh committed Mar 2, 2016
    After failover to the secondary, any connections that are still
    'established' - i.e. opened in the current checkpoint, need to carry
    on getting their sequence numbers adjusted since some of the packets
    might have reached the outside world before the primary failed.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commits on Mar 1, 2016
  1. colo-proxy: Allow miscomparison of fragmentation id

    dagrh committed Mar 1, 2016
    IP's 'identification' field is *very* random; for dont-fragment
    packets I'm going to cheat and allow a mismatch;  the hope is that
    they'll recover quickly with a timeout.
    
    The longterm fix for this is pass some state from the primary
    proxy to the secondary to sync the IDs.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  2. COLO: Lock memory map around reset/load

    dagrh committed Mar 1, 2016
    Changing the memory map appears to be expensive; we see this
    partiuclarly when on loading a checkpoint we:
       a) reset the devices
          This causes PCI bars to be reset
       b) Loading the device states
          This causes the PCI bars to be reloaded.
    
    Turning this all into a single memory_region_transaction saves
     ~10ms/checkpoint.
    
    TBD: What happens if the device code accesses the RAM during loading
    the checkpoint?
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Commits on Feb 29, 2016
  1. COLO: Allow colo->colo runstate transition

    dagrh committed Feb 26, 2016
    These are happening during failover, I've not quite figured out
    all the paths yet, but they seem to be harmless; needs more
    investigation.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  2. Flush colo ram in parallel

    dagrh committed Feb 24, 2016
    Flush the colo ram cache in parallel; use the same number
    of threads as CPU cores.
    
    On a VM with 4 cores, and 4GB RAM, I've seen a reduction from
    ~20ms to ~16ms using this, which is helpful but not as much
    as I hoped;   I guess one problem might be that all the changes
    could be concentrated in one area of RAM?  Perhaps another approach
    would be to have one thread searching the bitmap and other threads
    doing the copy with some type of work queueu.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commits on Feb 26, 2016
  1. Change state handler/flush

    dagrh committed Feb 19, 2016
    Packets that are sent to the filter queue are usually
    sent onwards straight away, but if they can't be
    (e.g. the guest is stopped) they sit in the queue until
    something causes them to be sent.  Flush the queue
    when teh guest restarts.
    
    TODO: I think in a lot of these cases the packets the
    queue system stops being sent, should be allowed to be sent when
    the guest is stopped.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  2. debug

    dagrh committed Feb 18, 2016
    Just debug enable/disable etc
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Commits on Feb 23, 2016
  1. mutex the packet lists

    dagrh committed Feb 18, 2016
    The packet lists are accessed by:
       a) The main thread for incoming packets on the socket and network
       b) the compare thread
       c) The migration/colo thread (when it calls flush)
    
    Mutex them to stop multiple threads changing the lists at the same
    time.
    
    These probably need to turn into RCU lists that would avoid the locks?
    
    TODO: Hmm some deadlock potential between the big lock when taken for
    the monitor commands and the packet lists as we read them in info
    colo-proxy; anywhere else?
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  2. colo-proxy: Fixup sequence numbers on secondary

    dagrh committed Feb 15, 2016
    Based on the technique in the earlier kernel colo-proxy.
    This only handles inbound connections;  when it sees
    the guest send the syn/ack and then the outside world
    respond with the ack, it uses the number in the ack
    to figure out the sequence number hte primary was using.
    Then we have an offset that gets applied to all packets
    until the next checkpoint comes in.
    This is also applied to incoming ACKs on these connections
    so that the secondary guest sees a consistent view.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    
    More packet stuff; has ack rework
  3. colo: Shorten proxy thread name

    dagrh committed Feb 23, 2016
    Thread names limit at about 14 chars, and if they're longer they're
    ignored.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  4. Ignore sequence mismatches on primary for syn/ack

    dagrh committed Feb 10, 2016
    On an incoming connection ignore the sequence numbers
    on the syn/ack when doing the compare; the secondary
    will fix up all future packets to match the primary
    after it receives the 'ack' from the outside world.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  5. colo: Add checkpoint numbering to packets

    dagrh committed Feb 16, 2016
    Add a checkpoint index before each packet sent over the socket
    between the VMs.  This is mostly as a sanity check, but it
    allows the primary to discard packets that are from the previous
    checkpoint.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  6. info colo-proxy

    dagrh committed Feb 9, 2016
    Add an info command to dump out the state of the proxy; for debug.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  7. Endian fixes for ports

    dagrh committed Feb 18, 2016
    Packets read off the wire are big endian; we don't actually
    need them to be the correct endian ness internally, but
    getting it right makes debug a lot easier.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  8. colo+RDMA: Remap in pin-all mode when switching to COLO

    dagrh committed Feb 1, 2016
    When switching to COLO mode force rmeap of all RAM using the new
    host addresses.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  9. md5 check for sync

    dagrh committed Jan 28, 2016
  10. COLO: Add fine grained stats (secondary side)

    dagrh committed Jan 26, 2016
    Add fine grain stats so we can see where the time goes.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  11. COLO: Add fine grain stats (primary side)

    dagrh committed Jan 26, 2016
    Add finer grained stats to the primary side to see where the time
    goes in each checkpoint.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  12. COLO: Add stats

    dagrh committed Jan 21, 2016
    Used the TimedAverage type to hold min/max/avg sizes for colo stats
    and present them through info migrate.
    
    Based partially on zhanghailiang's earlier stats patches.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  13. COLO: Hybrid mode

    dagrh committed Jan 20, 2016
    Automatically switch into a passive checkpoint mode when checkpoints are
    repeatedly short.  This saves CPU time on the SVM (since it's not running)
    and the network traffic and PVM CPU time for the comparison processing.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  14. COLO: Minimum checkpoint time

    dagrh committed Jan 14, 2016
    If the rate of miscompares is very quick we don't make much forward
    progress;  start off by limiting the minimum checkpoint period
    by adding a delay after we tell the secondary we're going to want
    a checkpoint.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  15. use condition variable to signal miscompare

    dagrh committed Jan 14, 2016
    Instead of polling colo_proxy's miscompare variable, use a
    condition variable and signal it; using timedwait this means the waiting
    is free. (Note we haven't got a wrapper for posix_cond_timedwait
    possibly because it's not portable?)
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  16. Disable packet dumping

    dagrh committed Jan 12, 2016
  17. Migration: Emit event at start of pass

    dagrh committed Dec 16, 2015
    Emit an event each time we sync the dirty bitmap on the source;
    this helps libvirt use postcopy by giving it a kick when it
    might be a good idea to start the postcopy.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Reviewed-by: Eric Blake <eblake@redhat.com>
    Reviewed-by: Amit Shah <amit.shah@redhat.com>
    Message-Id: <1450266458-3178-5-git-send-email-dgilbert@redhat.com>
    Signed-off-by: Amit Shah <amit.shah@redhat.com>
  18. Postcopy: Send events/change state on incoming side

    dagrh committed Dec 16, 2015
    I missed the calls to send migration events on the destination side
    as we enter postcopy.
    Take care when adding them not to do it after state has been freed.
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    Reviewed-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Reviewed-by: Juan Quintela <quintela@redhat.com>
    Reviewed-by: Amit Shah <amit.shah@redhat.com>
    Message-Id: <1450266458-3178-4-git-send-email-dgilbert@redhat.com>
    Signed-off-by: Amit Shah <amit.shah@redhat.com>
  19. Hack colo-proxy into colo module

    dagrh committed Jan 11, 2016
    This is part of Li Zhijian's 'integrate colo proxy to colo module'
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  20. Remove netfilter related code

    dagrh committed Jan 19, 2016
    Minimal verison of Li Zhijian's 'remove netfilter related code'
    
    Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
  21. start kvm dirty log

    zhijianli88 authored and dagrh committed Nov 27, 2015
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
  22. make secondary running

    zhijianli88 authored and dagrh committed Nov 27, 2015
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>