Skip to content

Commits

Permalink
colo-v1.5-deve…
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?

Commits on Jul 30, 2015

  1. COLO: Add some statistics for number of pages transferred

    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    48ea94a View commit details
    Browse the repository at this point in the history
  2. COLO: Save part of dirty pages to slave during the wait time of check…

    …point
    
    We can send part of dirty pages to slave during the wait time of checkpoint,
    where we just sleep before. In this way, we can reduce the pause time for VM
    when do checkpoint.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    e72be88 View commit details
    Browse the repository at this point in the history
  3. COLO: Move the position of ram begin process of saving/loading

    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    ee81058 View commit details
    Browse the repository at this point in the history
  4. arch_init: Change the return value of ram_save_complete

    Let ram_save_complete return the number of pages that been sent,
    just like what ram_save_iterate does.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    10161cb View commit details
    Browse the repository at this point in the history
  5. COLO: Separate ram and device save/load process

    We separate the process of saving/loading ram and device when do checkpoint,
    we add new helpers for save/load ram/device. With this change, we can directly
    transfer ram from master to slave without using QEMUSizeBuffer as assistant,
    which also reduce the size of extra memory been used during checkpoint.
    
    Besides, we move the colo_flush_ram_cache to the proper position after the
    above change.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    1
    Copy the full SHA
    b84f57f View commit details
    Browse the repository at this point in the history
  6. savevm: Split load vm state function qemu_loadvm_state

    qemu_loadvm_state is too long, and we can simplify it by splitting up
    with three helper functions.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    c488537 View commit details
    Browse the repository at this point in the history
  7. COLO: Expose statistics information of checkpoint to user

    You can get some statistics information of checkpoint by using qmp command or
    hmp command 'info migrate'.
    
    Cc: Luiz Capitulino <lcapitulino@redhat.com>
    Cc: Eric Blake <eblake@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    22f0e37 View commit details
    Browse the repository at this point in the history
  8. COLO: Add some statistics information for checkpoint

    The statistics include: total checkpoint count, checkpoint count because of
    proxy net packets inconsistent, periodic checkpoint count, also, include
    VM's downtime during checkpoint.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    9059fcb View commit details
    Browse the repository at this point in the history
  9. COLO: Add block replication into colo process

    Make sure master start block replication after slave's block replication started.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Wen Congyang <wency@cn.fujitsu.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    9f71e6d View commit details
    Browse the repository at this point in the history
  10. COLO: Implement shutdown checkpoint

    For Secondary VM, we forbid it shutdown directly when in COLO mode,
    FOR Primary VM's shutdown, we should do some work to ensure the consistent action
    between PVM and SVM.
    
    Cc: Paolo Bonzini <pbonzini@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    ccf8bca View commit details
    Browse the repository at this point in the history
  11. COLO NIC: Implement NIC checkpoint and failover

    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    f915df5 View commit details
    Browse the repository at this point in the history
  12. COLO: Add colo-set-checkpoint-period command

    With this command, we can control the period of checkpoint, if
    there is no comparison of net packets.
    
    Cc: Luiz Capitulino <lcapitulino@redhat.com>
    Cc: Eric Blake <eblake@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    77d7e85 View commit details
    Browse the repository at this point in the history
  13. COLO: Improve checkpoint efficiency by do additional periodic checkpoint

    Besides normal checkpoint which according to the result of net packets
    comparing, We do additional checkpoint periodically, it will reduce the number
    of dirty pages when do one checkpoint, if we don't do checkpoint for a long
    time (This is a special case when the net packets is always consistent).
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    4bb6a2c View commit details
    Browse the repository at this point in the history
  14. COLO: Do checkpoint according to the result of packets comparation

    Only do checkpoint, when the PVM's and SVM's output net packets are inconsistent,
    We also limit the min time between two continuous checkpoint action, to
    give VM a change to run.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    d36d6ec View commit details
    Browse the repository at this point in the history
  15. COLO: Handle nfnetlink message from proxy module

    Proxy module will send message to qemu through nfnetlink.
    Now, the message only contains the result of packets comparation.
    
    We use a global variable 'packet_compare_different' to store the result.
    And this variable should be accessed by using atomic related function,
    such as 'atomic_set' 'atomic_xchg'.
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    4e50ead View commit details
    Browse the repository at this point in the history
  16. COLO NIC: Some init work related with proxy module

    Implement communication protocol with proxy module by using
    nfnetlink, which requires libnfnetlink libs.
    
    Tell proxy module to do initialization work and moreover ask
    kernel to acknowledge the request. It's is necessary for the first
    time because Netlink is not a reliable protocol.
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    f942f8a View commit details
    Browse the repository at this point in the history
  17. COLO NIC: Implement colo nic init/destroy function

    When in colo mode, call colo nic init/destroy function.
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    f047dfe View commit details
    Browse the repository at this point in the history
  18. colo-nic: Handle secondary VM's original net device configure

    For secondary VM, we need to reconfigure its original net devices,
    Before go into COLO mode, we detach its original net devices (here is tap)
    from its default configure (here is bridge), and
    attach the net devices to forward bridge.
    When exit from COLO mode, we resume its origianl configure.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    3cf1d59 View commit details
    Browse the repository at this point in the history
  19. COLO NIC: Implement colo nic device interface configure()

    Implement colo nic device interface configure()
    add a script to configure nic devices:
    ${QEMU_SCRIPT_DIR}/colo-proxy-script.sh
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    20ca051 View commit details
    Browse the repository at this point in the history
  20. tap: Make launch_script() public

    We also change the parameters of launch_script().
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    1d1a66a View commit details
    Browse the repository at this point in the history
  21. COLO NIC: Init/remove colo nic devices when add/cleanup tap devices

    When go into COLO mode, we need to some init work for all VM's nics.
    Here we use a list to record these nic, and for now we only support
    the 'tap' nic backend.
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    90023ff View commit details
    Browse the repository at this point in the history
  22. COLO: Add new command parameter 'forward_nic' 'colo_script' for net

    The 'forward_nic' should be assigned with network name,
    for exmple, 'eth2'. It will be parameter of 'colo_script',
    'colo_script' should be assigned with an scirpt path.
    
    We parse these parameter in tap.
    
    Cc: Stefan Hajnoczi <stefanha@redhat.com>
    Cc: Jason Wang <jasowang@redhat.com>
    Cc: Eric Blake <eblake@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    b289c07 View commit details
    Browse the repository at this point in the history
  23. COLO failover: Don't do failover during loading VM's state

    We should not do failover work while the main thread is loading
    VM's state, otherwise it will destroy the consistent of VM's memory and
    device state.
    
    Here we add a new failover status 'RELAUNCH' which means we should
    relaunch the process of failover.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    348155e View commit details
    Browse the repository at this point in the history
  24. qmp event: Add event notification for COLO error

    If some errors happen during VM's COLO FT stage, it's import to notify the users
    this event, Togehter with 'colo_lost_heartbeat', users can intervene in COLO's
    failover work immediately.
    If users don't want to get involved in COLO's failover verdict,
    it is still necessary to notify users that we exit COLO mode.
    
    Cc: Markus Armbruster <armbru@redhat.com>
    Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    72de29e View commit details
    Browse the repository at this point in the history
  25. COLO failover: Implement COLO primary/secondary vm failover work

    If there are some errors happen, we will give users(administrators) time to
    get involved in failover verdict, which they can decide
    which side should take over the work by using 'colo_lost_heartbeat' command.
    
    Note: The default verdict is primary VM takes over work while secondary VM exit.
    So if users choose secondary VM to take over work, please make sure that
    Primary VM is dead, or there will be 'split-brain' problem.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    3
    Copy the full SHA
    7792783 View commit details
    Browse the repository at this point in the history
  26. COLO failover: Introduce state to record failover process

    When handling failover, we do different things according to the different stage
    of failover process, here we introduce a global atomic variable to record the
    status of failover.
    
    We add four failover status to indicate the different stage of failover process.
    You should use the helpers to get and set the value.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    0691d3b View commit details
    Browse the repository at this point in the history
  27. COLO failover: Introduce a new command to trigger a failover

    We leave users to use whatever heartbeat solution they want, if the heartbeat
    is lost, or other errors they detect, they can use command
    'colo_lost_heartbeat' to tell COLO to do failover, COLO will do operations
    accordingly.
    
    For example,
    If send the command to PVM, Primary will exit COLO mode, and takeover,
    if to Secondary, Secondary will do failover work and at last takeover server.
    
    Cc: Luiz Capitulino <lcapitulino@redhat.com>
    Cc: Eric Blake <eblake@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    9082e28 View commit details
    Browse the repository at this point in the history
  28. COLO RAM: Flush cached RAM into SVM's memory

    During the time of VM's running, PVM/SVM may dirty some pages, we will transfer
    PVM's dirty pages to SVM and store them into SVM's RAM cache at next checkpoint
    time. So, the content of SVM's RAM cache will always be some with PVM's memory
    after checkpoint.
    
    Instead of flushing all content of SVM's RAM cache into SVM's MEMORY,
    we do this in a more efficient way:
    Only flush any page that dirtied by PVM or SVM since last checkpoint.
    In this way, we ensure SVM's memory same with PVM's.
    
    Besides, we must ensure flush RAM cache before load device state.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: Gonglei <arei.gonglei@huawei.com>
    colo-ft committed Jul 30, 2015
    2
    Copy the full SHA
    b3c1168 View commit details
    Browse the repository at this point in the history
  29. arch_init: Start to trace dirty pages of SVM

    we will use this dirty bitmap together with VM's cache RAM dirty bitmap
    to decide which page in cache should be flushed into VM's RAM.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    6788008 View commit details
    Browse the repository at this point in the history
  30. COLO VMstate: Load VM state into qsb before restore it

    We should not destroy the state of secondary until we receive the whole
    state from the primary, in case the primary fails in the middle of sending
    the state, so, here we cache the device state in Secondary before restore it.
    
    Besides, we should call qemu_system_reset() before load VM state,
    which can ensure the data is intact.
    
    Note: If we discard qemu_system_reset(), there will be some odd error,
    For exmple, qemu in slave side crashes and reports:
    
    KVM: entry failed, hardware error 0x7
    EAX=00000000 EBX=0000e000 ECX=00009578 EDX=0000434f
    ESI=0000fc10 EDI=0000434f EBP=00000000 ESP=00001fca
    EIP=00009594 EFL=00010246 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
    ES =0040 00000400 0000ffff 00009300
    CS =f000 000f0000 0000ffff 00009b00
    SS =434f 000434f0 0000ffff 00009300
    DS =434f 000434f0 0000ffff 00009300
    FS =0000 00000000 0000ffff 00009300
    GS =0000 00000000 0000ffff 00009300
    LDT=0000 00000000 0000ffff 00008200
    TR =0000 00000000 0000ffff 00008b00
    GDT=     0002dcc8 00000047
    IDT=     00000000 0000ffff
    CR0=00000010 CR2=ffffffff CR3=00000000 CR4=00000000
    DR0=0000000000000000 DR1=0000000000000000 DR2=0000000000000000 DR3=0000000000000000
    DR6=00000000ffff0ff0 DR7=0000000000000400
    EFER=0000000000000000
    Code=c0 74 0f 66 b9 78 95 00 00 66 31 d2 66 31 c0 e9 47 e0 fb 90 <f3> 90 fa fc 66 c3 66 53 66 89 c3 66 e8 9d e8 ff ff 66 01 c3 66 89 d8 66 e8 40 e9 ff ff 66
    ERROR: invalid runstate transition: 'internal-error' -> 'colo'
    
    The reason is, some of the device state will be ignored when saving device state to slave,
    if the corresponding data is in its initial value, such as 0.
    But the device state in slave maybe in initialized value, after a loop of checkpoint,
    there will be inconsistent for the value of device state.
    This will happen when the PVM reboot or SVM run ahead of PVM in the startup process.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: Gonglei <arei.gonglei@huawei.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    690f83b View commit details
    Browse the repository at this point in the history
  31. COLO RAM: Load PVM's dirty page into SVM's RAM cache temporarily

    The ram cache is initially the same as SVM/PVM's memory.
    
    At checkpoint, we cache the dirty RAM of PVM into RAM cache in the slave
    (so that RAM cache always the same as PVM's memory at every
    checkpoint), we will flush cached RAM to SVM after we receive
    all PVM's vmstate (RAM/device).
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    Signed-off-by: Gonglei <arei.gonglei@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    5209d89 View commit details
    Browse the repository at this point in the history
  32. COLO: Save VM state to slave when do checkpoint

    We should save PVM's RAM/device to slave when needed.
    
    For VM state, we will cache them in slave, we use QEMUSizedBuffer
    to store the data, we need know the data size of VM state, so in master,
    we use qsb to store VM state temporarily, and then migrate the data to
    slave.
    
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: Gonglei <arei.gonglei@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    66f20cf View commit details
    Browse the repository at this point in the history
  33. QEMUSizedBuffer: Introduce two help functions for qsb

    Introduce two new QEMUSizedBuffer APIs which will be used by COLO to buffer
    VM state:
    One is qsb_put_buffer(), which put the content of a given QEMUSizedBuffer
    into QEMUFile, this is used to send buffered VM state to secondary.
    Another is qsb_fill_buffer(), read 'size' bytes of data from the file into
    qsb, this is used to get VM state from socket into a buffer.
    
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    5905fdd View commit details
    Browse the repository at this point in the history
  34. COLO: Add a new RunState RUN_STATE_COLO

    Guest will enter this state when paused to save/restore VM state
    under colo checkpoint.
    
    Cc: Eric Blake <eblake@redhat.com>
    Cc: Markus Armbruster <armbru@redhat.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    Signed-off-by: Gonglei <arei.gonglei@huawei.com>
    Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    3e2a61a View commit details
    Browse the repository at this point in the history
  35. COLO: Implement colo checkpoint protocol

    We need communications protocol of user-defined to control the checkpoint
    process.
    
    The new checkpoint request is started by Primary VM, and the interactive process
    like below:
    Checkpoint synchronizing points,
    
                      Primary                 Secondary
      NEW             @
                                              Suspend
      SUSPENDED                               @
                      Suspend&Save state
      SEND            @
                      Send state              Receive state
      RECEIVED                                @
                      Flush network           Load state
      LOADED                                  @
                      Resume                  Resume
    
                      Start Comparing
    NOTE:
     1) '@' who sends the message
     2) Every sync-point is synchronized by two sides with only
        one handshake(single direction) for low-latency.
        If more strict synchronization is required, a opposite direction
        sync-point should be added.
     3) Since sync-points are single direction, the remote side may
        go forward a lot when this side just receives the sync-point.
    
    Signed-off-by: Yang Hongyang <yanghy@cn.fujitsu.com>
    Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
    Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
    Signed-off-by: Gonglei <arei.gonglei@huawei.com>
    colo-ft committed Jul 30, 2015
    Copy the full SHA
    e4dbb72 View commit details
    Browse the repository at this point in the history
Older