Permalink
Commits on Nov 25, 2011
  1. adding systemd unit file

    committed Nov 25, 2011
Commits on Nov 24, 2011
  1. skip recovery when there is a pending recovery work

    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 21, 2011
  2. process only the latest epoch recovery

    This improves the performance of recovery when multiple node failure
    occurs.
    
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 21, 2011
  3. reset retry_cnt before calling __fill_obj_list()

    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 21, 2011
  4. collie: fix an typo in vdi object command output

    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 24, 2011
  5. sheep: don't exit when sheep calls leave_cluster()

    When some unrecoverable error happens, sheep daemon will leave the cluster but stay
    as a gate to redirect requests.
    
    For e.g, fllowing case is sheep meets an EIO
    ...
    Nov 24 10:36:15 do_io_request(785) failed: 2, 2, 7c2b2500000000 , 1, 3
    Nov 24 10:36:15 io_op_done(147) leaving sheepdog cluster
    Nov 24 10:36:15 sd_leave_handler(1291) network partition bug: this sheep should have exited
    Nov 24 10:36:15 log_sigsegv(358) logger pid 8255 exiting abnormally
    ...
    
    Thit has nothing to do with network partition stuff.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 24, 2011
  6. collie: fix vdi_object() read size

    This fixes the bug for command 'collie vdi object image -i x'
    
    Since now we don't support short read, for data object, we have
    to pass the exact size, or sheep daemon will error out.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 24, 2011
Commits on Nov 22, 2011
  1. sheep: use do_process_work() to handle io request

    Since we already have a low level framework to handle requests, let's use
    it to handle io requests too.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 22, 2011
  2. sheep: refactor local and io request handling

    They don't share any code or logic, let's split 'em out.
     - add a new function to handle local request.
    
    other minor changes:
     - rename store/cluster_queue_request into do_io/cluster_request to conform naming in ops.c
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 22, 2011
  3. sheep: unify cow object and regular object writing path

    This is necessary to do further unifying of sheep requests handling.
    
    small changes on other:
     - remove read_from_one and merge it, make it return sd result.
     - rename read_from_other_sheep into read_copy_from_cluster
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 22, 2011
Commits on Nov 18, 2011
  1. sheep: return error when read/write cannot process full-length data

    Sheepdog block driver doesn't expect that SD_OP_READ/WRITE_OBJECT
    processes less data than requested, so we should return SD_RES_EIO in
    that case.
    
    With this patch, we can return the result code in read_object() and
    make code readable.
    
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 18, 2011
  2. reduce the maximum size of vdi attributes from 4 MB to 64 KB

    This allows us to make simple_store_read()/write() fail when it cannot
    read/write full length data.
    
    This patch can also remove SD_FLAG_CMD_TRUNCATE.
    
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 18, 2011
  3. sheep: abstract out store IO interface

    We need to abstract out store IO interface to adopt it to other IO store
    such as the coming 'Farm' store.
    
    The open/read/write/close is cumbersome for a common kv-store to work with,
    but this interface request smallest changes to current sheep store code. It
    sucks but works as a kludge.
    
    Don't get me wrong that I am writing an universal interface that will work well
    with different kinds of data stores, say, sql-store, non-sql store, unstructured store,
    the store that is not with local backing stroage, etc.
    
    Simply I am *not* and I am always lost to foresee the future.
    
    This interface is stupid but simply enough that costs me smallest changes to existing code
    to let Sheepdog work with current store implementation and the coming 'Farm' store.
    
    I think those kind people who try to squeeze other useful stores into Sheepdog are at a better
    position to cook a more generic interface in the future.
    
    - Why include length, offset that many kv stores don't need at all?
    
    Okay, we'er trying to implement huge data size, so we need these to do partial object read/write.
    
    - Why 'int fd' instead of a void *opaque for store object handle?
    
    I suppose file is everything in UNIX philosophy and so fd can name everything and I hate type
    conversion and frown when I can't cscope what it means for one second.
    
    And last, I am happy to see anybody prove me wrong and replace it with a more capable interface.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 18, 2011
  4. journel: move data commiting out of jrnl_perform()

    Let jrnl_perform just concentrate on journeling stuff, not intrude in store IO.
    This would make store IO interface abstracting easier and cleaner.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 18, 2011
Commits on Nov 17, 2011
  1. sheep: use sys_stat_* helper to check status

    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 17, 2011
  2. logger: quiet gcc about write()

    use xwrite() instead of write() to get rid of below kindly warning:
    
    logger.c:276: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 17, 2011
  3. sheep: add string buf candy helpers

    This is almost taken from git. Thank git if you find it useful.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 17, 2011
  4. sheep: add hlist candy helpers

    Taken from Linux kernel.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 17, 2011
  5. sheep: add some candy helpers in util.c

    These are trivial helper wrappers around standard IO functions
    and interger hash function. "stolen" from git and Linux kernel.
    
    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 17, 2011
  6. sheep: modify Makefile.am for candy helpers.

    Signed-off-by: Liu Yuan <tailai.ly@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    liuy committed with kazum Nov 17, 2011
  7. sheep: fix uninitialized value in sd_join_handler()

    The value 'w' is unallocated when the join result is
    CJ_RES_MASTER_TRANSFER.
    
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 16, 2011
Commits on Nov 15, 2011
  1. cluster: add accord cluster driver

    This adds initial support for the Accord cluster driver.
    
    Usage:
      $ sheep /store -c accord:[accord server address]
    
    TODO:
     - use asynchronous Accord APIs
     - use watch notification instead of loop and sleep
     - use transaction instead of global distributed lock
    
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 14, 2011
Commits on Nov 14, 2011
  1. sdnet: tidy up queue_request

    Use a switch for the system status, and use a common done goto labels for
    all cases that want to complete the request and return.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 14, 2011
  2. sdnet: split up __done

    Split the __done function into one helper per operation type given that
    there is no shared code between the different types.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 14, 2011
  3. fix a compiler warning in forward_write_obj_req

    rlen is never used in the function, and recent gcc complains about
    this fact.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 14, 2011
  4. tests: add qemu-io testcases.

    Signed-off-by: CHEN Baozi <chenbaozi.pt@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    CHEN Baozi committed with kazum Nov 14, 2011
Commits on Nov 11, 2011
  1. tests: add test_io method to support qemu-io test.

    Also fixed some python grammar bugs. (missing "self." when refering
    member variable in Python class)
    
    Noticed that the subprocess.PIPE in python has limited size. I redirect
    it to None after the node has joined Sheepdog successfully, or it would
    lead a dead-lock when the pipe becomes full.
    
    Signed-off-by: CHEN Baozi <chenbaozi.pt@taobao.com>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    CHEN Baozi committed with kazum Nov 11, 2011
  2. store: use fallocate when allocating new objects

    Writing zeroes into the last sector of an object is not going to
    preallocate it, but just allocates the last sector.  This leads
    to fairly nasty fragmentation.  Use fallocate on the whole object
    instead.  On my test setup with XFS this speeds up writes to an
    unallocate volume from ~73MB/s to ~80MB/s.
    
    If the filesystem does not support fallocate we fall back to the
    old code.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 11, 2011
  3. store: split store_queue_request_local

    Split store_queue_request_local into one function for each command.  While
    this leads to a small amount of duplication it keeps the code nicely
    separated and helps with adding new commands.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 11, 2011
  4. enable silent make

    Don't display the compiler command line by default, and let errors stick out
    more clearly.  If needed make V=1 shows the full command line again.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    hch@infradead.org committed with kazum Nov 11, 2011
  5. Revert "store: propagate open failure in store_queue_request_local"

    This reverts commit 5d513a0.
    
    Conflicts:
    
    	sheep/store.c
    
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    kazum committed Nov 11, 2011
  6. O_DIRECT is not a replacement for O_DSYNC

    Even if a file is opened with O_DIRECT we still need O_DSYNC / fdatasync
    to make sure all metadata required to find the data made it to disk.
    
    Also clean up the flags handling in ob_open a bit.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 10, 2011
  7. use O_DSYNC instead of O_SYNC

    Using O_DSYNC means we do not have to write out the inode if we are
    overwriting full allocated blocks.  For sheepdog that is a fairly usual
    use case when blocks in an image has already been allocated and the guest
    OS overwrites previously deleted blocks with new data.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 10, 2011
  8. store: clean up store_queue_request_local a bit

    The SD_OP_WRITE_OBJ/SD_OP_READ_OBJ and SD_OP_CREATE_AND_WRITE_OBJ share
    no code, so split them apart.  Also us O_TRUNC instead of calling
    ftruncate to zero after opening for the SD_OP_CREATE_AND_WRITE_OBJ case.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 10, 2011
  9. store: propagate open failure in store_queue_request_local

    Currently store_queue_request_local returns success when an open fails,
    change this to SD_RES_EIO to indicate failure.  It might make sense to
    make the failure more specific, but this at least fixes the bug for now.
    
    Signed-off-by: Christoph Hellwig <hch@lst.de>
    Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
    Christoph Hellwig committed with kazum Nov 10, 2011