Permalink
Commits on Dec 16, 2014
  1. Checking in changes prior to tagging of version 2.72.

    dormando committed Dec 16, 2014
    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index a6b2872..441b328 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,29 @@
    +2014-12-15: Release version 2.72
    +
    +   * Work with DBD::SQLite's latest lock errors (dormando <dormando@rydia.net>)
    +
    +   * remove update_host_property (Eric Wong <e@80x24.org>)
    +
    +   * remove users of unreachable_fids table (Eric Wong <e@80x24.org>)
    +
    +   * monitor: batch MySQL device table updates (Eric Wong <normalperson@yhbt.net>)
    +
    +   * monitor: defer DB updates until all HTTP requests are done (Eric Wong <normalperson@yhbt.net>)
    +
    +   * connection/poolable: defer expiry of timed out connections (Eric Wong <e@80x24.org>)
    +
    +   * connection/poolable: disable watch_write before retrying write (Eric Wong <normalperson@yhbt.net>)
    +
    +   * connection/poolable: do not write before event_write (Eric Wong <normalperson@yhbt.net>)
    +
    +   * add conn_pool_size configuration option (Eric Wong <normalperson@yhbt.net>)
    +
    +   * enable TCP keepalives for iostat watcher sockets (Eric Wong <normalperson@yhbt.net>)
    +
    +   * host: add "readonly" state to override device "alive" state (Eric Wong <normalperson@yhbt.net>)
    +
    +   * add LICENSE file to distro (dormando <dormando@rydia.net>)
    +
     2013-08-18: Release version 2.70
    
        * This release features a very large rewrite to the Monitor worker to run
  2. Work with DBD::SQLite's latest lock errors

    dormando committed Dec 16, 2014
    "is not unique" => "UNIQUE constraint failed". String matching is lovely.
  3. remove update_host_property

    Eric Wong committed with dormando Nov 12, 2014
    No longer used since commit ebf8a5a
    ("Mass nuke unused code and fix most tests") in MogileFS 2.50
  4. remove users of unreachable_fids table

    Eric Wong committed with dormando Nov 12, 2014
    mark_fidid_unreachable has not been used since MogileFS 2.35
    commit 53528c7
    ("Wipe out old replication code.", r1432)
  5. monitor: batch MySQL device table updates

    Eric Wong committed with dormando Feb 6, 2014
    Issuing many UPDATE statements slow down monitoring on high latency
    connections between the monitor and DB.  Under MySQL, it is possible
    to do multiple UPDATEs in a single statement using CASE/WHEN
    syntax.
    
    We limit ourselves to 10000 devices per update for now, this should
    keep us comfortably under most the max_allowed_packet size of most
    MySQL deployments (where the default is 1M).
    
    A compatibility function is provided for SQLite and Postgres users.
    SQLite users are not expected to run this over high-latency NFS, and
    interested Postgres users should submit their own implementation.
  6. monitor: ping parent during deferred DB updates

    Eric Wong committed with dormando Feb 1, 2014
    With enough devices and high enough network latency to the DB,
    we bump into the watchdog timeout of 30s easily.
  7. monitor: defer DB updates until all HTTP requests are done

    Eric Wong committed with dormando Feb 1, 2014
    HTTP requests time out because we had to wait synchronously for DBI,
    this is very noticeable on a high-latency connection.  So avoid
    running synchronous code while asynchronous code (which is subject
    to timeouts) is running..
  8. connection/poolable: defer expiry of timed out connections

    Eric Wong committed with dormando Feb 1, 2014
    The timeout check may run on a socket before epoll_wait/kevent has
    a chance to run, giving the application no chance for any readiness
    callbacks to fire.
    
    This prevents timeouts in the monitor if the database is slow during
    synchronous UPDATE device calls (or there are just thousands of active
    connections).
  9. connection/poolable: disable watch_write before retrying write

    Eric Wong committed with dormando Nov 18, 2013
    Otherwise we'll end up constantly waking up when there's nothing
    to write.
  10. connection/poolable: do not write before event_write

    Eric Wong committed with dormando Nov 18, 2013
    Blindly attempting to write to a socket before a TCP connection can be
    established returns EAGAIN on Linux, but not on FreeBSD 8/9.  This
    causes Danga::Socket to error out, as it won't attempt to buffer on
    anything but EAGAIN on write() attempts.
    
    Now, we buffer writes explicitly after the initial socket creation and
    connect(), and only call Danga::Socket::write when we've established
    writability.  This works on Linux, too, and avoids an unnecessary
    syscall in most cases.
    
    Reported-by: Alex Yakovenko <aleksey.yakovenko@gmail.com>
  11. add conn_pool_size configuration option

    Eric Wong committed with dormando Oct 14, 2013
    This defines the size of the HTTP connection pool.  This affects
    all workers at the moment, but is likely most interesting to the
    Monitor as it affects the number of devices the monitor may
    concurrently update.
    
    This defaults to 20 (the long-existing, hard-coded value).
    
    In the future, there may be a(n easy) way to specify this on a
    a per-worker basis, but for now it affects all workers.
  12. enable TCP keepalives for iostat watcher sockets

    Eric Wong committed with dormando Sep 26, 2013
    This allows the monitor to eventually notice a client socket is
    totally gone if a machine death was not detected earlier.  We enable
    TCP keepalive everywhere else, too.
  13. host: add "readonly" state to override device "alive" state

    Eric Wong committed with dormando Sep 25, 2012
    Marking an entire host as "readonly" before a host maintenance
    window can useful and easier than marking each device "readonly"
    and reduces the likelyhood a device will be incorrectly marked
    as "alive" again when it is intended to stay down.
Commits on Dec 15, 2014
  1. add LICENSE file to distro

    dormando committed Dec 15, 2014
    Clarified by Brad Fitzpatrick
Commits on Aug 19, 2013
  1. Checking in changes prior to tagging of version 2.70.

    dormando committed Aug 19, 2013
    Changelog diff is:
    
    diff --git a/CHANGES b/CHANGES
    index b74f7f4..a6b2872 100644
    --- a/CHANGES
    +++ b/CHANGES
    @@ -1,3 +1,26 @@
    +2013-08-18: Release version 2.70
    +
    +   * This release features a very large rewrite to the Monitor worker to run
    +     checks in parallel. There are no DB schema changes.
    +
    +   * replicate: use persistent connection from pool if possible (Eric Wong <normalperson@yhbt.net>)
    +
    +   * replicate: enforce expected Content-Length in http_copy (Eric Wong <normalperson@yhbt.net>)
    +
    +   * create_open: parallelize directory vivification (Eric Wong <normalperson@yhbt.net>)
    +
    +   * device: reuse HTTP connections for MKCOL (Eric Wong <normalperson@yhbt.net>)
    +
    +   * delete worker uses persistent HTTP connections (Eric Wong <normalperson@yhbt.net>)
    +
    +   * httpfile: use HTTP connection pool for DELETE (Eric Wong <normalperson@yhbt.net>)
    +
    +   * httpfile: use Net::HTTP::NB, remove LWP::UserAgent (Eric Wong <normalperson@yhbt.net>)
    +
    +   * fsck: parallelize size checks for any given FID (Eric Wong <normalperson@yhbt.net>)
    +
    +   * monitor: refactor/rewrite to use new async API (Eric Wong <normalperson@yhbt.net>)
    +
     2013-08-07: Release version 2.68
    
        * optimize monitor worker for large installs (Eric Wong <normalperson@yhbt.net>)
Commits on Aug 10, 2013
  1. monitor: remove misleading error message for timeout

    Eric Wong committed Aug 2, 2013
    The timeout we're removing includes time spent in the queue waiting
    to even start, so reporting it in the syslog is confusing,
    especially since we already log the timeout via Connection::Poolable
    
    This avoids a confusing sequence of error messages like the following:
    
    [monitor(666)] node_timeout: 2 (elapsed: 2.00099802017212): GET http://127.0.0.1:7500/dev666/usage
    [monitor(666)] Timeout contacting 127.0.0.1 dev 666 (http://127.0.0.1:7500/dev666/usage):  took 2.25 seconds out of 2 allowed
    
    Now, we only display the first message.
  2. ProcManager: SetAsChild drops inherited IPC sockets

    Eric Wong committed Sep 4, 2012
    Workers only need to inherit the minimum amount necessary from the
    parent ProcManager.  Keeping the socket of unrelated workers in each
    worker is wasteful and may contribute to premature resource
    exhaustion.
    
    Additionally, we will be using Danga::Socket in more (possibly all)
    workers, not just the Monitor and Reaper.  Resetting in workers that
    do not use Danga::Socket is harmless and will not allocate
    epoll/kqueue descriptors until the worker actually uses
    Danga::Socket.
  3. connection/poolable: stricter timeout key check

    Eric Wong committed Feb 23, 2013
    String representations of small floating point values may
    be in (scientific) E notation, so we must ensure the entire
    string is free of decimal digits before considering it a
    configuration key.
  4. connection/{poolable,http}: common retry logic for timeouts

    Eric Wong committed Feb 22, 2013
    We will want similar logic for Mogstored sidechannel to avoid
    retrying on timeout.
  5. t/http.t: test error handling on non-running server

    Eric Wong committed Feb 22, 2013
    We need to ensure we don't blow up a worker process if a
    server is shutdown and a connection attempted before the
    monitor notices.
  6. ConnectionPool: improve reporting of socket creation errors

    Eric Wong committed Feb 21, 2013
    Send the entire error message (including intended host:port so
    it is more informative when it propagates to
    Connection::HTTP::err_response.  We also do not need to log
    the error in ConnectionPool, as the error will be logged by
    the caller.
    
    While we're at it, fix the documentation and a spelling error in
    err_response, too.
  7. host: handle case where conn_get may return undef

    Eric Wong committed Feb 21, 2013
    MogileFS::ConnectionPool::conn_get may return undef on some
    errors, so we must account for that and not kill the replicate
    worker.
  8. replicate: use persistent connection from pool if possible

    Eric Wong committed Feb 4, 2013
    This should reduce the amount of TIME-WAIT sockets and TCP
    handshakes when replicating, especially with small files.
    
    An attempt was previously made to use the Net::HTTP::NB API
    directly, but that resulted in complicated callback nesting
    and state management needed to throttle the reader if the
    sender socket were blocked in any way.
    
    There were many bugs in the early version of this code as
    a result of the complicated code.  Even after all the bugs
    got fixed, a small performance reduction due to the extra
    buffer copies was difficult to avoid.
    
    Thus I started using the synchronous version to keep the code
    simple and fast while preserving the ability to use persistent
    sockets to avoid excessive TIME-WAIT and handshaking for small
    file replication.
  9. replicate: enforce expected Content-Length in http_copy

    Eric Wong committed Sep 19, 2012
    There's no reason we should ever skip Content-Length validation
    if we know which FID we're replicating and have an FID object
    handy.
    
    Conflicts:
    	lib/MogileFS/Worker/Replicate.pm
  10. create_open: parallelize directory vivification

    Eric Wong committed Sep 20, 2012
    For setups stuck needing MKCOL, we can parallelize
    directory vivification for multi-destination uploads.
  11. device: reuse HTTP connections for MKCOL

    Eric Wong committed Sep 20, 2012
    This can reduce latency for folks still stuck with MKCOL.
    This creates no new sockets for replicate and monitor in
    all cases, as connections to the HTTP DAV server are already
    used in those workers.
    
    This only adds new persistent connections to the queryworker if
    GET-only HTTP ports are configured (queryworker already may call
    HTTPFile->size).
  12. delete worker uses persistent HTTP connections

    Eric Wong committed Sep 12, 2012
    This allows us to avoid running ourselves out of local ports
    when handling massive delete storms.
    
    Eventually, we can parallelize deletes in a manner similar
    to fsck size checking.
  13. httpfile: use HTTP connection pool for DELETE

    Eric Wong committed Sep 11, 2012
    This simplifies the delete subroutine and should reduce
    the number of sockets created during rebalance.
  14. httpfile: use Net::HTTP::NB, remove LWP::UserAgent

    Eric Wong committed Sep 11, 2012
    This allows us to use the same HTTP connections between
    digest and HTTP size checks, reducing the number of open
    connections we need in the Fsck worker.
  15. fsck: parallelize size checks for any given FID

    Eric Wong committed Sep 11, 2012
    This allows us to us to speed up fsck on high latency clusters
    by issuing parallel HEAD requests.
  16. httpfile: remove size check failure backoff handling

    Eric Wong committed Sep 11, 2012
    This backoff handling in HTTPFile is redundant for several reasons:
    
    * We rely on the monitor worker anyways to inform us of unreachable hosts
    
    * Monitor runs much faster nowadays, giving us a smaller window for
      out-of-date information about host reachability
    
    * HTTPFile->size no longer connects to the sidechannel port,
      only HTTP, so we waste fewer syscalls on failure if we a
      host went down before the last monitor run.
  17. JobMaster: use Danga::Socket to schedule

    Eric Wong committed Sep 11, 2012
    In the future, this will allow JobMaster to write concurrently to
    ProcManager (or even individual workers) without blocking.
    
    (tweaked to accomodate "!want 0 job_master" support)
  18. monitor: switch to non-blocking HTTP device checks

    Eric Wong committed Sep 4, 2012
    Net::HTTP::NB is usable with Danga::Socket and may be used to
    make HTTP requests in parallel.
    
    The new connection pool supports persistent connection pooling
    similar to LWP::ConnCache.  Total connection capacity is
    enforced to prevent out-of-FD situations on the workers.
    
    Unlike LWP::ConnCache, MogileFS::ConnectionPool is designed for
    use with concurrent, active connections.  It also supports
    queueing (when any enforced capacity or system limits are
    reached) and relies on Danga::Socket for scheduling queued
    connections.
    
    In addition to total capacity limits, MogileFS::ConnectionPool
    also supports limiting concurrency on a per-destination basis to
    avoid potentially overloading a single destination.
    
    Currently, we limit ourselves to 20 connections from a single
    worker (matching the old LWP limit) and also limit ourselves
    to 20 connections to a single host (again matching our previous
    LWP behavior).
  19. monitor: refactor/rewrite to use new async API

    Eric Wong committed Sep 4, 2012
    In order to migrate to the upcoming Danga::Socket-based
    HTTP API, we'll first refactor monitor to use the new API
    (but preserve LWP usage behind-the-scenes).
    
    DEBUG=1 users will see the elapsed time for all device refreshes
    each time monitor runs.
    
    While we're at it, also guard against race conditions on the
    PUT/GET test by double-checking on failure.  (A long-standing
    TODO item)
    
    also squashed the following commit:
    
      use conn_timeout in monitor, node_timeout in other workers
    
      This matches the behavior in MogileFS:Server 2.65.
    
      It makes sense to use a different, lower timeout in monitor to
      quickly detect overloaded nodes and avoid propagating their
      liveness for a monitoring period.
    
      It also makes sense to use a higher value for node_timeout in
      other workers since other actions are less fault-tolerant.
    
      For example, a timed-out size check in create_close may cause a
      client to eventually reupload the file, creating even more load
      on the cluster.
  20. move Danga::Socket->Reset to ProcManager

    Eric Wong committed Sep 4, 2012
    We will be using Danga::Socket in more (possibly all) workers,
    not just the Monitor and Reaper.  Resetting in workers that do
    not use Danga::Socket is harmless and will not allocate
    epoll/kqueue descriptors until the worker actually uses
    Danga::Socket.