Switch branches/tags
Commits on Oct 2, 2013
  1. Logplex Shard commits do not update URL cache

    ferd committed Oct 2, 2013
    The cache that can be obtained with logplex_shard:urls() is not updated
    when committing a replacement shard successfully. This can lead to
    confusing subsequent calls if nodes have been removed in ETS tables but
    the operator still uses urls/0 to create the new shard list.
    This patch makes sure that the url cache is update using the
    ?CURRENT_WRITE_MAP table, which should be similar to the read pool given
    the current implementation.
Commits on Sep 23, 2013
  1. Upgrade path for v69.5->v69.6

    ferd committed Sep 23, 2013
Commits on Sep 19, 2013
  1. Merge pull request #58 from heroku/update-l2met-format

    ferd committed Sep 19, 2013
    stop using obsolete l2met format
  2. Merge pull request #55 from heroku/m_equal_logplex_stats

    ferd committed Sep 19, 2013
    Add m= in front of logplex_stats logging so it's easier to group in splunk
Commits on Sep 18, 2013
Commits on Sep 5, 2013
  1. Changing memory allocation settings for binaries

    ferd committed Sep 5, 2013
    Changing the allocator strategy for binary_alloc to do aobf rather than
    bf (+MBas aobf). Depending on how all of the binaries are allocated,
    this could make new allocations favor the same carrier. This will only
    add a small cpu overhead when allocating new binaries. This could
    however worsen our utilization even more if we are unlucky, it will
    however make allocations faster and more in the same area in the
    expected case, which should reduce fragmentation and untraceable leaks.
    We're also decreasing the size of our mbcs. Right now we have smbcs set
    to 256 kb and lmbcs at 5 MB (rounded up to 8MB as ERTS only allocates
    multiples of 2) and an average multi-block carrier size of 7.78 MB. We
    try to set +MBlmbcs 512 so that we get many more carriers and
    thus increase the chance that it can be returned to the OS.
    These two options have been recommended by members of the Erlang/OTP
    team to reduce the passive memory leaks due to allocation patterns
    compared to our peculiar use cases for log messages.
Commits on Sep 3, 2013
  1. Basic batchio inclusion, replaces lager macros

    ferd committed Sep 3, 2013
    batches io:format calls into a buffer process to reduce the number
    of calls required and also do overload protection automatically.
    Experimental material to see if it helps with performance, given the
    buffering and load shedding allow to go asynchronous on log messages
    without loosing too much data.
  2. update readme with redgrid redis url env var

    Tristan Sloughter
    Tristan Sloughter committed Sep 3, 2013
Commits on Aug 29, 2013
  1. must read in redgrid redis url from keys file, it will not be in the …

    Tristan Sloughter
    Tristan Sloughter committed Aug 29, 2013
    …os env
  2. add upgrade script for using new redgrid redis slot

    Tristan Sloughter
    Tristan Sloughter committed Aug 29, 2013
  3. use logplex_redgrid_redis config

    Tristan Sloughter
    Tristan Sloughter committed Aug 19, 2013
Commits on Aug 28, 2013
Commits on Aug 26, 2013
  1. v69.2 to v69.3 actually good upgrade path

    ferd committed Aug 26, 2013
    Was missing a path addition for backoff to be loaded, causing a node
Commits on Aug 23, 2013
  1. v69.2 to v69.2 upgrade path

    ferd committed Aug 23, 2013
  2. Upgrade to redgrid 1.0.3

    ferd committed Aug 23, 2013
    Redgrid 1.0.3 adds tolerance for DNS disconnections and configuration
  3. Raising send timeout to 5s on HTTP(s) drains

    ferd committed Aug 23, 2013
    When network or drains show bad behaviour temporarily, low timeouts we
    currently have (1 second) ends up killing connections and raising the
    retry count of frames. When massive losses are seen, it makes it
    difficult to put the blame on logplex's speed at sending logs, or the
    drains consumption (or network).
    By raising the timeout a bit, we should reduce the reconnection rate and
    at the same time make it harder to blame logplex (as an individual node)
    for the problems.
    This should not have a super significant impact on the drop rate,
    however, but possibly a noticeable one.
  4. More reliable tests for log transmission

    ferd committed Aug 23, 2013
    Log messages can land in non-sequential order due to be the receive/send
    routine. this fix makes it so we reorder all the messages received to
    verify them first.
Commits on Aug 16, 2013
  1. log-token inspection function

    ferd committed Aug 16, 2013
    This escript allows to fetch app ids (for now) based on a given log
    token, assuming access to the local host.
Commits on Aug 9, 2013
  1. Fixing deps and adding recon

    ferd committed Aug 9, 2013
    Mochiweb branches were broken for public and test rebar config. This
    comes from the migration from the mochi account to the internal heroku
    account. New branches were created but the account name wasn't switched.
    Recon is a library to help with devops tasks in production.
Commits on Aug 8, 2013
  1. configuring lager watermarks

    ferd committed Aug 8, 2013
  2. Moving to a sys.config configuration

    ferd committed Aug 7, 2013
    Rather than configuring specific apps in many places (bin/logplex,
    bin/devel_logplex, logplex_app.erl), configurations are moved to a
    sys.config file that can be loaded by adding `-config sys` to the `erl`
    exectuable, or loaded automatically when generating an OTP release.
  3. Moving logging to lager

    ferd committed Aug 7, 2013
    The current logplex version shows a point of contention for logs through
    using io:format/2. Although it is unlikely lager will help a lot with
    it given we don't log directly to disk (and this is where it shines in
    comparison to other logging engines), it's worth trying to see if things
    are improving with it.
    Custom log formats are used to make sure the production log format
    remains 100% identical to the former one. They will, however, be
    different during test runs because no specific care has been taken to
    make the lager config be compatible in test cases.
  4. Adding hibernation timeouts on drains

    ferd committed Aug 7, 2013
    An inactive drain or buffer (Receives no request from the outside world)
    should be sent to hibernation in order to trigger a full-sweep GC,
    compact the memory of the process, and reduce the overall load of the
    system, and possibly reducing memory fragmentation of the VM at the cost
    of slightly more CPU when it triggers.
    The timeout is implemented using the gen_fsm timeout option, which
    automatically resets timeout timers when a message is processed by the
    process. This should allow to generally catch any kind of inactivity and
    force hibernation of the processes.
    Note: it is not yet known if the timeout value of 5 seconds or the
    amount of timers setup/cancellations will have an impact of any
    significance on an active system or not. The values may need to be
    tweaked or the effort redirected towards manual GC if refc binaries keep
    on hogging the memory after this.
Commits on Aug 7, 2013
  1. Making counters explicit for buffer size

    ferd committed Aug 7, 2013
    The logplex_msg_buffer module is used extensively by drain processes
    that buffer request and need to be the least blocking possible under
    heavy load. The current implementation would recalculate the entire
    queue length on every call, which became both time consuming and CPU
    intensive when the buffer was full, which happens when you have to count
    lengths even more often.
    This patch makes it so that we have an explicit counter for the buffer
    so that we don't need to recalculate it all the time, lowering the
    contention for runtime for a given process.
    The module includes conversion clauses for all functions part of the API
    so that the code can be hot-reloaded without stopping, and just adapt to
    the new format.
  2. Force hibernation on drain reconnect

    ferd committed Aug 7, 2013
    When there's a timer being set for a reconnection, we force hibernation
    in order to do a fullsweep GC of the drain processes.
    This might incur a certain cost for very busy-but-disconnected
    processes, forcing a short pause, but the backoff timers for
    reconnections will act as rate limiters on this.
Commits on Aug 6, 2013
  1. Support more transition versions

    ferd committed Aug 6, 2013
    R16B01 and R16B01-swfi are forks of v69 that were supported for
    a while and need to be able to upgrade too.
  2. Move web responses before IO logging calls

    ferd committed Aug 6, 2013
    With IO being blocking for individual processes due to Erlang's IO
    protocol and logplex using io:format/2 to log information, it is
    possible that a node that does a lot of logging has bad tail latencies
    on its API as reported by issues #49 and #51 on github.
    This quickfix, pending a rewrite of the logging system to be
    non-blocking and load-shedding, moves the logging outside of the
    critical path for part of the requests as a whole. Some requests, such
    as token creation for channels (POST /v2/channels/(\\d+)/tokens) still
    contain logs in said critical path and will only see minor improvements.
  3. Update invalid log message for rfc5424

    ferd committed Aug 6, 2013
    The HTTP API used to not accept the STRUCTURED-DATA field of logplex
    messages, but this is no longer true as of May 2013. The documentation
    (last edited before then) didn't reflect the change.
Commits on Jul 19, 2013
  1. Merge pull request #48 from heroku/refc-leak-quickfix

    ferd committed Jul 19, 2013
    Quickfix for logplex refc binary leak
  2. Quickfix for logplex refc binary leak

    ferd committed Jul 18, 2013
    This fix is temporary. It garbage collects the node once it
    reaches too high of a memory threshold in an attempt to protect
    against failure due to OOMs following refc binaries leak.