Skip to content
Commits on Aug 9, 2013
  1. @ferd

    Fixing deps and adding recon

    Mochiweb branches were broken for public and test rebar config. This
    comes from the migration from the mochi account to the internal heroku
    account. New branches were created but the account name wasn't switched.
    
    Recon is a library to help with devops tasks in production.
    ferd committed Aug 9, 2013
Commits on Aug 8, 2013
  1. @ferd

    configuring lager watermarks

    ferd committed Aug 8, 2013
  2. @ferd
  3. @ferd

    Moving to a sys.config configuration

    Rather than configuring specific apps in many places (bin/logplex,
    bin/devel_logplex, logplex_app.erl), configurations are moved to a
    sys.config file that can be loaded by adding `-config sys` to the `erl`
    exectuable, or loaded automatically when generating an OTP release.
    ferd committed Aug 7, 2013
  4. @ferd

    Moving logging to lager

    The current logplex version shows a point of contention for logs through
    using io:format/2. Although it is unlikely lager will help a lot with
    it given we don't log directly to disk (and this is where it shines in
    comparison to other logging engines), it's worth trying to see if things
    are improving with it.
    
    Custom log formats are used to make sure the production log format
    remains 100% identical to the former one. They will, however, be
    different during test runs because no specific care has been taken to
    make the lager config be compatible in test cases.
    ferd committed Aug 7, 2013
  5. @ferd

    Adding hibernation timeouts on drains

    An inactive drain or buffer (Receives no request from the outside world)
    should be sent to hibernation in order to trigger a full-sweep GC,
    compact the memory of the process, and reduce the overall load of the
    system, and possibly reducing memory fragmentation of the VM at the cost
    of slightly more CPU when it triggers.
    
    The timeout is implemented using the gen_fsm timeout option, which
    automatically resets timeout timers when a message is processed by the
    process. This should allow to generally catch any kind of inactivity and
    force hibernation of the processes.
    
    Note: it is not yet known if the timeout value of 5 seconds or the
    amount of timers setup/cancellations will have an impact of any
    significance on an active system or not. The values may need to be
    tweaked or the effort redirected towards manual GC if refc binaries keep
    on hogging the memory after this.
    ferd committed Aug 7, 2013
Commits on Aug 7, 2013
  1. @ferd

    Making counters explicit for buffer size

    The logplex_msg_buffer module is used extensively by drain processes
    that buffer request and need to be the least blocking possible under
    heavy load. The current implementation would recalculate the entire
    queue length on every call, which became both time consuming and CPU
    intensive when the buffer was full, which happens when you have to count
    lengths even more often.
    
    This patch makes it so that we have an explicit counter for the buffer
    so that we don't need to recalculate it all the time, lowering the
    contention for runtime for a given process.
    
    The module includes conversion clauses for all functions part of the API
    so that the code can be hot-reloaded without stopping, and just adapt to
    the new format.
    ferd committed Aug 7, 2013
  2. @ferd

    Force hibernation on drain reconnect

    When there's a timer being set for a reconnection, we force hibernation
    in order to do a fullsweep GC of the drain processes.
    
    This might incur a certain cost for very busy-but-disconnected
    processes, forcing a short pause, but the backoff timers for
    reconnections will act as rate limiters on this.
    ferd committed Aug 7, 2013
Commits on Aug 6, 2013
  1. @ferd

    Support more transition versions

    R16B01 and R16B01-swfi are forks of v69 that were supported for
    a while and need to be able to upgrade too.
    ferd committed Aug 6, 2013
  2. @ferd
  3. @ferd

    Move web responses before IO logging calls

    With IO being blocking for individual processes due to Erlang's IO
    protocol and logplex using io:format/2 to log information, it is
    possible that a node that does a lot of logging has bad tail latencies
    on its API as reported by issues #49 and #51 on github.
    
    This quickfix, pending a rewrite of the logging system to be
    non-blocking and load-shedding, moves the logging outside of the
    critical path for part of the requests as a whole. Some requests, such
    as token creation for channels (POST /v2/channels/(\\d+)/tokens) still
    contain logs in said critical path and will only see minor improvements.
    ferd committed Aug 6, 2013
  4. @ferd

    Update invalid log message for rfc5424

    The HTTP API used to not accept the STRUCTURED-DATA field of logplex
    messages, but this is no longer true as of May 2013. The documentation
    (last edited before then) didn't reflect the change.
    ferd committed Aug 6, 2013
Commits on Jul 19, 2013
  1. @ferd

    Merge branch 'v69' into v69-R16B01

    ferd committed Jul 19, 2013
  2. @ferd

    Merge pull request #48 from heroku/refc-leak-quickfix

    Quickfix for logplex refc binary leak
    ferd committed Jul 19, 2013
  3. @ferd

    Quickfix for logplex refc binary leak

    This fix is temporary. It garbage collects the node once it
    reaches too high of a memory threshold in an attempt to protect
    against failure due to OOMs following refc binaries leak.
    ferd committed Jul 18, 2013
Commits on Jul 16, 2013
  1. @ferd

    Bump to R16B01

    ferd committed Jul 15, 2013
Commits on Jul 9, 2013
  1. @omarkj

    Update the live_upgrade script.

    omarkj committed Jul 9, 2013
Commits on Jun 28, 2013
  1. @ferd

    Getting chunked encoding working

    ferd committed Jun 28, 2013
Commits on Jun 27, 2013
  1. @omarkj

    Merge pull request #47 from heroku/canary_fetch

    Merge Canary fetch to v69
    omarkj committed Jun 27, 2013
  2. @omarkj

    Added some tests for `canary_fetch`. Since logplex

    doesn't use standard http to deliver the logs the
    body isn't tested in this test. That's something 
    I'd need to fix by writing a custom http client.
    omarkj committed Jun 27, 2013
  3. @ferd

    Moving from crypto:md5/1 -> crypto:hash/2

    crypto:md5(Data) is getting deprecated in favour of
    crypto:hash(md5,Data) in R16B02, and R16B01 will be generating warnings
    for it.
    
    This is preemptively future-proofing the code so we can keep using
    warning-as-errors settings.
    ferd committed Jun 27, 2013
Commits on Jun 26, 2013
  1. @omarkj
  2. @omarkj
  3. @omarkj
  4. @omarkj

    Added a upgrade script for v69 (shard

    replacements)
    omarkj committed Jun 26, 2013
  5. @ferd

    Merge branch 'v68'

    ferd committed Jun 26, 2013
  6. @omarkj

    Merge pull request #46 from heroku/shard_update

    Changes applied to the branch, so I'm merging it in to the master.
    omarkj committed Jun 25, 2013
Commits on Jun 25, 2013
  1. @omarkj
  2. @omarkj
  3. @omarkj

    Check for the `LOGPLEX_NODE_NAME` variable and

    connect to it rather then the local node if it is
    set. This is mostly used for testing.
    omarkj committed Jun 25, 2013
  4. @omarkj

    Removed the -s flag and added a -k flag to read

    from the file. Also did a bit of refactoring to
    be able to reuse more code.
    omarkj committed Jun 25, 2013
  5. @ferd

    Adding redgrid management escript

    The escript allows to do a few operations on redgrid:
    
    - get the redgrid process status
    - get a list of connected nodes according to redgrid
    - suspend redgrid (and unregister from it)
    - resume redgrid (and connect to it)
    
    The script connects as a hidden node.
    ferd committed Jun 25, 2013
Commits on Jun 21, 2013
  1. add canary fetch logs endpoint

    Tristan Sloughter committed Jun 21, 2013
  2. @omarkj

    Created a escript that updates shards in a running

    logplex node.
    omarkj committed Jun 20, 2013
  3. @omarkj

    Added a shorthand to updating the single node, it

    still uses the old functions but if the plan is to
    stop using them this should be refactored to not
    use rpc-style gen_server calls.
    omarkj committed Jun 20, 2013
Something went wrong with that request. Please try again.