Skip to content
/ fluff Public
forked from dimagi/fluff

Library for doing the map part of map-reduce in python and then syncing the results to couchdb

Notifications You must be signed in to change notification settings

esoergel/fluff

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Fluff

Build Status Test coverage

Fluff is a library that works with pillowtop and couchdbkit and lets you define a set of computations to do on all docs of a doc_type, i.e. perform a map over them, and then use couchdb to aggregate over the output.

The advantages of this over a normal map reduce are that you get to write your map in python with full access to the database.

This document describes the intended capabilities; what's currently here is very preliminary.

Example:

import fluff

class VisitCalculator(fluff.Calculator):

    @fluff.date_emitter
    def all_visits(self, case):
        for action in case.actions:
            yield action.date

    @fluff.date_emitter
    def bp_visits(self, case):
        for action in case.actions:
            if is_bp(case):
                yield action.date

    @fluff.custom_date_emitter('max')
    def temp_max(self, case):
        for action in case.actions:
            if is_temp(case):
                yield [action.date, action.temperature]


class MyIndicators(fluff.IndicatorDocument):
    document_class = CommCareCase
    group_by = [
        # this is the standard style of group_by
        'domain',
        # this is the more complicated style of group_by - redundant here, but useful for more complex things
        fluff.AttributeGetter('owner_id', getter_function=lambda item: item['owner_id']),
    ]
    domains = ('droberts', 'test', 'corpora')

    visits_week = VisitCalculator(window=timedelta(days=7))
    visits_month = VisitCalculator(window=timedelta(days=30))


# add this pillow to settings.PILLOWTOPS
MyIndicatorsPillow = MyIndicators.pillow()

By creating a simple setup of this sort, you'll get a bunch of stuff for free:

  • Whenever a doc changes, the corresponding indicator document will be updated

  • You can get aggregated results back straight from couch with a simple MyIndicators.get_result('visits_week', key=[domain, owner_id]), which will be correct for the current date/time with no real-time computation, and will return the data in the following format:

    {
        "all_visits": 26,
        "bp_visits": 15,
        "bp_max": 140
    }

Emitting custom values

Yield a list where the second value in the list is the value you want to be emitted.

This is useful if you want to do more than just count events. Options for aggregation are:

  • count
  • sum
  • max
  • min
  • sumsqr

In the example above the temp_max emitter emits action.temperature for each action. It also specifies that the final value should be the max of all the emitted values in the date range.

About

Library for doing the map part of map-reduce in python and then syncing the results to couchdb

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 95.0%
  • JavaScript 5.0%