# Functional Programming for Data Analysis

### Jim Pivarski

Second notebook: functional playground

C++ and Python are not functional languages.

Functional programming is a nebulously defined style, so there isn't a strict definition, but generally it involves working with expressions and not statements.

   * **Expression:** tree-like structure of nested function calls. Has a return value and can be used as an argument to a function. Examples: a FORTRAN formula, a diagrammed sentence, all of Lisp.
   * **Statement:** a command that either changes the computer's state or does nothing. Examples: Python's `for` and `if`, `move-robotic-arm`, all of assembly language.

This notebook will add methods to Python lists to make them easier to use for functional programming.

The goal will be to analyze data without ever writing a `for` loop or `if` statement.

In [None]:
%matplotlib inline
import helpers.functional

In [None]:
[1, 2, 3, 4, 5].map(lambda x: x**2)

To make it more real, let's work with real data (from the CMS public dataset).

In [None]:
from helpers.functional import events

events.take(1)

Before trying to solve problems, we have to understand our toolset. Here are some of the methods that we've added to list:

In [None]:
# not functional— a plain old function— but useful to peel off a few events to play with
events.take(2)

In [None]:
# also not functional— but using a suffix rather than "len" makes it easier to read chains
events.take(12).size

In [None]:
# aha! a real functional! but does it matter what order I put the "map" and the "take"?
events.map(lambda ev: ev.muons).take(5)

Filter is a very important functional in high energy physics.

In [None]:
events.take(100).filter(lambda ev: ev.muons.size >= 2)

Flatten turns pesky lists-of-lists into simple lists.

In [None]:
events.map(lambda ev: ev.muons).take(10)

"Flatmap" does "map" and "flatten" at the same time. It's more than a convenience— it has foundational importance (see [monadic bind](https://en.wikipedia.org/wiki/Monad_%28functional_programming%29)). For our purposes, we can think of it as a way of turning event ntuples into particle ntuples.

In [None]:
events.flatmap(lambda ev: ev.muons).take(10)   # now a muon ntuple

"Reduce" is fundamentally different: it turns ntuples into aggregations (counts, sums, means, histograms...). All the other functionals we have seen so far turn ntuples into ntuples.

In [None]:
events.map(lambda ev: ev.numPrimaryVertices).take(1000) \
      .reduce(lambda x, y: x + y) / 1000.0

In [None]:
def weightAndPrimaryVertices(ev):
    return (1.0, ev.numPrimaryVertices)

def averageOnTheFly(x, y):
    wx, x = x
    wy, y = y
    return (wx + wy), (wx*x + wy*y)/(wx + wy)