Skip to content

Commit a89264d

Browse files
committed
aggregates and more examples
1 parent 6cc4532 commit a89264d

File tree

2 files changed

+61
-2
lines changed

2 files changed

+61
-2
lines changed

reduce.pdf

11 KB
Binary file not shown.

reduce.tex

Lines changed: 61 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -187,7 +187,7 @@ \subsection{Reducers}
187187
\]
188188
\end{definition}
189189

190-
In database terminology, a well-formed reducer defines an \emph{invertible distributive aggregate} (see Section~\ref{subsec:distributive-aggregates}): the fold can be computed over partitions independently (distributive), and individual values can be removed from the accumulated result (invertible).
190+
In database terminology, a well-formed reducer defines an \emph{invertible distributive aggregate} (see Section~\ref{subsec:aggregate-classes}): the fold can be computed over partitions independently (distributive), and individual values can be removed from the accumulated result (invertible).
191191

192192
\begin{remark}[Remove-Add Commutativity]
193193
For well-formed reducers where $\oplus$ and $\ominus$ arise from an abelian group action on $A$, the following property holds automatically:
@@ -198,7 +198,7 @@ \subsection{Reducers}
198198
All practical reducers (sum, count, product over commutative groups) satisfy this.
199199
\end{remark}
200200

201-
\subsection{Distributive Aggregates}\label{subsec:distributive-aggregates}
201+
\subsection{Aggregate Classes}\label{subsec:aggregate-classes}
202202

203203
The database literature~\cite{viewmaintenance} classifies aggregates as \emph{distributive}, \emph{algebraic}, or \emph{holistic}.
204204
In our setting, a pair $(\iota, \oplus)$ defines a \emph{distributive aggregate} when folding over a union of multisets can be decomposed into folds over the parts.
@@ -219,6 +219,16 @@ \subsection{Distributive Aggregates}\label{subsec:distributive-aggregates}
219219
In Skip, this corresponds to using a well-formed reducer with richer accumulator state (e.g., $(sum, count)$ pairs) followed by a pointwise mapper to extract the final value.
220220
We illustrate this pattern in Section~\ref{sec:examples}.
221221

222+
Finally, \emph{holistic} aggregates~\cite{viewmaintenance} cannot be computed from bounded intermediate state---they potentially require access to the entire multiset.
223+
Examples include:
224+
\begin{itemize}
225+
\item \textbf{MEDIAN}: requires knowing the full distribution to find the middle value(s)
226+
\item \textbf{QUANTILES/PERCENTILES}: similar to median, require global ordering information
227+
\item \textbf{RANK}: depends on the position of a value within the full sorted dataset
228+
\end{itemize}
229+
For holistic aggregates, any exact incremental solution must maintain auxiliary state that grows with the data (e.g., the entire multiset or an order-statistic tree) in order to answer updates and queries.
230+
Skip can of course support such analyses by using richer data structures or approximations (e.g., quantile sketches), but these fall outside the constant-space, purely algebraic reducer model we formalize in this paper.
231+
222232
\subsection{Deltas}
223233

224234
We model updates to collections as deltas.
@@ -475,6 +485,55 @@ \subsection{Min Reducer (Partial)}
475485
\item Alternatively, one can maintain richer state (e.g., a sorted multiset of all values), making the remove operation invertible on that richer state---but this is no longer a constant-space reducer
476486
\end{itemize}
477487

488+
\subsection{Dead Code Elimination via Reachability}
489+
490+
We sketch a more complex example based on dead code elimination.
491+
Assume source code is partitioned into files; each file is a key in a collection and contributes a partial directed graph of symbol references.
492+
493+
\paragraph{Inputs.}
494+
For each file key $f$ we maintain:
495+
\begin{itemize}
496+
\item $\mathsf{edges}(f) \in \mathcal{M}(V \times V)$: multiset of directed edges $(u,v)$ from symbol $u$ to symbol $v$.
497+
\item $\mathsf{nodes}(f) \in \mathcal{M}(V)$: multiset of declared symbols.
498+
\item $\mathsf{roots}(f) \in \mathcal{M}(V)$: multiset of root symbols (entry points).
499+
\end{itemize}
500+
501+
\paragraph{Normalize to a global graph.}
502+
A mapper rewrites each $(f, x)$ to a single global key $g$, preserving $x$.
503+
Reduce with the \emph{union} reducer (add = multiset insert, remove = multiset delete) to obtain:
504+
\[
505+
\mathsf{edges}_g = \biguplus_f \mathsf{edges}(f),\quad
506+
\mathsf{nodes}_g = \biguplus_f \mathsf{nodes}(f),\quad
507+
\mathsf{roots}_g = \biguplus_f \mathsf{roots}(f).
508+
\]
509+
This reducer is well-formed (multiset union with deletion), so updates from individual files flow as small deltas to the global graph state.
510+
511+
\paragraph{Reachability view (incremental with deletions).}
512+
Define a derived view $\mathsf{reachable}_g \subseteq \mathsf{nodes}_g$ as the set of nodes reachable from $\mathsf{roots}_g$ along $\mathsf{edges}_g$.
513+
Implement this as a \emph{lazy compute} over the global key $g$: a user-defined computation that, given the current collections $\mathsf{edges}_g$, $\mathsf{roots}_g$ (and optionally $\mathsf{nodes}_g$), produces the current reachable set and maintains internal state to update it incrementally as upstream deltas arrive.
514+
In Skip, \texttt{LazyCompute.make} takes a function of the form $(\mathsf{lazyCollection}, k, context, params) \mapsto \text{array of values}$; here $k$ would be the global key $g$, and the function would implement the dynamic reachability algorithm described below.
515+
The lazy compute participates in the reactive pipeline like \texttt{reduce}: when upstream inputs change, it receives deltas and produces the updated reachable set for key $g$.
516+
Internally, it maintains a dynamic reachability data structure:
517+
\begin{itemize}
518+
\item Maintain a spanning forest of reachable nodes and, for each node, a count of incoming edges from currently reachable predecessors.
519+
\item When an edge $(u,v)$ is added and $u$ is reachable, increment the incoming count for $v$; if $v$ becomes newly reachable, enqueue its outgoing edges for propagation.
520+
\item When an edge $(u,v)$ is removed, decrement the incoming count for $v$; if the count drops to zero and $v \notin \mathsf{roots}_g$, mark $v$ unreachable and propagate the removal to its outgoing edges.
521+
\item When a root is added, treat it as initially reachable and propagate; when a root is removed, drop its reachability and propagate as above.
522+
\end{itemize}
523+
Fully dynamic reachability algorithms support these updates in time proportional to the size of the affected subgraph: for a change set $\Delta E$ of edges and $\Delta R$ of roots, the cost is $O(|\Delta E| + |\Delta R| + |A|)$ where $A$ is the set of nodes whose reachability status changes (newly reached nodes plus nodes that lose their last reachable predecessor).
524+
This avoids full recomputation: edits to a single file only touch the portion of the graph reachable from the edited nodes and their dependents.
525+
526+
\paragraph{Unreachable (dead) nodes.}
527+
Define the dead-code view as a set difference:
528+
\[
529+
\mathsf{dead}_g = \mathsf{nodes}_g \setminus \mathsf{reachable}_g.
530+
\]
531+
Because $\mathsf{nodes}_g$ and $\mathsf{reachable}_g$ are themselves maintained reactively, $\mathsf{dead}_g$ updates automatically when any file changes.
532+
533+
\paragraph{Savings.}
534+
Changes in a single file produce small deltas to $\mathsf{edges}_g$, $\mathsf{nodes}_g$, and $\mathsf{roots}_g$, rather than rebuilding the whole graph.
535+
If reachability is maintained incrementally for additions, the update work scales with the affected portion of the graph; even with recompute-on-delete semantics, recomputation is localized to the global graph key $g$.
536+
478537
\section{Complexity}
479538

480539
\begin{theorem}[Time Complexity]

0 commit comments

Comments
 (0)