You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: reduce.tex
+61-2Lines changed: 61 additions & 2 deletions
Original file line number
Diff line number
Diff line change
@@ -187,7 +187,7 @@ \subsection{Reducers}
187
187
\]
188
188
\end{definition}
189
189
190
-
In database terminology, a well-formed reducer defines an \emph{invertible distributive aggregate} (see Section~\ref{subsec:distributive-aggregates}): the fold can be computed over partitions independently (distributive), and individual values can be removed from the accumulated result (invertible).
190
+
In database terminology, a well-formed reducer defines an \emph{invertible distributive aggregate} (see Section~\ref{subsec:aggregate-classes}): the fold can be computed over partitions independently (distributive), and individual values can be removed from the accumulated result (invertible).
191
191
192
192
\begin{remark}[Remove-Add Commutativity]
193
193
For well-formed reducers where $\oplus$ and $\ominus$ arise from an abelian group action on $A$, the following property holds automatically:
@@ -198,7 +198,7 @@ \subsection{Reducers}
198
198
All practical reducers (sum, count, product over commutative groups) satisfy this.
The database literature~\cite{viewmaintenance} classifies aggregates as \emph{distributive}, \emph{algebraic}, or \emph{holistic}.
204
204
In our setting, a pair $(\iota, \oplus)$ defines a \emph{distributive aggregate} when folding over a union of multisets can be decomposed into folds over the parts.
In Skip, this corresponds to using a well-formed reducer with richer accumulator state (e.g., $(sum, count)$ pairs) followed by a pointwise mapper to extract the final value.
220
220
We illustrate this pattern in Section~\ref{sec:examples}.
221
221
222
+
Finally, \emph{holistic} aggregates~\cite{viewmaintenance} cannot be computed from bounded intermediate state---they potentially require access to the entire multiset.
223
+
Examples include:
224
+
\begin{itemize}
225
+
\item\textbf{MEDIAN}: requires knowing the full distribution to find the middle value(s)
226
+
\item\textbf{QUANTILES/PERCENTILES}: similar to median, require global ordering information
227
+
\item\textbf{RANK}: depends on the position of a value within the full sorted dataset
228
+
\end{itemize}
229
+
For holistic aggregates, any exact incremental solution must maintain auxiliary state that grows with the data (e.g., the entire multiset or an order-statistic tree) in order to answer updates and queries.
230
+
Skip can of course support such analyses by using richer data structures or approximations (e.g., quantile sketches), but these fall outside the constant-space, purely algebraic reducer model we formalize in this paper.
\item Alternatively, one can maintain richer state (e.g., a sorted multiset of all values), making the remove operation invertible on that richer state---but this is no longer a constant-space reducer
476
486
\end{itemize}
477
487
488
+
\subsection{Dead Code Elimination via Reachability}
489
+
490
+
We sketch a more complex example based on dead code elimination.
491
+
Assume source code is partitioned into files; each file is a key in a collection and contributes a partial directed graph of symbol references.
492
+
493
+
\paragraph{Inputs.}
494
+
For each file key $f$ we maintain:
495
+
\begin{itemize}
496
+
\item$\mathsf{edges}(f) \in\mathcal{M}(V \times V)$: multiset of directed edges $(u,v)$ from symbol $u$ to symbol $v$.
497
+
\item$\mathsf{nodes}(f) \in\mathcal{M}(V)$: multiset of declared symbols.
498
+
\item$\mathsf{roots}(f) \in\mathcal{M}(V)$: multiset of root symbols (entry points).
499
+
\end{itemize}
500
+
501
+
\paragraph{Normalize to a global graph.}
502
+
A mapper rewrites each $(f, x)$ to a single global key $g$, preserving $x$.
503
+
Reduce with the \emph{union} reducer (add = multiset insert, remove = multiset delete) to obtain:
This reducer is well-formed (multiset union with deletion), so updates from individual files flow as small deltas to the global graph state.
510
+
511
+
\paragraph{Reachability view (incremental with deletions).}
512
+
Define a derived view $\mathsf{reachable}_g \subseteq\mathsf{nodes}_g$ as the set of nodes reachable from $\mathsf{roots}_g$ along $\mathsf{edges}_g$.
513
+
Implement this as a \emph{lazy compute} over the global key $g$: a user-defined computation that, given the current collections $\mathsf{edges}_g$, $\mathsf{roots}_g$ (and optionally $\mathsf{nodes}_g$), produces the current reachable set and maintains internal state to update it incrementally as upstream deltas arrive.
514
+
In Skip, \texttt{LazyCompute.make} takes a function of the form $(\mathsf{lazyCollection}, k, context, params) \mapsto\text{array of values}$; here $k$ would be the global key $g$, and the function would implement the dynamic reachability algorithm described below.
515
+
The lazy compute participates in the reactive pipeline like \texttt{reduce}: when upstream inputs change, it receives deltas and produces the updated reachable set for key $g$.
516
+
Internally, it maintains a dynamic reachability data structure:
517
+
\begin{itemize}
518
+
\item Maintain a spanning forest of reachable nodes and, for each node, a count of incoming edges from currently reachable predecessors.
519
+
\item When an edge $(u,v)$ is added and $u$ is reachable, increment the incoming count for $v$; if $v$ becomes newly reachable, enqueue its outgoing edges for propagation.
520
+
\item When an edge $(u,v)$ is removed, decrement the incoming count for $v$; if the count drops to zero and $v \notin\mathsf{roots}_g$, mark $v$ unreachable and propagate the removal to its outgoing edges.
521
+
\item When a root is added, treat it as initially reachable and propagate; when a root is removed, drop its reachability and propagate as above.
522
+
\end{itemize}
523
+
Fully dynamic reachability algorithms support these updates in time proportional to the size of the affected subgraph: for a change set $\Delta E$ of edges and $\Delta R$ of roots, the cost is $O(|\Delta E| + |\Delta R| + |A|)$ where $A$ is the set of nodes whose reachability status changes (newly reached nodes plus nodes that lose their last reachable predecessor).
524
+
This avoids full recomputation: edits to a single file only touch the portion of the graph reachable from the edited nodes and their dependents.
Because $\mathsf{nodes}_g$ and $\mathsf{reachable}_g$ are themselves maintained reactively, $\mathsf{dead}_g$ updates automatically when any file changes.
532
+
533
+
\paragraph{Savings.}
534
+
Changes in a single file produce small deltas to $\mathsf{edges}_g$, $\mathsf{nodes}_g$, and $\mathsf{roots}_g$, rather than rebuilding the whole graph.
535
+
If reachability is maintained incrementally for additions, the update work scales with the affected portion of the graph; even with recompute-on-delete semantics, recomputation is localized to the global graph key $g$.
0 commit comments