Skip to content

Commit

Permalink
[MINOR] HDFSBackedStateStore, StateStore and StateStoreRDD
Browse files Browse the repository at this point in the history
  • Loading branch information
jaceklaskowski committed Aug 25, 2017
1 parent d12f767 commit b6a4a78
Show file tree
Hide file tree
Showing 9 changed files with 28 additions and 17 deletions.
22 changes: 13 additions & 9 deletions SUMMARY.adoc
Expand Up @@ -29,8 +29,8 @@
.. link:spark-sql-streaming-Dataset-groupBy.adoc[groupBy Operator -- Streaming Aggregation (with Implicit State Logic)]

. link:spark-sql-streaming-KeyValueGroupedDataset.adoc[KeyValueGroupedDataset -- Streaming Aggregation]
.. link:spark-sql-streaming-KeyValueGroupedDataset-mapGroupsWithState.adoc[mapGroupsWithState Operator -- Stateful Stream Aggregation (with Explicit State Logic)]
.. link:spark-sql-streaming-KeyValueGroupedDataset-flatMapGroupsWithState.adoc[flatMapGroupsWithState Operator -- Arbitrary Stateful Stream Aggregation (with Explicit State Logic)]
.. link:spark-sql-streaming-KeyValueGroupedDataset-mapGroupsWithState.adoc[mapGroupsWithState Operator -- Stateful Streaming Aggregation (with Explicit State Logic)]
.. link:spark-sql-streaming-KeyValueGroupedDataset-flatMapGroupsWithState.adoc[flatMapGroupsWithState Operator -- Arbitrary Stateful Streaming Aggregation (with Explicit State Logic)]

. link:spark-sql-streaming-DataStreamWriter.adoc[DataStreamWriter -- Writing Datasets To Streaming Data Sinks]
.. link:spark-sql-streaming-ForeachWriter.adoc[ForeachWriter]
Expand Down Expand Up @@ -58,6 +58,10 @@
.. link:spark-sql-streaming-QueryProgressEvent.adoc[QueryProgressEvent]
.. link:spark-sql-streaming-QueryTerminatedEvent.adoc[QueryTerminatedEvent]

. link:spark-sql-streaming-webui.adoc[Web UI]

. link:spark-sql-streaming-logging.adoc[Logging]

== Extending Structured Streaming

. link:spark-sql-streaming-StreamSourceProvider.adoc[StreamSourceProvider -- Streaming Data Source Provider]
Expand Down Expand Up @@ -119,25 +123,25 @@

. link:spark-sql-streaming-OffsetSeqMetadata.adoc[OffsetSeqMetadata]

=== State in Stateful Stream Processing
=== State in Stateful Streaming Aggregations

. link:spark-sql-streaming-GroupState.adoc[GroupState -- Contract for State Per Group For Stateful Aggregation]
. link:spark-sql-streaming-GroupState.adoc[GroupState -- State Per Group in Stateful Streaming Aggregation]
.. link:spark-sql-streaming-GroupStateImpl.adoc[GroupStateImpl]
.. link:spark-sql-streaming-GroupStateTimeout.adoc[GroupStateTimeout]

. link:spark-sql-streaming-StateStore.adoc[StateStore]
.. link:spark-sql-streaming-StateStoreOps.adoc[StateStoreOps -- Implicits Methods for Creating StateStoreRDD]
.. link:spark-sql-streaming-StateStoreRDD.adoc[StateStoreRDD]
.. link:spark-sql-streaming-StateStoreProvider.adoc[StateStoreProvider]
.. link:spark-sql-streaming-StateStoreUpdater.adoc[StateStoreUpdater]
.. link:spark-sql-streaming-StateStoreWriter.adoc[StateStoreWriter -- Recording Metrics For Writing to StateStore]
.. link:spark-sql-streaming-StateStoreCoordinator.adoc[StateStoreCoordinator -- Tracking Locations of StateStores for StateStoreRDD]
... link:spark-sql-streaming-StateStoreCoordinatorRef.adoc[StateStoreCoordinatorRef Interface for Communication with StateStoreCoordinator]
.. link:spark-sql-streaming-StateStoreProvider.adoc[StateStoreProvider]

. link:spark-sql-streaming-HDFSBackedStateStore.adoc[HDFSBackedStateStore]
.. link:spark-sql-streaming-HDFSBackedStateStoreProvider.adoc[HDFSBackedStateStoreProvider]

. link:spark-sql-streaming-StateStoreRDD.adoc[StateStoreRDD -- RDD for Updating State (in StateStore)]
. link:spark-sql-streaming-StateStoreCoordinator.adoc[StateStoreCoordinator -- Tracking Locations of StateStores for StateStoreRDD]
.. link:spark-sql-streaming-StateStoreCoordinatorRef.adoc[StateStoreCoordinatorRef Interface for Communication with StateStoreCoordinator]

== Varia

. link:spark-sql-streaming-StreamProgress.adoc[StreamProgress Custom Scala Map]
. link:spark-sql-streaming-logging.adoc[Logging]
2 changes: 1 addition & 1 deletion spark-sql-streaming-GroupState.adoc
@@ -1,4 +1,4 @@
== [[GroupState]] GroupState -- Contract for State Per Group For Stateful Aggregation
== [[GroupState]] GroupState -- State Per Group in Stateful Streaming Aggregation

`GroupState` is the <<contract, contract>> for working with a state (of type `S`) per group for arbitrary stateful aggregation (using link:spark-sql-streaming-KeyValueGroupedDataset.adoc#mapGroupsWithState[mapGroupsWithState] or link:spark-sql-streaming-KeyValueGroupedDataset.adoc#flatMapGroupsWithState[flatMapGroupsWithState] operators).

Expand Down
2 changes: 1 addition & 1 deletion spark-sql-streaming-HDFSBackedStateStore.adoc
@@ -1,6 +1,6 @@
== [[HDFSBackedStateStore]] HDFSBackedStateStore

`HDFSBackedStateStore` is a link:spark-sql-streaming-StateStore.adoc[StateStore] using HDFS-compatible file system for versioned state persistence.
`HDFSBackedStateStore` is a link:spark-sql-streaming-StateStore.adoc[StateStore] that uses a HDFS-compatible file system for versioned state persistence.

`HDFSBackedStateStore` is created exclusively when `HDFSBackedStateStoreProvider` <<getStore, is requested for a state store>> (which is when...FIXME).

Expand Down
@@ -1,4 +1,4 @@
== [[flatMapGroupsWithState]] flatMapGroupsWithState Operator -- Arbitrary Stateful Stream Aggregation (with Explicit State Logic)
== [[flatMapGroupsWithState]] flatMapGroupsWithState Operator -- Arbitrary Stateful Streaming Aggregation (with Explicit State Logic)

[source, scala]
----
Expand Down
@@ -1,4 +1,4 @@
== [[mapGroupsWithState]] mapGroupsWithState Operator -- Stateful Stream Aggregation (with Explicit State Logic)
== [[mapGroupsWithState]] mapGroupsWithState Operator -- Stateful Streaming Aggregation (with Explicit State Logic)

[source, scala]
----
Expand Down
2 changes: 2 additions & 0 deletions spark-sql-streaming-StateStore.adoc
Expand Up @@ -37,6 +37,8 @@ trait StateStore {
| [[get]] `get`
|

Used exclusively when `StateStoreRDD` link:spark-sql-streaming-StateStoreRDD.adoc#compute[is executed].

| [[getRange]] `getRange`
|

Expand Down
2 changes: 1 addition & 1 deletion spark-sql-streaming-StateStoreProvider.adoc
Expand Up @@ -14,4 +14,4 @@ CAUTION: FIXME

CAUTION: FIXME

NOTE: `getStore` is used when `StateStore` is link:spark-sql-streaming-StateStore.adoc#get[requested for a state store] (given a `StateStoreProviderId`).
NOTE: `getStore` is used exclusively when `StateStore` is link:spark-sql-streaming-StateStore.adoc#get[requested for a state store] (given a `StateStoreProviderId`).
8 changes: 5 additions & 3 deletions spark-sql-streaming-StateStoreRDD.adoc
@@ -1,8 +1,10 @@
== [[StateStoreRDD]] StateStoreRDD
== [[StateStoreRDD]] StateStoreRDD -- RDD for Updating State (in StateStore)

`StateStoreRDD` is a specialized `RDD` for <<compute, computations>> using link:spark-sql-streaming-StateStore.adoc[StateStore].
`StateStoreRDD` is an `RDD` for <<compute, executing storeUpdateFunction>> with link:spark-sql-streaming-StateStore.adoc[StateStore] (and data from partitions of a <<dataRDD, child RDD>>).

`StateStoreRDD` is <<creating-instance, created>> as the result of link:spark-sql-streaming-StateStoreOps.adoc#mapPartitionsWithStateStore[executing physical operators] (i.e. `FlatMapGroupsWithStateExec`, `StateStoreRestoreExec`, `StateStoreSaveExec`, `StreamingDeduplicateExec`).
`StateStoreRDD` is <<creating-instance, created>> when link:spark-sql-streaming-StateStoreOps.adoc#mapPartitionsWithStateStore[executing physical operators] that work with state (i.e. `FlatMapGroupsWithStateExec`, `StateStoreRestoreExec`, `StateStoreSaveExec`, `StreamingDeduplicateExec`).

CAUTION: FIXME graffle with operator -> logical operator -> planner -> physical operator -> StateStoreRDD

`StateStoreRDD` uses `StateStoreCoordinator` for <<getPreferredLocations, preferred locations>> for job scheduling.

Expand Down
3 changes: 3 additions & 0 deletions spark-sql-streaming-webui.adoc
@@ -0,0 +1,3 @@
== Web UI

Web UI...FIXME

0 comments on commit b6a4a78

Please sign in to comment.