StreamingQueryManager
is the management interface for continuous queries (per SparkSession
).
StreamingQueryManager
manages all continuous structured queries per SparkSession
that is available using SparkSession.streams
operator.
val spark: SparkSession = ...
val queries = spark.streams
StreamingQueryManager
is created when…FIXME
Name | Description |
---|---|
Registry of Used in active, get, startQuery and notifyQueryTermination. |
createQuery(
userSpecifiedName: Option[String],
userSpecifiedCheckpointLocation: Option[String],
df: DataFrame,
sink: Sink,
outputMode: OutputMode,
useTempCheckpointLocation: Boolean,
recoverFromCheckpointLocation: Boolean,
trigger: Trigger,
triggerClock: Clock): StreamingQueryWrapper
Caution
|
FIXME |
Note
|
|
Note
|
userSpecifiedName corresponds to queryName option (that can be defined using DataStreamWriter 's queryName method) while userSpecifiedCheckpointLocation is checkpointLocation option.
|
Note
|
createQuery is used exclusively when StreamingQueryManager starts executing a streaming query.
|
StreamingQueryManager
manages the following instances:
-
StateStoreCoordinatorRef (as
stateStoreCoordinator
) -
StreamingQueryListenerBus (as
listenerBus
) -
activeQueries
which is a mutable mapping between query names andStreamingQuery
objects.
startQuery(
userSpecifiedName: Option[String],
userSpecifiedCheckpointLocation: Option[String],
df: DataFrame,
sink: Sink,
outputMode: OutputMode,
useTempCheckpointLocation: Boolean = false,
recoverFromCheckpointLocation: Boolean = true,
trigger: Trigger = ProcessingTime(0),
triggerClock: Clock = new SystemClock()): StreamingQuery
startQuery
starts a streaming query.
Note
|
trigger defaults to 0 milliseconds (as ProcessingTime(0)).
|
Internally, startQuery
first creates a streaming query, registers it in activeQueries internal registry and starts the query.
In the end, startQuery
returns the query (as part of the fluent API so you can chain operators) or reports the exception that was reported when starting the query.
startQuery
reports a IllegalArgumentException
when there is another query registered under name
. startQuery
looks it up in activeQueries internal registry.
Cannot start query with name [name] as a query with that name is already active
startQuery
reports a IllegalStateException
when a query is started again from checkpoint. startQuery
looks it up in activeQueries internal registry.
Cannot start query with id [id] as another query with same id is already active.
Perhaps you are attempting to restart a query from checkpoint that is already active.
Note
|
startQuery is used exclusively when DataStreamWriter is started.
|
active: Array[StreamingQuery]
active
method returns a collection of StreamingQuery instances for the current SQLContext
.
get(name: String): StreamingQuery
get
method returns a StreamingQuery by name
.
It may throw an IllegalArgumentException
when no StreamingQuery exists for the name
.
java.lang.IllegalArgumentException: There is no active query with name hello
at org.apache.spark.sql.StreamingQueryManager$$anonfun$get$1.apply(StreamingQueryManager.scala:59)
at org.apache.spark.sql.StreamingQueryManager$$anonfun$get$1.apply(StreamingQueryManager.scala:59)
at scala.collection.MapLike$class.getOrElse(MapLike.scala:128)
at scala.collection.AbstractMap.getOrElse(Map.scala:59)
at org.apache.spark.sql.StreamingQueryManager.get(StreamingQueryManager.scala:58)
... 49 elided
-
addListener(listener: StreamingQueryListener): Unit
addslistener
to the internallistenerBus
. -
removeListener(listener: StreamingQueryListener): Unit
removeslistener
from the internallistenerBus
.
postListenerEvent(event: StreamingQueryListener.Event): Unit
postListenerEvent
posts a StreamingQueryListener.Event
to listenerBus
.
Caution
|
FIXME |
StreamingQueryListener
is an interface for listening to query life cycle events, i.e. a query start, progress and termination events.
Caution
|
FIXME Why is lastTerminatedQuery needed?
|
Used in:
-
awaitAnyTermination
-
awaitAnyTermination(timeoutMs: Long)
They all wait 10
millis before doing the check of lastTerminatedQuery
being non-null.
It is set in:
-
resetTerminated()
resetslastTerminatedQuery
, i.e. sets it tonull
. -
notifyQueryTermination(terminatedQuery: StreamingQuery)
setslastTerminatedQuery
to beterminatedQuery
and notifies all the threads that wait onawaitTerminationLock
.It is called from StreamExecution.runBatches.
StreamingQueryManager
takes the following when created:
StreamingQueryManager
initializes the internal registries and counters.