Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
KafkaSourceRDD et al
  • Loading branch information
jaceklaskowski committed Mar 6, 2017
1 parent f9b09c4 commit c90f860
Show file tree
Hide file tree
Showing 4 changed files with 49 additions and 3 deletions.
2 changes: 2 additions & 0 deletions SUMMARY.adoc
Expand Up @@ -9,7 +9,9 @@
.. link:spark-sql-streaming-FileStreamSource.adoc[FileStreamSource]

.. link:spark-sql-streaming-KafkaSource.adoc[KafkaSource]
... link:spark-sql-streaming-KafkaRelation.adoc[KafkaRelation]
... link:spark-sql-streaming-KafkaSourceRDD.adoc[KafkaSourceRDD]
... link:spark-sql-streaming-CachedKafkaConsumer.adoc[CachedKafkaConsumer]

.. link:spark-sql-streaming-MemoryStream.adoc[MemoryStream]
.. link:spark-sql-streaming-TextSocketSource.adoc[TextSocketSource]
Expand Down
11 changes: 11 additions & 0 deletions spark-sql-streaming-CachedKafkaConsumer.adoc
@@ -0,0 +1,11 @@
== [[CachedKafkaConsumer]] CachedKafkaConsumer

CAUTION: FIXME

=== [[poll]] `poll` Internal Method

CAUTION: FIXME

=== [[fetchData]] `fetchData` Internal Method

CAUTION: FIXME
7 changes: 7 additions & 0 deletions spark-sql-streaming-KafkaRelation.adoc
@@ -0,0 +1,7 @@
== [[KafkaRelation]] KafkaRelation

CAUTION: FIXME

=== [[buildScan]] `buildScan` Method

CAUTION: FIXME
32 changes: 29 additions & 3 deletions spark-sql-streaming-KafkaSourceRDD.adoc
@@ -1,13 +1,39 @@
== [[KafkaSourceRDD]] KafkaSourceRDD

`KafkaSourceRDD` is an `RDD` of Kafka's https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/ConsumerRecords.html[ConsumerRecords] (with keys and values being collections of bytes, i.e. `Array[Byte]`).

`KafkaSourceRDD` is created when:

* `KafkaRelation` link:spark-sql-streaming-KafkaRelation.adoc#buildScan[buildScan]
* `KafkaSource` link:spark-sql-streaming-KafkaSource.adoc#getBatch[getBatch]
=== [[getPreferredLocations]] `getPreferredLocations` Method

CAUTION: FIXME

=== [[compute]] `compute` Method

CAUTION: FIXME

=== [[getPartitions]] `getPartitions` Method

CAUTION: FIXME

=== [[persist]] `persist` Method

CAUTION: FIXME

=== [[creating-instance]] Creating KafkaSourceRDD Instance

`KafkaSourceRDD` takes the following when created:

* KafkaSourceRDD
* [[sc]] `SparkContext`
* [[executorKafkaParams]] Collection of key-value settings for executors reading records from Kafka topics
* [[offsetRanges]] Collection of `KafkaSourceRDDOffsetRange` offsets
* [[pollTimeoutMs]] Timeout (in milliseconds) to poll data from Kafka
+
Used when `KafkaSourceRDD` <<compute, is requested for records>> (for given offsets) and in turn link:spark-sql-streaming-CachedKafkaConsumer.adoc#poll[requests `CachedKafkaConsumer` to poll for Kafka's `ConsumerRecords`].
* [[failOnDataLoss]] Flag to...FIXME
* [[reuseKafkaConsumer]] Flag to...FIXME

`KafkaSourceRDD` initializes the <<internal-registries, internal registries and counters>>.

NOTE: `KafkaSourceRDD` is created when

0 comments on commit c90f860

Please sign in to comment.