Skip to content

Latest commit

 

History

History
47 lines (30 loc) · 1.47 KB

spark-sql-streaming-BatchCommitLog.adoc

File metadata and controls

47 lines (30 loc) · 1.47 KB

BatchCommitLog — HDFSMetadataLog for Batch Completion Log

BatchCommitLog is a HDFSMetadataLog with metadata as regular text (i.e. String).

Note
HDFSMetadataLog is a MetadataLog that uses Hadoop HDFS for a reliable storage.

BatchCommitLog is created exclusively for batch commit log in StreamExecution.

serialize Method

serialize(metadata: String, out: OutputStream): Unit
Note
serialize is a part of HDFSMetadataLog Contract to write a metadata in serialized format.

serialize writes out the version prefixed with v on a single line (e.g. v1) followed by the empty JSON (i.e. {}).

Note
The version in Spark 2.2 is 1 with the charset being UTF-8.
Note
serialize always writes an empty JSON as the name of the files gives the meaning.
$ ls -tr [checkpoint-directory]/commits
0 1 2 3 4 5 6 7 8 9

$ cat [checkpoint-directory]/commits/8
v1
{}

deserialize Method

deserialize(in: InputStream): String
Caution
FIXME

Creating BatchCommitLog Instance

BatchCommitLog takes the following when created:

  • SparkSession

  • Path of the metadata log directory