Skip to content

Commit

Permalink
Updating documentation
Browse files Browse the repository at this point in the history
  • Loading branch information
mmolimar committed Sep 13, 2020
1 parent e27dda2 commit 4e0bae0
Show file tree
Hide file tree
Showing 4 changed files with 220 additions and 209 deletions.
276 changes: 138 additions & 138 deletions docs/source/config_options.rst
Original file line number Diff line number Diff line change
Expand Up @@ -244,20 +244,6 @@ File readers
Some file readers have custom properties to define and others don't. So, depending on the configuration you'll have
to take into account their properties.

.. _config_options-filereaders-avro:

Avro
--------------------------------------------

In order to configure custom properties for this reader, the name you must use is ``avro``.

``file_reader.avro.schema``
Avro schema in JSON format to use when reading a file.
If not specified, the reader will use the schema defined in the file.

* Type: string
* Importance: medium

.. _config_options-filereaders-parquet:

Parquet
Expand All @@ -277,6 +263,20 @@ In order to configure custom properties for this reader, the name you must use i
* Type: string
* Importance: medium

.. _config_options-filereaders-avro:

Avro
--------------------------------------------

In order to configure custom properties for this reader, the name you must use is ``avro``.

``file_reader.avro.schema``
Avro schema in JSON format to use when reading a file.
If not specified, the reader will use the schema defined in the file.

* Type: string
* Importance: medium

.. _config_options-filereaders-orc:

ORC
Expand Down Expand Up @@ -552,130 +552,6 @@ In order to configure custom properties for this reader, the name you must use i
* Default: ````
* Importance: low

.. _config_options-filereaders-json:

JSON
--------------------------------------------

To configure custom properties for this reader, the name you must use is ``json``.

``file_reader.json.record_per_line``
If enabled, the reader will read each line as a record. Otherwise, the reader will read the full
content of the file as a record.

* Type: boolean
* Default: ``true``
* Importance: medium

``file_reader.json.deserialization.<deserialization_feature>``
Deserialization feature to use when reading a JSON file. You can add as much as you like
based on the ones defined `here. <https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/DeserializationFeature.html#enum.constant.summary>`__

* Type: boolean
* Importance: medium

``file_reader.json.encoding``
Encoding to use for reading a file. If not specified, the reader will use the default encoding.

* Type: string
* Default: based on the locale and charset of the underlying operating system.
* Importance: medium

``file_reader.json.compression.type``
Compression type to use when reading a file.

* Type: enum (available values ``bzip2``, ``gzip`` and ``none``)
* Default: ``none``
* Importance: medium

``file_reader.json.compression.concatenated``
Flag to specify if the decompression of the reader will finish at the end of the file or after
the first compressed stream.

* Type: boolean
* Default: ``true``
* Importance: low

.. _config_options-filereaders-xml:

XML
--------------------------------------------

To configure custom properties for this reader, the name you must use is ``xml``.

``file_reader.xml.record_per_line``
If enabled, the reader will read each line as a record. Otherwise, the reader will read the full
content of the file as a record.

* Type: boolean
* Default: ``true``
* Importance: medium

``file_reader.xml.deserialization.<deserialization_feature>``
Deserialization feature to use when reading a XML file. You can add as much as you like
based on the ones defined `here. <https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/DeserializationFeature.html#enum.constant.summary>`__

* Type: boolean
* Importance: medium

``file_reader.xml.encoding``
Encoding to use for reading a file. If not specified, the reader will use the default encoding.

* Type: string
* Default: based on the locale and charset of the underlying operating system.
* Importance: medium

``file_reader.xml.compression.type``
Compression type to use when reading a file.

* Type: enum (available values ``bzip2``, ``gzip`` and ``none``)
* Default: ``none``
* Importance: medium

``file_reader.xml.compression.concatenated``
Flag to specify if the decompression of the reader will finish at the end of the file or after
the first compressed stream.

* Type: boolean
* Default: ``true``
* Importance: low

.. _config_options-filereaders-yaml:

YAML
--------------------------------------------

To configure custom properties for this reader, the name you must use is ``yaml``.

``file_reader.yaml.deserialization.<deserialization_feature>``
Deserialization feature to use when reading a YAML file. You can add as much as you like
based on the ones defined `here. <https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/DeserializationFeature.html#enum.constant.summary>`__

* Type: boolean
* Importance: medium

``file_reader.yaml.encoding``
Encoding to use for reading a file. If not specified, the reader will use the default encoding.

* Type: string
* Default: based on the locale and charset of the underlying operating system.
* Importance: medium

``file_reader.yaml.compression.type``
Compression type to use when reading a file.

* Type: enum (available values ``bzip2``, ``gzip`` and ``none``)
* Default: ``none``
* Importance: medium

``file_reader.yaml.compression.concatenated``
Flag to specify if the decompression of the reader will finish at the end of the file or after
the first compressed stream.

* Type: boolean
* Default: ``true``
* Importance: low

.. _config_options-filereaders-csv:

CSV
Expand Down Expand Up @@ -1171,6 +1047,130 @@ To configure custom properties for this reader, the name you must use is ``delim
* Default: ``true``
* Importance: low

.. _config_options-filereaders-json:

JSON
--------------------------------------------

To configure custom properties for this reader, the name you must use is ``json``.

``file_reader.json.record_per_line``
If enabled, the reader will read each line as a record. Otherwise, the reader will read the full
content of the file as a record.

* Type: boolean
* Default: ``true``
* Importance: medium

``file_reader.json.deserialization.<deserialization_feature>``
Deserialization feature to use when reading a JSON file. You can add as much as you like
based on the ones defined `here. <https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/DeserializationFeature.html#enum.constant.summary>`__

* Type: boolean
* Importance: medium

``file_reader.json.encoding``
Encoding to use for reading a file. If not specified, the reader will use the default encoding.

* Type: string
* Default: based on the locale and charset of the underlying operating system.
* Importance: medium

``file_reader.json.compression.type``
Compression type to use when reading a file.

* Type: enum (available values ``bzip2``, ``gzip`` and ``none``)
* Default: ``none``
* Importance: medium

``file_reader.json.compression.concatenated``
Flag to specify if the decompression of the reader will finish at the end of the file or after
the first compressed stream.

* Type: boolean
* Default: ``true``
* Importance: low

.. _config_options-filereaders-xml:

XML
--------------------------------------------

To configure custom properties for this reader, the name you must use is ``xml``.

``file_reader.xml.record_per_line``
If enabled, the reader will read each line as a record. Otherwise, the reader will read the full
content of the file as a record.

* Type: boolean
* Default: ``true``
* Importance: medium

``file_reader.xml.deserialization.<deserialization_feature>``
Deserialization feature to use when reading a XML file. You can add as much as you like
based on the ones defined `here. <https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/DeserializationFeature.html#enum.constant.summary>`__

* Type: boolean
* Importance: medium

``file_reader.xml.encoding``
Encoding to use for reading a file. If not specified, the reader will use the default encoding.

* Type: string
* Default: based on the locale and charset of the underlying operating system.
* Importance: medium

``file_reader.xml.compression.type``
Compression type to use when reading a file.

* Type: enum (available values ``bzip2``, ``gzip`` and ``none``)
* Default: ``none``
* Importance: medium

``file_reader.xml.compression.concatenated``
Flag to specify if the decompression of the reader will finish at the end of the file or after
the first compressed stream.

* Type: boolean
* Default: ``true``
* Importance: low

.. _config_options-filereaders-yaml:

YAML
--------------------------------------------

To configure custom properties for this reader, the name you must use is ``yaml``.

``file_reader.yaml.deserialization.<deserialization_feature>``
Deserialization feature to use when reading a YAML file. You can add as much as you like
based on the ones defined `here. <https://fasterxml.github.io/jackson-databind/javadoc/2.10/com/fasterxml/jackson/databind/DeserializationFeature.html#enum.constant.summary>`__

* Type: boolean
* Importance: medium

``file_reader.yaml.encoding``
Encoding to use for reading a file. If not specified, the reader will use the default encoding.

* Type: string
* Default: based on the locale and charset of the underlying operating system.
* Importance: medium

``file_reader.yaml.compression.type``
Compression type to use when reading a file.

* Type: enum (available values ``bzip2``, ``gzip`` and ``none``)
* Default: ``none``
* Importance: medium

``file_reader.yaml.compression.concatenated``
Flag to specify if the decompression of the reader will finish at the end of the file or after
the first compressed stream.

* Type: boolean
* Default: ``true``
* Importance: low

.. _config_options-filereaders-text:

Text
Expand Down
26 changes: 16 additions & 10 deletions docs/source/connector.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,9 @@
Connector
********************************************

Kafka Connect FileSystem Connector is a source connector for reading records from different sort of file
formats and from different file system types and load them into Kafka.

The connector takes advantage of the abstraction provided from `Hadoop Common <http://hadoop.apache.org/>`__
using the implementation of the ``org.apache.hadoop.fs.FileSystem`` class. So, it's possible to use a
wide variety of FS or if your FS is not included in the Hadoop Common API you can implement an extension
Expand All @@ -20,14 +23,17 @@ Among others, these are some file systems it supports:
* Local File System.
* Hadoop Archive File System.

On the other hand, the following file types are supported: Parquet, Avro, ORC, SequenceFile, Cobol / EBCDIC,
CSV, TSV, Fixed-width, JSON, XML, YAML and Text.

Getting started
============================================

Prerequisites
--------------------------------------------

- Apache Kafka 2.6.0
- Java 8
- Apache Kafka 2.6.0.
- Java 8.
- Confluent Schema Registry (recommended).

Building from source
Expand Down Expand Up @@ -118,11 +124,11 @@ Policies

In order to ingest data from the FS(s), the connector needs a **policy** to define the rules to do it.

Basically, the policy tries to connect to each FS included in ``fs.uris`` connector property, lists files
Basically, the policy tries to connect to each FS included in the ``fs.uris`` connector property, lists files
(and filter them using the regular expression provided in the ``policy.regexp`` property) and enables
a file reader to read records from them.
a file reader to read records.

The policy to be used by the connector is defined in ``policy.class`` connector property.
The policy to be used by the connector is defined in the ``policy.class`` connector property.

.. important:: When delivering records from the connector to Kafka, they contain their own file offset
so, if in the next eventual policy execution this file is processed again,
Expand All @@ -144,11 +150,11 @@ File readers

They read files and process each record from the FS. The **file reader** is needed by the policy to enable
the connector to process each record and includes in the implementation how to seek and iterate over the
records in the file.
records within the file.

The file reader to be used when processing files is defined in the ``file_reader.class`` connector property.

In the same way as the policies, the connector provides several sort of readers to parse and read records
In the same way as policies, the connector provides several sort of readers to parse and read records
for different file formats. If you don't have a file reader that fits your needs, just implement one
with the unique restriction that it must implement the interface
``com.github.mmolimar.kafka.connect.fs.file.reader.FileReader``.
Expand All @@ -157,15 +163,15 @@ The are several file readers included which can read the following file formats:

* Parquet.
* Avro.
* Cobol/EBCDIC.
* ORC.
* SequenceFile.
* Cobol / EBCDIC.
* CSV.
* TSV.
* Fixed-width.
* JSON.
* XML.
* YAML.
* ORC.
* SequenceFile.
* Text.

.. include:: filereaders.rst

0 comments on commit 4e0bae0

Please sign in to comment.