Skip to content

Commit

Permalink
update(site): add entries into FAQ
Browse files Browse the repository at this point in the history
  • Loading branch information
fhussonnois committed May 26, 2020
1 parent 06e4a23 commit 9b5c265
Show file tree
Hide file tree
Showing 3 changed files with 46 additions and 6 deletions.
6 changes: 6 additions & 0 deletions site/content/en/docs/Developer Guide/installation.md
Expand Up @@ -34,6 +34,12 @@ curl -sSL https://github.com/streamthoughts/kafka-connect-file-pulse/releases/do

Extract it into one of the directories that is listed on the `plugin.path` worker configuration property.

You can also use the Confluent Hub CLI for installing it.

```bash
$ confluent-hub install --no-prompt streamthoughts-kafka-connect-file-pulse-$VERSION.zip
```

{{% alert title="Important" color="info" %}}
When you run Connect workers in **distributed mode**, the connector-plugin must be installed **on each of machines** running Kafka Connect.
{{% /alert %}}
Expand Down
2 changes: 1 addition & 1 deletion site/content/en/docs/Developer Guide/scanning-files.md
Expand Up @@ -71,6 +71,6 @@ file.filter.regex.pattern="\\.log$"

The connector supports the following content types :

* **GIZ** : `application/x-gzip`
* **GZIP** : `application/x-gzip`
* **TAR** : `application/x-tar`
* **ZIP** : `application/x-zip-compressed` or `application/zip`
44 changes: 39 additions & 5 deletions site/content/en/docs/FAQ/_index.md
@@ -1,24 +1,58 @@
---
date: 2020-05-20
date: 2020-05-25
title: "FAQ"
linkTitle: "FAQ"
weight: 97
description: >
The most frequently asked questions ?
---

## Could we deployed kafka-connect-file-pulse in distributed mode ?
## Could we deployed FilePulse connector in distributed mode ?

Connect File Pulse must be running locally to the machine hosting files to be ingested. It is recommend to deploy your connector in distributed mode. Multiple Kafka Connect workers can be deployed on the same machine and participating in the same cluster. The configured input directory will be scanned by the JVM running the SourceConnector. Then, all detected files will be scheduled amongs the tasks spread on your local cluster.

## Is kafka-connect-file-pulse fault-tolerant ?
## Is FilePulse connector fault-tolerant ?

Connect File Pulse guarantees no data loss by leveraging Kafka Connect fault-tolerance capabilities.
Each task keeps a trace of the file offset of the last record written into Kafka. In case of a restart, tasks will continue where they stopped before crash.
Note, that some duplicates maybe written into Kafka.

## Is kafka-connect-file-pulse could be used in place of other solutions like Logstash ?
## Is FilePulse connector could be used in place of other solutions like Logstash ?

Connect File Pulse has some features which are similar to the ones provided by Logstash [codecs](https://www.elastic.co/guide/en/logstash/current/codec-plugins.html)/[filters](https://www.elastic.co/guide/en/logstash/current/filter-plugins.html). Filters like GrokFilter are actually strongly inspired from Logstash. For example you can use it to parse non-structured data like application logs.

However, Connect File Pulse has not be originally designed to collect dynamic application log files.
However, Connect File Pulse has not be originally designed to collect dynamic application log files.

## Is FilePulse connector support SASL/SSL authentication mechanisms and can be deployed on Confluent Cloud ?

Yes, FilePulse connector can be deployed on any Kafka Cluster. However, the connector currently requires the use of an internal topic
to synchronize the Connector instance and the Tasks that process files. For doing this, the connector
will create both a producer and consumer clients that you must configured when running the connector with a secured Kafka Cluster.

To override the default configuration for the internal consumer and producer clients,
you can used one of the following override prefixes :

* `internal.kafka.reporter.consumer.<consumer_property>`
* `internal.kafka.reporter.producer.<producer_property>`

## What are the differences between FilePulse connector and others Kafka connectors for streaming files ?

The following table shows a simple comparison between Connect File Pulse and other solutions : [Connect Spooldir](https://github.com/jcustenborder/kafka-connect-spooldir) and Connect [FileStreams](https://github.com/apache/kafka/tree/trunk/connect/file)

| | Connect FilePulse | Connect Spooldir | Connect FileStreams |
|:--- | :---: | :---: | :---: |
| **Connector Type** | source | source | source / sink |
| **License** |Apache License 2.0 |Apache License 2.0| Apache License 2.0 |
| **Available on Confluent Hub** | YES | YES | YES |
| **Docker image** | YES | NO | NO |
| **Delivery semantics** | At-least-once | At-least-once | At-most-once |
| **Usable in production** | YES | YES | NO |
| **Supported file formats(out-of-the box)** | Delimited, Binary, JSON, Avro, XML (limited) | Delimited, JSON | Text file | YES | NO |
| **Support recursive directory scan** | YES | NO | NO |
| **Support Archive and Compressed files** | YES (`GZIP`, `TAR`, `ZIP`) | NO | NO |
| **Source partitions** | Configurable (filename, path, filename+hash) | filename | filename |
| **Support for multi-tasks** | YES | YES | NO |
| **Support for worker distributed mode** | YES (requires a shared volume) | NO | NO |
| **Support for streaming log files** | YES | NO | YES |
| **Support for transformation** | Single Message Transforms <br /> Processing Filters (Grok, Append, JSON, etc) | Single Message Transforms| * Single Message Transforms
| **Support for tracking processing progress of files** | YES (using an internal topic) | NO | NO |

0 comments on commit 9b5c265

Please sign in to comment.