-
Notifications
You must be signed in to change notification settings - Fork 78
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issues in using the source connector with HDFS #5
Comments
Hi there, I think there is a misunderstanding here. You have in your classpath the kafka-connect-hdfs connector NOT the kafka-connect-fs. Try it out again and let me know if you have any issue :-) |
Hi Anyway, now I picked up a fresh zip from git and did not update the pom at all. I built through "mvn clean package" and moved the target to $CONFLUENT_HOME. I exported the classpath as: When I echo $CLASSPATH, I do see all the jars present inside the target path this time. Now, I have given fs.uris as hdfs://abc.com:9000/test where I have two CSV files. I am using below properties file: name=FsSourceConnector4 The regex looks fine to me to filter only .csv files. But I get this: And then these last last few lines keep repeating. Where is the problem now? |
It gives below warning in the middle. Does this have any effect? [2017-05-21 04:48:43,489] WARN could not create Dir using jarFile from url file:/home/puser/tmp/confluent-3.1.1/target/kafka-connect-fs-0.2-SNAPSHOT-package/share/java/kafka-connect-fs/fastutil-6.5.7.jar. skipping. (org.reflections.Reflections:104) |
Tried with $CONFLUENT_HOME/etc/schema-registry/connect-avro-standalone.properties, but the nohup did not change at all. In the getting started page, you mention the path of kafka-connect-fs.properties as config/kafka-connect-fs.properties. I dont see this path of kafka-connect-fs.properties. I don't see any config directory. My kafka-connect-fs.properties is at target/kafka-connect-fs-0.2-SNAPSHOT-package/etc/kafka-connect-fs/kafka-connect-fs.properties. Where am I missing the link? |
My Kafka Version - 0.10.2.1 Downloaded confluent tar from https://www.confluent.io/download-center/ |
There are two different things here. HdfsFileWatcherPolicy just retrieves files when a document is updated or created. If you had already these files in HDFS you won't get any record. To do this, I recommend to use another policy such as SleepyPolicy or SimplePolicy. You can find more info about policies here. On the other hand, it looks like fastutil-6.5.7.jar is corrupted. Probably it doesn't matter for reflections API but if you wanna fix this, you should clone the repo and build the source using |
Tried SimplePolicy. No record produced on kafka topic. [2017-05-21 09:39:45,767] DEBUG About to send 0 records to Kafka (org.apache.kafka.connect.runtime.WorkerSourceTask:159) |
No difference while using SleepyPolicy with 1000ms |
Created a path /test/tmp on unix and not on HDFS. Changes file.uris to file:///test/tmp. There has to be a way to run this. Where is the setup broken? |
The com.github.mmolimar.kafka.connect.fs.FsSourceConnector class is pointing to these jars: [puser@abc confluent-3.1.1]$ find . -name "*.jar" -exec grep -Hsli com.github.mmolimar.kafka.connect.fs.FsSourceConnector {} ; |
private List toSend from org.apache.kafka.connect.runtime.WorkerSourceTask does not seem to get list of files to process(neither from local FS or from HDFS). It remains null and hence no file comes in the list to process. There must be a missing link which I am overlooking. Listing the contents of property files again which I am using to start confluent-connector. $CONFLUENT_HOME/etc/kafka/connect-standalone.properties $CONFLUENT_HOME/target/kafka-connect-fs-0.2-SNAPSHOT-package/etc/kafka-connect-fs/kafka-connect-fs.properties Command:- nohup $CONFLUENT_HOME/bin/connect-standalone $CONFLUENT_HOME/etc/kafka/connect-standalone.properties $CONFLUENT_HOME/target/kafka-connect-fs-0.2-SNAPSHOT-package/etc/kafka-connect-fs/kafka-connect-fs.properties & |
Maybe the regexp does not match the filenames. Try with this one: .* |
Exactly. That too clicked me and I removed the regex part itself. But many times. I have to get rid of this schema information,etc now and figure out why same message was produced so many times. |
Hi there, Thanks |
And then:
|
That is a very useful information shared. Thanks a lot. I see on the consumer console the value produced as - But this gave console o/p as: All I want is plain data produced on KAFKA. If I have 10,20,30 on file, it should give 10,20,30. |
You have to use DelimitedTextFileReader and the corresponding token in property |
I tried with all the combinations of key converter,value converter,key.schema.enable and with DelimitedTextFileReader. Using DelimitedTextFileReader gives poor o/p as compared to TextFileReader. Does not produce anything with file_reader.delimited.header=false. I sincerely request you to please give a try yourself. Just a simple CSV file on HDFS. Say: 101,John,Smith,Computer And produce this on kafka topic as it is: And share the: It will be very beneficial. :) |
You're getting the columns you have in your CSV file and the default column name is 'column_N'. So, I think it's fine. |
If I include column name in the CSV and make the header flag true, then instead of column_n, it will use the columns from csv. |
Because they're |
This is my use case: There will be CSV files getting dumped on a HDFS path. I have to produce these CSV files on a Kafka Topic.
I downloaded the project on my eclipse and built using maven. Inside the target directory, I got - "kafka-connect-hdfs-0.10.2.0-package" which had etc/kafka-connect-hdfs/kafka-connect-fs.properties. This is what I updated in this file:
name=FsSourceConnector
connector.class=com.github.mmolimar.kafka.connect.fs.FsSourceConnector
tasks.max=2
fs.uris=hdfs://abc.com:9000/test
topic=mytopic
policy.class=com.github.mmolimar.kafka.connect.fs.policy.HdfsFileWatcherPolicy
file_reader.class=com.github.mmolimar.kafka.connect.fs.file.reader.DelimitedTextFileReader
file_reader.delimited.token=","
file_reader.delimited.header=true
I downloaded confluent-3.2.1 and updated etc/kafka/connect-standalone.properties as below:
bootstrap.servers=1.2.3.4:9092,1.2.3.5:9092
key.converter=org.apache.kafka.connect.json.JsonConverter
value.converter=org.apache.kafka.connect.json.JsonConverter
key.converter.schemas.enable=false
value.converter.schemas.enable=false
internal.key.converter=org.apache.kafka.connect.json.JsonConverter
internal.value.converter=org.apache.kafka.connect.json.JsonConverter
internal.key.converter.schemas.enable=false
internal.value.converter.schemas.enable=false
offset.storage.file.filename=/tmp/connect.offsets
offset.flush.interval.ms=10000
I then, moved the target directory from my windows to $CONFLUENT_HOME path in the server. I exported classpath in the same way as mentioned by you in the Getting started section:
export CLASSPATH="$(find target/ -type f -name '*.jar'| grep '-package' | tr '\n' ':')"
Used below command to start the connector:
nohup $CONFLUENT_HOME/bin/connect-standalone $CONFLUENT_HOME/etc/kafka/connect-standalone.properties $CONFLUENT_HOME/target/kafka-connect-hdfs-0.10.2.0-package/etc/kafka-connect-hdfs/kafka-connect-fs.properties &
Now, I get error as:
[2017-05-20 11:26:50,155] ERROR Failed to create job for /home/tmp/confluent-3.2.1/target/kafka-connect-hdfs-0.10.2.0-package/etc/kafka-connect-hdfs/kafka-connect-fs.properties (org.apache.kafka.connect.cli.ConnectStandalone:88)
[2017-05-20 11:26:50,156] ERROR Stopping after connector error (org.apache.kafka.connect.cli.ConnectStandalone:99)
java.util.concurrent.ExecutionException: org.apache.kafka.connect.errors.ConnectException: Failed to find any class that implements Connector and which name matches com.github.mmolimar.kafka.connect.fs.FsSourceConnector, available connectors are: io.confluent.connect.replicator.ReplicatorSourceConnector, org.apache.kafka.connect.tools.VerifiableSourceConnector, io.confluent.connect.s3.S3SinkConnector, org.apache.kafka.connect.tools.MockSourceConnector, org.apache.kafka.connect.tools.VerifiableSinkConnector, io.confluent.connect.storage.tools.SchemaSourceConnector, io.confluent.connect.jdbc.JdbcSourceConnector, org.apache.kafka.connect.tools.SchemaSourceConnector, org.apache.kafka.connect.sink.SinkConnector, io.confluent.connect.elasticsearch.ElasticsearchSinkConnector, org.apache.kafka.connect.tools.MockConnector, org.apache.kafka.connect.tools.MockSinkConnector, org.apache.kafka.connect.file.FileStreamSourceConnector, org.apache.kafka.connect.source.SourceConnector, io.confluent.connect.hdfs.HdfsSinkConnector, io.confluent.connect.hdfs.tools.SchemaSourceConnector, io.confluent.connect.jdbc.JdbcSinkConnector, org.apache.kafka.connect.file.FileStreamSinkConnector
at org.apache.kafka.connect.util.ConvertingFutureCallback.result(ConvertingFutureCallback.java:80)
at org.apache.kafka.connect.util.ConvertingFutureCallback.get(ConvertingFutureCallback.java:67)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:96)
Caused by: org.apache.kafka.connect.errors.ConnectException: Failed to find any class that implements Connector and which name matches com.github.mmolimar.kafka.connect.fs.FsSourceConnector, available connectors are: io.confluent.connect.replicator.ReplicatorSourceConnector, org.apache.kafka.connect.tools.VerifiableSourceConnector, io.confluent.connect.s3.S3SinkConnector, org.apache.kafka.connect.tools.MockSourceConnector, org.apache.kafka.connect.tools.VerifiableSinkConnector, io.confluent.connect.storage.tools.SchemaSourceConnector, io.confluent.connect.jdbc.JdbcSourceConnector, org.apache.kafka.connect.tools.SchemaSourceConnector, org.apache.kafka.connect.sink.SinkConnector, io.confluent.connect.elasticsearch.ElasticsearchSinkConnector, org.apache.kafka.connect.tools.MockConnector, org.apache.kafka.connect.tools.MockSinkConnector, org.apache.kafka.connect.file.FileStreamSourceConnector, org.apache.kafka.connect.source.SourceConnector, io.confluent.connect.hdfs.HdfsSinkConnector, io.confluent.connect.hdfs.tools.SchemaSourceConnector, io.confluent.connect.jdbc.JdbcSinkConnector, org.apache.kafka.connect.file.FileStreamSinkConnector
at org.apache.kafka.connect.runtime.ConnectorFactory.getConnectorClass(ConnectorFactory.java:84)
at org.apache.kafka.connect.runtime.ConnectorFactory.newConnector(ConnectorFactory.java:38)
at org.apache.kafka.connect.runtime.AbstractHerder.getConnector(AbstractHerder.java:336)
at org.apache.kafka.connect.runtime.AbstractHerder.validateConnectorConfig(AbstractHerder.java:235)
at org.apache.kafka.connect.runtime.standalone.StandaloneHerder.putConnectorConfig(StandaloneHerder.java:158)
at org.apache.kafka.connect.cli.ConnectStandalone.main(ConnectStandalone.java:93)
[2017-05-20 11:26:50,159] INFO Kafka Connect stopping (org.apache.kafka.connect.runtime.Connect:66)
[2017-05-20 11:26:50,159] INFO Stopping REST server (org.apache.kafka.connect.runtime.rest.RestServer:154)
[2017-05-20 11:26:50,159] DEBUG stopping org.eclipse.jetty.server.Server@1aa7ecca (org.eclipse.jetty.util.component.AbstractLifeCycle:194)
[2017-05-20 11:26:50,162] DEBUG Graceful shutdown org.eclipse.jetty.server.Server@1aa7ecca by (org.eclipse.jetty.server.Server:418)
[2017-05-20 11:26:50,162] DEBUG stopping ServerConnector@42a48628{HTTP/1.1}{0.0.0.0:8083} (org.eclipse.jetty.util.component.AbstractLifeCycle:194)
[2017-05-20 11:26:50,162] DEBUG stopping org.eclipse.jetty.server.ServerConnector$ServerConnectorManager@3c19aaa5 (org.eclipse.jetty.util.component.AbstractLifeCycle:194)
[2017-05-20 11:26:50,163] DEBUG stopping org.eclipse.jetty.io.SelectorManager$ManagedSelector@4204541c keys=0 selected=0 (org.eclipse.jetty.util.component.AbstractLifeCycle:194)
[2017-05-20 11:26:50,163] DEBUG Stopping org.eclipse.jetty.io.SelectorManager$ManagedSelector@4204541c keys=0 selected=0 (org.eclipse.jetty.io.SelectorManager:432)
[2017-05-20 11:26:50,164] DEBUG Queued change org.eclipse.jetty.io.SelectorManager$ManagedSelector$Stop@2a20ba2c (org.eclipse.jetty.io.SelectorManager:480)
[2017-05-20 11:26:50,167] DEBUG Selector loop woken up from select, 0/0 selected (org.eclipse.jetty.io.SelectorManager:602)
[2017-05-20 11:26:50,168] DEBUG Running change org.eclipse.jetty.io.SelectorManager$ManagedSelector$Stop@2a20ba2c (org.eclipse.jetty.io.SelectorManager:525)
[2017-05-20 11:26:50,168] DEBUG Stopped org.eclipse.jetty.io.SelectorManager$ManagedSelector@4204541c keys=-1 selected=-1 (org.eclipse.jetty.io.SelectorManager:437)
[2017-05-20 11:26:50,168] DEBUG STOPPED org.eclipse.jetty.io.SelectorManager$ManagedSelector@4204541c keys=-1 selected=-1 (org.eclipse.jetty.util.component.AbstractLifeCycle:204)
What am I doing wrong here? Is my deployment not OK?
I also tried to put the share/java/kafka-connect-hdfs jars from target path to actual confluent's share/java/kafka-connect-hdfs, but that did not help.
May be you can throw some light here. How to run this connector end to end. I am in dire need to make this run and implement it.
The text was updated successfully, but these errors were encountered: