Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems running a producer / consumer setup using cppkafka #103

Closed
mcraveiro opened this issue Nov 16, 2020 · 17 comments
Closed

Problems running a producer / consumer setup using cppkafka #103

mcraveiro opened this issue Nov 16, 2020 · 17 comments
Assignees

Comments

@mcraveiro
Copy link

mcraveiro commented Nov 16, 2020

Hi Red Panda developers,

Thanks very much for your amazing contribution to FOSS. Following on from the HN announcement [1], I am trying to play around with Red Panda. First, I apologise in advance as I am not an experienced Kafka user so I am certain I am doing something wrong, but not quite sure what. I am using Debian Testing and I've installed your DEB packages as well as the distro-supplied librdkafka [2] (version 1.51 on Debian). I then checked out and compiled cppkafka [3]. The Red Panda logs appear OK to the untrained eye:

Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,362 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:614 - Recovered, log offsets: {start_offset:-9223372036854775808, committed_offset:-9223372036854775808, committed_offset_term:-9223372036854775808, dirty_offset:-9223372036854775808, dirty_offset_term:-9223372036854775808, last_term_start_offset:-9223372036854775808}, term:0
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,373 [shard 0] storage - segment.cc:594 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.log
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,393 [shard 0] cluster - state_machine.cc:20 - Starting state machine
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,393 [shard 0] cluster - members_manager.cc:46 - starting cluster::members_manager...
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,405 [shard 0] raft - [group_id:0, {redpanda/controller/0}] vote_stm.cc:225 - became the leader term:1
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,405 [shard 0] storage - segment.cc:594 - Creating new segment /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,406 [shard 0] redpanda::main - application.cc:468 - Started RPC server listening at {host: 0.0.0.0, port: 33145}
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,406 [shard 0] redpanda::main - application.cc:506 - Started Kafka API server listening at {host: 0.0.0.0, port: 9092}
Nov 16 18:12:56 lovelace rpk[497754]: INFO  2020-11-16 18:12:56,406 [shard 0] redpanda::main - application.cc:508 - Successfully started Redpanda!
Nov 16 18:12:56 lovelace systemd[1]: Started Redpanda, the fastest queue in the West..
Nov 16 18:37:03 lovelace rpk[497754]: INFO  2020-11-16 18:37:03,765 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] consensus.cc:586 - Starting
Nov 16 18:37:03 lovelace rpk[497754]: INFO  2020-11-16 18:37:03,765 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] consensus.cc:614 - Recovered, log offsets: {start_offset:-9223372036854775808, committed_offset:-9223372036854775808, committed_offset_term:-9223372036854775808, dirty_offset:-9223372036854775808, dirty_offset_term:-9223372036854775808, last_term_start_offset:-9223372036854775808}, term:0
Nov 16 18:37:03 lovelace rpk[497754]: INFO  2020-11-16 18:37:03,775 [shard 1] storage - segment.cc:594 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/1_0/0-0-v1.log
Nov 16 18:37:03 lovelace rpk[497754]: INFO  2020-11-16 18:37:03,811 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] vote_stm.cc:225 - became the leader term:1
Nov 16 18:37:03 lovelace rpk[497754]: INFO  2020-11-16 18:37:03,811 [shard 1] storage - segment.cc:594 - Creating new segment /var/lib/redpanda/data/kafka_internal/group/0_2/0-1-v1.log

I then started the producer. It errored first but then succeeded:

$ ./examples/producer -b localhost -t test
Producing messages into topic test
%3|1605551753.243|FAIL|rdkafka#producer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1605551753.243|ERROR|rdkafka#producer-1| [thrd:localhost:9092/bootstrap]: 1/1 brokers are down
interrupt
$ ./examples/producer -b localhost -t test
Producing messages into topic test

However, when I try to start the consumer I get an error:

$ ./consumer -b localhost -t test -g 0
Consuming messages from topic test
%3|1605565901.190|FAIL|rdkafka#consumer-1| [thrd:localhost:9092/bootstrap]: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1605565901.190|ERROR|rdkafka#consumer-1| [thrd:localhost:9092/bootstrap]: 1/1 brokers are down
%3|1605565901.190|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: localhost:9092/bootstrap: Connect to ipv6#[::1]:9092 failed: Connection refused (after 0ms in state CONNECT)
%3|1605565902.191|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 1ms in state CONNECT)
%3|1605565902.191|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 1ms in state CONNECT)
%3|1605565903.191|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1605565903.191|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT, 1 identical error(s) suppressed)

I tried a few variations such as using 127.0.0.1 instead of localhost because I thought it could be an IPv6 issue, but to no avail:

$ ./consumer -b 127.0.0.1 -t test -g 0
Consuming messages from topic test
%3|1605565953.343|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT)
%3|1605565953.343|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT)
%3|1605565954.343|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1605565954.343|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT, 1 identical error(s) suppressed)

I also tried variations of groups such as 1, etc. I'll keep playing with it, but any pointers as to what I am doing wrong would be greatly appreciated.

Cheers

Marco

[1] https://news.ycombinator.com/item?id=25075739
[2] https://github.com/edenhill/librdkafka
[3] https://github.com/mfontanini/cppkafka

@dotnwat
Copy link
Member

dotnwat commented Nov 16, 2020

hey @mcraveiro thanks for the detailed report. this does look like some issue with dns / ipv6 and there definitely are some lingering usability issues here. it looks like what might be happening is that an ipv6 address is being cached some where. could you post your redpanda config, and are you using a single node setup? also, please try using a new group and topic when switching over to the explicit 127.0.0.1 -- it might something ipv6-ish getting cached in the metadata.

@emaxerrno
Copy link
Contributor

emaxerrno commented Nov 17, 2020

looks related to #99

@emaxerrno
Copy link
Contributor

emaxerrno commented Nov 17, 2020

@mcraveiro - can you run this tomorrow. We are hoping to update the rpk container in a few or early tom.

%3|1605565954.343|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0'

the :0 is a default initialization bug.

alternatively, you can remove this from all the configurations

 advertised_kafka_api:
    address: ""
    port: 0
  advertised_rpc_api:
    address: ""
    port: 0

@emaxerrno
Copy link
Contributor

@mcraveiro fyi #100 just landed. it will be in the packages later this week

@mcraveiro
Copy link
Author

mcraveiro commented Nov 17, 2020

Thanks very much for your prompt reply. I'll go through the different suggestions.

Config

My config is unchanged from the package:

$ cat /etc/redpanda/redpanda.yaml
cluster_id: ""
config_file: /etc/redpanda/redpanda.yaml
license_key: ""
node_uuid: QSF1Q5xV9NRYF3twbjz6GqSa6NzMkUwtx5xKWXz1JjYcAvdbf
organization: ""
redpanda:
  admin:
    address: 0.0.0.0
    port: 9644
  advertised_kafka_api:
    address: ""
    port: 0
  advertised_rpc_api:
    address: ""
    port: 0
  auto_create_topics_enabled: true
  data_directory: /var/lib/redpanda/data
  developer_mode: false
  kafka_api:
    address: 0.0.0.0
    port: 9092
  kafka_api_tls:
    cert_file: ""
    enabled: false
    key_file: ""
    truststore_file: ""
  node_id: 1
  rpc_server:
    address: 0.0.0.0
    port: 33145
  seed_servers: []
rpk:
  coredump_dir: /var/lib/redpanda/coredump
  enable_memory_locking: false
  enable_usage_stats: true
  overprovisioned: false
  tls:
    cert_file: ""
    key_file: ""
    truststore_file: ""
  tune_aio_events: false
  tune_clocksource: false
  tune_coredump: false
  tune_cpu: false
  tune_disk_irq: false
  tune_disk_nomerges: false
  tune_disk_scheduler: false
  tune_fstrim: false
  tune_network: false
  tune_swappiness: false
  tune_transparent_hugepages: false

Suggestion 1: Different topic and different group

This does not seem to have fixed the issue. Producer:

$ ./examples/producer -b 127.0.0.1 -t test4
Producing messages into topic test4

Consumer:

$ ./consumer -b 127.0.0.1 -t test4 -g 6
Consuming messages from topic test4
%3|1605602381.396|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT)
%3|1605602381.396|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT)
%3|1605602382.396|FAIL|rdkafka#consumer-1| [thrd:GroupCoordinator]: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT, 1 identical error(s) suppressed)
%3|1605602382.396|ERROR|rdkafka#consumer-1| [thrd:app]: rdkafka#consumer-1: GroupCoordinator: Failed to resolve ':0': Name or service not known (after 0ms in state CONNECT, 1 identical error(s) suppressed)

Broker: No new messages in the broker.

Suggestion 2: Remove API entries from config

Updates to config:

root@lovelace:/etc/redpanda# diff -u redpanda.yaml redpanda.yaml.bak 
--- redpanda.yaml.bak	2020-11-17 08:28:16.591878991 +0000
+++ redpanda.yaml	2020-11-17 08:28:59.083592953 +0000
@@ -10,15 +10,15 @@
   advertised_kafka_api:
     address: ""
     port: 0
-  advertised_rpc_api:
-    address: ""
-    port: 0
+#  advertised_rpc_api:
+#    address: ""
+#    port: 0
   auto_create_topics_enabled: true
   data_directory: /var/lib/redpanda/data
   developer_mode: false
-  kafka_api:
-    address: 0.0.0.0
-    port: 9092
+#  kafka_api:
+#    address: 0.0.0.0
+#    port: 9092
   kafka_api_tls:
     cert_file: ""
     enabled: false

Stopped and restarted server, but failed to restart:

# systemctl stop redpanda
# systemctl start redpanda
Job for redpanda.service failed because the control process exited with error code.
See "systemctl status redpanda.service" and "journalctl -xe" for details.

Server logs:

Nov 17 08:29:50 lovelace systemd[1]: Stopping Redpanda, the fastest queue in the West....
Nov 17 08:29:50 lovelace rpk[625347]: Missing license key. Please get a new one at https://vectorized.io/download-trial/
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,205 [shard 0] redpanda::main - application.cc:73 - Stopping...
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 0] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 0] rpc - server.cc:135 - kafka rpc protocol - Shutting down 1 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 2] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 5] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 6] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 7] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,207 [shard 4] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,208 [shard 3] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,210 [shard 1] rpc - server.cc:126 - kafka rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 1] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 3] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 4] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 6] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 7] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 5] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,227 [shard 2] rpc - server.cc:135 - kafka rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 0] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 0] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 1] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 3] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 6] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 3] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 1] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 6] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 5] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 4] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 5] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 2] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,231 [shard 2] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,230 [shard 4] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,231 [shard 7] rpc - server.cc:126 - vectorized internal rpc protocol - Stopping 1 listeners
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,231 [shard 7] rpc - server.cc:135 - vectorized internal rpc protocol - Shutting down 0 connections
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 1] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 3] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 5] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 4] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 2] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,241 [shard 0] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 7] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: ERROR 2020-11-17 08:29:50,239 [shard 6] cluster - controller_backend.cc:84 - Error while reconciling topics - seastar::abort_requested_exception (abort requested)
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,245 [shard 0] cluster - state_machine.cc:113 - State machine handles raft::offset_monitor::wait_aborted (offset monitor wait aborted)
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,246 [shard 0] cluster - members_manager.cc:154 - stopping cluster::members_manager...
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,247 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:117 - Stopping
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,250 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] consensus.cc:117 - Stopping
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,252 [shard 6] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/6_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,252 [shard 3] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/3_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,252 [shard 2] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/2_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,252 [shard 4] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/4_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,252 [shard 5] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/5_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,252 [shard 7] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/7_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,272 [shard 0] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/0_0
Nov 17 08:29:50 lovelace rpk[497754]: INFO  2020-11-17 08:29:50,289 [shard 1] kvstore - kvstore.cc:93 - Stopping kvstore: dir /var/lib/redpanda/data/redpanda/kvstore/1_0
Nov 17 08:29:50 lovelace systemd[1]: redpanda.service: Succeeded.
Nov 17 08:29:50 lovelace systemd[1]: Stopped Redpanda, the fastest queue in the West..
Nov 17 08:30:01 lovelace systemd[1]: Starting Redpanda, the fastest queue in the West....
Nov 17 08:30:01 lovelace rpk[625372]: Missing license key. Please get a new one at https://vectorized.io/download-trial/
Nov 17 08:30:01 lovelace rpk[625372]: Detecting the current cloud vendor and VM
Nov 17 08:30:01 lovelace rpk[625372]: Could not detect the current cloud vendor
Nov 17 08:30:01 lovelace rpk[625372]: System check 'NIC usb0 IRQ affinity set' failed with non-fatal error 'err=signal: segmentation fault, stderr='
Nov 17 08:30:05 lovelace rpk[625372]: Error: System check 'Config file valid' failed. Required: true, Current false
Nov 17 08:30:05 lovelace systemd[1]: redpanda.service: Main process exited, code=exited, status=1/FAILURE
Nov 17 08:30:05 lovelace systemd[1]: redpanda.service: Failed with result 'exit-code'.
Nov 17 08:30:05 lovelace systemd[1]: Failed to start Redpanda, the fastest queue in the West..

Tried a couple of restarts but seems to fail with the same error. Reverting the changes resulted in a happy broker again, but I still get the same error.

Many thanks for your time.

@0x5d
Copy link
Contributor

0x5d commented Nov 17, 2020

The advertised fields bug was fixed in #100

@mcraveiro
Copy link
Author

Thanks @0x5d, I will wait for this to hit the DEBs and then try again.

@emaxerrno
Copy link
Contributor

@mcraveiro - yeah, the configuration for IP's must be correct once we bootstrap the raft quorum you'd have to perform a metadata update, etc. if it's just for testing remove /var/lib/redpanda/data/*

but also we just released fixes for this. https://github.com/vectorizedio/redpanda/releases/latest

can you test again and close if it fixes your problem.

@emaxerrno
Copy link
Contributor

@mcraveiro if you already have it installed on your computer you can just do apt-get update redpanda (after removing the /var/lib/redpanda/data directory)

@mcraveiro
Copy link
Author

mcraveiro commented Nov 17, 2020

Thanks very much for all your tips. So, I trashed my data directory as mentioned:

# cd /var/lib/redpanda/data
# rm -rf *

I did an upgrade - couldn't quite get the command @senior7515 supplied to work, but the following incantation did the trick:

# apt-get update
# apt-get --only-upgrade install redpanda
<snip>
The following packages will be upgraded:
  redpanda
1 upgraded, 0 newly installed, 0 to remove and 95 not upgraded.
Need to get 20.4 MB of archives.
After this operation, 379 kB disk space will be freed.
Get:1 https://packagecloud.io/vectorizedio/v/debian bullseye/main amd64 redpanda amd64 20.11.4-1-e6f825d [20.4 MB]
Fetched 20.4 MB in 9s (2,368 kB/s)                                                                                                                                   
Reading changelogs... Done
(Reading database ... 784015 files and directories currently installed.)
Preparing to unpack .../redpanda_20.11.4-1-e6f825d_amd64.deb ...
Unpacking redpanda (20.11.4-1-e6f825d) over (20.11.1-1-6f1a62c) ...
Setting up redpanda (20.11.4-1-e6f825d) ...

Configuration file '/etc/redpanda/redpanda.yaml'
 ==> Modified (by you or by a script) since installation.
 ==> Package distributor has shipped an updated version.
   What would you like to do about it ?  Your options are:
    Y or I  : install the package maintainer's version
    N or O  : keep your currently-installed version
      D     : show the differences between the versions
      Z     : start a shell to examine the situation
 The default action is to keep your current version.
*** redpanda.yaml (Y/I/N/O/D/Z) [default=N] ? Y
Installing new version of config file /etc/redpanda/redpanda.yaml ...
redpanda:x:125:133::/var/lib/redpanda:/usr/sbin/nologin
redpanda:x:133:
kernel.sched_tunable_scaling = 0
kernel.sched_min_granularity_ns = 500000
kernel.sched_wakeup_granularity_ns = 450000
kernel.sched_latency_ns = 1000000
kernel.sched_autogroup_enabled = 0
kernel.numa_balancing = 0
fs.aio-max-nr = 1048576

Redpanda is set to run in development mode. To get the most out of the
fastest queue in the west, enable production mode by running the following:

    sudo rpk mode production

followed by:

    sudo rpk tune all
    sudo systemctl start redpanda

This will autotune your system to give you the best performance from Redpanda.
You can get more information on the tunning parameters here:
https://vectorized.io/documentation/autotune/

I then restarted the producer:

$ ./examples/producer -b 127.0.0.1 -t test
Producing messages into topic test

And the consumer does look happier but not quite all sorted I don't think:

[marco@lovelace examples]$ ./consumer -b 127.0.0.1 -t test -g 1
Consuming messages from topic test
[+] Received error notification: Broker: Unknown topic or partition

I tried a few groups and different topics but always with the same result. The server logs:

# journalctl -u redpanda.service -f
-- Logs begin at Mon 2020-08-17 09:21:34 BST. --
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,942 [shard 0] raft - [group_id:0, {redpanda/controller/0}] consensus.cc:623 - Recovered, log offsets: {start_offset:-9223372036854775808, committed_offset:-9223372036854775808, committed_offset_term:-9223372036854775808, dirty_offset:-9223372036854775808, dirty_offset_term:-9223372036854775808, last_term_start_offset:-9223372036854775808}, term:0
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,952 [shard 0] storage - segment.cc:603 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/0_0/0-0-v1.log
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,974 [shard 0] cluster - state_machine.cc:29 - Starting state machine
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,974 [shard 0] cluster - members_manager.cc:55 - starting cluster::members_manager...
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,987 [shard 0] raft - [group_id:0, {redpanda/controller/0}] vote_stm.cc:234 - became the leader term:1
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,987 [shard 0] storage - segment.cc:603 - Creating new segment /var/lib/redpanda/data/redpanda/controller/0_0/0-1-v1.log
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,988 [shard 0] redpanda::main - application.cc:479 - Started RPC server listening at {host: 0.0.0.0, port: 33145}
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,988 [shard 0] redpanda::main - application.cc:517 - Started Kafka API server listening at {host: 0.0.0.0, port: 9092}
Nov 17 20:53:35 lovelace rpk[676995]: INFO  2020-11-17 20:53:35,988 [shard 0] redpanda::main - application.cc:519 - Successfully started Redpanda!
Nov 17 20:53:35 lovelace systemd[1]: Started Redpanda, the fastest queue in the West..
Nov 17 20:54:16 lovelace rpk[676995]: INFO  2020-11-17 20:54:16,701 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] consensus.cc:595 - Starting
Nov 17 20:54:16 lovelace rpk[676995]: INFO  2020-11-17 20:54:16,701 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] consensus.cc:623 - Recovered, log offsets: {start_offset:-9223372036854775808, committed_offset:-9223372036854775808, committed_offset_term:-9223372036854775808, dirty_offset:-9223372036854775808, dirty_offset_term:-9223372036854775808, last_term_start_offset:-9223372036854775808}, term:0
Nov 17 20:54:16 lovelace rpk[676995]: INFO  2020-11-17 20:54:16,711 [shard 1] storage - segment.cc:603 - Creating new segment /var/lib/redpanda/data/redpanda/kvstore/1_0/0-0-v1.log
Nov 17 20:54:16 lovelace rpk[676995]: INFO  2020-11-17 20:54:16,743 [shard 1] raft - [group_id:1, {kafka_internal/group/0}] vote_stm.cc:234 - became the leader term:1
Nov 17 20:54:16 lovelace rpk[676995]: INFO  2020-11-17 20:54:16,743 [shard 1] storage - segment.cc:603 - Creating new segment /var/lib/redpanda/data/kafka_internal/group/0_2/0-1-v1.log

@emaxerrno
Copy link
Contributor

@mcraveiro did you create the topic or enable auto create ?

https://vectorized.io/documentation/advanced-configuration/

see auto_create_topics_enabled: true by default is false

@mcraveiro
Copy link
Author

mcraveiro commented Nov 17, 2020

It appears the config that comes with the DEB sets it to true:

# cd /etc/redpanda
# grep -nHe auto redpanda.yaml
redpanda.yaml:10:  auto_create_topics_enabled: true

@mcraveiro
Copy link
Author

But now you say that, it seems that is precisely the problem:

$  rpk api topic describe test
Missing license key. Please get a new one at https://vectorized.io/download-trial/
Error: topic 'test' not found
$  rpk api topic create test
Missing license key. Please get a new one at https://vectorized.io/download-trial/
Created topic 'test'. Partitions: 1, replicas: 1, configuration:
'cleanup.policy':'delete'

Producer:

$ ./examples/producer -b 127.0.0.1 -t test
Producing messages into topic test

Consumer:

$ ./consumer -b 127.0.0.1 -t test -g 1
Consuming messages from topic test
Got assigned: [ test[0:#] ]

So that looks good but then it goes quiet. Now, I would have expected a lot of data to start flooding in :-) but perhaps these test applications are not very verbose. I shall look into the source code and see if it should indeed be this quiet. At any rate, thanks very much, it seems I am almost there now.

@mcraveiro
Copy link
Author

mcraveiro commented Nov 17, 2020

Ah, silly me, it listens on stdin :-) Its all good. Producer:

$ ./examples/producer -b 127.0.0.1 -t test
Producing messages into topic test
test
this is a test

Consumer:

$ ./consumer -b 127.0.0.1 -t test -g 1
Consuming messages from topic test
Got assigned: [ test[0:#] ]
test
this is a test

So I guess the only remaining question then is the automatic topic creation.

@dotnwat
Copy link
Member

dotnwat commented Nov 17, 2020

@mcraveiro thanks for the update, this is looking good.

one reason topic auto creation may not be working is that the client is setting allow_auto_topic_creation to false in its internal metadata request. this may be something you'd need to check on your side in the client / application, perhaps in a config file. happy to help here where possible, perhaps looking at any underlying open-source client for tips, if relevant.

of course, we may also have a bug related to topic creation. checking your client might be the easiest first step to debugging this before digging further.

@emaxerrno
Copy link
Contributor

@mcraveiro resolving issue. if metadata is a bug, let's open another one. thank you.

@mcraveiro
Copy link
Author

Thanks a lot guys, I'll look at the client for that option.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants