Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Kafka offset deletion causes reprocessing #180

Open
alok87 opened this issue Apr 7, 2021 · 6 comments
Open

Kafka offset deletion causes reprocessing #180

alok87 opened this issue Apr 7, 2021 · 6 comments
Labels
bug Something isn't working p3 not urgent, intermittent issue

Comments

@alok87
Copy link
Contributor

alok87 commented Apr 7, 2021

Batcher data in s3 is getting rotated/archived/deleted sooner than expected 14d. Need to find the root cause and fix it.

Recreate is required everytime this is happening.

pq: Mandatory url is not present in manifest file.

@alok87 alok87 added the bug Something isn't working label Apr 7, 2021
@alok87
Copy link
Contributor Author

alok87 commented Apr 14, 2021

Archival has been paused to reproduce this, to be sure it is not due to code issue.

@alok87 alok87 added the p3 not urgent, intermittent issue label Apr 15, 2021
@alok87
Copy link
Contributor Author

alok87 commented Apr 23, 2021

Found the root cause:
Issue is happening for topics which are not receiving updates for many weeks. When topic does not receive updates for many weeks, then Kafka deletes its consumer group. And next time when the consumer group starts it is starting processing from -1 instead of last offset.

lastOffset 1

I0423 09:37:33.888793 manager.go:213] topic:loader-ts.inventory.customers, partition:0, lastOffset:1 (kafka lastoffset)

initialOffset was expected to be 1 but is -2

I0423 09:37:37.379926 loader_handler.go:115] loader-ts.inventory.customers: consumeClaim started, initalOffset:-2

Doing this #20 will help as we won't depend on Kafka to store the last consumer group last offsets.

@alok87 alok87 added p1 priority 1, do it ASAP and removed p3 not urgent, intermittent issue labels Apr 23, 2021
@alok87
Copy link
Contributor Author

alok87 commented Apr 23, 2021

Workaround:

  1. Find the topics which are not receiving any updates for many hours(magic no):
    sort_desc(rate(kafka_consumergroup_current_offset{topic=~"ts.inventory.*"}[6h])) == 0
  2. Stop the loader, so the consumer group becomes inactive
  3. Reset the offset for those topics to the last offset found using:
    kafka_topic_partition_current_offset{topic=~"loader-ts.inventory.customers"}
    reset offset to latest using:
./bin/kafka-consumer-groups.sh --command-config ./bin/client-ssl-auth.properties --bootstrap-server=XXX --group=ts-redshiftsink-latest-invetory-loader --topic=loader-ts.inventory.customers --reset-offsets --to-latest --execute

@alok87 alok87 changed the title Loader unable to load when it is suspended or is stuck due to Crashloop for few hours Mandatory URL is missing; load failures; 0 throughput tables Apr 30, 2021
@alok87
Copy link
Contributor Author

alok87 commented May 5, 2021

Solution could be very simple:

Operator watches over the topic having 0 new input for a topic for some time. And stop batcher and loaders for those topics. But this would bring in prometheus dependency, we can do this without it also.

@alok87
Copy link
Contributor Author

alok87 commented May 12, 2021

@alok87 alok87 changed the title Mandatory URL is missing; load failures; 0 throughput tables Kafka offset retention deletion causes reprocessing May 12, 2021
@alok87 alok87 changed the title Kafka offset retention deletion causes reprocessing Kafka offset info deletion causes reprocessing May 12, 2021
@alok87
Copy link
Contributor Author

alok87 commented May 12, 2021

Increased the offset.retention.minutes to large value temp fix.

@alok87 alok87 changed the title Kafka offset info deletion causes reprocessing Kafka offset deletion causes reprocessing May 12, 2021
@alok87 alok87 added p3 not urgent, intermittent issue and removed p1 priority 1, do it ASAP labels May 12, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working p3 not urgent, intermittent issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant