Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Arakoon services in start/fail/start loop and dumping many debug logs to /opt/OpenvStorage #194

Open
jake9050 opened this issue Aug 9, 2017 · 4 comments

Comments

Projects
None yet
4 participants
@jake9050
Copy link

commented Aug 9, 2017

After updating pocops to the latest Fargo release the arakoon services get stuck in a loop where they constantly restart. The ovs homefolder gets populated with files caled console:.debug.TIMESTAMP.xxxxxx that contain these kinds of messages:

1502284724: main debug: 7679026 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679025") -> new_i:7679026 1502284724: main debug: 7679027 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679026") -> new_i:7679027 1502284724: main debug: 7679028 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679027") -> new_i:7679028 1502284724: main debug: after_block 1502284724: main debug: _fold_blocks 1502284724: main info: Completed replay of 1535.tlx, took 0.502017 seconds, 1 to go 1502284724: main info: Replaying tlog file: 1536.tlog [7679030,...] (2/2) 1502284724: tlog_map debug: fold_read extension=.tlog => index':Some {filename="/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog";mapping=} 1502284724: main debug: U.fold 7679029 Some ("7679092") ~index:Some {filename="/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog";mapping=} 1502284724: main debug: maybe_fast_forward 7679029 with Some {filename="/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog";mapping=} 1502284724: main debug: 7679029 => store 1502284724: main debug: Store.incr_i old_i:Some ("7679028") -> new_i:7679029 1502284724: main debug: 7679030 => skip 1502284724: main debug: 7679029 => store 1502284724: tlog_map debug: filename:/mnt/ssd1/arakoon/flash-10-nsm_15/tlogs/1536.tlog(Failure "update 7679029, store @ 7679029 don't fit") 1502284724: main fatal: going down(Failure "update 7679029, store @ 7679029 don't fit") 1502284724: main fatal: after pick

This eventually fills the disk causing more trouble.

System info
os: Ubuntu 16.04.3 LTS

OVS components

`ii alba 1.3.14 amd64 the ALternative BAckend
ii arakoon 1.9.17 amd64 Simple consistent distributed key/value store
ii openvstorage 2.8.2-1 amd64 openvStorage
ii openvstorage-backend 1.8.1-1 amd64 openvStorage Backend plugin
ii openvstorage-backend-core 1.8.1-1 amd64 openvStorage Backend plugin core
ii openvstorage-backend-webapps 1.8.1-1 amd64 openvStorage Backend plugin Web Applications
ii openvstorage-core 2.8.2-1 amd64 openvStorage core
ii openvstorage-hc 1.8.1-1 amd64 openvStorage Backend plugin HyperConverged
ii openvstorage-health-check 3.2.0-fargo.3-1 amd64 Open vStorage HealthCheck
ii openvstorage-sdm 1.7.1-1 amd64 Open vStorage Backend ASD Manager
ii openvstorage-webapps 2.8.2-1 amd64 openvStorage Web Applications

`

@wimpers

This comment has been minimized.

Copy link

commented Aug 17, 2017

@jake9050 any idea why the Arakoon was stuck in a start/fail/start loop ?

@wimpers

This comment has been minimized.

Copy link

commented Aug 23, 2017

@jtorreke any idea why Arakoon acted up?

@jtorreke

This comment has been minimized.

Copy link
Member

commented Aug 23, 2017

It was lagging behind too much and could no longer catch up from cluster members. Throwing out the local data and start a new copy was the solution.

@wimpers wimpers removed the state_question label Aug 23, 2017

@wimpers

This comment has been minimized.

Copy link

commented Sep 4, 2017

@jtorreke was the root cause of not being able to catchup that the messages were too big?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
You can’t perform that action at this time.