Skip to content

Conversation

@ruichuan
Copy link
Collaborator

This branch is to stabilize the riak start script via adding retries and reporting errors when the cluster is started unsuccessfully.

@ruichuan ruichuan requested review from iakkus and manuelstein July 14, 2020 13:41
@ruichuan ruichuan linked an issue Jul 14, 2020 that may be closed by this pull request
@iakkus
Copy link
Member

iakkus commented Jul 14, 2020

Should we also add some reachability related checks?
The 'curl' statement currently fails in a silent way and I think someone had run into this issue. Maybe we can also retry on that?

@ruichuan
Copy link
Collaborator Author

Should we also add some reachability related checks?
The 'curl' statement currently fails in a silent way and I think someone had run into this issue. Maybe we can also retry on that?

The retry for reachability check has been added.

Copy link
Member

@iakkus iakkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thanks!

Copy link
Collaborator

@manuelstein manuelstein left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question: when the join succeeds, it would directly test if the node is listed in the cluster status as "joining". Could that go wrong? E.g. when the ring is busy with other stuff, could it take a second "cluster status" to find the node is "joining"? On one deployment I saw the third out of 3 nodes was marked "joining".

@ruichuan
Copy link
Collaborator Author

ruichuan commented Jul 15, 2020 via email

@iakkus iakkus merged commit 4e67cfa into develop Jul 16, 2020
@iakkus iakkus deleted the riak_start_script branch July 17, 2020 09:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Faulty setup appears as running, but fails to create first user

4 participants