Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use random:uniform instead of os:pid when constructing node name in nodetool #868

Merged
merged 3 commits into from Aug 25, 2021

Conversation

hmmr
Copy link
Contributor

@hmmr hmmr commented May 11, 2021

Borrowing from https://github.com/basho/node_package/blob/4.0/priv/base/nodetool#L195, this is to help reduce the risk of hitting the atom table limit, as was reported by one of our customers who was calling riak-admin continuously and frequently enough to trigger the atom table overflow.

…in append_node_suffix

Borrowing from https://github.com/basho/node_package/blob/4.0/priv/base/nodetool#L195, this will help reduce the risk of hitting the atom table limit, as was reported by one of our customers who was calling riak-admin continuously and frequently enough to trigger the atom table overflow.
@tsloughter
Copy link
Member

This looks good, thanks. I'll probably merge soon.

But I wanted to note that in OTP 23+ nodetool is no longer used and this issue does not exist. Obviously still worth it to be fixed for those using pre-23, just wanted to mention it :)

@hmmr
Copy link
Contributor Author

hmmr commented May 11, 2021

@tsloughter Indeed, I read that note and slightly pondered if it's worth while bothering. But, on reflection, it seems it still does :)

@tsloughter
Copy link
Member

Hm, the shelltestrunner tests and the tests on windows fail.

Copy link

@Bob-The-Marauder Bob-The-Marauder left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should random be replaced by rand as random is deprecated?

@ferd
Copy link
Collaborator

ferd commented May 12, 2021

Yes, definitely should replace with the newer stuff where available.

hmmr added a commit to hmmr/relx that referenced this pull request May 24, 2021
Instances of nodetool generate random node name suffxes to facilitate running multiple simultaneous calls in parallel.  However, each time nodetool connects to the target node, a new atom is created on the latter.  If this happens frequently and/or long enough, it will eventually crash the node as it hits the atom table limit.  As a workaround, if the caller can guarantee calls are serialized and isolated in time, defining an env variable $NODETOOL_NODE_PREFIX will create identical atoms for node name prefix, thus avoiding generation of new atoms.

The proposed change is complimentary to erlware#868, aiming to address the issue, reported by one of our customers, in which a riak node hit the atom table limit (yes, all of 1M+ entries) and crashed. A postmortem showed the table filled with `maint1a2b3c4d-riak@127.0.0.1`, accumulated over a period of time resulting from calls to `riak admin status` every 5 min.

Note that I did not attempt to do any changes that may need to be done, to the same effect, in extended_bin_windows, as it's not straightforward for me which they would be (my knowledge of scripting in Windows is some 30 year old).
@hmmr
Copy link
Contributor Author

hmmr commented Jul 15, 2021

@tsloughter After the approval, what is the current state of this PR? Is there anything I can do to help?

@tsloughter
Copy link
Member

Hey, sorry about that. I don't know what the hell is going on with CI... there is at least 1 other PR that should be passing CI but isn't that I also want to merge and cut a release with.

@tsloughter
Copy link
Member

Could you repush so it kicks of CI again? There isn't even a "rerun" option anywhere like there usually is...

@hmmr
Copy link
Contributor Author

hmmr commented Jul 16, 2021

Once it's in, there's a more substantial #871.

@tsloughter tsloughter merged commit e6c3ae4 into erlware:master Aug 25, 2021
@hmmr hmmr deleted the patch-1 branch August 27, 2021 13:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants