Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bulk user creation API #33845

Closed
bczifra opened this issue Sep 19, 2018 · 2 comments
Closed

Bulk user creation API #33845

bczifra opened this issue Sep 19, 2018 · 2 comments
Assignees
Labels
>feature :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc)

Comments

@bczifra
Copy link
Member

bczifra commented Sep 19, 2018

Describe the feature:
It would be helpful to have an API method that supports creating multiple users at the same time. This would be particularly useful in situations where hundreds or thousands of users may need to be created, such as when a cluster is initially set up.

@cbuescher cbuescher added the :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc) label Sep 19, 2018
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-security

@tvernum
Copy link
Contributor

tvernum commented Nov 16, 2018

We don't plan to introduce a Bulk Users API.
On balance we think that the number of genuine use cases for this API wouldn't justify the ongoing maintenance cost of having to support another API.

Bulk APIs introduce complications around error handling and reporting, and do not necessarily have a major improvement on performance unless an API primarily bound by network latency.

However for users who need to create many users at once, we have some recommendations:

There are 2 parts to the existing user creation API that have the biggest impact on throughput.

  1. "refresh"
  2. password hashing

For a long time the API has supported different options for these, but we're added more testing and docs around them in #34729 (see #35242 and #35574).

Refresh

By default the PutUser API does the equivalent of refresh=true. This is a divergence from other Elasticsearch APIs where the default is refresh=false.
If creating multiple users, performing a refresh after every user is quite costly. Including refresh=false as a query parameter can have a noticeable improvement on throughput. Even with default password hashing (see below) it's a 10% improvement.

Password Hashing

This is a bigger improvement (spoiler: up to 200%).
Password hashing is intentionally slow. The idea is that if your data is leaked somehow, then you want to make it expensive for attackers to crack your stored passwords.
The default password hasher in ES6.x is bcrypt10, and by design, it makes heavy use of the node's CPU.
There are 2 options for offloading some of that work:

  1. Use a cheaper (but less secure) hasher
  2. Perform the hashing on the client.

Cheap Password Hashing

The cheapest hasher we support for password storage is bcrypt4.
That's not very secure, and we definitely would not recommend that you use that in a production environment. But if you are regularly spinning up new test environments where your password security is less important to you, then you can configure the xpack.security.authc.password_hashing.algorithm setting to be bcrypt4, and get about a 200% improvement in throughput.
This does not require any changes to your API calls, just 1 change to your elasticsearch.yml.
But, to stress, We do not recommend this in production environments.

Pre Hashed Passwords

The other option is to hash the password before it is sent to the API.
If you are automating the setup of users from scripts, then this may actually be a big security improvement for you, as your scripts will not need to contain the plaintext password.
You will need to hash the passwords in the same format as the server's default hasher but our hash formats are fairly standard (bcrypt or PBKDF2), and there are compatible libraries for most languages and many automation tools.
If you have faster (or more CPUs) on your client machine, or can pre-hash the passwords before you need to load them, then this can also get a 200% improvement in throughput, without needing to sacrifice hash strength.

Conclusion

I wrote a fairly naive, single threaded python script that creates users in a serial fashion and ran it my 2 year old laptop (with Elasticsearch also running on that laptop).

For the default options (server side hashing with bcrypt10, default refresh=true) I can create about 8.5 users per second.

With client-side hashing and refresh=false, I can create about 37 users per second.
Even if you have 5000 users to create, that should only about 2.5 minutes.

Unless there's a really compelling reason to revisit this, we feel that the existing features provide the right set of options for the Put User API, and those throughput numbers are reasonable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>feature :Security/Authentication Logging in, Usernames/passwords, Realms (Native/LDAP/AD/SAML/PKI/etc)
Projects
None yet
Development

No branches or pull requests

6 participants