Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

security flaw in authentication scheme or odd api description #2746

Closed
hadmut opened this issue Jul 29, 2021 · 9 comments
Closed

security flaw in authentication scheme or odd api description #2746

hadmut opened this issue Jul 29, 2021 · 9 comments

Comments

@hadmut
Copy link

hadmut commented Jul 29, 2021

Hi,

I had filed a bug against gemstash, because it stores credentials in plaintext in its database instead of hashing them, what's a standard practice:

rubygems/gemstash#288 (comment)

Their response was that the authorization scheme is not a particular property of gemstash, but a standard defined by the gem api:
https://guides.rubygems.org/rubygems-org-api/#api-authorization

which looks as some misunderstandiing of the HTTP authorization header, see
RFC7617
https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Authorization
https://en.wikipedia.org/wiki/Basic_access_authentication

which uses the Authorization-Header like

  Authorization: Basic dGVzdDoxMjPCow==

where the base64 encoded string is a username and a password, joined with a colon. This allows to used the username as an index into a password database and to find the hash belonging to the username, then check it. gemstash just uses a random string and needs to store it in plaintext in order to find it.

I'm not sure, whether this is just a problem of gemstash, or the gem api in common, but even in the former case you should clarify in https://guides.rubygems.org/rubygems-org-api/#api-authorization

Using just a random API KEY makes it hard to store the key hashed in a database, since there is no index then and you need to check all entries.

@deivid-rodriguez deivid-rodriguez transferred this issue from rubygems/rubygems Jul 29, 2021
@deivid-rodriguez
Copy link
Member

I transferred this to the rubygems.org repo since it seems like a better fit for the issue, although I'm not fully sure.

@simi
Copy link
Member

simi commented Jul 29, 2021

RubyGems.org doesn't support Basic access authentication via HTTP headers. The only way to access the API is using API key.


Using just a random API KEY makes it hard to store the key hashed in a database, since there is no index then and you need to check all entries.

I'm not sure I do understand this part well, anyway here is implementation how we lookup for API key in database.

def authenticate_with_api_key
params_key = request.headers["Authorization"] || ""
hashed_key = Digest::SHA256.hexdigest(params_key)
@api_key = ApiKey.find_by_hashed_key(hashed_key)
render_unauthorized unless @api_key
end

We do store hashed key and we do have index over this column.

t.string "hashed_key", null: false

t.index ["hashed_key"], name: "index_api_keys_on_hashed_key", unique: true

@hadmut
Copy link
Author

hadmut commented Jul 30, 2021

Does RubyGems.org store the API keys in plaintext as well?

If so: shame on them.
If not: How do they find the hash then, if they do not have a username as an index? It is possible, but would require some deterministic hash function, i.e. without a salt or similar things.

@rubyFeedback
Copy link

Their response was that the authorization scheme is not a particular property of gemstash, but a standard defined by
the gem api

This sounds super-strange to me. That explanation is basically "we don't care but delegate to someone else" when it comes to storing any user data. They may have to reconsider their approach if this has indeed been their statement.

@hadmut
Copy link
Author

hadmut commented Jul 30, 2021

Yes, strange to me too.

However, the API doc is not exactly good in this aspect. Time to check all docs and components for proper definitions and implementation.

@sonalkr132
Copy link
Member

Does RubyGems.org store the API keys in plaintext as well?

Hi @hadmut , please help us understand why does following quote from simi not answer your question:

We do store hashed key and we do have index over this column.

How do they find the hash

We hash the API key before lookup in DB. ie index is on hashed values of the API keys.

@hadmut
Copy link
Author

hadmut commented Jul 31, 2021

Well, it's somewhat mediocre, much better than nothing, but not yet as good as common practise. But you need to step a little deeper into cryptography in order to understand.

In general, the main reason for doing all that hashing stuff, is to protect against the case that the server's credential database falls into atacker's hands. Some sort of read access, e.g. a backup file or a compromised account.

Obviously, if the database contains plaintext credentials, you can easily steal them. e.g. in gemstash, which stores the credentials as plaintext, you can simply query them from the database:

sqlite3 gemstash.db 'select * from authorizations;'

There's another problem with that: If you automatically install such systems with tools like puppet or ansible, you don't always have a source for reliable secret files or a server which is fully protected against read only attacks, e.g. because the installation server is used by a full company. Therefore, credentials not be stored as plaintext unless needed (e.g. cryptographic secret or private keys).

So
hashed_key = Digest::SHA256.hexdigest(params_key)
is a huge improvement over that, but not yet good. It's still naive and decades behind the state of the art.

What's the problem?

The problem is, it's deterministic. The same credential (=password) always results in the same hash. Which makes it weak.

For example, if John and Jane for some reason have the same password, even if you don't know it, you can recognize that it is the same.

Even worse:

There is lists of most common used passwords, and lists of stolen passwords from broken servers. Once you have these lists, you can easily prepare their hashes.

E.g. very common passwords are 'password' and '123456'. Silly, but good to explain the problem here. Once you have a list of such common or ever used passwords, it's no deal to prepare a list of their hashes

Digest::SHA256.hexdigest('password') => "5e884898da28047151d0e56f8dc6292773603d0d6aabbdd62a11ef721d1542d8"
Digest::SHA256.hexdigest('123456') => "8d969eef6ecad3c29a3a629280e686cf0c3f5d5a86aff3ca12020c923adc6c92"

and thus hack these passwords by a simple reverse lookup. Because on every server for every account on world, the digest for a given password is always the same.

Therefore, for those passwords that are vulnerable for brute force or password list attacks, they are not any more secure than plaintext, because once you see the hash, you now the password. It's not a one-way thing. It's not a trapdoor.

To avoid that, state of the art is to use a 'salted' hash. This means, that for every stored credential/password a unique random string is created, and the hash is generated over (I do simplify things here for explaining, do not directly implement that) the concatenation, i.e. Digest::SHA256.hexdigest(credential + randomstring). And both the randomstring and the digest are stored together in the database. The randomstring is not really confidential, because it's not important to keep it secret, but to have it randomly and uniquely created (i.e. a fresh one for every single act of storing a hashed credential).

Whenever you verify a password for user authentication, you have to fetch the random string from the database and calculate the digest.

The consequence is: You cannot build a codebook, a list of common passwords and their hashes. Because even if you choose a common stupid password like 123456, the hash looks different every time.

Check the /etc/passwords of your Linux distribution. It works like that. If you set the same password again or for a different account, the entry looks completely different.

E.g.:

require 'bcrypt'
BCrypt::Password.create('123456') => "$2a$12$PvkC6oJSl6zSnrBu9jet7OjuByVuEIxhkqm/.Cs5E6u0NggihfJ1a"
BCrypt::Password.create('123456') => "$2a$12$GSDLOHDZVpxQlR.kdk9.Q.wIajpFHxvwUGyk1OnLTxSAJAiFt1fQi"
BCrypt::Password.create('123456') => "$2a$12$YM7FzBAbEh7DB/2as2dZwerl5as6izrNW8v/nWVnoD1z.tVpB2LZW"

Each single one of these would verify with '123456', but you need to recalculate the hash, you cannot keep it in a database for reverse lookup.

So if an attacker gets access to the database, he cannot simply do a revery lookup against his database of the ten millions of most lovely passwords. He cannot easily check whether there is a hash in the database he knows the plaintext for. He needs to try every single password of his list against every single entry in the database. Which is much, much more work.

You can improve this if you don't just use a single hash, but a number of nested repeated hash calls Hash(Hash(Hash(Hash(...)))), which would take e.g. a full second to calculate on latest desktop CPUs.

Compare hacking a database with 100 entries and a single weak password contained in such a password list:

A simple sha digest could be solved by a reverse lookup, so the attacker would know in milliseconds the plaintext password.

Now, he is still able to find the weak password, but checking all his 3 millions known passwords against the 100 entries with 1 second calculation time each would cost him 3e61001 = 3e8 seconds of CPU time. Which is about 9.5 years. On the average, after half the time he would be successful, ie. after 4.75 years. Compare this to the milliseconds it would take him for a reverse lookup.

So even if there is a weak password, or one that is used for different accounts, the door is not yet open as wide as with such as simple scheme. BTW, this is state of the art of security for about 20, 30 years. I think it was invented for /etc/passwords end of the eighties or early nineties.

Now the downside for you:

You cannot index these hashes by knowning the plaintext password.

Why not? Because that's exactly what this is intended to spoil, because that's an attack. The idea is that even if you know the plaintext password, let it be '123456', you cannot easily find that in the database. Because that's exactly what this is intended to make impossible.

You first need to find the hashed string, then you can take the salt from it and try to check, whether the given password matches, i.e. authentication succeeds. You cannot easily look whether there is an entry for a given password, you have to check every single entry whether it matches the password. If you have 1000 entries, it takes you up to 1000 seconds, average 500 seconds of cpu time.

That's why you need the username as an index.

Imagine you have 50,000 password entries. It's unfeasible to effectively verify my poor password 'submarine' against every single on of these.

But once I login as
username: hadmut
password: submarine
you can use the username hadmut as an index to find the hash entry I intend to authenticate against, which might look like
"$2a$12$5wTnUtlXJRWCIyCUuUr/xuWXLFGyQ.NvSMUPP44MZBriUcxyKOEYS"

and then check the password against that entry to find it matching.

This is, in simplified words, why your hash scheme is weak, even if it uses sha256, and not uses a username. Because if you can use the password itself as an index, it is weak. If you can use the secret as an index, it shines through.

Furthermore, your API description is poor. It was misunderstood by the people writing gemstash, who made it even worse, because they misunderstood it to store the credentials in plaintext.

Another hint:

Since it is difficult to keep your passwords confidential these days, even the good ones, because there's so many sorts of malware, it is a good idea to use some second factor thing, e.g. u2f / fido2. Since these do work differently, you cannot use the credential as an index anyway, and need a username as an index.

Some non-cryptographic hint:

In general, it is not a good idea to always store credentials in a local app-specific database, because you can't reliably maintain it, e.g. disable accounts of users who left the organisation or who's notebook has been compromised.

If you want to use common practices like connecting to an LDAP server (even AD), you also need some index value, i.e. the username.

@sonalkr132
Copy link
Member

The problem is, it's deterministic. The same credential (=password) always results in the same hash.

You seem to be assuming that we are using user password as API key. This is not the case, our API keys are generated with SecureRandom.hex(24) which is different from user passwords. It is almost impossible to have a collision for number of users we have.
Please feel free to submit a report to https://hackerone.com/rubygems if you feel our implementation has a security flaw. If possible submit a POC as well.
I am closing this since I don't see any valid concern here regarding rubygems.org API keys.

@hadmut
Copy link
Author

hadmut commented Jul 31, 2021

  1. What's the point in asking me the question then?
  2. In what way does this help with the gemstash problem?
  3. It's the API definition.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants