Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Illegal passwords list #44

Open
aurbjo opened this issue Oct 21, 2016 · 26 comments
Open

Illegal passwords list #44

aurbjo opened this issue Oct 21, 2016 · 26 comments
Assignees
Milestone

Comments

@aurbjo
Copy link

aurbjo commented Oct 21, 2016

It would be great to be able to specify a list of illegal passwords in the config file, or even illegal words in password.

@r2evans
Copy link

r2evans commented Oct 21, 2016

(Caveat: I'm not a dev with LTB.) Are you talking about a simple blacklist? It sounds like you mean substring matches, meaning !password123! should fail, since either password or (at least) password123 should not be a substring.

On a cursory search, there are API-based subscription services (such as https://www.passwordrbl.com). I think this is different than what you are suggesting (and considerably different to implement).

I assume you mean a "static" blacklist ("static" in that its updates are either manual (by the sysadmin) or manual/versioned (by the maintainers of LTB-SSP). There are several known lists (such as http://web.archive.org/web/20130605082350/http://www.isdpodcast.com:80/resources/62k-common-passwords/, among many), so finding a reliable and reasonable source should just take a little time to research. I don't have much experience looking for them, so that link is just an example; I can't speak to it.

Some discussion questions:

  • should this be case-insensitive?
  • should it look for (simple) 1337-speak transliteration?
  • what about looking for reversals? (321wrodssap)
  • if substring matching is done, is there a point at which leading/trailing characters make it alright?
    (difference between aaPASSWORD123aa and &13as!&8sd@@PASSWORD123@@f7y*sd)

Obviously the combinations of all this needs to be considered for performance: a great password-matching blacklist does nothing if it takes too long to process and the page times out.

I suggest a first cut on this would be literal string matching, with later extensions to include substrings (with padding limits) and reversals.

What do you think, @coudot, do you find utility in having a blacklist? (I am confident that a lot of my authenticating-users will curse at me for enabling this :-)

@aurbjo, if he likes the idea (and doesn't have a starting point), it might help if you found an acceptable starting point for the list. Think: freely distributed, possibly updated (not antiquated), small enough, and easily parsed (perhaps just a literal text file, one-per-line).

@coudot coudot self-assigned this Oct 22, 2016
@coudot
Copy link
Member

coudot commented Oct 22, 2016

Yes, we could indeed use a simple text file as a blacklist, this file could then be updated by any means outside SSP. We can provide a simple blacklist at installation.

@coudot coudot added this to the 1.1 milestone Oct 22, 2016
@r2evans
Copy link

r2evans commented Oct 22, 2016

As a thought, would it be a bit robust to use something like cracklib for checking more than just a dictionary? It supports more than just blacklists, though I admit I don't know how to configure its various checks.

@aurbjo
Copy link
Author

aurbjo commented Oct 22, 2016

In my case i am just thinking about a simple list like @coudot is suggesting, this would deny users setting passwords like "qwerty12345", "password123" and so on. I also see users that have their company name and the current year in their password and it's not a good practice.

@coudot
Copy link
Member

coudot commented Oct 22, 2016

Cracklib usage can be another feature.

By the way, the support of cracklib in PHP seems experimental: http://php.net/manual/en/ref.crack.php

@r2evans
Copy link

r2evans commented Oct 22, 2016

(Essentially the same link I posted earlier.) Yes, and furthermore it was removed from base-php (around php-5.0) and pushed to be an extension, so that says that it is not something everyone will need but it has been in the ecosystem for many php versions.

I'll counter with this: I recommend against reinventing the wheel. Even if it has more features than originally suggested, php-crack has fast lookups and attempts to address the performance issues that large lists will bring. As a bonus, it can comment on password-strength (admittedly a contentious and subjective metric). The biggest detriment in my mind is that it isn't a flat text file (it needs to be compiled into the files, as commented on the link you provided), so in that vein it has a little more required investment on the part of the user.

If you think php-crack is too much (for now), I suggest finding another already-existing php package that permits fast lookups in a text file. There's no need to reinvent "lookups" (and therefore less work for coudot to test and maintain).

@coudot
Copy link
Member

coudot commented Oct 22, 2016

Indeed, sorry, I did not see the previous link.

If some of you want to propose some code, I'll be happy to review it.

@r2evans
Copy link

r2evans commented Oct 23, 2016

@coudot, do you have a sample LDAP image (e.g., docker) for testing? It's much easier when testing on something other than a production environment.

@r2evans
Copy link

r2evans commented Oct 24, 2016

Hehe, @coudot, I guess I didn't read all of the page I originally provided ... last "update" to php-crack was in 2005. Doesn't really inspire confidence ...

So perhaps the wheel can be reinvented a little here, simplistically.

Some thoughts, let me know if you have reservations:

  • Using $out = system("grep -q ...", $retval). My primary motivation is efficiency, since I know of no methods that will be able to do yes/no pattern "a match exists" as quickly as grep and family. Using -q means that it can short-circuit: as soon as a match is found, it returns with no other searching, perhaps a slight efficiency. (This also means we need to look at $retval instead of $out.)
  • Considering allowing compressed lists, thereby requiring zgrep and bzgrep; this can be determined by filename and runtime.
  • In order to protect against code-injection and/or command-injection when using system(), I'm thinking it may be safer to write the new password into a temp file and use grep -q -F -f ptnfile dictfile (the -F is to ensure it is used as a literal string so that regex chars aren't mistakenly taken as patterns).

@plewin
Copy link
Member

plewin commented Oct 24, 2016

If the goal is to implement a blacklist of passwords (and not a blacklist of words) in efficient way I would suggest filtering out the lines not conforming to the password policy, save the list sorted in a cache file, then read the cache file and find if the password is present using a binary search.

For example banning all passwords of this list of "one million of the most common passwords" ~ 9 Mio uncompressed

  1. load the whole file with file() (< 70 Mio RAM usage in php 7)
  2. filter out all passwords not conforming to password policy
  3. sort the array (can be already sorted)
  4. serialize() (serialized string < 24 Mio RAM in php 7 if nothing is filtered out)
  5. dump to file blacklistcache.txt

a) read and unserialize blacklistcache.txt (I'm not sure if file_get_contents + unserialize is better than a file() EDIT: no it's not, see benchmark below);
b) perform the binary search with a function adapted/similar to this one

edit: renamed steps
edit2: added "no it's not, see benchmark below"

@r2evans
Copy link

r2evans commented Oct 24, 2016

@plewin, you suggest doing all of that for each password change? I haven't benchmarked it, but that seems a bit extraneous. Perhaps the first five steps could be avoided since the program is already checking if the new password meets the password policy (otherwise no dictionary lookup is required).

I understand the suggestion to use a binary search; assuming that the file is sorted (which, given your first five, would be assured), this would most certainly be faster. However, are you sure even this will be faster than using an external grep?

Back to your first point: the OP did request blacklisting passwords; I wonder if it makes sense to be that literal, since defeating it is as simple as adding one character to the password (from password123 to password123! might work).

Thanks! I'll see if I can muster up some benchmarking between external grep and an in-memory binary search in PHP.

@plewin
Copy link
Member

plewin commented Oct 24, 2016

@r2evans Steps 1, 2, 3, 4, 5 are done only one time to compile the cache. It will require a script that can be launched manually or automatically if we detect that the cache is older than the dictionnary. Steps a) b) must be done each time there is a password change and only after the new password is valid according to the password policy.

With very basic tests on my side it looks like the binary search idea is not a such good idea after all. It's better only if the dictionnary is already in ram or the call of the file() to load the dictionnary in php is faster than spawning grep with exec.

Is it okay to add a dependency on a external command ? exec isn't portable.

IMHO it's okay if "password123!" passes the blacklist with "password123". An attacker would use the dictionnary, if "password123!" isn't in a dictionnary of the N most common passwords it should be ok to use it.

@r2evans
Copy link

r2evans commented Oct 24, 2016

I understand your description of when things are done, and it's certainly feasible to "invalidate the cache" and regenerate it as needed. I hope it isn't a time-intensive thing, though this can be mitigated several ways. I'm not worried about this part of it.

Your question of external dependencies is appropriate, but I think having grep available is not too much to ask. Even on "alpine", {,e,f}grep is provided by busybox, and that supports all of the options I was considering (-q, -f, and -F). The only problem is on a windows server, in which case there needs to be a note that something like Git-for-Windows or GnuWin32 is needed to get this extension to work. (Note my emphasis: this is an extension to SSP, not a core function, though some may feel it is important enough to be considered "core".)

Your last comment, though correct, is based on the assumption that the attackers are using the same dictionary. This is great justification for something like cracklib, where the strength of the password is tested, including presence of dictionary words. This is a different tact than a password blacklist, and I think the two are very different mindsets (not to mention implementations). So for a "blacklist", you are right, it should be accepted or explicitly added to the blacklist. (Anybody volunteer to maintain php-crack, an extension that hasn't been touched in 11 years?) For this discussion, I'll lean towards "whole-line filtering", a literal blacklist instead of a quasi-pattern blacklist.

@plewin
Copy link
Member

plewin commented Oct 24, 2016

I made a little benchmark comparing several possibilities on my ubuntu linux, php 7, i5-2430M CPU @ 2.40GHz + ssd.

file + in_array is the fastest until around 100 000 passwords
At 1 000 000 passwords, grep is faster than file + in_array in a factor of ~4
At ~3 700 000 grep is faster in a factor of ~ 10 ?

Traversing the file of 3 700 000 passwords took grep 37 ms (with no options), file + in_array 381 ms, stream_get_line 773 ms.
Problem with in_array : its consumes at lot of memory, for the last file php peak memory usage was 276 Mio.
stream_get_line is the best alternative to grep and file+in_array but there is a drawback : line endings must be fixed.

Results
Code

@r2evans
Copy link

r2evans commented Oct 24, 2016

Not sure you mean, "line endings must be fixed"?

Nice benchmarking, thanks for that. The response times (for other than grep) are better than I expected, I'm assuming that difference is due to a fork overhead (?). I agree that, though not likely to trigger an OOM error, 276 Mio is a bit onerous.

@coudot
Copy link
Member

coudot commented Oct 24, 2016

Thanks a lot for this work. Seems we can choose to use grep with escapeshell functions in order to have some security when invoking the command. And this feature will be disabled by default.

We will then add documentation to show how download a password list and use it in SSP.

@plewin
Copy link
Member

plewin commented Oct 24, 2016

@r2evans Sorry for the confusing words. stream_get_line requires that all line endings are the same and coherent in the file. You must be sure of lines endings (\n or \r\n) and give it as an argument of stream_get_line overwise trimming will be required and there will be a performance penalty like in the fgets vs fgets_rtrim benchmark.

@r2evans
Copy link

r2evans commented Oct 24, 2016

@plewin, thanks, that's what I suspected.

@coudot: absolutely, disabled by default. Any feedback on a testing image/environment?

@coudot
Copy link
Member

coudot commented Oct 24, 2016

@r2evans there is no docker image but it should be easy to build one. Anyway you can use any virtualization tool and install SSP with packages. You need a simple LDAP directory with a user account to test the password change.

@plewin
Copy link
Member

plewin commented Oct 24, 2016

@r2evans if you need a ldap directory for development you can use the embedded ldap server in Apache Directory Studio like in this tutorial http://www.stefan-seelmann.de/blog/setting-up-an-ldap-server-for-your-development-environment

@r2evans
Copy link

r2evans commented Oct 24, 2016

Both are good ideas. I also found rroemhild/docker-test-openldap which provides half of what I was looking for, and I think I can use one of the docker hub php images (partly because it comes prepopulated with data, that much simpler). Between your suggestions and this, I think I can get something working.

@coudot
Copy link
Member

coudot commented Oct 26, 2016

Someone told me about passwqc, and here is an PHP implementation: https://github.com/helver/PHP_passwdqc_check/blob/master/PHP_passwdqc_check/PasswordStrengthTest.inc

They provide a 4k words list and use PHP array methods to check it. Not sure it's the best way but it is interesting to read their code.

@coudot
Copy link
Member

coudot commented Oct 28, 2016

See also https://github.com/bjeavons/zxcvbn-php

@coudot coudot modified the milestones: Future, 1.1 Mar 24, 2017
@SyBernot
Copy link

SyBernot commented Mar 1, 2018

You can get a list of about a half billion known compromised passwords from here:
https://haveibeenpwned.com/Passwords
or you could just hook into his API, personally I don't trust anyone so doing it locally seems more secure even if you are only sending a hash.

@plewin
Copy link
Member

plewin commented Mar 1, 2018

I like this repository of passwords, it is from OWASP

https://github.com/danielmiessler/SecLists/tree/master/Passwords

Note for my 2.0 proposal:
I have implemented the old password strength checker + a dictionary checker with grep + Zxcvbn.
Multiple can be used in the same time and one can add a class to implement a new one.
https://github.com/plewin/self-service-password/tree/575c884d8bcd0022ed0f304da2307c509ea03c7a/src/PasswordStrengthChecker

@r2evans
Copy link

r2evans commented Mar 16, 2018

I second @SyBernot's mention of Have I Been Pwned, both as an API and as an optional local-copy. Though I don't think I would download it (at least not initially), I can definitely see utility in supporting both mechanisms. NextCloud just adopted its "range" API function, in which the caller sends only the first 5 chars of the SHA1 hash, and the API sends back all matching hashes and their respective counts. I'm not entirely certain how much security it adds, but the frequency of use and bandwidth overhead don't seem to be too high.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants