feat(training): add some new features #19

kofny · 2020-07-14T04:07:14Z

l33t, l33t is now used in structure parser
multiword, using DFS find all possible forms of multi words of
a password and pick the one with largest probability
context: add some comtext sensitive strings
digits and others: using multiwords too
comments E, W tag
add requirements
add monte carlo method. password_scorer can not give us correct
probability, because a password can be generated several time.
for example, 1q2w3e4r may be treated as K8 or K7A1.

fixes: #18

- l33t, now we can use DFS to find all possible unleeted cases, we can also use l33t.ignore / l33t.found to early accept or reject to speedup the process of finding l33ts - multiwords. Now we can split digits and others to multiwords. And we can find all possible Multi words compositions and pick one with largest probability by DFS - context. Add some fixed collocations to context detector. - monte carlo. Add monte carlo method. Note that a password may be given several probabilities, therefore I try to use as many as possible structures to find the largest probability of a given password. The result may not accurate. However, I test that under 10^10 guesses, the error is less than 0.1% between using Monte Carlo and actually generate 10^10 candidate passwords (used 5 datasets, nearly 100Millon).

- the logic of codes is not changed

kofny · 2020-07-24T05:06:36Z

feat(training): l33t, multiwords, context sensitive

I'm still on the way to master English. Sorry for my poor presentation and documentations.

Note that I didn't change any files of your repo except for .gitignore. What I did is to add new files into the repo.

You can reject the Pull Request I committed last time.

The usage of newly added features are similar to original ones. The difference is that sections_list will not be changed in place. You will get a new instance of sections_list.

Usages

MyMultiwordDetector: interfaces are the same.
AsciiL33tDetector: use init_l33t method after we get an instance of PCFG_Parser.
Note that AsciiL33tDetector depends on MyMultiwordDetector.
my_context_detection: the same as original one.

Changes

l33t, now we can use DFS to find all possible unleeted cases, we can
also use l33t.ignore / l33t.found to early accept or reject to
speedup the process of finding l33ts
multiwords. Now we can split digits and others to multiwords.
And we can find all possible Multi words compositions and pick one
with largest probability by DFS
context. Add some fixed collocations to context detector.
monte carlo. Add monte carlo method. Note that a password may be given
several probabilities, therefore I try to use as many as possible
structures to find the largest probability of a given password. The
result may not accurate. However, I test that under 10^10 guesses, the
error is less than 0.1% between using Monte Carlo and actually
generate 10^10 candidate passwords (used 5 datasets, nearly 100Millon).
docs. add requirements.txt

fixes: #18

- the cause if that I changed the return value of extract_l33t, however, I forget to change corresponding codes in parse(password: str), Therefore, sorted(l33t_list, key=lambda x: x[1]) will not give us correct result.

- following operation of v4.1 and v4.1-with-l33t. - intuitive.

- fix a bug in cs detection - add corresponding test case - add cli to segmntr

kofny · 2020-07-30T03:07:45Z

Some Known Bugs Fixed

l33t related codes will resort the password in an incorrect way because I changed the return value but didn't change corresponding codes using this returned value. Now this bug is fixed.

context_sensitive_detection has a bug. I fixed it and added corresponding test case to your codes. This is the only change I made to your codes. The other changes are all in new files and they won't affect your original codes.

Segmntr

Parsing passwords in test set and we can see what the structure of a password is.
And we can use the output of this Segmntr to see whether our trainer has any bugs intuitively.

fixes: #18

lakiw · 2020-07-30T13:17:18Z

I'm really impressed by these changes and I apologize as I had some things pop up in my personal life that have limited my ability to focus on this. That being said, I really would like to get these changes integrated by next Thursday in time for Defcon, as well as the Crack Me If You Can competition. Also your English is great so no need to apologize for that. I appreciate the work you have put into this, as there are some features here I am very excited about. As a heads up, I will likely accept your pull request, but then push another commit that temporarily disable some of your additions. Then I can slowly add those features back in as I have more time to test and understand them. I'll make sure I give you credit for these features, but since other people download and use this toolset I want to ensure I have a good handle it won't break any of the other tools such as the guesser and prince-ling.

1) Modified the unittests to support changes to the context_sensitive wordlist 2) Found a legacy bug in website_detection that had nothing to do with this pull request. lakiw#3) Moved entries around in the .gitignore to group the different types of ignore rules lakiw#4) Removed the version requirements from chardet in requirements.txt

lakiw · 2020-07-31T03:57:59Z

I apologize once again, as when I started really going through your code I realized you had already done most of what I was thinking about with moving many of the changes into new programs for people to run. I really like what I see, and by doing things like running the unit-tests again I found some errors I wasn't aware of in my own code (nothing to do with your additions).

kofny force-pushed the master branch from 6d68729 to ae6190f Compare July 24, 2020 04:23

docs(comments): add some comments to codes

d779ed8

- the logic of codes is not changed

kofny added 3 commits July 30, 2020 09:55

fix(l33t): fix a bug of passing l33t_list in parse(password: str)

b268b8d

- the cause if that I changed the return value of extract_l33t, however, I forget to change corresponding codes in parse(password: str), Therefore, sorted(l33t_list, key=lambda x: x[1]) will not give us correct result.

feat(segmntr): add a file to segment test sets.

2fee774

- following operation of v4.1 and v4.1-with-l33t. - intuitive.

fix(cs): fix known bugs in context_sensitive detection

8854cbf

- fix a bug in cs detection - add corresponding test case - add cli to segmntr

Updated the version number

064a073

lakiw merged commit 9b5affe into lakiw:master Jul 31, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(training): add some new features #19

feat(training): add some new features #19

kofny commented Jul 14, 2020

kofny commented Jul 24, 2020

kofny commented Jul 30, 2020 •

edited

lakiw commented Jul 30, 2020

lakiw commented Jul 31, 2020

feat(training): add some new features #19

feat(training): add some new features #19

Conversation

kofny commented Jul 14, 2020

kofny commented Jul 24, 2020

feat(training): l33t, multiwords, context sensitive

Usages

Changes

kofny commented Jul 30, 2020 • edited

Some Known Bugs Fixed

Segmntr

lakiw commented Jul 30, 2020

lakiw commented Jul 31, 2020

kofny commented Jul 30, 2020 •

edited