Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(training): add some new features #19

Merged
merged 7 commits into from Jul 31, 2020
Merged

feat(training): add some new features #19

merged 7 commits into from Jul 31, 2020

Conversation

kofny
Copy link
Contributor

@kofny kofny commented Jul 14, 2020

  • l33t, l33t is now used in structure parser
  • multiword, using DFS find all possible forms of multi words of
    a password and pick the one with largest probability
  • context: add some comtext sensitive strings
  • digits and others: using multiwords too
  • comments E, W tag
  • add requirements
  • add monte carlo method. password_scorer can not give us correct
    probability, because a password can be generated several time.
    for example, 1q2w3e4r may be treated as K8 or K7A1.

fixes: #18

- l33t, now we can use DFS to find all possible unleeted cases, we can
  also use l33t.ignore / l33t.found to early accept or reject to
  speedup the process of finding l33ts
- multiwords. Now we can split digits and others to multiwords.
  And we can find all possible Multi words compositions and pick one
  with largest probability by DFS
- context. Add some fixed collocations to context detector.
- monte carlo. Add monte carlo method. Note that a password may be given
  several probabilities, therefore I try to use as many as possible
  structures to find the largest probability of a given password. The
  result may not accurate. However, I test that under 10^10 guesses, the
  error is less than 0.1% between using Monte Carlo and actually
  generate 10^10 candidate passwords (used 5 datasets, nearly 100Millon).
- the logic of codes is not changed
@kofny
Copy link
Contributor Author

kofny commented Jul 24, 2020

feat(training): l33t, multiwords, context sensitive

I'm still on the way to master English. Sorry for my poor presentation and documentations.

Note that I didn't change any files of your repo except for .gitignore. What I did is to add new files into the repo.

You can reject the Pull Request I committed last time.

The usage of newly added features are similar to original ones. The difference is that sections_list will not be changed in place. You will get a new instance of sections_list.

Usages

  • MyMultiwordDetector: interfaces are the same.
  • AsciiL33tDetector: use init_l33t method after we get an instance of PCFG_Parser.
    Note that AsciiL33tDetector depends on MyMultiwordDetector.
  • my_context_detection: the same as original one.

Changes

  • l33t, now we can use DFS to find all possible unleeted cases, we can
    also use l33t.ignore / l33t.found to early accept or reject to
    speedup the process of finding l33ts
  • multiwords. Now we can split digits and others to multiwords.
    And we can find all possible Multi words compositions and pick one
    with largest probability by DFS
  • context. Add some fixed collocations to context detector.
  • monte carlo. Add monte carlo method. Note that a password may be given
    several probabilities, therefore I try to use as many as possible
    structures to find the largest probability of a given password. The
    result may not accurate. However, I test that under 10^10 guesses, the
    error is less than 0.1% between using Monte Carlo and actually
    generate 10^10 candidate passwords (used 5 datasets, nearly 100Millon).
  • docs. add requirements.txt

fixes: #18

- the cause if that I changed the return value of extract_l33t, however,
  I forget to change corresponding codes in parse(password: str),
  Therefore, sorted(l33t_list, key=lambda x: x[1]) will not give us
  correct result.
- following operation of v4.1 and v4.1-with-l33t.
- intuitive.
- fix a bug in cs detection
- add corresponding test case
- add cli to segmntr
@kofny
Copy link
Contributor Author

kofny commented Jul 30, 2020

Some Known Bugs Fixed

l33t related codes will resort the password in an incorrect way because I changed the return value but didn't change corresponding codes using this returned value. Now this bug is fixed.

context_sensitive_detection has a bug. I fixed it and added corresponding test case to your codes. This is the only change I made to your codes. The other changes are all in new files and they won't affect your original codes.

Segmntr

Parsing passwords in test set and we can see what the structure of a password is.
And we can use the output of this Segmntr to see whether our trainer has any bugs intuitively.

fixes: #18

@lakiw
Copy link
Owner

lakiw commented Jul 30, 2020

I'm really impressed by these changes and I apologize as I had some things pop up in my personal life that have limited my ability to focus on this. That being said, I really would like to get these changes integrated by next Thursday in time for Defcon, as well as the Crack Me If You Can competition. Also your English is great so no need to apologize for that. I appreciate the work you have put into this, as there are some features here I am very excited about. As a heads up, I will likely accept your pull request, but then push another commit that temporarily disable some of your additions. Then I can slowly add those features back in as I have more time to test and understand them. I'll make sure I give you credit for these features, but since other people download and use this toolset I want to ensure I have a good handle it won't break any of the other tools such as the guesser and prince-ling.

1) Modified the unittests to support changes to the context_sensitive wordlist
2) Found a legacy bug in website_detection that had nothing to do with this pull request.
lakiw#3) Moved entries around in the .gitignore to group the different types of ignore rules
lakiw#4) Removed the version requirements from chardet in requirements.txt
@lakiw
Copy link
Owner

lakiw commented Jul 31, 2020

I apologize once again, as when I started really going through your code I realized you had already done most of what I was thinking about with moving many of the changes into new programs for people to run. I really like what I see, and by doing things like running the unit-tests again I found some errors I wasn't aware of in my own code (nothing to do with your additions).

@lakiw lakiw merged commit 9b5affe into lakiw:master Jul 31, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

l33t, multiwords(DFS), monte_carlo method
2 participants