Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve rule stacking #4014

Open
magnumripper opened this issue May 30, 2019 · 3 comments
Open

Improve rule stacking #4014

magnumripper opened this issue May 30, 2019 · 3 comments

Comments

@magnumripper
Copy link
Member

Current rule stacking actually runs the preprocessor as well as the rule engine twice, which can't help performance. I'm thinking we could instead concatenate the two rules before PP and only run things once although that will involve less trivial things for throwing things around (eg. if the second rule has reject-flags they must be moved to just after the first rule's reject flags (if any) and/or remove dupes of such.

Example, rule1 ^[A-Z] and rule2 Az"[0-9][0-9]"

Current flow (first entry from PP from respective rule):

"word" -> apply ^A -> "Aword" -> apply Az"00" -> "Aword00"

New flow, single rule, PP ^[A-Z] Az"[0-9][0-9]"

"word" -> apply ^A Az"00" -> "Aword00"

That was easy. Now something less trivial:

Rule1 -p -c (?a 2 (?a c 1 [cl] and rule2 -c /?v V Q

Current flow (single mode, 2 words)

"john"+"smith" -> -p -c (?a 2 (?a c 1 c -> "JohnSmith" -> -c /?v V Q -> "JoHNSMiTH"

New flow, concat -p -c (?a 2 (?a c 1 [cl].+ M.-c /?v V Q (M always added between them, and in this case also + because we have 1 and/or 2 in the first rule) and then dropping the second -c results in -p -c (?a 2 (?a c 1 [cl] + M /?v V Q

"john"+"smith" -> -p -c (?a 2 (?a c 1 c + M /?v V Q -> "JoHNSMiTH"

A drawback with doing/fixing this is that current --rules-stack run first word through all rules before going on to next word (normal rules do the opposite). This is sometimes wanted.

@solardiz
Copy link
Member

Current rule stacking actually runs the preprocessor as well as the rule engine twice, which can't help performance. I'm thinking we could instead concatenate the two rules before PP and only run things once

Wouldn't that one run be on twice longer rules, and thus of roughly the same total processing cost?

Overall, what you suggest here sounds to me like introducing complexity for no gain. But I could be wrong, especially given that I'm not very familiar with the current rule stacking.

@magnumripper
Copy link
Member Author

Wouldn't that one run be on twice longer rules, and thus of roughly the same total processing cost?

Maybe, I'm not sure yet.

Overall, what you suggest here sounds to me like introducing complexity for no gain. But I could be wrong, especially given that I'm not very familiar with the current rule stacking.

Hopefully you are right. I just want to look at it when I get the time.

@solardiz
Copy link
Member

BTW, don't we still run the preprocessor + rule engine across all rules an extra time just to pre-check the rules' syntax? I certainly do that in core. If we care about speeding things like this up, we should add a way to (partially) skip this checking - e.g., a john.conf setting limiting this checking to only the first N ruleset lines (pre-pp) or/and rules (post-pp) - e.g., 1 million by default. This would speed up startup, but would allow allow for postponed failure. Speaking of which, we might then also have a setting to make such failure non-fatal (optionally only when the failure is a postponed one? make the setting a tri-state? this gets tricky).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants