Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ztex: ability for --fork #4027

Merged
merged 1 commit into from Jun 11, 2019
Merged

ztex: ability for --fork #4027

merged 1 commit into from Jun 11, 2019

Conversation

Apingis
Copy link
Collaborator

@Apingis Apingis commented Jun 11, 2019

  • added --fork option
  • improved bitstream upload: after upload, checks DONE bit, retries if upload was unsuccessful
  • small fixes

@solardiz
Copy link
Member

solardiz commented Jun 11, 2019

Thanks!

* improved bitstream upload: after upload, checks DONE bit, retries if upload was unsuccessful

Is this going to fix a particular issue someone reported to us?

Copy link
Member

@solardiz solardiz left a comment

I haven't yet tested this and I'm not entirely happy with hard-coded usleep in two places, but this is desired functionality and I'll test it after merging. Thanks!

@solardiz solardiz merged commit 18c3c44 into openwall:bleeding-jumbo Jun 11, 2019
7 of 8 checks passed
@Apingis
Copy link
Collaborator Author

Apingis commented Jun 11, 2019

Is this going to fix a particular issue someone reported to us?

Not exactly. Failed bitstream upload happen timely, that was reported and I seen that myself, however that never was a major issue.

@solardiz
Copy link
Member

solardiz commented Jun 14, 2019

I am testing this with:

./john -mask='?l?l?l?l' -format=bcrypt-ztex -verb=1 pw-fake-len4

like I did in my john-users postings, where this is expected to crack all 239 hashes. They're bcrypt cost 5.

On 4 boards without --fork, I am getting 471k c/s before the run gets to its final non-full batch of buffered candidates (then speed drops), and this cracks all 239. No errors, no timeouts.

With --fork=4, I am initially getting 4x 119k+ c/s, so 476k+ c/s total, which is about 1% higher than the speed was without --fork. However, I start getting frequent errors (no timeouts, but errors, all of them were error: pkt_comm_status=0x04, debug=0x0000 on different boards and FPGAs), and the resulting speed is lower and what's worse the run cracked only 217 out of 239. Once it terminated, trying to use the boards without --fork reproducibly gave me Self test failed (cmp_one(1)) - I tried 5 times, getting this exact error each time, and ended up power-cycling the boards, after which they worked fine again.

With --fork=4, but clock rate reduced from 150 to 141 MHz, I got many errors again (mostly pkt_comm_status=0x05, debug=0x0000, but also some error: pkt_comm_status=0x57, debug=0x7fff), yet this run cracked all 239 and didn't result in any issues running a further bcrypt-ztex attack immediately afterwards (no self-test failures).

With --fork=2 and back at 150 MHz, I got speeds of 2x237.5k = 475k and I got 2 instances of error: pkt_comm_status=0x04, debug=0x0000 (down from 10 or so for the --fork=4 runs) and all 239 cracked.

So the functionality added here appears to be working, but it appears to expose other issues somewhere else. Per my earlier testing with manual concurrent runs using the different boards, things might be better for other hash types, costs, or attacks.

@solardiz
Copy link
Member

solardiz commented Jun 14, 2019

Outputting the same mask's candidates into a text file and using that as wordlist, I am getting 463k c/s on 4 boards without --fork. This is worse than the speed with --mask during the run, but better than the average speed with --mask for the entire run (including the final non-full batch). The run's duration is worse than it was with --mask, though, perhaps because we're testing all candidates in the wordlist against each salt first, whereas our mask's order of characters was smart.

Using the same wordlist with --fork=4, I am getting 4x118k+ = ~473k.

No errors and all 239 cracked either way.

Can there be an on-device mask specific problem that --fork exposes or creates, @Apingis? Can you try to reproduce this, please? You can recreate the file using:

egrep '^([^:]*:){4}[a-z]{4}:' pw-fake-unix > pw-fake-len4

where pw-fake-unix can be downloaded at https://openwall.info/wiki/john/sample-hashes#Sample-password-hash-files

@solardiz
Copy link
Member

solardiz commented Jun 17, 2019

After this PR, JtR no longer prints the FPGA clock rates. Previously, this was printed by jtr_device_list_print at default verbosity or higher, but now this function became dead code. jtr_device_list_print_count is also dead code now. I think this is a bug, @Apingis.

@solardiz
Copy link
Member

solardiz commented Jun 17, 2019

We should document the ability to use "--fork" with "-ztex" formats in README-ZTEX, along with when to use it and when not to.

@solardiz
Copy link
Member

solardiz commented Jun 22, 2019

@Apingis I expect your reply on my comments above, and fixes for at least some of the issues. Thanks!

@Apingis
Copy link
Collaborator Author

Apingis commented Jul 2, 2019

The following subjects will be addressed in the next PR:

  • print FPGA clock settings
  • document --fork in README-ZTEX

Regarding more frequent errors with --fork. Per my testing, indeed errors seem to be more frequent (but still below the threshold where it substantially affects c/s).

I was not able to reproduce the specific case you mentioned. I'm running (with 2 boards)

./john -mask='?l?l?l?l' -format=bcrypt-ztex -verb=1 pw-fake-len4 --fork=2

many runs one after another, all times it finds 239 guesses, 1 error in 3-5 runs.

@solardiz
Copy link
Member

solardiz commented Jul 2, 2019

Thanks @Apingis. Maybe mention in README-ZTEX that --fork might result in more frequent communication(?) errors, especially at high forked process counts.

@roycewilliams
Copy link

roycewilliams commented Aug 3, 2020

Searching for something else, I was surprised to discover this interesting --fork support work for ZTEX! (There's no mention of this in README-ZTEX at this writing.)

I did a quick test and it does provide performance gains for my cluster. I will note this in my post of the test results.

@solardiz
Copy link
Member

solardiz commented Aug 3, 2020

@roycewilliams Cool. FWIW, I did mention it to you in #3807 (comment)

Looks like @Apingis never sent that planned "next PR". Royce, maybe you can take care of it - "print FPGA clock settings", "document --fork in README-ZTEX"? Or if you're uncomfortable with code changes, then just the documentation. It'd be nice to have PRs from you. ;-) Thanks!

Edit: we also need to mention this new feature in NEWS. Can be part of the same PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants