New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature: show time spent - setup, guessing, total (human and machine readable) #1209

Open
roycewilliams opened this Issue Mar 26, 2017 · 2 comments

Comments

Projects
None yet
2 participants
@roycewilliams
Contributor

roycewilliams commented Mar 26, 2017

A recent forum post was interested in benchmarking wordlist mode.

I wanted to show that hashcat's guessing speed is so fast that it can often spend more time caching a large dictionary than guessing. To show this, I calculated the relative times by hand.

I then realized that it might be useful to show this all the time. This would support similar analysis on an ongoing basis.

For example (using crackstation for its size, not for its quality), hashcat could add more timing information, in both human and machine-readable forms (with new information in bold) as follows, using the real times that I calculated:

$ echo en 'hashcathashcat' | md5sum | awk '{print $1}'
38f63c3a621ad108457843529feec46a

$ ./hashcat -m 0 -a 0 -w 4 38f63c3a621ad108457843529feec46a crackstation.txt
hashcat (v3.40-74-g368f8b39) starting...

[snip showing interim progress]

Dictionary cache building crackstation.txt: 12822195310 bytes (81.69%)

[snip]

Dictionary cache built:

  • Filename..: crackstation.txt
  • Passwords.: 1212356398
  • Bytes.....: 15696118781
  • Keyspace..: 1196843344
  • Time......: 143s (2 mins, 23 secs)

[s]tatus [p]ause [r]esume [b]ypass [c]heckpoint [q]uit =>

Session..........: hashcat
Status...........: Exhausted
Hash.Type........: MD5
Hash.Target......: 38f63c3a621ad108457843529feec46a
Time.Started.....: Sun Mar 26 06:38:05 2017 (1 min, 6 secs)
Time.Estimated...: Sun Mar 26 06:39:11 2017 (0 secs)

[snip]

Started: Sun Mar 26 06:34:53 2017 (epoch: 1490510093)
Stopped: Sun Mar 26 06:39:14 2017 (epoch: 1490510354)

Dictionary caching: 143s (54.79% - 2 mins, 23 secs)
Other setup: 52s (19.92% - 52 secs)
Total setup: 195s (74.71% - 3 mins, 15 secs)
Guessing: 66s (25.29% - 1 min, 6 secs )
Total time: 261s (4 mins, 21 secs)

For a simple implementation, simple difference in wallclock time could be used. The "Time.Started" status line already uses a routine that converts elapsed time into hours, minutes, seconds. etc. This routine could be reused.

As a nice-to-have, if the user pauses, or quits and restores, the time spent stopped would need to be deducted.

Showing the epoch is also a nice-to-have, to help people quickly calculate times relative to other activities outside of this specific run. But I'd be happy to skip that.

The fields could be rearranged, or the times could be reordered, if there is a better way.

@jsteube jsteube added the new feature label Mar 28, 2017

@jsteube

This comment has been minimized.

Show comment
Hide comment
@jsteube

jsteube May 23, 2017

Member

While I was trying to implement this I've run into two major problems that needs to be discussed first.

  • The first problem is that hashcat is also available as a library. That means after executing hashcat_init() (done only one time) there is an unknown number of subsequent calls to hashcat_session_init() following. Reason here is because the hashcat_init() initializes the for example the OpenCL subsystem which doesn't change while the calling process lives, but it can also take some time to initialize, depending on the devices and runtimes. The main time is of course the hashcat_session_init() but I wanted to show that there's different layer types that both can create some time to execute but the given example output does not reflect that.

  • The second problem is that hashcat (both binary and library) support Queues that are used whenever the user chooses wordlist folders, mask files, the -i switch, etc. This adds another 4 more layers to the system (init, outer loop, inner loop 1, inner loop 2). Each of those layers can call time consuming functions. For example the outer loop initializes the hashes itself and everything that depends on the hashes like potfile comparisons and bitmaps, OpenCL kernels and weak hash check. in inner loop2 it runs the autotune and the final cracking loop. The given example output does not reflect that as well.

And then there's another minor question: Do we want to export the values to hashcat_status_t or just to status_ctx_t? The hashcat_status_t is the "normalized" structure that can be used from the user API but it really creates more work.

@roycewilliams The original request needs to be rewritten with respect due to the different layers. Any Idea?

Member

jsteube commented May 23, 2017

While I was trying to implement this I've run into two major problems that needs to be discussed first.

  • The first problem is that hashcat is also available as a library. That means after executing hashcat_init() (done only one time) there is an unknown number of subsequent calls to hashcat_session_init() following. Reason here is because the hashcat_init() initializes the for example the OpenCL subsystem which doesn't change while the calling process lives, but it can also take some time to initialize, depending on the devices and runtimes. The main time is of course the hashcat_session_init() but I wanted to show that there's different layer types that both can create some time to execute but the given example output does not reflect that.

  • The second problem is that hashcat (both binary and library) support Queues that are used whenever the user chooses wordlist folders, mask files, the -i switch, etc. This adds another 4 more layers to the system (init, outer loop, inner loop 1, inner loop 2). Each of those layers can call time consuming functions. For example the outer loop initializes the hashes itself and everything that depends on the hashes like potfile comparisons and bitmaps, OpenCL kernels and weak hash check. in inner loop2 it runs the autotune and the final cracking loop. The given example output does not reflect that as well.

And then there's another minor question: Do we want to export the values to hashcat_status_t or just to status_ctx_t? The hashcat_status_t is the "normalized" structure that can be used from the user API but it really creates more work.

@roycewilliams The original request needs to be rewritten with respect due to the different layers. Any Idea?

@roycewilliams

This comment has been minimized.

Show comment
Hide comment
@roycewilliams

roycewilliams May 23, 2017

Contributor

@jsteube - First of all, thanks for considering this!

OK, I can't pretend to understand the underlying complexity. :) I'm fine with you changing the format from the original request into whatever works better. If we can skip some complexity of the layers in this pass, I would say go for it.

Not knowing any better, I assume that we can easily know these things:

  • Literal start and stop time of hashcat itself
  • Literal start and stop time of dictionary caching
  • Literal start and stop time of guessing

Naively, from these, I would think that we can calculate all of the values I was curious about:

Dictionary caching = dict stop - dict start
Other setup = total setup - dictionary caching
Total setup = literal start - guessing start
Guessing = guessing stop - guessing start
Total time = literal stop - literal start

As to how to export the values, is the additional work for using hashcat_status_t complex work, or just boring work? If it's boring work, I would expect that it would be useful to have this info from the API, and it would be worth the cost.

I trust your judgment on the right balance between functionality and effort.

Contributor

roycewilliams commented May 23, 2017

@jsteube - First of all, thanks for considering this!

OK, I can't pretend to understand the underlying complexity. :) I'm fine with you changing the format from the original request into whatever works better. If we can skip some complexity of the layers in this pass, I would say go for it.

Not knowing any better, I assume that we can easily know these things:

  • Literal start and stop time of hashcat itself
  • Literal start and stop time of dictionary caching
  • Literal start and stop time of guessing

Naively, from these, I would think that we can calculate all of the values I was curious about:

Dictionary caching = dict stop - dict start
Other setup = total setup - dictionary caching
Total setup = literal start - guessing start
Guessing = guessing stop - guessing start
Total time = literal stop - literal start

As to how to export the values, is the additional work for using hashcat_status_t complex work, or just boring work? If it's boring work, I would expect that it would be useful to have this info from the API, and it would be worth the cost.

I trust your judgment on the right balance between functionality and effort.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment