Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shared mutable state is not thread safe. #663

Closed
ioquatix opened this issue May 4, 2020 · 12 comments
Closed

Shared mutable state is not thread safe. #663

ioquatix opened this issue May 4, 2020 · 12 comments
Assignees

Comments

@ioquatix
Copy link

ioquatix commented May 4, 2020

DISPATCHER = Hash.new {|h, k| h[k] = "convert_#{k}" }

@gettalong gettalong self-assigned this May 4, 2020
@gettalong
Copy link
Owner

I'm not that well versed with multi-threading, but can it be a problem in this case? The worst case is that the key is set twice (multiple times) to the same value, isn't it?

@ioquatix
Copy link
Author

ioquatix commented May 5, 2020

I did a live stream about this: https://www.youtube.com/watch?v=Wu9LRNOc5pQ

Yes, it can be a problem.

I would suggest:

  1. Benchmark the use of a cache or not. Is it providing a statistically significant improvement?
  2. Pre-generate the hash table and freeze it if the cache is valuable.

@gettalong
Copy link
Owner

Thanks for the live stream - you make it very clear 👍

There is significant difference in performance, probably mostly due to less created objects and less garbage collection.

Here is a YAML file for benchmark-driver:

prelude: |
  hash = {:root=>1, :hr=>1, :p=>315, :text=>631, :typographic_sym=>21, :html_element=>1, :blank=>387, :header=>45, :strong=>5, :smart_quote=>24, :ul=>21, :li=>74, :a=>59, :em=>15, :dl=>4, :dt=>14, :dd=>14, :codespan=>88, :codeblock=>94, :blockquote=>29, :math=>1}
  dispatcher = Hash.new {|h, k| h[k] = "convert_#{k}" }
  mutex = Mutex.new
  mutex_dispatcher = Hash.new {|h, k| mutex.synchronize { h[k] = "convert_#{k}" } }


benchmark:
  direct: |
    hash.each {|name, calls| calls.times { "convert_#{name}" } }

  hash: |
    hash.each {|name, calls| calls.times { dispatcher[name] } }

  hash with mutex: |
    hash.each {|name, calls| calls.times { mutex_dispatcher[name] } }

Here is the result on Ruby 2.6.5:

Warming up --------------------------------------
              direct     3.569k i/s -      3.894k times in 1.090981s (280.17μs/i)
                hash    12.532k i/s -     13.772k times in 1.098990s (79.80μs/i)
     hash with mutex    12.308k i/s -     12.710k times in 1.032700s (81.25μs/i)
Calculating -------------------------------------
              direct     3.523k i/s -     10.707k times in 3.039080s (283.84μs/i)
                hash    12.879k i/s -     37.594k times in 2.918966s (77.64μs/i)
     hash with mutex    12.080k i/s -     36.922k times in 3.056406s (82.78μs/i)

Comparison:
                hash:     12879.2 i/s
     hash with mutex:     12080.2 i/s - 1.07x  slower
              direct:      3523.1 i/s - 3.66x  slower

The contents of the hash variable has been extracted from a kramdown run on doc/syntax.page and shows the number of calls for elements of the AST.

The mutex_dispatcher variant should avoid the thread safety problems and still be fast.

@ioquatix
Copy link
Author

ioquatix commented May 6, 2020

I like that benchmark but can you also try it in the context of parsing a real document?

There is one more point about global state.

It's never garbage collected.

At least if you used a per-document local instance, you wouldn't need a mutex, and it only exists for the lifetime of the document. At least worth considering. If someone processes a kramdown document, that state will exist now for the lifetime of the process, not the lifetime of the document.

Again, I'd opt for simple designs and only build more elaborate designs (with caches like this) when the cost of the implementation is outweighed by the value of the performance improvement. That's obviously subjective, but just for fun why don't you repeat the benchmark with a real document, e.g. a README from the project or something, and see if it actually makes more than a few points of difference.

@gettalong
Copy link
Owner

Thanks for the feedback!

I have added an additional test where the hash is initialized each time:

Warming up --------------------------------------
              direct     3.556k i/s -      3.905k times in 1.098093s (281.20μs/i)
                hash    12.940k i/s -     13.860k times in 1.071086s (77.28μs/i)
      hash each time    11.203k i/s -     11.770k times in 1.050587s (89.26μs/i)
     hash with mutex    11.400k i/s -     12.419k times in 1.089369s (87.72μs/i)
Calculating -------------------------------------
              direct     3.604k i/s -     10.668k times in 2.959820s (277.45μs/i)
                hash    12.770k i/s -     38.820k times in 3.039882s (78.31μs/i)
      hash each time    11.837k i/s -     33.609k times in 2.839294s (84.48μs/i)
     hash with mutex    11.291k i/s -     34.200k times in 3.029001s (88.57μs/i)

Comparison:
                hash:     12770.2 i/s
      hash each time:     11837.1 i/s - 1.08x  slower
     hash with mutex:     11290.9 i/s - 1.13x  slower
              direct:      3604.3 i/s - 3.54x  slower

So putting the dispatcher hash into the converter instance will create a bit more objects and be a bit slower but by not much.

Running the benchmark by parsing the doc/syntax.page in the prelude part and then running doc.to_html in the benchmark part:

Without dispatcher:

Warming up --------------------------------------
             convert          299.481 i/s -     315.000 times in 1.051819s (3.34ms/i)
Calculating -------------------------------------
             convert          284.929 i/s -     898.000 times in 3.151662s (3.51ms/i)

With dispatcher:

Warming up --------------------------------------
             convert          319.037 i/s -     330.000 times in 1.034361s (3.13ms/i)
Calculating -------------------------------------
             convert          314.643 i/s -     957.000 times in 3.041544s (3.18ms/i)

So in this case also clearly faster with dispatcher.

I have also done another test using benchmark/benchmark.sh, once with the current version and once with a version where the dispatch hash was removed. The used document was the doc/syntax.page (ignore the mentioning of mdbasics.text):

graph-ruby-2 7 1p83-83

The version with the dispatch hash removed is clearly slower for larger documents.

To sum up: With the dispatcher is faster and the version with the local dispatcher is the way to go to avoid the thread safety issue.

@eregon
Copy link

eregon commented May 8, 2020

Just reading this and noticed the solution:

@dispatcher = Hash.new {|h, k| h[k] = "convert_#{k}" }

Looks nice. One thought is it might be interesting to use Symbols there, like:

@dispatcher = Hash.new {|h, k| h[k] = :"convert_#{k}" }

as that would deduplicate the various convert_* internally.
That deduplication costs an internal lookup though, so it's not clear if it would be a gain or loss overall.

@gettalong
Copy link
Owner

Ah, right, since #send expects a Symbol, we could just store the symbol and avoid the conversion on line

send(@dispatcher[el.type], el, indent)

@eregon
Copy link

eregon commented May 8, 2020

Right, send accepts both but probably needs an additional conversion to Symbol/ID at least on MRI.

@gettalong
Copy link
Owner

A small benchmark for confirmation:

prelude: |
  def code
  end

  symbol = :code
  string = 'code'

benchmark:
  string: 'send(string)'
  symbol: 'send(symbol)'

Result on Ruby 2.6.5:

Warming up --------------------------------------
              string    10.657M i/s -     10.783M times in 1.011837s (93.84ns/i, 260clocks/i)
              symbol    19.148M i/s -     19.548M times in 1.020899s (52.23ns/i, 186clocks/i)
Calculating -------------------------------------
              string    11.974M i/s -     31.970M times in 2.669933s (83.51ns/i, 294clocks/i)
              symbol    25.945M i/s -     57.444M times in 2.214046s (38.54ns/i, 135clocks/i)

Comparison:
              symbol:  25945093.1 i/s
              string:  11973921.4 i/s - 2.17x  slower

So, yes, symbol is twice as fast, will change the code, thank @eregon!

Side note: Much of kramdown was started in the 1.8 era, so there are probably many other places where code could be updated to newer standards...

@gettalong gettalong reopened this May 8, 2020
@gettalong
Copy link
Owner

Done.

@eregon
Copy link

eregon commented May 9, 2020

@gettalong I don't see the change on master, maybe just a missing git push?

@gettalong
Copy link
Owner

Ah, yes, now it's there!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants