Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make AtomTable consurrency safe #1980

Merged
merged 11 commits into from Sep 5, 2023
Merged

make AtomTable consurrency safe #1980

merged 11 commits into from Sep 5, 2023

Conversation

Skgland
Copy link
Contributor

@Skgland Skgland commented Aug 27, 2023

This makes it possible to use multiple machines at the same time and removes one obstacle for moving a Machine to a different thread.

Note: currently searching for the reason that all integration test appear deadlock immediatly, unittest already work.
Update: Found the deadlock, now investigating the slow down, though its probably just a bunch of RwLock locking that can't be easily remedied.

@Skgland
Copy link
Contributor Author

Skgland commented Aug 27, 2023

Ah, the joys of yaml, bviously 1.70 is the string "1.7" the zero is definitly unimportant 🙄.

src/atom_table.rs Outdated Show resolved Hide resolved
@Skgland
Copy link
Contributor Author

Skgland commented Aug 27, 2023

All build-test CI jobs appear to take ~7 Minutes longer with these changes.
I am currently running flamegraph to see if there is some hotspot that can be solved easily,
though I am not optimistic that there is.

@triska
Copy link
Contributor

triska commented Aug 27, 2023

Incredible contribution, thank you a lot for working on this!

A remaining problematic aspect is currently a severe slowdown also in other applications (in addition to the CI tests). For instance, with these changes applied, just launching Scryer already takes a few seconds on my machine:

$ time scryer-prolog -g halt

real	0m5.283s
user	0m5.268s
sys	0m6.425s

That's a slowdown of roughly 2 orders of magnitude compared to master, where this finishes within fractions of a second. Other examples, such as Sudoku, even incur a 400-fold performance penalty with these changes applied. It would be awesome to get all these improvements while retaining at least roughly the same performance. Especially with only a single thread running, the synchronization overhead should ideally be negligible.

@mthom
Copy link
Owner

mthom commented Aug 27, 2023

This paper may be helpful in improving performance: https://arxiv.org/pdf/1608.00989.pdf

@Skgland
Copy link
Contributor Author

Skgland commented Aug 27, 2023

This paper may be helpful in improving performance: https://arxiv.org/pdf/1608.00989.pdf

Yeah, I think something like that will be needed, I was hoping that adding the lock wouldn't impackt performance too much especially in the singel threaded case, but that appears to be not the case.

I don't think I will get to that durring the week, so maybe next weekend.

Flamegraph (cargo flamegraph --test scryer) is currently still running and processing the 31.3 GB of the perf.data it generated from running all test. Maybe it will finish till I wake up tomorrow.

@Skgland
Copy link
Contributor Author

Skgland commented Aug 28, 2023

@mthom @triska Is there a reason why every type containing a RawBlock deallocates calls RawBlock::deallocate manually in its Drop impl rather than RawBlock implementing Drop and doing that there?

@Skgland
Copy link
Contributor Author

Skgland commented Aug 28, 2023

Also I think F64Table is currently also not concurrency safe as it also uses a global in the same way AtomTable does/did

@mthom
Copy link
Owner

mthom commented Aug 28, 2023

Is there a reason why every type containing a RawBlock deallocates calls RawBlock::deallocate manually in its Drop impl rather
than RawBlock implementing Drop and doing that there?

It was to allow the Drop instances of the stack to manually call the HeapCellValue destructor on the HeapCellValue stack registers back when that was needed. You can remove it. RawBlock is deprecated within the GC development branch I'm current working on.

@Skgland
Copy link
Contributor Author

Skgland commented Aug 28, 2023

With my latest changes I am back to more reasonable times

$ time ./target/release/scryer-prolog -g halt
./target/release/scryer-prolog -g halt  0,09s user 0,00s system 101% cpu 0,084 total

though as I mentioned I think F64Table looks like it also needs to be fixed to support concurrency.

I think I saw another global somewhere, that should maybe also be checked if it is correct under concurrency.

@triska
Copy link
Contributor

triska commented Aug 29, 2023

Nice, the speed looks a lot better now!

I measured between 20% and 40% slowdown in several benchmarks. I think that's quite acceptable, if in return we get the ability to run multiple threads which is very exciting! It would still be nice if this could be sped up further eventually, especially if only a single thread is running.

@Skgland
Copy link
Contributor Author

Skgland commented Sep 2, 2023

Ok, I think I am done now.
AtomTable and F64Table should now be safe for concurrent use.

F64Ptr looks a bit dangerous with the Deref and DerefMut impl it appears rather easy to produce alisasing references. Though as that was already the case I will leave it as is.

The remainig statics appear mostly fine, though some could probably be const instead of static.
The only exception are the statics in Machine::run_cleaners I am not sure that caching them between different Machines is necessarily valid and I don't knwo how I would check that. So I will leave this for someone else to check/verify and potentially fix.

@Skgland Skgland marked this pull request as ready for review September 2, 2023 20:38
@Skgland
Copy link
Contributor Author

Skgland commented Sep 2, 2023

rebased to resolve conflicts

@Skgland
Copy link
Contributor Author

Skgland commented Sep 2, 2023

PR vs. master

CI looks mixed compared to latest master ci times.
windows, mac and logtalk have improved slightly (½-1 minute).
ubuntu has one neutral and four that are slower by (1½-2 minutes).

@triska
Copy link
Contributor

triska commented Sep 3, 2023

The most important aspect is that this now even works correctly at all! As long as the speed is even somewhat acceptable, this is an awesome new feature, thank you a lot! I hope you plan to attend the Scryer meetup in November? Please also consider talking about these improvements there, they are very interesting!

@Skgland
Copy link
Contributor Author

Skgland commented Sep 3, 2023

I won't be able to attend as I have an important event at work that week (Tuesday to Friday) for which I am indisposable.

@Skgland
Copy link
Contributor Author

Skgland commented Sep 5, 2023

rebased again to resolve conflicts
apparently I did rush the rebase too much

there are technically still two locks
- the Mutex to serialize consurrent updates to the AtomTable
- the RwLock as part of GLOBAL_ATOM_TABLE

the former is the age lock of RESIZE ATOM TABLE in <https://arxiv.org/pdf/1608.00989.pdf>, though we use it for the whole update as we don't match the whole structure and can't reserve an atom slot as is, so we need to also lock concurrent updates without resizing

the later should be uncontendet as we only write AtomTable::new iff the current value is a dangling Weak
@Skgland
Copy link
Contributor Author

Skgland commented Sep 5, 2023

rebased again to fix the & that went missing in the rebase

@mthom mthom merged commit a9aef2b into mthom:master Sep 5, 2023
9 checks passed
@Skgland Skgland deleted the atomtable branch September 5, 2023 20:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants