Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Domain.join implementation #130

Merged
merged 7 commits into from Jun 8, 2017
Merged

Conversation

@edwintorok
Copy link
Contributor

@edwintorok edwintorok commented May 16, 2017

Domain.spawn now returns the pthread_t wrapped in a custom block.
Domain.join simply calls pthread_join.

pause had to be removed to avoid the thread hanging indefinetely.

edwintorok added 3 commits May 16, 2017
Changes the type signature of 'Domain.spawn'

Signed-off-by: Edwin Török <edwin@etorok.net>
Signed-off-by: Edwin Török <edwin@etorok.net>
Fixes #89

Signed-off-by: Edwin Török <edwin@etorok.net>
@stedolan
Copy link
Contributor

@stedolan stedolan commented May 16, 2017

Looks good! The call to pause is from a time when domains referred to each other's thread local variables (!), a poor design now fixed, so it can go away.

You do need some means of handling Domain.join being called multiple times, or in parallel by different domains - pthread_join has undefined behaviour if called twice. I'm not sure whether subsequent calls to Domain.join should be no-ops or raise an exception, but it should do something well-defined.

if (!th)
caml_invalid_argument("No thread");
caml_gc_log("Domain joining");
err = pthread_join(*th, NULL);

This comment has been minimized.

@stedolan

stedolan May 16, 2017
Contributor

This is a slow operation, so it needs to be in a blocking section (surrounded by caml_enter_blocking_section and caml_leave_blocking_section). Because *th can move, the dereference must be outside the blocking section. (The easiest way to do this is to write pthread_t th = *(...) instead of pthread_t *th = (...))

Raise an exception if Domain.join is attempted multiple times.
Concurrent joins would be undefined otherwise.

Store the mutex in a non-movable C allocation, because copying mutexes is
undefined.
Also wrap pthread_join with blocking section.

Signed-off-by: Edwin Török <edwin@etorok.net>
@edwintorok
Copy link
Contributor Author

@edwintorok edwintorok commented May 17, 2017

Thanks, I have added a mutex protected flag to detect multiple joins and exit early by raising an exception. Since I'm now storing a mutex inside the custom block I added another allocation (and a finalizer), I'm worried that OCaml's GC could move the custom block with the mutex in it which wouldn't be well defined.

Not sure whether I should just use caml_plat_lock/unlock instead of With_mutex, the mutex is only held for a short period of time to check a flag.

@stedolan
Copy link
Contributor

@stedolan stedolan commented May 17, 2017

Thanks, I have added a mutex protected flag to detect multiple joins and exit early by raising an exception. Since I'm now storing a mutex inside the custom block I added another allocation (and a finalizer), I'm worried that OCaml's GC could move the custom block with the mutex in it which wouldn't be well defined.

You're right, you do need a means to prevent the mutex moving. The standard idiom here is to allocate the struct domain_thread yourself, using caml_stat_alloc, and then only store a pointer to that structure in the custom block. That way, the GC can move the custom block but won't move the structure.

Not sure whether I should just use caml_plat_lock/unlock instead of With_mutex, the mutex is only held for a short period of time to check a flag.

caml_plat_lock/unlock is fine. With_mutex includes some magic to ensure that the lock is released if an exception is thrown while the mutex is held, which is necessary for some hairy code in io.c. It's not needed here.

Use caml_stat_{alloc,free} and caml_plat_{lock,unlock}.

Signed-off-by: Edwin Török <edwin@etorok.net>
@edwintorok
Copy link
Contributor Author

@edwintorok edwintorok commented May 17, 2017

Thanks, pushed a fixup commit.


caml_plat_event_init(&p.ev);

p.callback = caml_create_root(caml_promote(&domain_self->state, callback));

err = pthread_create(&th, 0, domain_thread_func, (void*)&p);
err = pthread_create(&d->th, 0, domain_thread_func, (void*)&p);

This comment has been minimized.

@stedolan

stedolan May 18, 2017
Contributor

If this returns an error, then d->th has not been initialised. The finaliser will run, and since joinable=true, it will try to pthread_detach(d->th).

I think joinable should be initialised to 0, and only set to 1 if pthread_create succeeds.

This comment has been minimized.

@edwintorok

edwintorok May 19, 2017
Author Contributor

Good point, pushed a fixup for that.

joinable should be 0 if pthread_create fails

Signed-off-by: Edwin Török <edwin@etorok.net>
Assert(dp);
d = *dp;
Assert(d);
th = d->th;

This comment has been minimized.

@stedolan

stedolan May 20, 2017
Contributor

Here, d->th is accessed without holding the mutex d->m. It's not wrong, because d->th is readonly except in the finaliser, with which no code can race. However, the locking setup isn't obvious.

You should do one of:

  • Move this line after caml_plat_lock below so that d->m protects all accesses to d, or
  • Add a comment in the definition of struct domain_thread explaining that the mutex protects only joinable but not th, and that th is immutable once initialised.

This comment has been minimized.

@edwintorok

edwintorok May 20, 2017
Author Contributor

I've added a comment, I think it is clearer that way. Otherwise it could be confusing why copying the pthread_t needed the lock, but operating on it (with pthread_join) didn't.

Add comment about locking setup.

Signed-off-by: Edwin Török <edwin@etorok.net>
@stedolan
Copy link
Contributor

@stedolan stedolan commented Jun 8, 2017

Sorry for the delay, I've been quite busy the last while. This version looks good, thanks!

@stedolan stedolan merged commit b6182f7 into ocaml-multicore:master Jun 8, 2017
1 check passed
1 check passed
continuous-integration/travis-ci/pr The Travis CI build passed
Details
@edwintorok edwintorok deleted the edwintorok:domainjoin branch Jun 8, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked issues

Successfully merging this pull request may close these issues.

None yet

2 participants
You can’t perform that action at this time.