Rhizome parallelization #68

Open
wants to merge 30 commits into
from

Conversation

Projects
None yet
2 participants
@rom1v
Contributor

rom1v commented Aug 11, 2013

I recently found that mdp packets and Rhizome stuff were executed on the same thread. Consequently, audio communications were broken during Rhizome synchronization.

I propose some code making all the Rhizome stuff executed in a separate POSIX thread (only in overlay mode, not necessary elsewhere).

With this code, I successfully exchange Rhizome bundles without impacting audio communications.

Implementation

Here are the principles of my implementation (for more information, read the messages of the firsts commits).

The main idea is to use two fdqueue instances: one "main" and one "rhizome". In overlay mode, a new thread is started, which calls fd_poll() on the rhizome fdqueue.

An alarm (sched_ent) has now a field fdqueue, specifying on which fdqueue the alarm must be scheduled/watched. Thus, the main thread can schedule an alarm to be executed on the rhizome thread (and vice-versa).

Of course, the alarm data (its context) cannot be stack-stored (there is one stack per thread) and these alarms must be allocated dynamically, which implies quite a lot of changes. Commits with message beggining by "Alarm" create a wrapper to be scheduled on an fdqueue. Commits with message beggining by "Schedule" allocate parameters (if needed) and schedule an alarm to be executed on the other thread.

Tests

All tests pass, except FileTransferUnreliableBigMDP.
This test sometimes fails even without my changes. If I change the filesize from 1MB to 500kB (in setup_bugfile_common() in tests/rhizomeprotocol), it pass.

rom1v added some commits Aug 11, 2013

Parallel tools
Implement rhizome thread function (which consumes its own fdqueue).

Add a util function schedules a new (allocated) alarm from only 3
parameters:
 - the function to execute;
 - its parameter (void *);
 - the fdqueue to use.

Of course, the parameter of the function must not be stored on the call
stack (else it will not exist anymore when the other thread will use
it).
Interruptible poll()
When poll() is waiting for events on watched file descriptors, it should
not block any thread wanting to (un)schedule an alarm.

But the mutex cannot be released while executing poll(), because the
watched file descriptors must not be modified (or closed!) by another
thread.

Therefore, poll() now waits for one more file descriptor, a pipe, to
become ready.

Before acquiring a mutex, fdqueue functions write 1 byte to the pipe.
Once the mutex is acquired, they read 1 byte. Thus, poll() is guaranteed
to be non-blocking when a thread waits for the mutex.

If the pipe is ready for I/O, then fd_poll() release the mutex for 1ms.
Multiple fdqueues support
The purpose is to execute rhizome actions on a thread different from
that handling routing and MDP stuff, so that a long rhizome action does
not jam the communications.

This first step adds a "struct fdqueue" type with all the fields
specific to an fdqueue. Two instances are created: main_fdqueue and
rhizome_fdqueue.

All public fdqueue functions now take an fdqueue argument. They are
thread-safe.

fd_poll() uses synchronization privimives for waiting until the next
event, so that the wait can be interrupted for executing a new action
earlier.
Log fd workaround
When running tests, if no trace is logged before fdqueues_init(), then
fd 0 will not be token for logfile.

This commit adds a dummy trace.
Schedule mdp_send_block in service_rhizomerequest
Schedule rhizome_mdp_send_block in overlay_mdp_service_rhizomerequest.
Performance_timing multithreading support
As performance_timing.c was designed to be monothreaded, its global
variables have been moved to the fdqueue structure, and the fdqueue
parameter is added to all its functions.
Schedule mdp_dispatch in mdp_requestblocks
Schedule overlay_mdp_dispatch_alarm in rhizome_fetch_mdp_requestblocks.
Schedule payload_enqueue in rhizome_advertise
Schedule overlay_payload_enqueue_alarm in overlay_rhizome_advertise.
Alarm for overlay_mdp_dispatch
Define overlay_mdp_dispatch_alarm, a wrapper for overlay_mdp_dispatch to
be scheduled.
Alarm for overlay_rhizome_saw_advertisements
Define overlay_rhizome_saw_advertisements_alarm, a wrapper for
overlay_rhizome_saw_advertisements to be scheduled.
Multithread flag
Multithreading is needed only in overlay mode.

Add a flag enabling use of the Rhizome fdqueue. If disabled, always use
the main fdqueue.
Enable multithreading for overlay mode
Enable multithreading and start a Rhizome thread in overlay mode.
Schedule mdp_dispatch in rhizome_sync_send_req
Schedule overlay_mdp_dispatch_alarm in rhizome_sync_send_requests.
Alarm for overlay_payload_enqueue
Define overlay_payload_enqueue_alarm, a wrapper for
overlay_payload_enqueue to be scheduled.
Schedule saw_adv in process_incoming_frame
Schedule rhizome_saw_advertisements_alarm in process_incoming_frame.
Schedule mdp_dispatch in mdp_send_block
Schedule overlay_mdp_dispatch_alarm in rhizome_mdp_send_block.
Schedule receive_content in service_rhizome_resp
Schedule rhizome_receive_content_alarm in
overlay_mdp_service_rhizome_response.
Alarm for rhizome_received_content
Define rhizome_received_content_alarm, a wrapper for
rhizome_received_content to be scheduled.
Schedule mdp_dispatch in sync_send_response
Schedule overlay_mdp_dispatch in sync_send_response.
Schedule mdp_dispatch in rhizome_saw_adv
Schedule overlay_mdp_dispatch_alarm in overlay_rhizome_saw_advertisements.
Alarm for rhizome_mdp_send_block
Define rhizome_mdp_send_block_alarm, a wrapper for
rhizome_mdp_send_block to be scheduled.
Schedule mdp_dispatch in rhizome_sync_request
Schedule overlay_mdp_dispatch_alarm in rhizome_sync_request.
Schedule sync_send_req in service_rhizome_sync
Schedule rhizome_sync_send_requests_alarm in
overlay_mdp_service_rhizome_sync.
Schedule rhizome_retr_adv in service_manifest_resp
Schedule rhizome_retrieve_and_advertise_manifest_alarm in
overlay_mdp_service_manifest_response.
Alarm for rhizome_advertise_manifest
Define rhizome_advertise_manifest_alarm, a wrapper for
rhizome_advertise_manifest to be scheduled.
Alarm for rhizome_sync_send_requests
Define rhizome_sync_send_requests_alarm, a wrapper for
rhizome_sync_send_requests to be scheduled.
Alarm for rhizome_retrieve_and_advertise_manifest
Define rhizome_retrieve_and_advertise_manifest_alarm, a wrapper for
rhizome_retrieve_and_advertise_manifest to be scheduled.
Extract retrieve+advertise manifest function
Code refactor paving the way to schedule retrieve+advertise manifest.
@lakeman

This comment has been minimized.

Show comment
Hide comment
@lakeman

lakeman Aug 12, 2013

We're going to need a better multi-threaded fix for this.
There's a chicken and egg problem with loading config, logging config errors, and using config to start writing to a log file. Plus with config effecting log levels, and timed log rotation, there are a number of things that could go wrong.
For now it might be safest to disable logging completely in the rhizome thread if no log file has been opened.

We're going to need a better multi-threaded fix for this.
There's a chicken and egg problem with loading config, logging config errors, and using config to start writing to a log file. Plus with config effecting log levels, and timed log rotation, there are a number of things that could go wrong.
For now it might be safest to disable logging completely in the rhizome thread if no log file has been opened.

@lakeman

This comment has been minimized.

Show comment
Hide comment
@lakeman

lakeman Aug 12, 2013

overlay_mdp_dispatch always duplicates the source packet and creates an overlay_frame. It would be simpler to run overlay_mdp_dispatch in either thread. And call overlay_payload_enqueue_alarm if required. Passing the new frame to be queued; "struct overlay_frame *frame = alarm->context;"

overlay_mdp_dispatch always duplicates the source packet and creates an overlay_frame. It would be simpler to run overlay_mdp_dispatch in either thread. And call overlay_payload_enqueue_alarm if required. Passing the new frame to be queued; "struct overlay_frame *frame = alarm->context;"

This comment has been minimized.

Show comment
Hide comment
@rom1v

rom1v Aug 12, 2013

Owner

overlay_mdp_dispatch always duplicates the source packet and creates an overlay_frame. It would be simpler to run overlay_mdp_dispatch in either thread.

Some code in overlay_mdp_dispatch need to be executed in the main thread, because it reads/writes some global variables without synchronization: calls to find_subscriber(), overlay_mdp_check_binding(), overlay_saw_mdp_frame()

Keeping the rounting and MDP stuff serialized (i.e. handled by only one thread) is, in my opinion, very important, for simplicity, reliability and probably performance (avoid useless synchronization).

Owner

rom1v replied Aug 12, 2013

overlay_mdp_dispatch always duplicates the source packet and creates an overlay_frame. It would be simpler to run overlay_mdp_dispatch in either thread.

Some code in overlay_mdp_dispatch need to be executed in the main thread, because it reads/writes some global variables without synchronization: calls to find_subscriber(), overlay_mdp_check_binding(), overlay_saw_mdp_frame()

Keeping the rounting and MDP stuff serialized (i.e. handled by only one thread) is, in my opinion, very important, for simplicity, reliability and probably performance (avoid useless synchronization).

This comment has been minimized.

Show comment
Hide comment
@lakeman

lakeman Aug 12, 2013

I'm not happy in general with the way we handle "struct overlay_mdp_frame" vs "struct overlay_frame". IMHO "struct overlay_mdp_frame" should only be used for interacting with mdp clients, and perhaps not even then. Internal services don't need a continuous memory buffer with all incoming packet information. We throw away information to put packets into this structure that we often end up rebuilding again. Personally I consider "struct overlay_mdp_frame" to be deprecated and would like to move away from using it. But I haven't had the time, or a pressing enough need, to revisit this part of the application yet.

Perhaps now is a good time to separate payload encryption / decryption from dependence on "struct overlay_mdp_frame". Then any future services which require more processing time can easily be shifted to a background thread or other processing queue, simply by passing whole "struct overlay_frame"'s around. Then parsing, processing and replying can be done in the background thread without needing to create a new alarm function and context data structure for each use case.

Packet encryption / signing could be done in a new function before calling overlay_payload_enqueue in the main thread. decryption / signature verification could be done in process_incoming_frame. I don't see any need for an internal service to send packets to another internal service. While we still need to reply to packets from mdp clients, these services could remain in the main thread for now.

We have internally discussed the idea of splitting rhizome into a completely separate process. Having both the daemon and command line api access the same sqlite database causes locking contention that we would also like to avoid. We would also like to support accessing more than one rhizome store at the same time, including copying content between stores when they are available.

We've considered turning rhizome into an mdp client. Extending the mdp client api to allow discovery and communication with other peers without requiring all packets to be sent via a single daemon. But we haven't had time to invest in this yet.

I'm not happy in general with the way we handle "struct overlay_mdp_frame" vs "struct overlay_frame". IMHO "struct overlay_mdp_frame" should only be used for interacting with mdp clients, and perhaps not even then. Internal services don't need a continuous memory buffer with all incoming packet information. We throw away information to put packets into this structure that we often end up rebuilding again. Personally I consider "struct overlay_mdp_frame" to be deprecated and would like to move away from using it. But I haven't had the time, or a pressing enough need, to revisit this part of the application yet.

Perhaps now is a good time to separate payload encryption / decryption from dependence on "struct overlay_mdp_frame". Then any future services which require more processing time can easily be shifted to a background thread or other processing queue, simply by passing whole "struct overlay_frame"'s around. Then parsing, processing and replying can be done in the background thread without needing to create a new alarm function and context data structure for each use case.

Packet encryption / signing could be done in a new function before calling overlay_payload_enqueue in the main thread. decryption / signature verification could be done in process_incoming_frame. I don't see any need for an internal service to send packets to another internal service. While we still need to reply to packets from mdp clients, these services could remain in the main thread for now.

We have internally discussed the idea of splitting rhizome into a completely separate process. Having both the daemon and command line api access the same sqlite database causes locking contention that we would also like to avoid. We would also like to support accessing more than one rhizome store at the same time, including copying content between stores when they are available.

We've considered turning rhizome into an mdp client. Extending the mdp client api to allow discovery and communication with other peers without requiring all packets to be sent via a single daemon. But we haven't had time to invest in this yet.

@lakeman

This comment has been minimized.

Show comment
Hide comment
@lakeman

lakeman Aug 12, 2013

Looks like the old code was doing this, but it's wrong. We can't depend on both machines to have the same endian order. Should be calling write_uint64.

Looks like the old code was doing this, but it's wrong. We can't depend on both machines to have the same endian order. Should be calling write_uint64.

@rom1v

This comment has been minimized.

Show comment
Hide comment
@rom1v

rom1v Aug 13, 2013

Contributor

Perhaps now is a good time to separate payload encryption / decryption from dependence on "struct overlay_mdp_frame". Then any future services which require more processing time can easily be shifted to a background thread or other processing queue, simply by passing whole "struct overlay_frame"'s around. Then parsing, processing and replying can be done in the background thread without needing to create a new alarm function and context data structure for each use case.

I only added the glue to make the current code multithreaded, without changing the logic.
I agree, the missing part of the work is to rewrite some algorithms, when necessary, with multithreading in mind.

We have internally discussed the idea of splitting rhizome into a completely separate process.

Paul also said:

I think in the longer term we want to put it in a separate process to avoid residual inter-thread locks

I read somewhere that you wanted to avoid threads, and use only monothreaded processes. The reasons are unclear to me. Is avoiding residual inter-thread locks the main reason? Could you explain?
Both have advantages and disadvantages (random link about it), but I don't know why you want Serval to always avoid threads a priori.

Contributor

rom1v commented Aug 13, 2013

Perhaps now is a good time to separate payload encryption / decryption from dependence on "struct overlay_mdp_frame". Then any future services which require more processing time can easily be shifted to a background thread or other processing queue, simply by passing whole "struct overlay_frame"'s around. Then parsing, processing and replying can be done in the background thread without needing to create a new alarm function and context data structure for each use case.

I only added the glue to make the current code multithreaded, without changing the logic.
I agree, the missing part of the work is to rewrite some algorithms, when necessary, with multithreading in mind.

We have internally discussed the idea of splitting rhizome into a completely separate process.

Paul also said:

I think in the longer term we want to put it in a separate process to avoid residual inter-thread locks

I read somewhere that you wanted to avoid threads, and use only monothreaded processes. The reasons are unclear to me. Is avoiding residual inter-thread locks the main reason? Could you explain?
Both have advantages and disadvantages (random link about it), but I don't know why you want Serval to always avoid threads a priori.

@lakeman

This comment has been minimized.

Show comment
Hide comment
@lakeman

lakeman Aug 15, 2013

Member

On Tue, Aug 13, 2013 at 7:27 PM, ®om notifications@github.com wrote:

Perhaps now is a good time to separate payload encryption / decryption
from dependence on "struct overlay_mdp_frame". Then any future services
which require more processing time can easily be shifted to a background
thread or other processing queue, simply by passing whole "struct
overlay_frame"'s around. Then parsing, processing and replying can be done
in the background thread without needing to create a new alarm function and
context data structure for each use case.

I only added the glue to make the current code multithreaded, without
changing the logic.

Yes I can tell. It's a very tidy series of patches that does nothing more
than it claims. Though you have left out the question of logging, another
part of the application which uses global variables and would need to be
resolved before merging.

I've split overlay_mdp_dispatch such that internal services that only sent
packets over the network, and not to local mdp clients, can be rewritten to
call overlay_send_frame instead. All of the rhizome related network packets
should meet that criteria. Then we just need a single method for passing
overlay_frames to the main thread to call this function.

Instead of allocating a new alarm per frame, we can probably build some
kind of producer -> consumer pipe using the existing next / prev list
pointers in this structure. Or only pass one frame from the background
thread at a time, blocking until the main thread has consumed and queued
the previous packet. It's a background thread for long running processes,
we probably don't care if it stalls when the main thread is busy with
latency sensitive tasks. But I wouldn't call myself an expert on writing
good multi-threaded C code.

On that point, should we rename the "rhizome" thread to the "background"
thread? Just because we're shifting rhizome operations there now, doesn't
mean we wont shift other stuff off the main thread later.

I agree, the missing part of the work is to rewrite some algorithms, when

necessary, with multithreading in mind.

We have internally discussed the idea of splitting rhizome into a
completely separate process.

Paul also saidhttps://groups.google.com/d/msg/serval-project-developers/kynmhNjv_To/S6cewSRXLYcJ
:

I think in the longer term we want to put it in a separate process to
avoid residual inter-thread locks

I read somewhere that you wanted to avoid threads, and use only
monothreaded processes. The reasons are unclear to me. Is avoiding
residual inter-thread locks
the main reason? Could you explain?
Both have advantages and disadvantages (random link about ithttp://stackoverflow.com/questions/617787/why-should-i-use-a-thread-vs-using-a-process),
but I don't know why you want Serval to always avoid threads a priori.


Reply to this email directly or view it on GitHubhttps://github.com/servalproject/serval-dna/pull/68#issuecomment-22554361
.

Member

lakeman commented Aug 15, 2013

On Tue, Aug 13, 2013 at 7:27 PM, ®om notifications@github.com wrote:

Perhaps now is a good time to separate payload encryption / decryption
from dependence on "struct overlay_mdp_frame". Then any future services
which require more processing time can easily be shifted to a background
thread or other processing queue, simply by passing whole "struct
overlay_frame"'s around. Then parsing, processing and replying can be done
in the background thread without needing to create a new alarm function and
context data structure for each use case.

I only added the glue to make the current code multithreaded, without
changing the logic.

Yes I can tell. It's a very tidy series of patches that does nothing more
than it claims. Though you have left out the question of logging, another
part of the application which uses global variables and would need to be
resolved before merging.

I've split overlay_mdp_dispatch such that internal services that only sent
packets over the network, and not to local mdp clients, can be rewritten to
call overlay_send_frame instead. All of the rhizome related network packets
should meet that criteria. Then we just need a single method for passing
overlay_frames to the main thread to call this function.

Instead of allocating a new alarm per frame, we can probably build some
kind of producer -> consumer pipe using the existing next / prev list
pointers in this structure. Or only pass one frame from the background
thread at a time, blocking until the main thread has consumed and queued
the previous packet. It's a background thread for long running processes,
we probably don't care if it stalls when the main thread is busy with
latency sensitive tasks. But I wouldn't call myself an expert on writing
good multi-threaded C code.

On that point, should we rename the "rhizome" thread to the "background"
thread? Just because we're shifting rhizome operations there now, doesn't
mean we wont shift other stuff off the main thread later.

I agree, the missing part of the work is to rewrite some algorithms, when

necessary, with multithreading in mind.

We have internally discussed the idea of splitting rhizome into a
completely separate process.

Paul also saidhttps://groups.google.com/d/msg/serval-project-developers/kynmhNjv_To/S6cewSRXLYcJ
:

I think in the longer term we want to put it in a separate process to
avoid residual inter-thread locks

I read somewhere that you wanted to avoid threads, and use only
monothreaded processes. The reasons are unclear to me. Is avoiding
residual inter-thread locks
the main reason? Could you explain?
Both have advantages and disadvantages (random link about ithttp://stackoverflow.com/questions/617787/why-should-i-use-a-thread-vs-using-a-process),
but I don't know why you want Serval to always avoid threads a priori.


Reply to this email directly or view it on GitHubhttps://github.com/servalproject/serval-dna/pull/68#issuecomment-22554361
.

@rom1v

This comment has been minimized.

Show comment
Hide comment
@rom1v

rom1v Aug 19, 2013

Contributor

Though you have left out the question of logging, another part of the application which uses global variables and would need to be resolved before merging.

Ah? Where are these variables?

I've split overlay_mdp_dispatch such that internal services that only sent packets over the network, and not to local mdp clients, can be rewritten to call overlay_send_frame instead.

Which services do send packets to "local" mdp clients? Which are these "local" mdp clients?

Instead of allocating a new alarm per frame, we can probably build some kind of producer -> consumer pipe using the existing next / prev list pointers in this structure.

Maybe.

But, that way, it could only apply for passing one frame from one thread to another.

My idea was to pass "runnables" (a generic function+argument to post whatever action you want). In practice, the alarms I scheduled do not always post frames (see parallel.h *_arg structures). I don't know if all these action arguments could be reduced as "frame data".

Or only pass one frame from the background thread at a time, blocking until the main thread has consumed and queued the previous packet. It's a background thread for long running processes, we probably don't care if it stalls when the main thread is busy with latency sensitive tasks.

I don't know if the overrhead of these malloc/free calls is significant or not.

The way I've implemented it uses the same mechanism both for main thread and rhizome thread. As a consequence, if rhizome blocks waiting for main thread to be idle, then main thread will also block waiting for rhizome thread to be idle. The situation where the main thread needs to post a runnable on the rhizome thread occurs (1 2 3), but maybe it can be avoided…

On that point, should we rename the "rhizome" thread to the "background" thread? Just because we're shifting rhizome operations there now, doesn't mean we wont shift other stuff off the main thread later.

I've considered this background thread to be rhizome-specific: another service would have its own thread too… Although, even a single service could have several threads.

We've considered turning rhizome into an mdp client.

I think it is a good idea.

Ideally, I think Rhizome could simply work as any other service on top of MDP: it would open an MDP socket on a predefined port and exchange with other peers, without any lower-level knowledge and, above all, without being referenced by any lower-level code.

This would remove the need of "internal services" hack: each service would use its own port dynamically (like with TCP or UDP).

In that case, Rhizome would create its own thread to handle its stuff separately, without impacting overlay* code.

What do you think?

Contributor

rom1v commented Aug 19, 2013

Though you have left out the question of logging, another part of the application which uses global variables and would need to be resolved before merging.

Ah? Where are these variables?

I've split overlay_mdp_dispatch such that internal services that only sent packets over the network, and not to local mdp clients, can be rewritten to call overlay_send_frame instead.

Which services do send packets to "local" mdp clients? Which are these "local" mdp clients?

Instead of allocating a new alarm per frame, we can probably build some kind of producer -> consumer pipe using the existing next / prev list pointers in this structure.

Maybe.

But, that way, it could only apply for passing one frame from one thread to another.

My idea was to pass "runnables" (a generic function+argument to post whatever action you want). In practice, the alarms I scheduled do not always post frames (see parallel.h *_arg structures). I don't know if all these action arguments could be reduced as "frame data".

Or only pass one frame from the background thread at a time, blocking until the main thread has consumed and queued the previous packet. It's a background thread for long running processes, we probably don't care if it stalls when the main thread is busy with latency sensitive tasks.

I don't know if the overrhead of these malloc/free calls is significant or not.

The way I've implemented it uses the same mechanism both for main thread and rhizome thread. As a consequence, if rhizome blocks waiting for main thread to be idle, then main thread will also block waiting for rhizome thread to be idle. The situation where the main thread needs to post a runnable on the rhizome thread occurs (1 2 3), but maybe it can be avoided…

On that point, should we rename the "rhizome" thread to the "background" thread? Just because we're shifting rhizome operations there now, doesn't mean we wont shift other stuff off the main thread later.

I've considered this background thread to be rhizome-specific: another service would have its own thread too… Although, even a single service could have several threads.

We've considered turning rhizome into an mdp client.

I think it is a good idea.

Ideally, I think Rhizome could simply work as any other service on top of MDP: it would open an MDP socket on a predefined port and exchange with other peers, without any lower-level knowledge and, above all, without being referenced by any lower-level code.

This would remove the need of "internal services" hack: each service would use its own port dynamically (like with TCP or UDP).

In that case, Rhizome would create its own thread to handle its stuff separately, without impacting overlay* code.

What do you think?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment