Multithreading

Thread strategy

Each system is attached to a thread. Physics and graphics share the same thread called the engine thread. Other systems may share a thread or have their own thread depending of the choice of implementation.

At each frame, a thread that has finished its work grabs the current time point given by the scheduler. Systems sharing the same thread have the same time point. This time point is the current validity of its data. In other words, if any data of the game were updated or removed in the previous frame by any system, the current data set reflects it. It is the last known time point.

At any time point, there is no guarantee that 2 threads have the same time point because a thread can stay in a work for a long time and miss some frames.

Since threads have not the same frame time point, they can't talk together without a lot of care about what they are talking about: for example thread A can have an entity #23 that is different from entity #23 in thread B context, because the entity may have changed and a thread may have outdated data.

We will call contextualized data, some data that are valid only for a specific frame. This kind of data must be properly synchronized. A list of components is a typical example, since some components may have been removed between the time when the data was built and the time when the consumer takes it into account.

Asynchronous distribution - 1 to N

A system that want to give access to its contextualized data can't do it synchronously, or, in other words, in another thread we can't call a get() function. Instead, the data updates will be published at each frame. The updates can be accessed in other threads. The consumers will have the delta between each synchronization.

The class RewindableMap contains tools to publish some data. It works like a git repository, i.e you have a work space where you can insert, update and remove data synchronously, but also a Commit() function that freezes the data and publish it. Subscriber can get the published data by calling Pull() at each frame. The Pull() functions take care of the case where the context are different, i.e if the caller thread is behind the publisher thread, the publisher rewinds the data to the point where the caller is.

Note that the Pull() function is the only function callable from another thread and it is the only way to get some data from another system in another thread.

The RewindableMap is adapted for all data that must be regularly updated and broadcasted to consumers with latency, even on the network.

Asynchronous call - N to 1

To send data that has no frame context, for example an order that must be executed as soon as possible, the receiver can use the AtomicQueue or the AtomicMap storage. They are regular containers with thread-safety capability.

Typical usage of the atomic containers are the keyboard and mouse controls. The movements fill a buffer that is polled at each frame by the engine thread.

The network uses also some atomic containers to store packets to the systems that poll them at each frame.

Atomic containers can be used between systems to start a job, while the job result will be published to be contextualized.

And now ?

Properly synchronizing data access allows to reduce the verifications that are usually needed when making synchronous calls. For example, there is a guarantee that an entity will be deleted at a specific frame in all the systems, even if systems don't actually remove the entity at the same time. Only systems that are ahead of this time point will see the entity deleted, while systems behind the time point will still see it in their data, but also in other systems.

It allows to remove all the conditional tests to check that a data set is still valid when iterated: There is a guarantee that a data set built for a specific frame will be valid during all the frame. Since we use the same time point for all the data we use in a specific frame, we can safely iterate on them and assume that the data is valid and has not changed.

Conclusion

Since data synchronization is not easy, it is better to put the code near the data, i.e if a system needs to work heavily on contextualized data that it does not own, it may be better to move the code to the other system and work synchronously on the data. Typically, if you reach the point where you rebuild the data container using the published deltas, it is the time to think about a migration of the code next to the original container.

Also remember to call only the 'const' functions in other systems. Even if some data is accessible synchronously, it is uncertain that it will be synchronized with yours. Bugs in a multithreaded environment are very difficult to chase.

Provide feedback

Saved searches