Join GitHub today
GitHub is home to over 40 million developers working together to host and review code, manage projects, and build software together.Sign up
demonstrators of different threading technologies for CMS framework
Fetching latest commit…
Cannot retrieve the latest commit at this time.
|Type||Name||Latest commit message||Commit time|
|Failed to load latest commit information.|
This project is a simple demonstration of using different threading technologies as the basis of a HEP multi-threaded framemework -------------------------------------- COMPILING 1) You'll need to have available a version of boost which is used to parse the configuration files http://www.boost.org 2) To use the Intel Thread Building Block (TBB) framework, you'll need a version of tbb https://www.threadingbuildingblocks.org 3) create a subdirectory named build to hold the build products 4) in the build directory, run the command cmake -DCMAKE_BUILD_TYPE=Release -DBOOST_ROOT=<boost install directory> -DTBB_ROOT_DIR=<tbb install directory> .. 5) Then just compile make You will wind up with four executables BusyWaitCalibration/BusyWaitCalibrate: Used to calibrate how long it takes to run a simple integration function. This is used for tests where you want a module to actually do work (as apposed to just sleep). TBBProcessingDemo/TBBDemo: Processes configuration files to run trivial modules via TBB in a way analogous to a full HEP framework. Useful for testing scalability of the system. OpenMPProcessingDemo/OpenMPDemo: Processes configuration files to run trivial modules via OpenMP in a way analogous to a full HEP framework. Useful for testing scalability of the system. SingleThreadedDemo: Similar to the other *Demos exception only uses one thread and is implemented in the most trivial (and fastest) way possible. Used for timing comparisons with the same configurations passed to *Demos. NOTE: The DispatchProcessingDemo which made use of Apple's Grand Central Dispatch system is no longer supported. It is possible to build it on Linux using https://www.heily.com/trac/libdispatch which need libkqueue http://mark.heily.com/book/export/html/52 -------------------------------------- TESTING To see that the program works at all, you can run DispatchDemo using the simple test configuration TBBProcessingDemo/TBBDemo ../TBBProcessingDemo/test_config.json This should generate an output like demo::PassFilter first 0x5ca9a0 demo::PassFilter second 0x5c6910 demo::PassFilter third 0x5c5fe0 demo::ThreadSaferGetterFilter fourth 0x5c6190 demo::BusyWaitProducer alpha 0x5c6470 demo::BusyWaitProducer beta 0x5c66e0 # threads:24 # simultaneous events:4 total # events:3000 cpu time:182.333 real time:26.2779 events/sec:114.164 -------------------------------------- CALIBRATING In order to simulate the effect of actually doing HEP type work, the code does a simple integration to eat up 100% of a CPU. However, how long a given loop takes needs to be calibrated for the machine being used. This is done by running 'BusyWaitCalibrate' Running this program gives an output similar to the following BusyWaitCalibration/BusyWaitCalibrate 43.3487 100000 0.004535 2.20507e+07 195.367 1000000 0.042997 2.32574e+07 199.928 10000000 0.441726 2.26385e+07 6.4598 100000000 5.10827 1.95761e+07 The grouping of numbers is the results from running 4 different iteration values, each differing by a factor of 10. The lone number on a line can be ignored. The the numbers on one line are - the number of iterations used - the length of time it took to do that many iterations - the ratio of (# of iterations)/(length of time) From experimentation, I've found that the first 3 iterations tend to be converging to a reasonable common number but the longest iteration tends to be slower, probably because the operating system is taking some time from the job in order to do some other work. Therefore I tend to take the largest value of the ration from the first three values. The ratio is then used to set the value in the various configuration files. The value is process.options.busyWaitScaleFactor The easy way to change the value is to just search for 'busyWaitScaleFactor' in the configuration file since it only appears once. -------------------------------------- CONFIGURATIONS The configuration files are JSON format files which contain the information such as - What is the busy wait scale factor to use (process.options.busyWaitScaleFactor) - How many events should the job 'process' (process.source.iterations) - How many events should be processed simultaneously (process.options.nSimultaneousEvents) - What 'modules' to run and how the various 'modules' are related The demo framework uses standard CMS concepts: -Tasks are encapsulated in 'modules' where 'modules' only communicate by adding object to and reading objects from the Event -Modules which only read data from the Event and decide if we should continue processing an Event are called Filters -Filters are placed in sequence into a Path -The job allows multiple Paths, where each Path is considered independent and can be run in parallel -Modules which add data to an event are called Producers. Producers can depend on data in the Event. A Producer will be run the first time a data item is requested from a given Event. -Data in the event is identified by the 'label' which was assigned to the Producer which makes the data. The list of Filters is defined in the 'process.filters' value. The list of Producers is defined in the 'process.producers' value. The list of Paths is defined in the 'process.paths' value. Filters and Producers are configured with the mandatory parameters '@label': the label used to uniquely defined the module (and the data the module makes) '@type': what C++ class to instantiate when creating the module Each C++ class type can then have its own parameters, although most accept 'eventTimes': how long the module runs for a given event. Specifying time for each event allows one to measure effects of correlations of times between modules. The units are seconds. 'threadType': used by TBBDemo and OpenMPDemo to decide how thread safe to treat the module. The allowed values are ThreadSafeBetweenInstances: the same instance of a module can be passed multiple events simultaneously ThreadSafeBetweenModules: the same instance of a module can only handle one event at a time but it does not interact with any other module ThreadUnsafe: the module can not be run at the same time as other 'ThreadUnsafe' modules 'toGet': A list of the data to acquire from the event. The list is pair of 'label' (which is the label of the Producer) and a 'product' which is not used now. -------------------------------------- EXPERIMENTS The experimental configurations are found in cms-data: reco_minbias_busywait.config: This simulates doing event Reconstruction on a common and quick to process event type where each module is using real CPU time. The job has one Filter which is simulating the behavior of the output module which would write a ROOT file. This Filter is set to be 'ThreadUnsafe' to simulate present ROOT behavior. The source is set to be ThreadSafeBetweenModules so only one request is processed at a time in the module. All other modules are configured with ThreadSafeBetweenInstances so are assumed to be perfectly thread safe. reco_minbias_sleeping.config Similar to reco_minbias_busywait.config but instead of having modules use real CPU time, they 'sleep' instead. The libdispatch is smart enough to see when a thread is sleeping and then assign more work to the system. This configuration allows one to push the nSimultaneousEvents to much larger levels than the actual number of CPUs on the machine. I was able to run 700 nSimultaneousEvents on a 2 core machine. reco_minbias_sleeping_1000.config The same as reco_minbias_sleeping.config, however instead of each module having specified timing for 100 events, this has timing for 1000 events. This is only here for historic reasons. reco_minbias_sleeping_betweenmodules.config The same as reco_minbias_sleeping.config except the Producers have been set to only be able to process one event at a time (ThreadSafeBetweenModules). This allows scaling test for what happens if modules are not fully thread safe. reco_minbias_sleeping_perfectIO.config The same as reco_minbias_sleeping.config except both the source and output (i.e. the Filter) are set to completely thread safe (ThreadSafeBetweenInstances). This tests the scaling limits of the framework itself. reco_multijet_sleeping.config The same as reco_minbias_sleeping.config except the events take much longer to process. reco_multijet_sleeping_1000.config The same as reco_multijet_sleeping.config exception each module specifies 1000 event times rather than just 100. This is only here for historic reasons. So the suggested way to proceed with a experimenting is 1) Run reco_minbias_busywait.config while changing both 'nSimultaneousEvents' and 'iterations' (trying to keep the total job time about constant) until you see the events/sec results plateau 2) Run reco_minbias_sleeping.config while changing both 'nSimultaneousEvents' and 'iterations' (trying to keep the total job time about constant) until you see the events/sec results plateau. You should be able to go to much higher nSimultaneousEvents. Also, results from 1) with the same 'nSimultaneousEvents' should be the same as 2) 3) Same as 2) but using reco_minbias_sleeping_betweenmodules.config 4) Same as 2) but using reco_minbias_sleeping_perfectIO.config