Closed
Conversation
In the new module fast_exp.c there was a line to include ieee754.h, as a kind of crude attempt to test the compliance with this standard that the FastExp function requires (see issue lime-rt#49). However it seems that Mac OSX doesn't make use of this header file. I thus simply deleted the include for now.
The single main loop over grid cells is broken into 3. This is in preparation for the parallelisation. No change in run time or output is detected.
The reason for this is to separate the pointers which the parallel threads need to write to from the remainder of the molData attributes, which are read-only within the crucial code blocks. I put three of the attributes in question, 'jbar' 'phot' and 'vfac', into a separate structure, which is now declared, allocated, written to and freed within the code lines which will eventually be within the parallel block. The attribute 'ds' does not depend on species and so was just declared as a pointer to double outside of any struct. I also renamed it to halfFirstDs to be a bit more descriptive. The run time and output values appear to be unaffected.
This is done to make it easier later on to corral the thread-private stuff in a parallel block.
Again this is done to prepare for the parallelisation, when we will need separate RNGs for the separate threads. Note that we will still get repeatable results when the TEST flag in the makefile is set. The output is now slightly different than the previous commit produced, because in effect different random seeds are in use both in the random selection of initial photon directions and the random dither of starting rays in the raytracing section. The present output however is expected to be identical to that produced by single-thread running of the parallelized code.
Pointer expTau in photon() is now freed after use.
I had called it molDataPrivate, now it is gridPointData.
There were many pointers which were malloced via a statement of the form <name> = malloc(sizeof(<data type of pointer>) * <number of entries>); This is not very conservative. Better is <name> = malloc(sizeof(*<name>) * <number of entries>); I've changed this in numerous instances.
In the case in molinit() that the opacities file contains no entries, bail_out() was called without a following call to exit(). This is now fixed.
Also defined defaultNThreads in lime.h. par.nThreads is set to this if the user leaves it unset. (We really need to separate user-settable parameters from task configuration information!)
In f07953a I introduced arrangements to provide separate random number generators when we add the facility to run several threads in parallel. This scheme however gave rise to a data race condition on the pointer 'ran'. Although this seemed to be harmless in practice, in the interests of conservative programming, an alternative scheme is now used which should avoid the data race.
Detailed changes: - Changed the definition in the Makefile of CC to gcc -fopenmp; - Added some tests on _OPENMP to lime.h; - Added the necessary OMP pragmas in aux.c, stateq.c and raytrace.c; - Added a new function greetings_parallel() in curses.c (not yet used). Running with 4 threads was observed to decrease run-time by about a factor of 2.5. The output shows small differences, which are probably due to the data race condition on the values of some grid-point-specific quantities. This does not appear to be significant.
Merged
Contributor
|
Superseded by #56. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
There are a lot of commits here, but most of them are rearrangements to prepare the way for the final one, which allows the user to divide the bottleneck processing between several threads (via a new optional user-settable parameter 'nThreads').
The task now makes use of the OpenMP API provided by the gomp library to divide up between threads two sections of code: (i) the solution of the radiative transport equations; (ii) raytracing to make the final image.