Fix #125 and #149 by tlunttil · Pull Request #153 · lime-rt/lime

tlunttil · 2016-08-08T09:30:17Z

This PR tries to fix issues #125 and #149. I had to do some rather major changes to various parts of photon.c; most importantly, I changed the whole line profile/velocity integration.

This PR also includes some optimisations that had a small (but not insignificant) effect on running time.

There are still some things about blended lines that are not done quite correctly.

This has changed the format of binary output population files, because previously value or norm and norminv was saved in the files. Also removed unused function sourceFunc from sourcefunc.c.

…fixes

tlunttil · 2016-08-08T09:38:09Z

Apparently the old gcc version (at least with its default settings) used by travis-ci doesn't like the inlining.

Fixed path length projection, half-index offset and doubling of the effect of the local cell. The fixes required redoing the whole velospline thing, which is now replaced with a piecewise linear approximation and erf lookup. The new lineshape calculation also does the velocity projection correctly.

Optimisations to critical loops and inlining of sourceFunc_cont and _line and FastExp.

tlunttil · 2016-08-11T11:08:44Z

I'm going to make some further changes to this, probably including fix to #154.

Fixing issue #154 and bugfixes to earlier commits.

tlunttil · 2016-08-15T11:07:45Z

I've updated this PR to include fix to #154 (and some stupid bugs in earlier commits...). It would now be very good if someone(s) had a look - I'd be surprised if I got everything right the first time. Some very preliminary benchmarking (#146 and others) indicates that this PR significantly reduces the disagreement between LIME and another code that I've been using.

This PR includes some optimisations, including inlining of some functions. The speed gain is not huge but not negligible either, maybe 10 to 15 percent. However, inlining needs a compiler with C99 support (which is why the PR fails travis-ci checks).

allegroLeiden · 2016-08-16T08:24:11Z

I'll try to look at it tomorrow.

Previous commit had some non-working test code that wasn't meant to be committed.

We now use a more recent version of GCC that supports inlining.

smaret · 2016-08-22T14:47:06Z

Apparently the old gcc version (at least with its default settings) used by travis-ci doesn't like the inlining.

This should be fixed with tlunttil/lime#1

Removed inlining of FastExp and sourceFunc_line and _cont. This slowed down the example model runtime from 1m35s to 1m55s on my laptop (with -f -n -p 8).

Fix travis build

allegroLeiden · 2016-08-29T12:13:07Z

I'm back from my vacation, should I look this over now?

tlunttil · 2016-08-29T12:20:07Z

Yes, please. There may well be all sorts of strange things going on there. I meant to do further testing/thinking last week, but all sorts of things happened and I didn't have time for that in the end.

I've removed the inlining for now, but we'll need to make some decision about what C language features are allowed in the code.

allegroLeiden · 2016-08-30T10:07:51Z

we'll need to make some decision about what C language features are allowed in the code.

For the time being I am afraid I vote pretty firmly for ANSI only. The reason for this is that my primary project or task involving Lime is to add facilities to allow it to be called from CASA. The latter does not seem to ship with its own version of gcc (thank heavens), but for various reasons we run CASA locally on machines with RHEL6. The gcc version on these RHEL6 machines is 4.4.7, which does not seem to support C99 fully, at least by default. Until I clarify the requirements for doing what I am tasked with doing, I would say, let's wait with C99. As soon as I can get a green board with this, then by all means, let's use these features; but I am just not at this point yet.

tlunttil · 2016-08-30T10:24:37Z

Ok. We can always revert to my earlier commit with inlining when the time is ripe.

There are some things in the code (at least variable length arrays in a couple of places and erf-function to construct the look-up table) that weren't officially added until in C99, but were supported by gcc by default even in old(ish) versioins. I guess these are fine?

allegroLeiden · 2016-08-31T08:05:54Z

So long as it compiles on my RH6 machines, I don't mind.

allegroLeiden · 2016-09-01T14:52:51Z

The changes fall into two groups, and I have different opinions about them.

All the corrections in photon(), namely

Correct projection of distance along the photon path.
Breaking up the path through each Voronoi cell into 'in' and 'out' sections.
Removing the double accounting of the first half-cell.
Use of proper vector velocities along the links between cells rather than speeds in the direction of the link.

Changes to the calculation of the average lineshape.

I fully support 1 but only partially support 2. My reasoning in more detail is as follows. 2 breaks into 2 sub-parts:

2a) The way the projected speed v as a function of distance is approximated.

2b) The numerical integration scheme employed.

For 2a you have gone from the old polynomial interpolation to piecewise linear; for 2b you have gone from rectangle-rule to error function. I like the change rectangle->erf but (at least for the present) I question whether we should move away from the polynomial scheme. The advantage of the latter was perhaps not so much the high order approximation of v, as the adaptive (in a crude fashion) generation of sample intervals. The variation of velocity (and thus line shape) between grid points seems to me to be at present the major bottleneck of approximation in Lime and thus I think we need to tread carefully before we make it any cruder. Indeed I think some movement towards more generic expression of various routines, controlled by global variables/macros, e.g. the polynomial order, even though this will probably slow things down, would be a good thing, because by playing with the various quantities (polynomial order, fineness of sampling etc) we can find out which are important and which not. There is I think a deal of such refinement left to do with Lime and we should be cautious about hard-wiring routines until we know which values are optimal.

Thus what I would suggest would be a re-write of velospline.c:getVelocities() to bring it back similar to the present (1.5) scheme, but with (i) a flexible number of coefficients, ruled by a macro NUM_VEL_COEFFS, and (ii) storage of a polynomial for each cartesian coordinate; then recast photon.c:calcLineAmpPWLin() (what does 'PW' refer to, by the way?) so that it performs the erf integrations as now, but preserves the old adaptive sampling scheme, and obtains its linear approximations for v(x) from the polynomial scheme. I can PR this after the acceptance of your PR as-is if you like.

Detailed comments are as follows.

lime.h:87: why not just declare

    extern const double oneOver_i[FAST_EXP_MAX_TAYLOR];

and loop to FAST_EXP_MAX_TAYLOR at main.c:32 instead of hard-wiring the value 9.

I'm a bit uneasy about the hard-wiring of 6145 for the size of ERF_TABLE[].
In popsin.c, popsout.c: shouldn't we read and write the former .norm and .norminv into dummies for now until we decide to break backward compatibility?
sourcefunc.c: valuable innovations in sourceFunc_line() and sourceFunc_cont(), but why not have gm->pops etc instead of (*gm).pops? It doesn't really matter, just personal preference.
photon.c:geterf(): I guess every coder has their personal bugbears, and hard-wired numbers and repeated code are two of mine, so you may guess I would have written this a bit differently. :) Comes down to coding style in the end, so not worth changing perhaps. We can fight it out in medieval armour with battleaxes some day, that would be the proper method. ;)
photon.c:calcLineAmpPWLin() - I am not sure this integral is done correctly. The function should estimate

             ds
         1   /       (-[deltav-v{x}]^2 )
    I = ---- | dx exp(-----------------)
         ds  /       (       B         )
            0

where ds is the (projected) length of the 'link' between grid cells, x is the physical distance along this link, v(x) the velocity component in the direction of the photon as a function of x, and B is the Doppler width (which I've here assumed to be constant for simplicity). Your function breaks this into

             __4   x1_j
         1   \    /       (-[deltav-v{x}]^2 )
    I = ----  >   | dx exp(-----------------)
         ds  /_   /       (       B         )
             j=0  x0_j

where x0_j = ds*j/5 and x1_j = x0_{j+1}. For any given j you approximate v by a linear function

    v(x) = v0 + A*(x - x0)

where

         v1 - v0
    A = ---------.
         x1 - x0

Making the change of variable

    t = (A*[x-x0] - deltav + v0)/sqrt(B)

yields

                     __4          (v1_j - deltav)/sqrt(B)
         sqrt(pi*B)  \        2     /
    I = ------------  >   --------- | dt exp(-t^2)
           2*ds*A    /_    sqrt(pi) /
                     j=0           (v0_j - deltav)/sqrt(B)

                     __4
         sqrt(pi*B)  \
    I = ------------  > (erf[{v1_j-deltav}/sqrt{B}] - erf[{v0_j-deltav}/sqrt{B}]).
           2*ds*A    /_
                     j=0

I don't guarantee that there are no mistakes in my working here, but this is the correct line to pursue. (This is the single item which it is necessary to fix before the PR can be accepted - all the other suggestions are optional or simply discussion items for further work.)

photon.c:139 - there is no need for this line.
photon.c:193,199 - I'm wondering why you made this change? Is it faster, if so why?
photon.c:260 - It is easy to verify that the factor of 4.3 was chosen because exp(-[4.3/2]^2) ~ 0.01. This may indeed be too coarse. However I would like to see some change to the effect:

    lime.h:
      #define MIN_REL_LINE_AMP 0.01 // or whatever
      configInfo{
        double maxFWDeltaV;
    main.c:
      par->maxFWDeltaV = 2.0*sqrt(-ln(MIN_REL_LINE_AMP));
    photon.c:
      deltav = par->maxFWDeltaV*segment etc

photon.c:279,298 - could have used veloproject().

allegroLeiden · 2016-09-01T14:58:39Z

what does 'PW' refer to, by the way?

piecewise, duh

tlunttil · 2016-09-01T15:49:31Z

I'll need to look again at what I did with integration...

Regarding the polynomial scheme: isn't the adaptiveness in some sense false accuracy, because the interpolating polynomial is fixed? There's no reason to think that sampling the polynomial more densely provides a better estimate of the true velocity.

Answers to your other questions (in the same order):

Mainly because of laziness ;) Also I doubt there's ever any reason to use a very high degree.
There was some reason for choosing 6145, but I'm no longer sure what it was...
I didn't think that backward compatibility (for these files) is important. We can just use the code in Untangling - starting with the dust/knu thing #152, which uses dummy variables.
Yeah, I think I'll change that to ->
I expected that you may not like hardwired numbers and it could certainly be changes. Some repetition is a bit hard to avoid here at least without sacrificing speed.
I think that line 139 (and 122) are there because without them compiler complained about possibly unset variables (although clearly that isn't the case). But it's true that the lines don't do anything.
I think (didn't check the assembler, though) that the old version of lines 193, 199 compiles as a run-time division, while the new one does division at compile time (and divisions are a lot slower than multiplications, so it's better to avoid them in time-critical parts).
Sure, although maybe it's better to rethink the whole frequency sampling thing.
I think I forgot here momentarily that there's such a function as veloproject()

allegroLeiden · 2016-09-01T16:22:04Z

Regarding the polynomial scheme: isn't the adaptiveness in some sense false accuracy, because the interpolating polynomial is fixed? There's no reason to think that sampling the polynomial more densely provides a better estimate of the true velocity.

True, but that is not really the point of the adaptive scheme. The problem is that where the velocity varies quickly (regardless of how accurately we track its precise changes), so will the line amplitude, in a highly nonlinear fashion. The adapative scheme doesn't really care about the accuracy, it just adapts to rapid changes in v by choosing finer intervals.

I am advocating for retaining the polynomial scheme with (in the medium term) an easily-settable number of orders, but I would actually be surprised if it turned out that terms greater than order 2 or so made much difference. I do think we ought to find this out, though.

allegroLeiden · 2016-09-01T16:24:09Z

I think I forgot here momentarily that there's such a function as veloproject()

Memory is terrible - I am ashamed to admit that I forgot that one could use continue and break in C. Too used to thinking of them as python-specific luxuries perhaps.

allegroLeiden · 2016-10-05T18:15:36Z

Do we want to keep also the old 'spline' method for line shapes?

I would vote 'no'.

tlunttil · 2016-10-05T18:24:21Z

So would I, if for no other reason than easier merging...

…fixes

Fix issue #176 and some related bugs in the blended lines part of photon() and getjbar().

tlunttil · 2016-10-15T17:27:08Z

I have now merged the latest master branch and included a fix to #176. Both of these provided ample opportunities for mistakes, so another set of eyes looking at this would be good (I'm mainly thinking of @allegroLeiden and @ParfenovS ). I banished the spline/interpolation stuff, so the new erf-method is the only one remaining. Therefore the results are not identical to previous ones even with -DTEST, but at least the code compiles and the results in example model don't look hugely strange.

As for #176, there seemed to be a strange case of 'premature integration' of all blended lines separately. I'm not sure what the idea here was - I think the old version was clearly wrong. But somehow I never noticed it until @ParfenovS pointed it out.

This version is maybe ~20% slower than the previous one, mainly because of in/out splitting. There are many things that could be optimised, but I think it's better to leave all performance fine-tuning to a separate PR after the functional PRs are done.

ParfenovS · 2016-10-16T10:27:27Z

jnu=0.;
alpha=0.;

at lines 403 and 404 in photon.c (getjbar function) should be removed.

tlunttil · 2016-10-16T10:36:17Z

Right, they were leftovers from the old version. I'll fix that and any other bugs once you have gone through the changes.

ParfenovS · 2016-10-19T10:46:18Z

expTau[iline] and tau[iline] are not increasing in photon.c
Thus, the following code:
mp[molI].phot[lineI+iphot*md[molI].nline]+=expTau[iline]*remnantSnu;
will give wrong result.
One should add expTau[iline]*=expDTau and tau[iline]+=dtau;

One can move the following line:
mp[molI].phot[lineI+iphot*md[molI].nline]+=expTau[iline]*remnantSnu;
after if(tau[iline] < -30.){... check

tlunttil · 2016-10-19T12:41:36Z

Oops. I was very careless in copy-pasting code from one place to another.

We wouldn't even necessarily need the tau[] array anymore, because everything except maser check is done with expTau and it would be easy to change that, too.

ParfenovS · 2016-10-19T15:52:10Z

getjbar uses both mp[molI].vfac_loc[iphot] and mp[molI].vfac[iphot]
Is this really needed? Maybe one can use only mp[molI].vfac_loc[iphot]

tlunttil · 2016-10-19T16:04:37Z

vfac_loc comes from a pure Gaussian (that is, the local profile at the grid point), while vfac includes the effect of velocity gradient in the Voronoi cell containing the grid point (which is needed for calculating the change in intensity inside that cell). So I think it's needed, though the effect is probably very small in realistic cases. Though I think that the test at the beginning of the outermost loop in getjbar should be against, vfac_loc, not vfac.

ParfenovS · 2016-10-19T16:15:38Z

Ok. I see.

ParfenovS · 2016-10-19T16:17:12Z

I didn't find any other problems.

tlunttil · 2016-10-19T16:21:56Z

Thanks for your very detailed work @ParfenovS . I'll fix the bugs you found and maybe add a few more comments in the code to explain what is being done and why.

Fixed bugs. Changed large struct parameters to sourceFunc_line to pointers. Removed unneeded 'tau' array from photon().

tlunttil · 2016-10-22T10:20:18Z

I pushed a new version with bugs fixed (I hope). I also made the code a bit more readable by replacing magic numbers with things defined in the header file and removed the 'tau' array from photon() that wasn't really necessary.

Although I wrote that I wouldn't do any optimisations in this PR, I decided to do one thing: calls to sourceFunc_line passed big structures (in particular molData). A lot of time seemed to be spent on copying that data, so I changed to using pointers in calls. This made the example model run ~20% faster.

This PR should take care of issues #124, #125, #149, #154, and #176.

allegroLeiden · 2016-10-25T15:36:22Z

It's funny, because I thought the sourceFunc_line call had already been fixed the way you describe. Maybe it got reverted in one of the frantic merges. Really looking forward to getting past the present rapids in Lime.

tlunttil · 2016-10-25T15:40:09Z

I had done it in an earlier commit, but then it got reverted when I merged in #152.

allegroLeiden · 2016-10-27T10:58:06Z

Ok, I've had a look through and could not see anything that did not look kosher. Image looks ok. I'll merge it.

BTW I am thinking we could probably use the erfs also in raytrace in place of calcLineAmpSample().

tlunttil · 2016-10-27T13:51:29Z

Yeah, I thought about the raytrace, too. It's perhaps not something to go for in 1.7, I think there are several more serious issues to take care of first.

tlunttil added 2 commits July 23, 2016 16:30

Removed scaling factors norm/norminv that seemed to serve no purpose.

8b429ae

This has changed the format of binary output population files, because previously value or norm and norminv was saved in the files. Also removed unused function sourceFunc from sourcefunc.c.

Merge branch 'master' of https://github.com/lime-rt/lime into photon_…

6f1388d

…fixes

tlunttil added 3 commits August 8, 2016 12:50

Some optimisations.

0456a96

Optimisations to critical loops and inlining of sourceFunc_cont and _line and FastExp.

Cleaned code, added interpolation to geterf.

41d3e8d

allegroLeiden self-assigned this Aug 9, 2016

allegroLeiden added bug enhancement awaiting update labels Aug 12, 2016

Fix to #154

e9686d9

Fixing issue #154 and bugfixes to earlier commits.

tlunttil and others added 2 commits August 17, 2016 17:47

Fix previous commit

6aa3653

Previous commit had some non-working test code that wasn't meant to be committed.

Fix travis build

535e6f4

We now use a more recent version of GCC that supports inlining.

tlunttil and others added 2 commits August 23, 2016 11:19

Removed inlining

f204104

Removed inlining of FastExp and sourceFunc_line and _cont. This slowed down the example model runtime from 1m35s to 1m55s on my laptop (with -f -n -p 8).

Merge pull request #1 from lime-rt/tlunttil-photon_fixes

5c9d1dc

Fix travis build

tlunttil mentioned this pull request Aug 29, 2016

Untangling - starting with the dust/knu thing #152

Merged

tlunttil mentioned this pull request Oct 13, 2016

Add IR pumping read from an input file #177

Merged

tlunttil added 3 commits October 14, 2016 15:26

Fixes suggested by review

381b9cb

Merge branch 'master' of https://github.com/lime-rt/lime into photon_…

506d311

…fixes

Fix #176

e06bda0

Fix issue #176 and some related bugs in the blended lines part of photon() and getjbar().

Fix problems from merge

2906724

Fixed bugs. Changed large struct parameters to sourceFunc_line to pointers. Removed unneeded 'tau' array from photon().

tlunttil closed this Oct 22, 2016

tlunttil reopened this Oct 22, 2016

allegroLeiden merged commit f0df50c into lime-rt:master Oct 27, 2016

tlunttil mentioned this pull request Oct 27, 2016

Wrong dopb in deltav calculation. #154

Closed

allegroLeiden removed the awaiting update label Oct 27, 2016

tlunttil deleted the photon_fixes branch November 11, 2016 15:14

This was referenced Jan 31, 2017

Implemented 2nd-order interpolation of velocity during raytracing. #209

Merged

Doubling the local radiation field? #149

Closed

Conversation

tlunttil commented Aug 8, 2016

Uh oh!

tlunttil commented Aug 8, 2016

Uh oh!

tlunttil commented Aug 11, 2016

Uh oh!

tlunttil commented Aug 15, 2016

Uh oh!

allegroLeiden commented Aug 16, 2016

Uh oh!

smaret commented Aug 22, 2016

Uh oh!

allegroLeiden commented Aug 29, 2016

Uh oh!

tlunttil commented Aug 29, 2016

Uh oh!

allegroLeiden commented Aug 30, 2016

Uh oh!

tlunttil commented Aug 30, 2016

Uh oh!

allegroLeiden commented Aug 31, 2016

Uh oh!

allegroLeiden commented Sep 1, 2016

Uh oh!

allegroLeiden commented Sep 1, 2016

Uh oh!

tlunttil commented Sep 1, 2016 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

allegroLeiden commented Sep 1, 2016

Uh oh!

allegroLeiden commented Sep 1, 2016

Uh oh!

allegroLeiden commented Oct 5, 2016

Uh oh!

tlunttil commented Oct 5, 2016

Uh oh!

tlunttil commented Oct 15, 2016

Uh oh!

ParfenovS commented Oct 16, 2016

Uh oh!

tlunttil commented Oct 16, 2016

Uh oh!

ParfenovS commented Oct 19, 2016

Uh oh!

tlunttil commented Oct 19, 2016

Uh oh!

ParfenovS commented Oct 19, 2016

Uh oh!

tlunttil commented Oct 19, 2016

Uh oh!

ParfenovS commented Oct 19, 2016

Uh oh!

ParfenovS commented Oct 19, 2016

Uh oh!

tlunttil commented Oct 19, 2016

Uh oh!

tlunttil commented Oct 22, 2016

Uh oh!

allegroLeiden commented Oct 25, 2016

Uh oh!

tlunttil commented Oct 25, 2016

Uh oh!

allegroLeiden commented Oct 27, 2016

Uh oh!

tlunttil commented Oct 27, 2016

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

tlunttil commented Sep 1, 2016 •

edited

Loading