Skip to content

Fix #125 and #149#153

Merged
allegroLeiden merged 14 commits intolime-rt:masterfrom
tlunttil:photon_fixes
Oct 27, 2016
Merged

Fix #125 and #149#153
allegroLeiden merged 14 commits intolime-rt:masterfrom
tlunttil:photon_fixes

Conversation

@tlunttil
Copy link
Copy Markdown
Contributor

@tlunttil tlunttil commented Aug 8, 2016

This PR tries to fix issues #125 and #149. I had to do some rather major changes to various parts of photon.c; most importantly, I changed the whole line profile/velocity integration.

This PR also includes some optimisations that had a small (but not insignificant) effect on running time.

There are still some things about blended lines that are not done quite correctly.

tlunttil added 2 commits July 23, 2016 16:30
This has changed the format of binary output population files, because
previously value or norm and norminv was saved in the files. Also removed
unused function sourceFunc from sourcefunc.c.
@tlunttil
Copy link
Copy Markdown
Contributor Author

tlunttil commented Aug 8, 2016

Apparently the old gcc version (at least with its default settings) used by travis-ci doesn't like the inlining.

tlunttil added 3 commits August 8, 2016 12:50
Fixed path length projection, half-index offset and doubling of the
effect of the local cell.

The fixes required redoing the whole velospline thing, which is now
replaced with a piecewise linear approximation and erf lookup. The
new lineshape calculation also does the velocity projection correctly.
Optimisations to critical loops and inlining of sourceFunc_cont
and _line and FastExp.
@allegroLeiden allegroLeiden self-assigned this Aug 9, 2016
@tlunttil
Copy link
Copy Markdown
Contributor Author

I'm going to make some further changes to this, probably including fix to #154.

Fixing issue #154 and bugfixes to earlier commits.
@tlunttil
Copy link
Copy Markdown
Contributor Author

I've updated this PR to include fix to #154 (and some stupid bugs in earlier commits...). It would now be very good if someone(s) had a look - I'd be surprised if I got everything right the first time. Some very preliminary benchmarking (#146 and others) indicates that this PR significantly reduces the disagreement between LIME and another code that I've been using.

This PR includes some optimisations, including inlining of some functions. The speed gain is not huge but not negligible either, maybe 10 to 15 percent. However, inlining needs a compiler with C99 support (which is why the PR fails travis-ci checks).

@allegroLeiden
Copy link
Copy Markdown
Contributor

I'll try to look at it tomorrow.

tlunttil and others added 2 commits August 17, 2016 17:47
Previous commit had some non-working test code that wasn't meant
to be committed.
We now use a more recent version of GCC that supports inlining.
@smaret
Copy link
Copy Markdown
Contributor

smaret commented Aug 22, 2016

Apparently the old gcc version (at least with its default settings) used by travis-ci doesn't like the inlining.

This should be fixed with tlunttil/lime#1

tlunttil and others added 2 commits August 23, 2016 11:19
Removed inlining of FastExp and sourceFunc_line and _cont.
This slowed down the example model runtime from 1m35s to
1m55s on my laptop (with -f -n -p 8).
@allegroLeiden
Copy link
Copy Markdown
Contributor

I'm back from my vacation, should I look this over now?

@tlunttil
Copy link
Copy Markdown
Contributor Author

Yes, please. There may well be all sorts of strange things going on there. I meant to do further testing/thinking last week, but all sorts of things happened and I didn't have time for that in the end.

I've removed the inlining for now, but we'll need to make some decision about what C language features are allowed in the code.

@allegroLeiden
Copy link
Copy Markdown
Contributor

we'll need to make some decision about what C language features are allowed in the code.

For the time being I am afraid I vote pretty firmly for ANSI only. The reason for this is that my primary project or task involving Lime is to add facilities to allow it to be called from CASA. The latter does not seem to ship with its own version of gcc (thank heavens), but for various reasons we run CASA locally on machines with RHEL6. The gcc version on these RHEL6 machines is 4.4.7, which does not seem to support C99 fully, at least by default. Until I clarify the requirements for doing what I am tasked with doing, I would say, let's wait with C99. As soon as I can get a green board with this, then by all means, let's use these features; but I am just not at this point yet.

@tlunttil
Copy link
Copy Markdown
Contributor Author

Ok. We can always revert to my earlier commit with inlining when the time is ripe.

There are some things in the code (at least variable length arrays in a couple of places and erf-function to construct the look-up table) that weren't officially added until in C99, but were supported by gcc by default even in old(ish) versioins. I guess these are fine?

@allegroLeiden
Copy link
Copy Markdown
Contributor

So long as it compiles on my RH6 machines, I don't mind.

@allegroLeiden
Copy link
Copy Markdown
Contributor

The changes fall into two groups, and I have different opinions about them.

  1. All the corrections in photon(), namely
  • Correct projection of distance along the photon path.
  • Breaking up the path through each Voronoi cell into 'in' and 'out' sections.
  • Removing the double accounting of the first half-cell.
  • Use of proper vector velocities along the links between cells rather than speeds in the direction of the link.
  1. Changes to the calculation of the average lineshape.

I fully support 1 but only partially support 2. My reasoning in more detail is as follows. 2 breaks into 2 sub-parts:

2a) The way the projected speed v as a function of distance is approximated.

2b) The numerical integration scheme employed.

For 2a you have gone from the old polynomial interpolation to piecewise linear; for 2b you have gone from rectangle-rule to error function. I like the change rectangle->erf but (at least for the present) I question whether we should move away from the polynomial scheme. The advantage of the latter was perhaps not so much the high order approximation of v, as the adaptive (in a crude fashion) generation of sample intervals. The variation of velocity (and thus line shape) between grid points seems to me to be at present the major bottleneck of approximation in Lime and thus I think we need to tread carefully before we make it any cruder. Indeed I think some movement towards more generic expression of various routines, controlled by global variables/macros, e.g. the polynomial order, even though this will probably slow things down, would be a good thing, because by playing with the various quantities (polynomial order, fineness of sampling etc) we can find out which are important and which not. There is I think a deal of such refinement left to do with Lime and we should be cautious about hard-wiring routines until we know which values are optimal.

Thus what I would suggest would be a re-write of velospline.c:getVelocities() to bring it back similar to the present (1.5) scheme, but with (i) a flexible number of coefficients, ruled by a macro NUM_VEL_COEFFS, and (ii) storage of a polynomial for each cartesian coordinate; then recast photon.c:calcLineAmpPWLin() (what does 'PW' refer to, by the way?) so that it performs the erf integrations as now, but preserves the old adaptive sampling scheme, and obtains its linear approximations for v(x) from the polynomial scheme. I can PR this after the acceptance of your PR as-is if you like.

Detailed comments are as follows.

  • lime.h:87: why not just declare
    extern const double oneOver_i[FAST_EXP_MAX_TAYLOR];

and loop to FAST_EXP_MAX_TAYLOR at main.c:32 instead of hard-wiring the value 9.

  • I'm a bit uneasy about the hard-wiring of 6145 for the size of ERF_TABLE[].
  • In popsin.c, popsout.c: shouldn't we read and write the former .norm and .norminv into dummies for now until we decide to break backward compatibility?
  • sourcefunc.c: valuable innovations in sourceFunc_line() and sourceFunc_cont(), but why not have gm->pops etc instead of (*gm).pops? It doesn't really matter, just personal preference.
  • photon.c:geterf(): I guess every coder has their personal bugbears, and hard-wired numbers and repeated code are two of mine, so you may guess I would have written this a bit differently. :) Comes down to coding style in the end, so not worth changing perhaps. We can fight it out in medieval armour with battleaxes some day, that would be the proper method. ;)
  • photon.c:calcLineAmpPWLin() - I am not sure this integral is done correctly. The function should estimate
             ds
         1   /       (-[deltav-v{x}]^2 )
    I = ---- | dx exp(-----------------)
         ds  /       (       B         )
            0

where ds is the (projected) length of the 'link' between grid cells, x is the physical distance along this link, v(x) the velocity component in the direction of the photon as a function of x, and B is the Doppler width (which I've here assumed to be constant for simplicity). Your function breaks this into

             __4   x1_j
         1   \    /       (-[deltav-v{x}]^2 )
    I = ----  >   | dx exp(-----------------)
         ds  /_   /       (       B         )
             j=0  x0_j

where x0_j = ds*j/5 and x1_j = x0_{j+1}. For any given j you approximate v by a linear function

    v(x) = v0 + A*(x - x0)

where

         v1 - v0
    A = ---------.
         x1 - x0

Making the change of variable

    t = (A*[x-x0] - deltav + v0)/sqrt(B)

yields

                     __4          (v1_j - deltav)/sqrt(B)
         sqrt(pi*B)  \        2     /
    I = ------------  >   --------- | dt exp(-t^2)
           2*ds*A    /_    sqrt(pi) /
                     j=0           (v0_j - deltav)/sqrt(B)

                     __4
         sqrt(pi*B)  \
    I = ------------  > (erf[{v1_j-deltav}/sqrt{B}] - erf[{v0_j-deltav}/sqrt{B}]).
           2*ds*A    /_
                     j=0

I don't guarantee that there are no mistakes in my working here, but this is the correct line to pursue. (This is the single item which it is necessary to fix before the PR can be accepted - all the other suggestions are optional or simply discussion items for further work.)

  • photon.c:139 - there is no need for this line.
  • photon.c:193,199 - I'm wondering why you made this change? Is it faster, if so why?
  • photon.c:260 - It is easy to verify that the factor of 4.3 was chosen because exp(-[4.3/2]^2) ~ 0.01. This may indeed be too coarse. However I would like to see some change to the effect:
    lime.h:
      #define MIN_REL_LINE_AMP 0.01 // or whatever
      configInfo{
        double maxFWDeltaV;
    main.c:
      par->maxFWDeltaV = 2.0*sqrt(-ln(MIN_REL_LINE_AMP));
    photon.c:
      deltav = par->maxFWDeltaV*segment etc
  • photon.c:279,298 - could have used veloproject().

@allegroLeiden
Copy link
Copy Markdown
Contributor

what does 'PW' refer to, by the way?

piecewise, duh

@tlunttil
Copy link
Copy Markdown
Contributor Author

tlunttil commented Sep 1, 2016

I'll need to look again at what I did with integration...

Regarding the polynomial scheme: isn't the adaptiveness in some sense false accuracy, because the interpolating polynomial is fixed? There's no reason to think that sampling the polynomial more densely provides a better estimate of the true velocity.

Answers to your other questions (in the same order):

  • Mainly because of laziness ;) Also I doubt there's ever any reason to use a very high degree.
  • There was some reason for choosing 6145, but I'm no longer sure what it was...
  • I didn't think that backward compatibility (for these files) is important. We can just use the code in Untangling - starting with the dust/knu thing #152, which uses dummy variables.
  • Yeah, I think I'll change that to ->
  • I expected that you may not like hardwired numbers and it could certainly be changes. Some repetition is a bit hard to avoid here at least without sacrificing speed.
  • I think that line 139 (and 122) are there because without them compiler complained about possibly unset variables (although clearly that isn't the case). But it's true that the lines don't do anything.
  • I think (didn't check the assembler, though) that the old version of lines 193, 199 compiles as a run-time division, while the new one does division at compile time (and divisions are a lot slower than multiplications, so it's better to avoid them in time-critical parts).
  • Sure, although maybe it's better to rethink the whole frequency sampling thing.
  • I think I forgot here momentarily that there's such a function as veloproject()

@allegroLeiden
Copy link
Copy Markdown
Contributor

Regarding the polynomial scheme: isn't the adaptiveness in some sense false accuracy, because the interpolating polynomial is fixed? There's no reason to think that sampling the polynomial more densely provides a better estimate of the true velocity.

True, but that is not really the point of the adaptive scheme. The problem is that where the velocity varies quickly (regardless of how accurately we track its precise changes), so will the line amplitude, in a highly nonlinear fashion. The adapative scheme doesn't really care about the accuracy, it just adapts to rapid changes in v by choosing finer intervals.

I am advocating for retaining the polynomial scheme with (in the medium term) an easily-settable number of orders, but I would actually be surprised if it turned out that terms greater than order 2 or so made much difference. I do think we ought to find this out, though.

@allegroLeiden
Copy link
Copy Markdown
Contributor

I think I forgot here momentarily that there's such a function as veloproject()

Memory is terrible - I am ashamed to admit that I forgot that one could use continue and break in C. Too used to thinking of them as python-specific luxuries perhaps.

@allegroLeiden
Copy link
Copy Markdown
Contributor

Do we want to keep also the old 'spline' method for line shapes?

I would vote 'no'.

@tlunttil
Copy link
Copy Markdown
Contributor Author

tlunttil commented Oct 5, 2016

So would I, if for no other reason than easier merging...

tlunttil added 3 commits October 14, 2016 15:26
Fix issue #176 and some related bugs in the blended lines part
of photon() and getjbar().
@tlunttil
Copy link
Copy Markdown
Contributor Author

I have now merged the latest master branch and included a fix to #176. Both of these provided ample opportunities for mistakes, so another set of eyes looking at this would be good (I'm mainly thinking of @allegroLeiden and @ParfenovS ). I banished the spline/interpolation stuff, so the new erf-method is the only one remaining. Therefore the results are not identical to previous ones even with -DTEST, but at least the code compiles and the results in example model don't look hugely strange.

As for #176, there seemed to be a strange case of 'premature integration' of all blended lines separately. I'm not sure what the idea here was - I think the old version was clearly wrong. But somehow I never noticed it until @ParfenovS pointed it out.

This version is maybe ~20% slower than the previous one, mainly because of in/out splitting. There are many things that could be optimised, but I think it's better to leave all performance fine-tuning to a separate PR after the functional PRs are done.

@ParfenovS
Copy link
Copy Markdown
Contributor

jnu=0.;
alpha=0.;

at lines 403 and 404 in photon.c (getjbar function) should be removed.

@tlunttil
Copy link
Copy Markdown
Contributor Author

Right, they were leftovers from the old version. I'll fix that and any other bugs once you have gone through the changes.

@ParfenovS
Copy link
Copy Markdown
Contributor

expTau[iline] and tau[iline] are not increasing in photon.c
Thus, the following code:
mp[molI].phot[lineI+iphot*md[molI].nline]+=expTau[iline]*remnantSnu;
will give wrong result.
One should add expTau[iline]*=expDTau and tau[iline]+=dtau;

One can move the following line:
mp[molI].phot[lineI+iphot*md[molI].nline]+=expTau[iline]*remnantSnu;
after if(tau[iline] < -30.){... check

@tlunttil
Copy link
Copy Markdown
Contributor Author

Oops. I was very careless in copy-pasting code from one place to another.

We wouldn't even necessarily need the tau[] array anymore, because everything except maser check is done with expTau and it would be easy to change that, too.

@ParfenovS
Copy link
Copy Markdown
Contributor

getjbar uses both mp[molI].vfac_loc[iphot] and mp[molI].vfac[iphot]
Is this really needed? Maybe one can use only mp[molI].vfac_loc[iphot]

@tlunttil
Copy link
Copy Markdown
Contributor Author

vfac_loc comes from a pure Gaussian (that is, the local profile at the grid point), while vfac includes the effect of velocity gradient in the Voronoi cell containing the grid point (which is needed for calculating the change in intensity inside that cell). So I think it's needed, though the effect is probably very small in realistic cases. Though I think that the test at the beginning of the outermost loop in getjbar should be against, vfac_loc, not vfac.

@ParfenovS
Copy link
Copy Markdown
Contributor

Ok. I see.

@ParfenovS
Copy link
Copy Markdown
Contributor

I didn't find any other problems.

@tlunttil
Copy link
Copy Markdown
Contributor Author

Thanks for your very detailed work @ParfenovS . I'll fix the bugs you found and maybe add a few more comments in the code to explain what is being done and why.

Fixed bugs. Changed large struct parameters to sourceFunc_line to
pointers. Removed unneeded 'tau' array from photon().
@tlunttil tlunttil closed this Oct 22, 2016
@tlunttil tlunttil reopened this Oct 22, 2016
@tlunttil
Copy link
Copy Markdown
Contributor Author

I pushed a new version with bugs fixed (I hope). I also made the code a bit more readable by replacing magic numbers with things defined in the header file and removed the 'tau' array from photon() that wasn't really necessary.

Although I wrote that I wouldn't do any optimisations in this PR, I decided to do one thing: calls to sourceFunc_line passed big structures (in particular molData). A lot of time seemed to be spent on copying that data, so I changed to using pointers in calls. This made the example model run ~20% faster.

This PR should take care of issues #124, #125, #149, #154, and #176.

@allegroLeiden
Copy link
Copy Markdown
Contributor

It's funny, because I thought the sourceFunc_line call had already been fixed the way you describe. Maybe it got reverted in one of the frantic merges. Really looking forward to getting past the present rapids in Lime.

@tlunttil
Copy link
Copy Markdown
Contributor Author

I had done it in an earlier commit, but then it got reverted when I merged in #152.

@allegroLeiden
Copy link
Copy Markdown
Contributor

Ok, I've had a look through and could not see anything that did not look kosher. Image looks ok. I'll merge it.

BTW I am thinking we could probably use the erfs also in raytrace in place of calcLineAmpSample().

@allegroLeiden allegroLeiden merged commit f0df50c into lime-rt:master Oct 27, 2016
@tlunttil
Copy link
Copy Markdown
Contributor Author

Yeah, I thought about the raytrace, too. It's perhaps not something to go for in 1.7, I think there are several more serious issues to take care of first.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants