Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement per-axis periodicity and box size #276

Merged
merged 20 commits into from
Dec 7, 2022
Merged

Conversation

lgarrison
Copy link
Collaborator

@lgarrison lgarrison commented May 24, 2022

This PR allows the user to specify a periodic box length in each dimension, rather than assuming a cube. Internally, Corrfunc already supports this, so the only necessary change was to add support for passing a Python tuple as the box size instead of a scalar.

This change is API and ABI backward-compatible (EDIT: no longer ABI backwards compatible, see #276 (comment)). Currently, I've only implemented it for theory.DD, but all the other modules continue to work without modification.

@johannesulf, would you be willing to test this PR? I've tested that boxsize=L and boxsize=(L,L,L) get the same answer, but I haven't actually verified the code gets the right answer for Lx!=Ly!=Lz!

Will close #247.

@pep8speaks
Copy link

pep8speaks commented May 24, 2022

Hello @lgarrison! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:

There are currently no PEP 8 issues detected in this Pull Request. Cheers! 🍻

Comment last updated at 2022-07-19 15:10:00 UTC

@lgarrison lgarrison requested a review from manodeep May 24, 2022 16:00
@johannesulf
Copy link

Fantastic! I'll check this PR against halotools in the next few days and report back.

@lgarrison lgarrison added this to the v2.5.0 milestone May 24, 2022
@johannesulf
Copy link

johannesulf commented May 25, 2022

I currently don't find agreement with halotools. Take the following code that counts pairs in spheres with radius of 10 in a uniformly sampled, non-cubic box.

import Corrfunc
from Corrfunc.theory.DD import DD
from halotools.mock_observables import tpcf

N = 100000
boxsize = (100.0, 100.0, 50.0)
nthreads = 4
autocorr = 1
seed = 42
np.random.seed(seed)
X = np.random.uniform(0, boxsize[0], N)
Y = np.random.uniform(0, boxsize[1], N)
Z = np.random.uniform(0, boxsize[2], N)
r_bins = np.array([1e-10, 10])
weights = np.ones_like(X)
results = DD(autocorr, nthreads, r_bins, X, Y, Z, weights1=weights,
             weight_type='pair_product', boxsize=boxsize, periodic=True)

pos = np.vstack([X, Y, Z]).T
xi = tpcf(pos, r_bins, period=boxsize)
print('Number of pairs...')
print('halotools: {:.0f}'.format(
    (xi + 1) * 4 * np.pi / 3 * r_bins[-1]**3 / np.prod(boxsize) * N**2))
print('Corrfunc: {:.0f}'.format(results['npairs'][0]))
print('Expectation: {:.0f}'.format(
    N**2 * 4 * np.pi / 3 * r_bins[-1]**3 / np.prod(boxsize)))

I get agreement with halotools and the expectation value (within error) for cubic boundary conditions. For the above code with a non-cubic box, Corrfunc currently does not agree with halotools. However, if I only consider points with Z<40, there is agreement with halotools. So it seems that the periodicity along the z-axis might not be taken into account by Corrfunc in the above code.

@johannesulf
Copy link

johannesulf commented May 25, 2022

Awesome! Just tested the updated version and everything looks good.

@johannesulf
Copy link

The current PR doesn't implement non-cubic boundary conditions for Corrfunc.theory functions other than DD, right? Is it possible to also add it for the other pair counters, especially DDsmu?

@lgarrison
Copy link
Collaborator Author

Great! And yes, I haven't ported this feature to the other modules yet. It's not too bad, just involves echoing the _countpairs.c change to each module in that file (and updating the docstrings), and then updating each theory/*/*impl.c.src with the snippet from theory/DD/countpairs_impl.c.src. The last bit is updating the Corrfunc/theory/*.py interfaces (and docstrings).

@manodeep, can you review the change to DD before we proceed to the other modules?

utils/defs.h Outdated Show resolved Hide resolved
@manodeep
Copy link
Owner

I took one quick look and noted some comments, but I need to look more closely. Different boxsizes per dimension might impact the min-separation + periodic features, particularly in the calculation of the cell-pairs.

@lgarrison
Copy link
Collaborator Author

I agree, if you could check those areas of the code, that would be great. Fortunately, there were few places in the code that used boxsize—most functions take xdiff, ydiff, and zdiff separately. And Corrfunc has always supported non-cubic periodic boxes whenever boxsize=0 was used, although that code path was probably less tested. It's also reassuring that non-cubic DD agrees with halotools exactly!

@lgarrison
Copy link
Collaborator Author

I've implemented the semantics from #276 (comment). Specifically, a boxsize of -1 along any dimension now means that dimension is not periodic. The surface area of the change was pretty minimal; it just affected the generation of the cell pairs.

I've tested that turning off periodicity per-dimension works by comparing to the periodic answer where the boxsize along that dimension has been extended beyond the particle distribution, such that it is effectively non-periodic. The answers agree exactly.

@manodeep, can you take a look?

@lgarrison lgarrison requested a review from manodeep May 31, 2022 19:20
Copy link
Owner

@manodeep manodeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright I have gone through and really have a couple of major comments

  • the x/y/zdiff calculation and in get_binsize might need to be switched back. This is in part due to not-so-great naming choices - x/y/zdiff in the*impl.c.src refers to periodic wrapping lengths, while x/y/zdiff in gridlink* refers to an actual diff = (xmax-xmin).

  • would highly recommend against changing parameter semantics while keeping the function signature the same

  • think calls to generate_cell_pairs should invoke separate periodic_x/y/z rather than the replicated options->periodic

CHANGES.rst Outdated Show resolved Hide resolved
Corrfunc/theory/DD.py Show resolved Hide resolved
theory/DD/countpairs_impl.c.src Outdated Show resolved Hide resolved
Comment on lines 248 to 250
const DOUBLE xdiff = (options->periodic && options->boxsize_x > 0) ? options->boxsize_x : (xmax-xmin);
const DOUBLE ydiff = (options->periodic && boxsize_y > 0) ? boxsize_y : (ymax-ymin);
const DOUBLE zdiff = (options->periodic && boxsize_z > 0) ? boxsize_z : (zmax-zmin);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure I follow the logic here. It also seems that we might want to initialise to 0.0 for the cases where -1 has been passed as the box size.

Separately, this might be a good opportunity to clean up the naming for x/y/zdiff -> x/y/z_wrap or something like that

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've renamed these to xwrap now. These values are never used if a dimension is not periodic. We could set them to 0 in that case, but I'm not sure how to do that without adding another conditional, which I'm afraid would make the logic even harder to follow.

theory/DD/countpairs_impl.c.src Outdated Show resolved Hide resolved
utils/gridlink_mocks_impl.c.src Outdated Show resolved Hide resolved
utils/gridlink_impl.h.src Outdated Show resolved Hide resolved
utils/gridlink_impl.c.src Show resolved Hide resolved
utils/gridlink_utils.c.src Outdated Show resolved Hide resolved
utils/gridlink_utils.h.src Show resolved Hide resolved
@lgarrison lgarrison changed the title Implement non-cubic periodic box Implement per-axis periodicity and box size Jun 3, 2022
@lgarrison
Copy link
Collaborator Author

Thanks for the review; I think we've uncovered an unintended behavior in the binsize calculation predating this PR. Can you take a look at #276 (comment) and see if you agree?

@lgarrison
Copy link
Collaborator Author

Hey @manodeep, can you take a look at this when you get the chance?

utils/gridlink_impl.c.src Show resolved Hide resolved
utils/gridlink_impl.c.src Outdated Show resolved Hide resolved
@manodeep
Copy link
Owner

I had looked at the comment and it seems fine. (Realised just now that I never "submitted" the comment - so it was pending ...)

@lgarrison
Copy link
Collaborator Author

Great! The code now uses the spatial extent instead of the boxsize, and all the tests are passing, so I think we're ready to port to the other modules. Will work on that next week.

@lgarrison
Copy link
Collaborator Author

@manodeep: xi and wp take a double boxsize argument in their C API, so they actually ignore options->boxsize. Do you think we should leave xi and wp alone in this PR, or should they also support non-cubic (periodic) boxes? If so, do you think we should add double boxsize_y, double boxsize_z to the C function, or remove double boxsize and only use options->boxsize?

@johannesulf
Copy link

I did some more in-depth tests and now I get some errors but only sometimes. Let me see whether I can find out what exactly triggers it.

@johannesulf
Copy link

@lgarrison I get errors when the number of tracers is very small. For example, if I take my code above and set N=2, it'll crash for me. I'm not sure whether this is supposed to happen. In principle, with a maximum search radius of 10 and box dimensions of 100, 100 and 50, no tracer should be counted twice.

../../utils/gridlink_utils_double.c> ERROR: nlattice = 2 is so small that with periodic wrapping the same cells will be counted twice ....exiting
../../utils/gridlink_utils_double.c> Please reduce Rmax = 10.000000 to be a smaller fraction of the particle distribution region = 13.333546
../../utils/gridlink_utils_double.c> ERROR: nlattice = 1 is so small that with periodic wrapping the same cells will be counted twice ....exiting
../../utils/gridlink_utils_double.c> Please reduce Rmax = 10.000000 to be a smaller fraction of the particle distribution region = 0.001206
Received xstatus = 0 ystatus = 1 zstatus = 1. Error

@manodeep
Copy link
Owner

@johannesulf Corrfunc requires a minimum of 3 cells per dimension and so will crash based on the calculated number of cells -- check is here. It looks like the actual particle extent is significantly smaller than 100 (or 50) - e.g., the two tracers are separated by ~13 Mpc in Y (-> only 2 cells) and ~1e-3 in Z (-> only 1 cell).

You can avoid this crash by explicitly passing the boxsize.

@lgarrison
Copy link
Collaborator Author

lgarrison commented Jun 24, 2022

I just checked, the crash still happens if the boxsize is passed (as Johannes' above example does). But this makes sense, because we changed the gridding to only span the particle extent, whereas before it was using the boxsize. So now, regardless of the boxsize, we may have few cells if the particle extent is narrow.

I think the fix is actually pretty simple: only apply the 3 cell minimum if any pairs can be counted across the wrap; that is, if xdiff + Rmax >= xwrap. This is actually a little conservative, but I think will cover almost all practical cases.

@johannesulf
Copy link

Works much better. But I'm getting an out-of-bounds error when calculating an auto-correlation function with e.g. Corrfunc.theory.DD using just 1 particle, even if the particle is within the bounds.

@johannesulf
Copy link

Wonderful! Everything looks good now.

@lgarrison
Copy link
Collaborator Author

Great! @manodeep do you want to check my last two commits?

@lgarrison lgarrison requested a review from manodeep July 5, 2022 18:50
Copy link
Owner

@manodeep manodeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that we have to make some changes to the grid-cell assignment (with setting inv_x/y/zbinsize to 0, perhaps it would be better to use the entire boxsize for cases where x/y/zdiff is 0 (and print a warning). I know I am going back on what I said before, but I hadn't realised this edge-case of particles all lying on a plane/line etc

utils/gridlink_utils.c.src Outdated Show resolved Hide resolved
utils/gridlink_utils.h.src Show resolved Hide resolved
@lgarrison
Copy link
Collaborator Author

Thanks @manodeep, I've added the warning and the parentheses as you suggested.

I'm not sure I follow the suggestion about the inverse xdiff. Wouldn't using the boxsize when xdiff is zero just create a lot of empty cells to process? And if the user requested automatic detection of the particle extent, then we don't even have a sensible non-zero value of the boxsize to use.

I think the code handles the planar case fine: once all the particles are gridded, it no longer matters that they were in a plane, or even what the size of their containing cell is. The binsize calculation will still raise an error if the user tries to apply periodicity in that dimension that could cause repeat counts.

@manodeep
Copy link
Owner

Thanks @manodeep, I've added the warning and the parentheses as you suggested.

I'm not sure I follow the suggestion about the inverse xdiff. Wouldn't using the boxsize when xdiff is zero just create a lot of empty cells to process? And if the user requested automatic detection of the particle extent, then we don't even have a sensible non-zero value of the boxsize to use.

Agree that we need a single cell (i.e. cell index == 0) and the easiest way to do that is by setting inv_xyzdiff to 0. However, that might be confusing to read - may be add some lines of comments to show why inv_xyzdiff is being set to 0. That way, other users and later developers will not be confused.

Yup - agree that when the particles are co-planar (or worse), we have a meaningful boxsize only in the periodic cases and not in the non-periodic ones.

I think the code handles the planar case fine: once all the particles are gridded, it no longer matters that they were in a plane, or even what the size of their containing cell is. The binsize calculation will still raise an error if the user tries to apply periodicity in that dimension that could cause repeat counts.

If there is only a single cell, then I am not sure what conditions needs to be checked - (xdiff + rmax) >= xwrap or (xdiff + rmax) >= 0.5*xwrap. Did you have a simple unit-test added that can check?

@lgarrison
Copy link
Collaborator Author

If there is only a single cell, then I am not sure what conditions needs to be checked - (xdiff + rmax) >= xwrap or (xdiff + rmax) >= 0.5*xwrap. Did you have a simple unit-test added that can check?

I'm thinking of the condition as: "is this problem periodic?" (this previous condition was literally options->periodic). Which means that one edge of the particle extent can touch the other across the wrap. For that to happen, they need to be within Rmax of each other, or xwrap - xdiff >= rmax, which is equivalent to your first condition.

There's a unit test added here to make sure the code gives the right answer in the single-cell case (I've updated it slightly in the last commit): https://github.com/manodeep/Corrfunc/pull/276/files#diff-c7d4836bf94aa48cd9d4e4410ddab869be6ec08d83dfc0999ee5eac26cd8d30cR100

If we revisit #210, we might be able to relax this condition to allow a greater fraction of the boxsize. But I think this version is the most literal translation of the existing code to this PR.

lgarrison and others added 3 commits October 7, 2022 11:52
* Attempt to enable Rmax comparable to half boxsize by removing (unnecessary?) duplicate cell checks

* Comment out unused var

* Add test implementing Manodeep's example

* Restore duplicate cell pair check, but only count a pair as a duplicate if the wrap value is identical.

* Add pragmas for CI (will revisit)

* Update GNU assembler bug detection (#278)

* Update GNU assembler bug detection

* Cosmetic enhancement to suppress spurious warnings during GAS bug test

* Fix test error code

* Add another test of large Rmax, comparing against brute-force

* Add const qualifiers

* Apply @manodeep's fix for large Rmax test against brute-force, and require nmesh>=2 to avoid duplicate cell pairs

* Make boxsize non-trivial in test. Remove extra print statement.

* Add comments on array broadcasting to test

* Changed variable name for clarity

Co-authored-by: Manodeep Sinha <manodeep@gmail.com>
…ze arg types, and tests for those arg types. Remove old large Rmax test.
@lgarrison
Copy link
Collaborator Author

I merged the large-rmax changes into this branch; it was pretty clean. That branch removed the minimum cells-per-dimension requirement, so now the only requirement is that the user-specified Rmax is less than half the boxsize (in each dimension). This is expressed here:

if(xwrap > 0 && rmax >= xwrap/2){

The only other thing was to add tests that simultaneously checked the large Rmax and non-cubic boxsize features, which is in the last commit. Everything looks fine.

@manodeep Ready to merge? The only check failure is the unrelated CircleCI.

Copy link
Owner

@manodeep manodeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be good to add the python tests for the DDrppi and DDsmu as well. Otherwise, seems to be ready to be merged in

Corrfunc/tests/test_theory.py Show resolved Hide resolved
Comment on lines 275 to 277
const DOUBLE xwrap = (options->periodic && options->boxsize_x > 0) ? options->boxsize_x : (xmax-xmin);
const DOUBLE ywrap = (options->periodic && boxsize_y > 0) ? boxsize_y : (ymax-ymin);
const DOUBLE zwrap = (options->periodic && boxsize_z > 0) ? boxsize_z : (zmax-zmin);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Think I asked this before as well - shouldn't x/y/zwrap be set to 0 when the relevant boxsize_x/y/z is < 0?

Also, if we are only testing for a negative number for setting non-periodic, then the python docstrings need to be updated. Currently, the docstrings say to set to -1.0 for non-periodic along any axis

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So I thought it didn't matter, but indeed we do end up passing xwrap to get_binsize_DOUBLE, so this had the side effect of disallowing Rmax > L/2 for non-periodic boxes, when really Rmax should be unrestricted. The expanded brute-force test caught this. So now we do set xwrap = 0 for non-periodic.

Copy link
Collaborator Author

@lgarrison lgarrison Nov 3, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We actually use BOXSIZE_NOTGIVEN = -2. as a special value in the C API, so the docstrings are correct that -1 is the right value to pass.

boxsize > 0 is really testing boxsize != 0 && boxsize != -1. I will add a comment to that effect.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me know if you want to add the comment - otherwise, I will merge this in

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the new comment; ready for merge, I think!

" the maximum difference within each dimension of the X/Y/Z arrays.\n\n"
" If the boxsize in a dimension is 0., then\n"
" then that dimension's wrap is done based on the extent of the particle\n"
" distribution. If the boxsize in a dimension is -1., then periodicity\n"
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this might need to be "If the boxsize in a dimension is negative, then ..."

@lgarrison
Copy link
Collaborator Author

I've significantly expanded the brute-force test; it now tests almost all combinations of options for DD, DDrppi, and DDsmu. I actually think that if we wanted to reduce the test runtime, we could just use this instead of the gals_Mr19 tests. But that can be a PR for another day.

The expanded tests did catch one or two regressions in supported configurations, namely Rmax > L/2 for non-periodic, and boxsize=None in the Python API. Nothing special popped up related to DDrppi or DDsmu.

@manodeep
Copy link
Owner

@lgarrison Is this ready to merge? And then (after the readme), release v2.5 with this per-axis periodicity and R ~ Lbox/2 features?

@lgarrison
Copy link
Collaborator Author

Yes, ready for merge. I think it's a good time for v2.5 too.

Related to v2.5, I see the milestone has some issues (#192, #210) that are related to gridlink/Rmax and can likely now be closed. I will try to test them this week and close them if appropriate.

@manodeep manodeep merged commit 34667fd into master Dec 7, 2022
@manodeep manodeep deleted the boxsize-per-dim branch May 12, 2023 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Box size for each dimension in Corrfunc.theory
4 participants