Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tickets/dm 2557 #23

Merged
merged 16 commits into from Apr 29, 2015
Merged

Tickets/dm 2557 #23

merged 16 commits into from Apr 29, 2015

Conversation

danielsf
Copy link
Contributor

This ticket adds methods to perform findDetectors on numpy arrays of points. It is necessary to speed up the CatSim methods that read it catalogs of objects and assign them to individual detectors.

@return a list of lists; each list contains the names of all detectors which contain the
corresponding point
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We must be very careful in afw to disambiguate between afw.coord and afw.cameraGeom. Thus in cameraGeom we are careful to use the terms "camera coordinate system" (the word "camera" is always necessary) "point" or "camera point" instead of "coord" or "coordinate". We reserve the plain terms "coord", "coordinate", "coordinate system" for the coord package.

Also our standard is to use "List" for ordered collection, both in variable names (Array and the simple plural are discouraged) and in documentation.

From this perspective this API (and perhaps others you added) need some work. Here are some suggestions:

  • Please don't pass x and y as separate arrays. It would be much safer and cleaner to pass them as a single array of Point2D or some other object. This saves any worry that they may not be the same length or may not be in the same order.
  • Change the name "coord" to pointList (or perhaps "cameraPointList", but does "camera" add useful information?). Note that our coding standards discourage simple plurals for collections, so please also change the code accordingly.
  • For coordSys: in the description change "coordinate system" to "camera coordinate system"
  • Please remove the word "is" from your @param commands; just use a pair of spaces (read the generated documentation to see if you agree)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're going to have to start living with some interfaces that accept separate arrays of x and y instead of arrays of points, because arrays of points just aren't NumPy-friendly.

That said, this method apparently uses a vector of Points under the hood (which it populates in a loop in Python), so I can't see how it'd be any more efficient than the interfaces we have now. If you want to speed it up, I'd think you'd need a C++ interface that takes an ndarray::Array for each of x and y, and then loops over them in C++. Otherwise you're just substituting one loop over points in Python for another loop over the same points in Python.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's less that this makes camerGeom any faster than that it facilitates a faster interface with what is going on the catalog simulations code.

The catalog simulations code already reads object positions in from a databases and stores them as numpy arrays of Ra, Dec, which it converts quickly into numpy arrays of pupil coordinates (or whatever other coordinates we need at the time). In order to speed up the conversion from Ra, Dec, we needed an interface that would allow us to pass in those numpy arrays of coordinates and get out a numpy array of corresponding chipNames.

So: I'm sure what Jim is saying is correct, but the existing catalog simulations code had forced us into writing something like this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a similar reason, I'm going to push back on Russell's comment, since, in CatSim, we already have numpy arrays of x and y. Converting them into Point2D in CatSim (which is all python), is what was slowing us down before.

(Wait... I think I see what you're talking about; let me try to pass in a list of cameraPoints)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is just an impedance match, why does it have to be on the afw side rather than the sims side?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The first thing this routine does is [afwGeom.Point2D(x,y) for x,y in zip(xCoord, yCoord)]. All I am asking you to do is move that out of this method and accept a collection of Point2D. It cannot affect speed and it makes for a much cleaner interface.

As to Jim's comment: I would be much happier with a numpy structured array with x and y fields than two separate arrays. Perhaps we can find some way to efficiently convert between this and the bits of C++ that presently use vectors of Point2D. But I think that is for the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The real answer to Jim's question may be: I'm not 100% sure why it's faster, but it is. Running on a simulated catalog 5000 objects, using findDetectors for each takes a total of about 46 seconds. Using findDetectorsList takes 7 seconds.

I think it is related to the frequency of calls to coordinate transformations, either to the camera system or the detector systems.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please open an issue to sort this out later.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there are three different issues in this discussion and I fear they are getting conflated:

  1. The speedup. This is caused by using findDetectorsArray instead of findDetectors, and it makes sense because it uses the vectorized version of TransformMap.transform.

  2. findDetectorsArray x,y arrays vs a list of Point2D. The first thing findDetectorsArray does is turn the x and y array into a list of findDetectorsArray (in python). All I'm asking is that the caller do this, instead, so findDetectorsArray can have a cleaner and API that looks more like the API of TransformMap. This could not possibly affect speed.

  3. Eventually we will want a way of passing around lists of points and possibly coordinates (e.g. ICRS) as numpy arrays. One obvious representation of a list of points in numpy is a structured array with x and y fields. But how do we get that into C++ (e.g. for the vector version of TransformMap.transform)? Coordinates add additional complications. This issue could probably use a separate ticket; it certainly should not be solved on this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you all for the explanation on (1); just wanted to make sure we weren't just guessing this would go faster.

As for (2) and (3), I think it's at least a little bit relevant here because looking towards the future I don't think it's entirely obvious that we want to discourage interfaces that accept separate arrays of x and y. I agree we don't want to try to come up with a general solution here, but if that interface was provided in addition to one that takes a vector of Points, I don't think it's a problem.

SimonKrughoff and others added 16 commits April 29, 2015 11:51
In order to speed up findDetectors for lots of points, we need a version of findDetectors that takes
a list of points.  This method is necessary to make that work.
The points were being passed to the transform map as CameraPoints and they should have been Point2ds
testFindDetectors already tested findDetectors.  I piggyback on that method to also test findDetectorsArray.
(renamed variables so that plurality is denoted by
the suffix 'List')
@danielsf danielsf merged commit aca12d7 into master Apr 29, 2015
@ktlim ktlim deleted the tickets/DM-2557 branch August 25, 2018 06:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

6 participants