Overall design plan #1

msimet · 2014-04-25T17:37:44Z

This thread is a place to debate some overall design decisions for Stile (or whatever we decide to call it, if we want to change the name).

As I've been thinking about this, I've been aiming to maximize the possible user base, which drives a couple of design decisions: 1, that the interface with the data should be as separate as possible from the test functions, and 2, that we want to include enough code that you can run all of the tests without knowing Python, via config files/command-line options. 2 is low on the list of priorities at the moment, but I thought it would be helpful to keep in mind as we're working out the structure.

Brief overview of the architecture, as I've conceived of it so far:

Tests. We can slice and dice the tests in several ways, but I think the best is by what they're doing internally. We have correlation-function type tests (eg shear around random points, autocorrelation of star shapes); statistical quantity tests for both scalar quantities (mean/median, variance) and combinations of scalar quantities (Pearson coefficients); histograms; whisker plots; etc. So I think it makes sense to have a generic Test object, then more specific CorrelationFunctionTest, WhiskerTest, etc objects with some of the helper functions & output functions for that kind of test, and then specific implementations for the various tests we decide to include. (I have a partially-written CorrelationFunctionTest, for example, and then a specific correlation test calls the corr2 wrapper from the parent class with the type of correlation function & the specific quantities we need.) The tests should know what kind of data they need to run. [There are at least three flavors: a raw data type (catalog, image); a time domain (single image, multiepoch
Data handling. We need a data handler class which can deliver the base unit of data that a lensing test needs, given the constraints of imposed binning functions. Things it should be able to do include: return the types of data it contains; list all the data sets that meet given parameters; read in those data sets, bin them given (a) binning function(s), and turn them into data types the rest of the program understands; give the location of the file on disk (if it exists as such and if no bins are defined) so that external programs like corr2 can ingest them without us having to read & rewrite an identical file. It also needs to be able to serve multiple data sets which should be analyzed together (a lens sample and a source sample; for multiepoch tests, all the individual exposures that went into one coadd; etc). I think the output files should also be handled by a data handler (with default implementation available) so users who want to insert the results in a framework they already have can do that at the same time. In the short term, we're essentially writing a less-flexible wrapper for the HSC butler.
Binning. I think we should include a binning structure within the code natively, as systematics can show up in some regimes of data and not others and I don't think we want to define whole new data handler classes just to limit which data is actually produced. This can also simplify some of our own coding--eg you might want to do star/galaxy separation tests as a function of Galactic latitude, and it would be nice to just reuse the same code & add an object that will do latitude bins.
Corr2. A lot of these tests involve correlation functions, and Mike Jarvis has a nice, fast, straightforward-for-the-user correlation function code, corr2 (http://code.google.com/p/mjarvis/), so we should make use of that. A lot of the coding I've already done involves the corr2 interface, in fact, though it's not very well tested right now.

Things I have put less thought into:

Exactly what kind of outputs we want to generate for each of the tests, and how we go about doing it. The displayQA framework produces some really nice outputs, but it requires running a web server; I know that the machine where I plan to run the HSC pipeline won't let me do that due to security concerns. I also worry a little about defining a hard pass/fail metric for a lot of these tests. On the other hand, as I said, very nice outputs, and less code to write...

Code that exists:

I just put up some simple file I/O (copied from another project) and the corr2 interface, which is mostly untested but has a lot of what we need I think. Over the next week or so I'll clean up and post:

a file tests.py including some abstract interfaces & documentation for the systematics tests, plus a mostly-complete implementation of the CorrelationFunctionTest template & a sketchy implementation of a test that uses it (signals around objects).
some pieces for a the binning structure--I already have similar code for this, so I think I can make it line up with what we need on a short time scale.
some first ideas for a DataHandler class (documentation of what it should do, without implementation)

Next comment: some notes on coding conventions.

msimet · 2014-04-25T17:53:09Z

Notes on coding conventions:

Pure Python for the code we write ourselves.
Using the GalSim conventions for coding style (section 1 of https://github.com/GalSim-developers/GalSim/blob/master/devel/credo.txt). I think this makes sense as most of us working on this project have also worked on GalSim. [Note: I don't guarantee that all of the code currently in this repo conforms to that style--I just looked and noticed that the files should have been CamelCase in name, for example. Although we could revisit this, since we're not including the SBProfile classes which used that style...)
Write comments that can be parsed by DOxygen.
Since a large fraction of the code will be an interface with Mike Jarvis's corr2 code, we should use the same variable names internally as the corr2 program uses, for easier coding & debugging. These quantities are the dict keys of corr2_utils.options, and are mostly self-explanatory. In particular, I think the ones we'll want to pull out for ourselves are the ones with corr2_utils.options[key]['status'] = 'captured'.
For as broad as possible a user base, I think we want the user to be able to run a maximal number of tests for a minimal number of dependencies; ideally almost nothing should be required to run except Python 2.6+ and NumPy (versions TBD), possibly SciPy and/or astropy as well, although we'd have to flag which tests wouldn't be able to be run without eg corr2. Any other dependencies (pyfits/astropy, galsim, etc) should not be imported until everything downstream from the import requires their use, so that users without those packages can continue to run the rest of the tests. (If this starts to cause major problems we can revisit it.) Actually, now that I write this out, I'm having second thoughts about this, or at least about advertising it...thoughts?
I think we should try to make it compatible with Python 3, which based on my current understanding will come at the expense of making it compatible for Python <=2.5, at least without a lot of extra coding. (Right now I'm just doing Python 2.7 as that's what I'm familiar with, but I don't think the changes will be too onerous.)
Memory requirements: don't load in the data until you have to start manipulating it. If you're
doing something that calls corr2 and there are no BinObjects, don't load in the data at all, just
pass it to corr2, if possible. (Possibly this should be a parameter in the dataHandler?)
Decision to make: how to refer to the tests. Options are long names (but these could get really long), short names (kind of opaque), numbers following an index (very opaque but possibly not more opaque than short names?).
I have a tendency to just import numpy rather than import numpy as np--is this weird enough that it'll be a significant source of confusion/bugs for other coders?
License: anybody object to BSD?

And I think we should use the GalSim branching system as well, which has worked well for us so far. (For people not involved with that project: when you want to add something to the code, you open a new issue for it and describe what you'd like to do. Others can comment if they have thoughts. Then you start a branch--eg, if I were to write code based on this issue, I would create a branch called '#1'--which makes pull requests easy. You can also tag your commits with #1 and then they will show up on the issue page, which can be handy if you run into problems midway through.) And as you can see from this issue, the issue pages can also act forum-like if we need such functionality! :)

rmandelb · 2014-04-25T18:20:39Z

A few random comments:

Dependencies: matplotlib?
numpy rather than bumpy as np (github keeps autocorrecting that for me even after I change it back so I'm leaving it wrong): I don't care.
BSD, yes please.
What you're thinking about different kinds of tests is what I was thinking too. We shouldn't write code for 4 tests that do histograms and have code in each one to do the histograms - we should write code for 4 tests, each of which makes an appropriate HistogramTest object with the appropriate quantities as arguments.

msimet · 2014-04-25T18:29:48Z

Ah, matplotlib: that actually brings up another question. One of the problems with matplotlib is that it is very slow (and this is why DisplayQA only generates plots upon request). However, it's not the only plotting package in the world; there are others that will generate faster but less pretty plots. Should we look into those and maybe pick one for generating results on the fly if users have it installed, and then have something like a use_matplotlib=True argument to generate publication-quality plots with matplotlib instead?

In any case--there's an option to make matplotlib required, or we could just have the code write out a simple ASCII table you could plot with gnuplot/supermongo/etc if it doesn't find a plotting package. (We may want to do that anyway so users can use the outputs directly for papers if they don't like matplotlib.)

rmandelb · 2014-04-25T18:34:32Z

It is going to get complicated if we want to be able to talk to multiple plotting programs.

My thinking had been that people would use Stile in a way that lets them pick and choose tests. If they’ve chosen a test, they definitely want it run and want to see results. In contrast, I thought that with pipeQA it runs the whole suite automatically, which means that it totally makes sense for it to not write out all plots automatically. Am I thinking about this differently from how you are?

How about this for the near term:
(1) Use matplotlib, but have a make_plot kwarg and a make_textfile kwarg that people can set to True/False to specify what types of output they want. Users can choose one or both.

(2) As a goal for later, once we have something that is stable and can do a few simple tests, we can consider the option of how to interface with other plotting software.

TallJimbo · 2014-04-25T18:35:06Z

I'll comment on the rest when I catch up with reading it, but re matplotlib: I'd say use matplotlib until you've noticed a clear bottleneck in the performance, and then worry about replacing it for that particular kind of test (and changing the API for that kind of test to not rely on e.g. matplotlib style keywords if necessary). I think the fact that it's what most (all?) potential developers are most familiar with and by far the Python plotting package with the most market share outweighs performance concerns at this point. And by organizing the code so that everything that wants to make e.g. a scatter plot or a histogram plot has to go through the same class, you've really walled off how much you have to change if you put in a new plotting backend for one of those.

msimet · 2014-04-25T18:38:14Z

Okay, that all seems persuasive to me. Leave matplotlib for now, add others later if we need to. Thanks, guys!

HironaoMiyatake · 2014-04-25T18:56:50Z

I think matplotlib is a good option for the near term.

I think displaying the results would be as important as data handling and tests, because we have to check lots of results of systematic tests. For example, the following complicated but useful feature, which is one of my favorite PipeQA functionalities, might be needed. PipeQA allows us to check first the overall quality of exposure by looking at the entire FOV and if we click a particular CCD that seems problematic we can look at details of QA of the CCD. This is actually very handy to check things. Do we want to think about displaying seriously?

TallJimbo · 2014-04-25T19:26:06Z

I'll start with some meta-design ideas:

Don't overdesign early; one of the most important things I've learned about code design is that you can't design all the interfaces up front, because you don't yet really know what they need to do. Instead make yourself quickly write a horribly ugly prototype, and then redesign and refactor it when you have covered the main use cases (for instance, one of each qualitatively different kind of plot or input data). This might be especially true for LSST/HSC stack abstraction: don't bother trying to hide from it in the first pass, and learn what you need to abstract away by using it. Of course, you have to guard against letting the interfaces in that initial prototype stick around longer than they should because they're all you have that works.
Testing the tests: think about test data and how one might unit-test plotting code up-front. As any piece of code gets more complex, continuous testing gets more and more important. This might have to have human element to it - pop up a window with a few plots, add some "eye-guides" to the plots, ask the user if it looks reasonable. That's just an idea, though. I really don't have any good ideas on how to test plotting code, but I do bet it will be important down the road.

I think one of the key features (as Hironao has noted) is the need to be able to "drill down" from overview plots to related plots that can help explain what's going on. I think one of the early mistakes of PipeQA (which has now been addressed) was to generate all the drill-down plots up front (that was too slow). So I think we want those to be done on-the-fly, even if we do have the capability to run a lot of overview plots in bulk up-front. I think the other key feature here is that we want an overview plot to know about what drill-down plots might be of interest (and vice versa), and provide a way to move easily from one to the other without having to re-specify the data involved. That said, we don't want to sucked into writing a big fancy user interface for this, when the focus should be on making the right kind of plots. But we definitely want the design to be aware of (at least) these two modes of operations:

we're running in bulk, and saving the plots for later
we're looking at plots we made up-front, and making some new plots on-the-fly

I recommend Python 2.7 as a minimum. While 2.6 is what you get in the OS version on big clusters, almost everyone using Python for science has some version of 2.7 available. I also think it will be very hard to support Python 3 at the same time as 2.6.

msimet · 2014-04-25T21:15:59Z

Hironao: Yes, we need to think carefully about the presentation and plotting. As I said up above, the displayQA stuff looks really nice, but I worry about people being able to run it on the same machines where they'll be running their tests. We might want to think about an interactive post-processing code in Python (using Tkinter? I haven't used it for much so don't know what its capabilities are). Any thoughts you have would be very helpful.

msimet · 2014-04-25T21:21:09Z

Jim:

Don't overdesign early; one of the most important things I've learned about code design is that you can't design all the interfaces up front, because you don't yet really know what they need to do.

That's a good point, thanks.

I think the other key feature here is that we want an overview plot to know about what drill-down plots might be of interest (and vice versa), and provide a way to move easily from one to the other without having to re-specify the data involved.

This, I think, also means that the data has to know what other data it's a subset or superset of (which in principle isn't necessary). I think that's a useful feature, though, & will keep it in mind.

Testing the tests: think about test data and how one might unit-test plotting code up-front. As any piece of code gets more complex, continuous testing gets more and more important. This might have to have human element to it - pop up a window with a few plots, add some "eye-guides" to the plots, ask the user if it looks reasonable. That's just an idea, though. I really don't have any good ideas on how to test plotting code, but I do bet it will be important down the road.

Hmm, an interesting point. The simplest thing might to be to have a set of pre-generated "expected" images and a set of new images joined together in some way (side by side?) and ask the user to verify that each pair is qualitatively similar and contains the same labels, legends, etc. Something to think about, in any case.

I recommend Python 2.7 as a minimum.

That's fine with me. I had glanced through some things and knew that 2.5 and 3 really wouldn't be compatible, but hadn't explored 2.6 too carefully--I'm happy to take your word that it's difficult!

HironaoMiyatake · 2014-04-26T18:05:11Z

Melanie: I am not sure if using Tkinter is a good idea. I think it would be nice if one can share plots with others. For this purpose, it might be nice to use a web browser (this is what PipeQA does) which makes it easier; we can just send a link. If not, one might have to send a command line (or rather complicated something) to reproduce the result, which sometimes makes mistake. However, I am not sure this is the best solution. I am not familiar with this kind of things either… Probably I need inputs from others!

msimet · 2014-05-07T16:44:46Z

Hey folks, I think there is now a semi-working example in the test/ directory. If you pull the latest version, cd test and type python test_run.py, and you have matplotlib and numpy installed, and you can invoke corr2 via just saying corr2, it will produce some galaxy-galaxy shear measurements for a simple example (1 1E14 object at z=0.2, ra=0, dec=0) in the test directory. (You can also replace the hard-coded call to "corr2" in stile/tests.py if you need to.) I should say, this works for me, but it still needs a lot of testing and validation, so it might not work for you!

You'll get the corr2 outputs as realshear (all data) and realshear[01]-[01], which is binned into quadrants around ra=0 and dec=0, plus simple .pngs displaying those. The matplotlib calls are just in test_run.py, not as any method of the test function yet.

The next step is probably to split off into issues (test, validate and repair what's already here; add unit tests for what's already here; add an HSC data handler; add plotting functions to the correlation function tests; add other correlation function tests, which should be easy once the plotting's done; add other kinds of tests, eg histograms, single-point statistics, etc) and tackle things individually. But hopefully this is enough of a framework to start things going.

msimet · 2014-05-07T16:46:07Z

Okay, there was a formatting snafu above--might be better to read on github. Also, when I said python test_run.py, you'd need to be in the test/ directory first!

msimet · 2014-05-07T16:50:21Z

And to address Hironao's point, sorry--we might want to allow interactive results both via something like Tkinter, for folks like me who are on machines where they won't be able to do web servers, and via DisplayQA for folks who can. But I would say if we do do something like Tkinter, that's far down the road--we have much more important stuff to do first...

HironaoMiyatake · 2014-05-07T20:49:46Z

I couldn't run the test… It seems that there is no tests.py in ../stile/. Am I missing something?

bash-3.2$ pwd
/Users/miyatake/work/Stile/test
bash-3.2$ python test_run.py
Traceback (most recent call last):
File "test_run.py", line 7, in
import stile
File "../stile/init.py", line 7, in
from tests import TestXShear
ImportError: No module named tests

HironaoMiyatake · 2014-05-07T20:51:16Z

Yes, I agree that we have much more important stuff to do first. Maybe we might just want to use DisplayQA first (or easier one like just a simple matplotlib!).

msimet · 2014-05-07T20:53:47Z

Sorry, you're right! The file is there now.

HironaoMiyatake · 2014-05-07T20:57:54Z

Great! It works now.

rmandelb · 2014-05-16T18:33:20Z

I very belatedly decided to try python test_run.py but I'm getting

Traceback (most recent call last):
  File "test_run.py", line 89, in <module>
    main()
  File "test_run.py", line 35, in main
    results = test(stile_args,dh,data,data2)
  File "../stile/tests.py", line 103, in __call__
    **corr2_options)
  File "../stile/tests.py", line 89, in get_correlation_function
    return_value  = stile.read_corr2_results_file(output_file)
  File "../stile/corr2_utils.py", line 469, in read_corr2_results_file
    output = file_io.read_ascii_table(file_name,comment='R',startline=2)
  File "../stile/file_io.py", line 66, in read_ascii_table
    f=open(file_name,'r')
IOError: [Errno 2] No such file or directory: '././realshear'

Is something missing...?

msimet · 2014-05-16T18:35:15Z

Hmm...is that the only output you see?

rmandelb · 2014-05-16T18:36:39Z

Yes.

I can dig into it more myself if this is non-obvious to you, I just wondered if something was obviously missing...

msimet · 2014-05-16T18:37:41Z

I should clarify: that means corr2 didn't write its output file (realshear is the output file requested from corr2). So if there had been something else--I see there isn't--that could be a sign that corr2 had failed first. I'm happy to come over & help if you like.

rmandelb · 2014-05-16T18:41:29Z

Sure, whenever you’re ready.

rmandelb · 2014-05-16T19:07:15Z

Hi Melanie (and anyone else who might be following along): I confirm that if I install the latest corr2 this does work just fine. For those who might be wondering, the problem is that recently Mike changed the filenames for shear correlations so that all the e's became g's. I had an older version that was expecting ne_file instead of ng_file. And if you use the wrong keyword, corr2 helpfully runs without producing output or complaining. :) So it ran, produced no output but gave no failure message, and then test_run.py was sad when it couldn't find output.

As for when this happened, Melanie, I looked at Mike's commits:
https://code.google.com/p/mjarvis/source/list
At the bottom of the page you see a few commits with references like "change E->G" that cover things like this. The changes in filenames happened before the commit that says "tag v2.5", so I would conclude you should put a note that we require corr2 version 2.5+ (2.6 is the latest).

rmandelb · 2014-05-16T19:08:59Z

And just for posterity, I wasn't able to compile corr2 with clang, and had to find an old GalSim issue where I was complaining about this (since in OSX 10.9, the compiler that comes with Xcode is clang, not g++). The comment where Mike responded to my complaint and we iterated for a while on how to compile is here:
GalSim-developers/GalSim#387 (comment)

Could be useful if this comes up again. (basically, I had to set some compiler flags differently and change 1 line of code)

rmandelb · 2014-05-16T19:09:21Z

Sorry, screwed up formatting, please view on github.

msimet · 2014-05-23T03:24:01Z

Preserving for the record that we decided on a telecon today to use Sphinx for documentation rather than DOxygen...

msimet · 2014-05-23T03:24:48Z

(and thanks to @TallJimbo for the suggestion)

msimet mentioned this issue Jun 30, 2014

Documentation #17

Closed

msimet mentioned this issue Sep 19, 2014

Speeding up HSC interface #39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Overall design plan #1

Overall design plan #1

msimet commented Apr 25, 2014

msimet commented Apr 25, 2014

rmandelb commented Apr 25, 2014

msimet commented Apr 25, 2014

rmandelb commented Apr 25, 2014

TallJimbo commented Apr 25, 2014

msimet commented Apr 25, 2014

HironaoMiyatake commented Apr 25, 2014

TallJimbo commented Apr 25, 2014

msimet commented Apr 25, 2014

msimet commented Apr 25, 2014

HironaoMiyatake commented Apr 26, 2014

msimet commented May 7, 2014

msimet commented May 7, 2014

msimet commented May 7, 2014

HironaoMiyatake commented May 7, 2014

HironaoMiyatake commented May 7, 2014

msimet commented May 7, 2014

HironaoMiyatake commented May 7, 2014

rmandelb commented May 16, 2014

msimet commented May 16, 2014

rmandelb commented May 16, 2014

msimet commented May 16, 2014

rmandelb commented May 16, 2014

rmandelb commented May 16, 2014

rmandelb commented May 16, 2014

rmandelb commented May 16, 2014

msimet commented May 23, 2014

msimet commented May 23, 2014

Overall design plan #1

Overall design plan #1

Comments

msimet commented Apr 25, 2014

msimet commented Apr 25, 2014

rmandelb commented Apr 25, 2014

msimet commented Apr 25, 2014

rmandelb commented Apr 25, 2014

TallJimbo commented Apr 25, 2014

msimet commented Apr 25, 2014

HironaoMiyatake commented Apr 25, 2014

TallJimbo commented Apr 25, 2014

msimet commented Apr 25, 2014

msimet commented Apr 25, 2014

HironaoMiyatake commented Apr 26, 2014

msimet commented May 7, 2014

msimet commented May 7, 2014

msimet commented May 7, 2014

HironaoMiyatake commented May 7, 2014

HironaoMiyatake commented May 7, 2014

msimet commented May 7, 2014

HironaoMiyatake commented May 7, 2014

rmandelb commented May 16, 2014

msimet commented May 16, 2014

rmandelb commented May 16, 2014

msimet commented May 16, 2014

rmandelb commented May 16, 2014

rmandelb commented May 16, 2014

rmandelb commented May 16, 2014

rmandelb commented May 16, 2014

msimet commented May 23, 2014

msimet commented May 23, 2014