Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add summary table #805

Closed
bqpd opened this issue Aug 6, 2016 · 38 comments
Closed

add summary table #805

bqpd opened this issue Aug 6, 2016 · 38 comments
Milestone

Comments

@bqpd
Copy link
Contributor

bqpd commented Aug 6, 2016

No description provided.

@bqpd bqpd added this to the Next release milestone Aug 6, 2016
@whoburg
Copy link
Collaborator

whoburg commented Aug 8, 2016

@pgkirsch, apologies for the sudden change to printing behavior.

As we've been scaling to larger models, it turns out the table printout gets unwieldy (as you know from the aircraft model) -- e.g. the printout sometimes overwhelms the terminal buffer, such that you can't scroll up.

You'll recall the early days of GPkit, when table printing was not automatic (see http://gpkit.readthedocs.io/en/v0.1.0/examples.html#a-trivial-gp for example).

We're constantly trying to balance not making user-facing changes on one hand against what makes sense long term on the other. I think it's clear that printing solution tables doesn't scale to large models.

Here are a couple questions in the spirit of having the most user-friendly design long-term:

Should we have a SolutionArray.summary() method that prints key sensitivities and top-level model variables, along the lines of panda's DataFrame.describe()?

Should small models print their table or summary automatically, or is it better to be consistent across all models?

Should we allow user-specific printing options in gpkit/env/settings?

@mayork
Copy link
Contributor

mayork commented Aug 8, 2016

  • I really like the idea of the summary table

@pgkirsch
Copy link
Contributor

pgkirsch commented Aug 9, 2016

@whoburg no worries, I appreciate it's a fine balance to strike. I do think that up-to-date documentation should be treated just as importantly as unit tests though, at least for official releases.

I agree that a summary table would be nice!

Being consistent is probably a safer bet.

I would vote against user-specific settings to keep things simple.

@whoburg
Copy link
Collaborator

whoburg commented Aug 9, 2016

In case anyone is wondering how to get the previously-printed table, you can use

print sol.table()

where sol is a gpkit.SolutionArray, i.e. the output of Model.solve() or Model.localsolve().

@bqpd bqpd changed the title document not printing tables by default tables miscellany Sep 8, 2016
@pgkirsch
Copy link
Contributor

pgkirsch commented Sep 8, 2016

The new title for this issue is begging for more vaguely related requests, so here it is. In addition to the option of having a summary table(/selecting which variables print in the solution table), it would also be nice if you could choose to group the solution table (e.g. by subsystem, or by scope) and to give these groups some sort of subtitles.

One quick idea would be to use another kwarg in the Variable init. That said - and this is getting very tangential - it's already pretty difficult to keep Variable declarations less than 80 characters (particularly when creating Models (capital M) and you have 8 characters of whitespace before) and multi-line variable declarations are kind of a bummer. Anyway, I'm rambling and should probably open a separate ticket for this last thing.

@bqpd bqpd mentioned this issue Sep 9, 2016
3 tasks
@bqpd
Copy link
Contributor Author

bqpd commented Sep 9, 2016

@pgkirsch, I opened that separate ticket for ya :p

@pgkirsch
Copy link
Contributor

pgkirsch commented Sep 9, 2016

Too kind

@bqpd
Copy link
Contributor Author

bqpd commented Sep 10, 2016

proposed summary table:

Cost
----
 0.06533 [1/day] 

Prominent Free Variables
------------------------
Fuel, Cruise, Mission |                                                                                                  
           W_{finish} : 4844        [lbf]   weight at beginning of flight segment
           W_{finish} : 1.644e+04   [lbf]   weight at beginning of flight segment
          W_{fuel-fs} : 98.67       [lbf]   flight segment fuel weight           
          W_{fuel-fs} : 135.3       [lbf]   flight segment fuel weight           
            W_{start} : 4943        [lbf]   weight at beginning of flight segment
            W_{start} : 1.657e+04   [lbf]   weight at beginning of flight segment

Sensitive Constants
-------------------
Structures |                                                                                       
\rho_{cap} : 1.76     [g/cm**3]    +0.21%/1%      Density of CF cap
          ...

Insensitive Constants
---------------------
Structures |                      
   E_{cap} : 2e+07    [psi]        -1e-3%/1%      Youngs modulus of CF cap
          ...

this would print as the default string of the solutionarray, such that:

print sol
# summary table
sol
# dictionary view

"prominent free variables" would be marked in the Variable init or as an argument to .summary. If there's less than 5 free variables it shows the "free variables" table instead.

"sensitive constants" would show the top 5 above 0.1 absolute sensitivity, if any

"insensitive constants" would show the bottom 5 below 0.01 absolute sensitivity, if any

Similarly, if there's fewer than 5 constants it shows all of them, sorted by absolute sensitivity.

@bqpd
Copy link
Contributor Author

bqpd commented Sep 10, 2016

Should we switch to printing model names to the right of the |?

@bqpd
Copy link
Contributor Author

bqpd commented Sep 14, 2016

@whoburg @mayork @mjburton11 @pgkirsch comments on the above format?

@mjburton11
Copy link

I like this a lot!

@pgkirsch
Copy link
Contributor

pgkirsch commented Sep 15, 2016

  1. What does the "+0.21%/1% " refer to?
  2. I don't know if prominent is the right word. Maybe "selected" or "key" or something idk.
  3. My feeling is that for sensitivities, I don't necessarily just want to know what I am most and least sensitive to, but I want to know "how sensitive am I to X?" <-- a question I get a lot. So either sensitivity should be like free variables - you select which ones manually - or it should be something you turn on/off. I often don't want to see the sensitivities, I just want to see that my model solves and what the key variable values are, and when I do want to see sensitivities I want to see all of them.
  4. I also prefer the term fixed variables to constants, but that's just me. I think there are both constants (e.g g, rho etc.) and fixed variables (e.g. safety factor, seat width etc.), so being able to distinguish between the two in a results table might be nice.
  5. Further distinction by subsystem could also be nice for large models. Maybe I want all my aerodynamic variables to be grouped, and all of my structural variables to be grouped. Same with sensitivities?

@bqpd
Copy link
Contributor Author

bqpd commented Sep 15, 2016

  1. sensitivities. objective increases by 0.21% for every 1% increase in this parameter
  2. haha, yeah I'm not a huge fan. I like selected. Key seems confusing...
  3. hmm, sure thing. selected constants. I think there should also be 3 large/small ones?
  4. hmm, noted. we could do better with this.
  5. great plan. that's not for the summary table, right?

@pgkirsch
Copy link
Contributor

  1. Ewww, I much preferred the old way. So 1.76 refers to the actual value of the constant? Seems like that table needs some headers.
  2. You're right, key would be confusing.
  3. Eh.. but why 3? It seems too arbitrary to me. I really feel like sensitivities should be all or none. But that's just me.
  4. No you're right, that could be a different solution table feature altogether.

@mjburton11
Copy link

I do think it is useful to see the sensitive and insensitive constants for debugging purposes. I ran into problems with the 82 code that I solved because I noticed I was very insensitive to something that I should have been sensitive to. But I also agree with @pgkirsch in that when I'm showing someone sensitivities I want to show them the ones they care about.

This may be a terrible idea but, a solution table with a debugging flag that would print out useful stuff for debugging purposes could be a solution.

@bqpd bqpd modified the milestones: Middle Release, Next release Sep 15, 2016
@whoburg
Copy link
Collaborator

whoburg commented Sep 20, 2016

I'm skeptical of the hard cutoffs on when sensitivities are and are not interesting. I lean toward printing all or nothing, or perhaps a sorted list, or a list of the top N?

Also, I don't think a Variable kwarg is enough to specify prominent vars. What about modular models? Surely for an electric motor design, the salient variables (winding diameter, wire diameter, max torque, etc) are not ones that would be reported as salient in an aircraft design model that uses the electric motor as a subcomponent. So which electric motor vars would I give the prominent kwarg to?

My main message is that I'm worried about a bunch of specialized functions that depend on magic numbers like sensitivity cutoffs. I say keep it as simple and robust as possible. Note: pandas .describe method takes only 3 kwargs.

@bqpd bqpd modified the milestones: Next release, Middle Release Nov 30, 2016
@bqpd bqpd changed the title tables miscellany add summary table Nov 30, 2016
@bqpd
Copy link
Contributor Author

bqpd commented Nov 30, 2016

proposal for two new tables (top three most and least sensitive constants), to be printed by default when table is called with no arguments

Most Sensitive Constants
------------------------
     | Aircraft/Wing
   S : 190     [ft**2]     surface area
     + 0.6831
\rho : 1       [lbf/ft**2] areal density
     + 0.4816

     | Mission/FlightSegment/AircraftP/WingAero
   e : [ 0.9       0.9       0.9       0.9      ]              Oswald efficiency
     - 0.36 (sum) 

Instead of abs(sum(, can use max(abs(, if it's bigger.

Least Sensitive Constants
-------------------------
     | Mission/FlightSegment/FlightState
\rho : [ 0.74      0.74      0.74      0.74     ]  [kg/m**3]   air density
     - 1.01e-2 (sum)
 \mu : [ 1.63e-05  1.63e-05  1.63e-05  1.63e-05 ]  [N*s/m**2]  dynamic viscosity
     + 4.32e-3 (min, at index (1, 2))

Instead of min(abs(, can use sum(, if it's smaller.

Would also remove minval in the sensitivities table, so that if that's explicitly printed it shows all of them.

@bqpd
Copy link
Contributor Author

bqpd commented Nov 30, 2016

Looking for ways to indicate that the rows below each variable are their sensitivities.

@bqpd
Copy link
Contributor Author

bqpd commented Dec 1, 2016

@mayork, @1ozturkbe , @pgkirsch, @mjburton11, please comment on the above proposal so I don't implement it blindly :p

@1ozturkbe
Copy link
Contributor

1ozturkbe commented Dec 1, 2016

This is a gnarly topic... I agree with most, saying that it is generally important to have all or nothing in terms of sensitivities in the table. I also think that the least sensitive constants are 99% of the time completely useless. With respect to @mjburton11 saying that the low sensitivity constants are useful for debugging, if you were trying to debug, you would look at all of the sensitivities anyways. If it were up to me, I'd only have the 5-8 most sensitive variables, as well as the ability to see specific sensitivities that the user may have tagged.
Additionally, I like what you have done with the summing of the sensitivities of vector variables. It makes it easier to interpret.

@mjburton11
Copy link

Is there a reason why we don't want the sensitivities in line with everything else.

Variable: value [units] sensitivity label

or something like that.

With larger models it becomes very difficult to parse through all the sensitivities and so I still think that printing out the low sensitivities can be useful.

I also think that the table should print out all the sensitivities. But it would be great to have a separate function that prints out the highest and/or lowest sensitivities for a sensitivity threshold that the user specifies.

@mayork
Copy link
Contributor

mayork commented Dec 1, 2016

printing the sensitivity in the table seems like an interesting idea. I don't like the idea of printing out only the x most sensitive or least sensitive variables, this seems like a good way to potentially miss some valuable information.

@bqpd
Copy link
Contributor Author

bqpd commented Dec 1, 2016

I suggested a format for that and no-one said they liked it. Main disadvantage is it makes the whole table wider...

@bqpd
Copy link
Contributor Author

bqpd commented Dec 1, 2016

@mjburton11, @mayork I encourage you to edit one of the tables above to create your desired format.

@mayork
Copy link
Contributor

mayork commented Dec 1, 2016

in large models when there are tons of model names tagged to variables the table is already very wide, any chance people's thoughts on this have changed?

@bqpd
Copy link
Contributor Author

bqpd commented Dec 1, 2016

We've narrowed the table considerably in a recent PR (#965)

@bqpd
Copy link
Contributor Author

bqpd commented Dec 1, 2016

@mjburton11, you actually can already print a sensitivities table with custom minimum value already! The problem is to a large extent one of discoverability...how do we suggest analysis processes to users? Examples? And what should we emphasize by default?

@mjburton11
Copy link

how do you do that?

@bqpd
Copy link
Contributor Author

bqpd commented Dec 1, 2016

from gpkit.solution_array import results_table
print results_table(sol["sensitivities"]["constants"], "Sensitivities", minval=1000, printunits=False)

@pgkirsch
Copy link
Contributor

pgkirsch commented Dec 4, 2016

A bit late to the party, sorry. As a user, I think that I am often looking for specific sensitivities, so I still think the default should be an alphabetical-order list of fixed variables.

I don't particularly like the sample output above. It seems cluttered/unclear, and alternating one kind of data (fixed values) with another (sensitivities) seems a bit dubious in a plain text table. I really prefer separating the values from the sensitivities, but I feel like I'm the odd one out here.

Also what does the sum of sensitivities mean?

@pgkirsch
Copy link
Contributor

pgkirsch commented Dec 4, 2016

I would prefer putting sensitivities on the same line as the fixed variables, rather than below. Surely that's possible in under 80 characters?

@bqpd
Copy link
Contributor Author

bqpd commented Dec 4, 2016

@pgkirsch, it really depends on the description and (vector) value length, and they've been observed to be quite long. You're welcome to suggest a specific implementation using the example data!

@pgkirsch
Copy link
Contributor

pgkirsch commented Dec 4, 2016

I agree that with @1ozturkbe that most of the time you don't care about insensitive fixed vars.

I also agree with @mayork that only picking the N largest sensitivities is a good way to miss useful information. I think that it isn't the absolute largest sensitivities that are always the most interesting, but the sensitivities that are larger than you expect.

@bqpd
Copy link
Contributor Author

bqpd commented Dec 4, 2016

re: expectations, good point!

@whoburg
Copy link
Collaborator

whoburg commented Dec 18, 2016

The summary table would sure be useful for a talk I'm giving Tuesday morning! What I need is a large model (JHO or solar preferred, since I'm talking about those a lot) that I can copy/paste into an ipython notebook and demonstrate cool things on (e.g. printing results, showing top sensitivities, getting latex)

@lochieferrier
Copy link
Contributor

lochieferrier commented Dec 18, 2016 via email

@bqpd
Copy link
Contributor Author

bqpd commented Dec 20, 2016

@pgkirsch, I think I'm going to implement the summary table pretty much as above and make it the default string representation of a SolutionArray (so that it's what shows up when you do print sol, but print sol.table() will be unchanged for now).

The idea of finding answers that don't match expectations is interesting though. Do you think it'd be feasible to tag variables with "expected" sensitivities?

@pgkirsch
Copy link
Contributor

Sounds good.

And no, I think that's asking too much from a user and doesn't add much value, in my opinion. I would hate to have to change said tag every time I make a significant change to a model.

@bqpd bqpd modified the milestones: Next release, v0.5.2 Jan 16, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

7 participants