Submit BEST #1517

ericmjl · 2016-11-10T00:17:53Z

Referencing #1502, here’s the PR for it. Have provided:

BEST object with .fit() and .plot_posterior() functions
Jupyter notebook showing how to use it.

ColCarroll · 2016-11-10T01:17:09Z

Really cool! Looks good to me.

twiecki · 2016-11-10T08:34:10Z

pymc3/modules/best.py

@@ -0,0 +1,141 @@
+from ..distributions import StudentT, Exponential, Uniform, HalfCauchy


Why call the dir modules and not models?

That's because there's a models.py present in the pymc3/ directory. I'm not sure what a better name would be, though. Do you have a suggestion?

Well... It looks like I'm blind when coding at 3 am in the morning (in Vienna). I just realized it's model.py, not models.py, so I'll change the directory name to models.

twiecki · 2016-11-10T08:34:39Z

pymc3/modules/best.py

+        self.data['indices'] = self.data[self.sample_col].apply(
+            lambda x: sample_names[x])
+
+        # print(self.data)


stray debug

twiecki · 2016-11-10T08:38:49Z

This is a great start. It warrants to think deeply about the API, choice of inference etc. But we might not be able to really specify it at this point so can move ahead here.

One key trick is to use summary statistics for the test_vals to help with initialization. See e.g. https://github.com/quantopian/pyfolio/blob/master/pyfolio/bayesian.py#L194

ericmjl · 2016-11-10T09:40:52Z

@twiecki thanks for the feedback!

On the API design: would it be worthwhile inserting a comment block at the top of the module to clarify how the API is currently designed and its rationale, and leaving a note for future contributors that they'd be free to modify the API if they see a better fit? Not sure what PyMC3's current practice is for this, so I'd love to get feedback.

ericmjl · 2016-11-11T12:59:37Z

ping @twiecki: any comment on leaving a note for future contributors?

springcoil · 2016-11-11T14:08:23Z

Hi Eric,
Leave a note it's a good idea :)

On Fri, Nov 11, 2016 at 12:59 PM, Eric Ma notifications@github.com wrote:

ping @twiecki https://github.com/twiecki: any comment on leaving a note
for future contributors?

—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
#1517 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AA8DiFzW-fft6uPJZPdIXoBy1j7-kdXGks5q9Ga6gaJpZM4KuIt0
.

Peadar Coyle
Skype: springcoilarch
www.twitter.com/springcoil
peadarcoyle.wordpress.com

ericmjl · 2016-11-11T14:21:14Z

@springcoil all done. Thanks for the feedback! The final commit was just adding a docstring at the top of the BEST file, so all tests should pass.

twiecki · 2016-11-12T08:41:01Z

@ericmjl This looks great. The main thing is I think this should use NUTS, not ADVI. I have a PR that I think would fit well here, can you hold tight a bit?

twiecki · 2016-11-12T09:48:35Z

@ericmjl I think #1523 could make inference here more concise.

ferrine · 2016-11-12T21:24:35Z

I want to implement some glm models too and we surely need base class to share essential functionality, see #1524.

ericmjl · 2016-11-12T21:31:51Z

@ferrine I put a note outlining the API design on the top of the best.py file. Could you comment on what you think about it please?

To repeat it here briefly, I thought first about how the data should be structured in a Pandas dataframe in order to make the code most concise, and then designed it around that. From my limited experience, statisticians like it best when data are provided structured and cleaned, with one row representing one "observation". Maybe that'd be a starting point?

Actually, I'll move this to the other thread, we can discuss there.

ferrine

I see that there is some work left. Tried to do my best in reviewing

ferrine · 2016-11-12T21:36:32Z

pymc3/models/best.py

+        mean_test = self.data.groupby('indices').mean()[self.output_col].values
+        sd_test = self.data.groupby('indices').std()[self.output_col].values
+
+        with Model() as model:


This adds limitations to usage, I will not be possible to use model within a model. Am I right, @twiecki?

ferrine · 2016-11-12T21:39:50Z

pymc3/models/best.py

+        self.params = params
+        self.model = model
+
+    def plot_posterior(self, rotate_xticks=False):


Should also return axes for further user modifications

Great point, I'll update the PR with a fix for this.

ferrine · 2016-11-12T21:41:53Z

pymc3/models/best.py

+        self.data['indices'] = self.data[self.sample_col].apply(
+            lambda x: sample_names[x])
+
+    def fit(self, n_steps=500000):


It is not possible to custimize advifit:(

I plan to change this to use @twiecki's sample_init function when it's merged in. I'll then wrap the kwargs.

I think that's a good idea, it is afterall what @twiecki's sample_init function is designed for.

ferrine · 2016-11-12T21:51:27Z

pymc3/models/best.py

+
+        self._convert_to_indices()
+
+    def _convert_to_indices(self):


I think it is better to write static function without side effects. BTW it seems to be modifying original dataframe, that's bad for user

class O: def __init__(self, data): self.data = data def meth(self): self.data['a']=1 data = pd.DataFrame() # empty o = O(data) o.meth() data # not empty

Agreed, the side effects here concern me.

ferrine · 2016-11-12T21:55:30Z

pymc3/models/best.py

+import pandas as pd
+
+
+class BEST(object):


As for base class we should consider using lasagne base ideas. I'll do a sketch for that soon

fonnesbeck · 2016-11-12T22:21:03Z

pymc3/models/best.py

+        sample_names = dict()
+        for i, name in enumerate(
+                list(np.unique(self.data[self.sample_col].values))):
+            print('Sample name {0} has the index {1}'.format(name, i))


I suggest using the logging facility for this so that it can be turned off (by default) and on as needed.

ferrine · 2016-11-12T22:33:56Z

@ericmjl I added a branch user_model in my fork, let's try to work in pair on implementing BEST, GLM and base class?

fonnesbeck · 2016-11-12T23:17:27Z

As slick as the sklearn-like API is, it is not consistent with other fitting methods on PyMC3, which are functions (sample and advi), rather than methods. Why can't BEST simply go ahead and fit the model, as advi does? I don't see the advantage of having a BEST object kicking around. In any case, I think we should maintain consistency here.

ericmjl · 2016-11-13T00:46:01Z

@ferrine: let's see how the discussion pans out here; I can see the reasons behind @fonnesbeck's concerns. If reception here looks good, I'm happy to work on it, otherwise, let's respect the leads and work on the idea in a separate repo instead. How does that sound?

twiecki · 2016-11-14T09:39:49Z

The sklearn API suggestion was only an idea, feel free to abandon it.

fonnesbeck · 2016-11-14T16:29:45Z

Its not a bad idea at all, but my inclination is to keep the interface consistent across models, unless there is a good reason to deviate from it. If the sklearn API is something we like, then we should look for ways of moving to it across the package, and not just for new fitting methods.

springcoil · 2016-11-16T23:35:56Z

I agree with @fonnesbeck about consistency across the API, in terms of the interface across models. I'm a huge fan of the sklearn API but would consider something to be too different.

burnpanck · 2016-11-25T17:15:06Z

I am just starting to join the discussion so please feel free to tell me if I'm either stating obvious or irrelevant things. I was looking for ways to build models in a modular fashion from simpler components (see #1553). I do like the approach here. In particular with respect to @fonnesbeck's comment, I see value in the object oriented ("lasagne") approach. I do not care that much where the fitting function is found, for consistency that could very well be the usual PyMC3 functions. But I do like the concept of an object that encapsulates related variables of a particular (sub-) model which can be fitted to data and linked to other models. I imagine that such a specialised model-class is a stand-in for a set of related random variables, while instances describe a particular binding of such related random variables to actual data and possibly external random variables. The object provides convenient access to these variables and possibly provide additional related functionality like domain specific insight in picking start-values.

twiecki · 2016-11-28T04:51:42Z

@ericmjl Now that we merged #1525 (into the 3.1 branch) I think it's worthwhile to try and port BEST to it to make use of the new API. Maybe @ferrine could help if he wants.

twiecki · 2016-12-05T12:12:11Z

@ericmjl Just wondering if you are interested in refactoring this PR.

ericmjl · 2016-12-05T12:25:41Z

@twiecki yep, still interested, just lacking time right now. If you need to clean up PRs that are not active, I'm cool closing it for now.

twiecki · 2016-12-05T12:28:43Z

No that's fine, just wondering if it was still active.

springcoil · 2017-04-07T09:37:04Z

Wondering @ericmjl if you've time to get this over the line?

ericmjl · 2017-04-07T13:42:58Z

@springcoil I think it's low on my priority list right now - thesis defence, job hunt, prep for PyCon. I'll go ahead and close.

ericmjl added 6 commits November 9, 2016 18:09

text editor auto-removed extra spaces

b20886c

created init for modules

78ab907

first implementation of BEST model

93d6385

Merge remote-tracking branch 'pymc-devs/master' into best

2ce1c30

updated with function to convert to indices

f150458

added notebook demo

360c8c6

ericmjl mentioned this pull request Nov 10, 2016

Submitting PR for BEST as a "default model" #1502

Closed

ericmjl added 2 commits November 10, 2016 01:21

updated text.

92b4cb7

updated text describing BEST

872337f

twiecki reviewed Nov 10, 2016

View reviewed changes

ericmjl added 2 commits November 10, 2016 10:25

changed "modules" directory name to "models"

b69c9dc

fixed stray debug comment

b3cffda

added test_vals and re-ran notebook

c0fd39b

added note on current API design.

7595681

ferrine requested changes Nov 12, 2016

View reviewed changes

return fig and ax for user mods

e62e58e

fonnesbeck reviewed Nov 12, 2016

View reviewed changes

ferrine mentioned this pull request Nov 12, 2016

User model #1525

Merged

Merge remote-tracking branch 'pymc-devs/master' into best

ee33707

Merge remote-tracking branch 'pymc-devs/master' into best

183ee9f

ericmjl closed this Apr 7, 2017

		@@ -0,0 +1,141 @@
		from ..distributions import StudentT, Exponential, Uniform, HalfCauchy

Submit BEST #1517

Submit BEST #1517

Conversation

ericmjl commented Nov 10, 2016 • edited

ColCarroll commented Nov 10, 2016

Choose a reason for hiding this comment

ericmjl Nov 10, 2016 • edited

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

twiecki commented Nov 10, 2016

ericmjl commented Nov 10, 2016

ericmjl commented Nov 11, 2016

springcoil commented Nov 11, 2016

ericmjl commented Nov 11, 2016

twiecki commented Nov 12, 2016

twiecki commented Nov 12, 2016

ferrine commented Nov 12, 2016

ericmjl commented Nov 12, 2016 • edited

ferrine left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferrine commented Nov 12, 2016 • edited

fonnesbeck commented Nov 12, 2016 • edited

ericmjl commented Nov 13, 2016

twiecki commented Nov 14, 2016

fonnesbeck commented Nov 14, 2016

springcoil commented Nov 16, 2016

burnpanck commented Nov 25, 2016

twiecki commented Nov 28, 2016

twiecki commented Dec 5, 2016

ericmjl commented Dec 5, 2016

twiecki commented Dec 5, 2016

springcoil commented Apr 7, 2017

ericmjl commented Apr 7, 2017

ericmjl commented Nov 10, 2016 •

edited

ericmjl Nov 10, 2016 •

edited

ericmjl commented Nov 12, 2016 •

edited

ferrine commented Nov 12, 2016 •

edited

fonnesbeck commented Nov 12, 2016 •

edited