Measuring feature importance for additional regressors #361

zhitkovk · 2017-11-16T12:28:20Z

Hi,

In one the last releases you've added feature with external regressors. Is there any possibility to estimate their impact on the target variable except training models with them and without them and comparing prediction accuracy on holdout set?

Thanks.

bletham · 2017-11-17T01:28:29Z

That certainly sounds like a good approach to me. You could fit with and without, and then use the cross_validate function to get an estimate of the prediction errors under each model.

Besides that... If you are doing MCMC then you could see if the predictive distribution of the component for the additional regressor contains 0. Its _lower and _upper pieces of the interval will be columns in the output of predict. If it does, that would be evidence towards that regressor not being useful. But note that it could still be a useless regressor while being a non-zero component of the model if it adds something that could have instead been captured by another component of the model. For instance an additional regressor that is a straight line could be used in a significant way, while still being redundant to the trend model and so removing it wouldn't affect model performance.

If you're not doing MCMC, then it would be cheaper to just fit the model twice. Is there a particular reason for not wanting to remove the regressor for estimating its importance?

zhitkovk · 2017-11-17T10:30:07Z

Thanks for the quick reply. I got your points about CV and MCMC usage. I was kinda interested in something like interpretation of dependence between target and regressors. I'm not speaking about OLS style of thing with exact effects, but something like positive/negative impact would be cool or as you've mentioned some importance scale like in caret package. (I suppose my lack of understanding of GAM models may play a role in desiring impossible features, but still).

bletham · 2017-11-17T18:03:21Z

That makes sense. In that case just looking at that component in the output of predict would be able to show what portion of yhat is coming from that regressor. With MCMC you could see uncertainty in that, otherwise you would still get a point estimate which could be qualitatively useful.

nsriram13 · 2017-11-20T17:47:59Z

This is a very interesting discussion. I am cross posting this question from another thread I had posted originally on. Is there a way to use spike and slab prior for variable selection using Prophet? Similar to the Bayesian structural time series approach (paper) for getting a parsimonious model?

@bletham: The MCMC approach that you have suggested above seems to follow a similar approach. On reading the docs, it seems like a normal prior is used by default for regressors. But would using a prior that is designed to make a model parsimonious (like a spike and slab) be better? And if so, is it easy to code in? Would love to hear your thoughts.

bletham · 2017-11-22T21:29:09Z

The bulk of the linear component of the model (the X*beta piece) is handling seasonalities. Seasonalities are modeled using fourier series - each column of X is a frequency in the Fourier series, and the betas are then the coefficients. A sparse prior on this component wouldn't be appropriate - it would correspond to the seasonality having a sparse frequency representation. There's no reason to believe that would be the case and I'd expect it to hurt performance.

However, for the columns of X that do correspond to extra regressors, a sparse prior on their betas like spike and slab could make sense. If we had a bunch of extra regressors and believed that the relevant signal could be captured from just a few, then I'd expect this to work really well. I don't have much of a feeling for how most people are using extra regressors, but if this does sound useful to anyone I'd love to hear it. It would be pretty easy to test out since the prior is contained entirely in the stan code. Basically the prior here:
https://github.com/facebook/prophet/blob/master/R/inst/stan/prophet_linear_growth.stan#L36
would be swapped with a sparse prior, but only for the columns corresponding to extra regressors.

nsriram13 · 2017-11-23T07:09:55Z

In our specific use case, we have several time series that can roll up into hierarchies. We believe the outcome of one time-series can affect the other. To capture some of the interaction effects across them we want to test several versions of lagged variables of different time series for the target one. Additionally, we have some regressors for each of the time series, which we want to test and see if it interacts with other time series. This results in a lot of variables and we believe that only a few regressors should really trigger for each time series.

I want add that vector autoregression or a hierarchical timeseries approach also seemed promising, but a simple time series approach with regressors seemed to be also worth giving a shot (especially given the robust implementations that are available for the latter :)).

timvink · 2018-03-29T12:58:33Z

@bletham : If we had a bunch of extra regressors and believed that the relevant signal could be captured from just a few, then I'd expect this to work really well. I don't have much of a feeling for how most people are using extra regressors, but if this does sound useful to anyone I'd love to hear it.

I'm working on a marketing project that tries to identity which of many activities (represented as separate time series) have a significant effect on sales. I'm looking to de-trend y and then see which extra regressors have a significant effect, and then use their co-efficients to tweak the marketing budget a bit.

Feature importance of regressors, f.e. with regressor coefficients, would be a great addition to prophet :)

roblisy · 2018-04-25T18:09:20Z

I'm doing the exact same thing @timvink is, previously used the BRMS approach in R https://cran.r-project.org/web/packages/brms/brms.pdf

nithints · 2018-05-30T10:20:54Z

Can Someone suggest me how can I do Vector Auto regressive with prophet

bletham · 2018-05-30T22:41:40Z

@nithints Prophet isn't an autoregressive model, so vector autoregression would not be possible. If you mean more generally just fitting a multivariate time series, this also isn't currently supported but there is an open issue for it in #49 you could follow along in.

nhernandez05 · 2018-11-16T22:36:35Z

Hello, I've read through multiple threads on this topic and have received some great insights. How can I observe the actual beta coefficients? I am working in R, and the [insert model name here]$params$beta will provide the beta coefficients but there is no names just column numbers [1-n]...is there a way to see which name corresponds with each beta coefficient? thanks!

bletham · 2018-11-21T17:10:28Z

@nhernandez05 I think m$train.component.cols should give you what you want. This is a dataframe. Each row is an entry in beta, and each column is the name of a component. 1 indicates that beta is involved in that component, 0 indicates it is not. For example, by default weekly seasonality uses 6 coefficients, and so there will be 6 rows that have 1 in the 'weekly' column, and those are them.

ghost · 2020-07-10T09:07:05Z

@bletham Would you be able to provide what the Python syntax would be to find these column names for the beta dataframe?

bletham · 2020-07-14T00:45:28Z

@MaxBirdChemEng it's in m.train_component_cols. It's a pandas dataframe with columns the name of each seasonal component and rows the corresponding entries in m.params['beta']. Row i column j of m.train_component_cols is 1 if beta_i is used in seasonality component j.

srilamaiti · 2020-11-21T23:35:22Z

@bletham : Why weekly seasonlaity has 6 coefficients?

srilamaiti · 2020-11-21T23:54:06Z

@bletham : Also how to handle outcome variable with high fluctuations?

michaelsabramson · 2021-02-03T02:08:08Z

@bletham : Why weekly seasonlaity has 6 coefficients?

When adjusting for seasonality you always need a baseline in the season. For example, in a weekly seasonality adjustment you should only adjust for six days because your seventh is your baseline level. This is easier to grasp using dummy seasonality variables in a normal linear regression. If you are adding seasonality for months, you should only include eleven months of dummies as the twelfth is excluded to create a baseline.

elif-tr · 2021-12-20T21:30:18Z

@timvink hey Tim, I read your approach above on the impact of marketing promos on the sales and was wondering how you utilized prophet for that activity, if ever? I am currently working on a similar project, tho the main goal is to understand the coeff impact of regressors on the y hat which is still not clear as none of the calculations are adding up to what I have in the data frame, I would like to know a bit more about your approach in your project?

timvink · 2021-12-23T19:04:35Z

Cool to get a reply 4 years later :) i did not use prophet for that project in the end, but ended up writing my own R package (and innersourced it at my previous company so can't share). If you search 'mixed marketing modelling'
(mmm) you should find enough to get started. Imho mmm is a bit of a pseudo science though, but taken with a big pinch of salt it could help tweak marketing budgets slightly.

zhitkovk changed the title ~~Regressors impact~~ Additional regressors impact Nov 16, 2017

bletham changed the title ~~Additional regressors impact~~ Measuring feature importance for additional regressors Nov 17, 2017

bletham added the enhancement label May 25, 2018

bletham mentioned this issue Jun 25, 2018

Regressor Influence #585

Closed

bletham mentioned this issue Nov 29, 2018

Description of the estimates #755

Closed

bletham mentioned this issue Oct 3, 2019

Compute confidence intervals and posteriors #1145

Closed

lefnire mentioned this issue Oct 1, 2020

Proper time-series / causal inference on fields ocdevel/gnothi#23

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Measuring feature importance for additional regressors #361

Measuring feature importance for additional regressors #361

zhitkovk commented Nov 16, 2017 •

edited

bletham commented Nov 17, 2017

zhitkovk commented Nov 17, 2017

bletham commented Nov 17, 2017

nsriram13 commented Nov 20, 2017

bletham commented Nov 22, 2017

nsriram13 commented Nov 23, 2017

timvink commented Mar 29, 2018

roblisy commented Apr 25, 2018 •

edited

nithints commented May 30, 2018

bletham commented May 30, 2018

nhernandez05 commented Nov 16, 2018 •

edited

bletham commented Nov 21, 2018

ghost commented Jul 10, 2020

bletham commented Jul 14, 2020

srilamaiti commented Nov 21, 2020

srilamaiti commented Nov 21, 2020

michaelsabramson commented Feb 3, 2021 •

edited

elif-tr commented Dec 20, 2021

timvink commented Dec 23, 2021

Measuring feature importance for additional regressors #361

Measuring feature importance for additional regressors #361

Comments

zhitkovk commented Nov 16, 2017 • edited

bletham commented Nov 17, 2017

zhitkovk commented Nov 17, 2017

bletham commented Nov 17, 2017

nsriram13 commented Nov 20, 2017

bletham commented Nov 22, 2017

nsriram13 commented Nov 23, 2017

timvink commented Mar 29, 2018

roblisy commented Apr 25, 2018 • edited

nithints commented May 30, 2018

bletham commented May 30, 2018

nhernandez05 commented Nov 16, 2018 • edited

bletham commented Nov 21, 2018

ghost commented Jul 10, 2020

bletham commented Jul 14, 2020

srilamaiti commented Nov 21, 2020

srilamaiti commented Nov 21, 2020

michaelsabramson commented Feb 3, 2021 • edited

elif-tr commented Dec 20, 2021

timvink commented Dec 23, 2021

zhitkovk commented Nov 16, 2017 •

edited

roblisy commented Apr 25, 2018 •

edited

nhernandez05 commented Nov 16, 2018 •

edited

michaelsabramson commented Feb 3, 2021 •

edited