Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Measuring feature importance for additional regressors #361

Open
zhitkovk opened this issue Nov 16, 2017 · 19 comments
Open

Measuring feature importance for additional regressors #361

zhitkovk opened this issue Nov 16, 2017 · 19 comments

Comments

@zhitkovk
Copy link

zhitkovk commented Nov 16, 2017

Hi,

In one the last releases you've added feature with external regressors. Is there any possibility to estimate their impact on the target variable except training models with them and without them and comparing prediction accuracy on holdout set?

Thanks.

@zhitkovk zhitkovk changed the title Regressors impact Additional regressors impact Nov 16, 2017
@bletham
Copy link
Contributor

bletham commented Nov 17, 2017

That certainly sounds like a good approach to me. You could fit with and without, and then use the cross_validate function to get an estimate of the prediction errors under each model.

Besides that... If you are doing MCMC then you could see if the predictive distribution of the component for the additional regressor contains 0. Its _lower and _upper pieces of the interval will be columns in the output of predict. If it does, that would be evidence towards that regressor not being useful. But note that it could still be a useless regressor while being a non-zero component of the model if it adds something that could have instead been captured by another component of the model. For instance an additional regressor that is a straight line could be used in a significant way, while still being redundant to the trend model and so removing it wouldn't affect model performance.

If you're not doing MCMC, then it would be cheaper to just fit the model twice. Is there a particular reason for not wanting to remove the regressor for estimating its importance?

@bletham bletham changed the title Additional regressors impact Measuring feature importance for additional regressors Nov 17, 2017
@zhitkovk
Copy link
Author

Thanks for the quick reply. I got your points about CV and MCMC usage. I was kinda interested in something like interpretation of dependence between target and regressors. I'm not speaking about OLS style of thing with exact effects, but something like positive/negative impact would be cool or as you've mentioned some importance scale like in caret package. (I suppose my lack of understanding of GAM models may play a role in desiring impossible features, but still).

@bletham
Copy link
Contributor

bletham commented Nov 17, 2017

That makes sense. In that case just looking at that component in the output of predict would be able to show what portion of yhat is coming from that regressor. With MCMC you could see uncertainty in that, otherwise you would still get a point estimate which could be qualitatively useful.

@nsriram13
Copy link

This is a very interesting discussion. I am cross posting this question from another thread I had posted originally on. Is there a way to use spike and slab prior for variable selection using Prophet? Similar to the Bayesian structural time series approach (paper) for getting a parsimonious model?

@bletham: The MCMC approach that you have suggested above seems to follow a similar approach. On reading the docs, it seems like a normal prior is used by default for regressors. But would using a prior that is designed to make a model parsimonious (like a spike and slab) be better? And if so, is it easy to code in? Would love to hear your thoughts.

@bletham
Copy link
Contributor

bletham commented Nov 22, 2017

The bulk of the linear component of the model (the X*beta piece) is handling seasonalities. Seasonalities are modeled using fourier series - each column of X is a frequency in the Fourier series, and the betas are then the coefficients. A sparse prior on this component wouldn't be appropriate - it would correspond to the seasonality having a sparse frequency representation. There's no reason to believe that would be the case and I'd expect it to hurt performance.

However, for the columns of X that do correspond to extra regressors, a sparse prior on their betas like spike and slab could make sense. If we had a bunch of extra regressors and believed that the relevant signal could be captured from just a few, then I'd expect this to work really well. I don't have much of a feeling for how most people are using extra regressors, but if this does sound useful to anyone I'd love to hear it. It would be pretty easy to test out since the prior is contained entirely in the stan code. Basically the prior here:
https://github.com/facebook/prophet/blob/master/R/inst/stan/prophet_linear_growth.stan#L36
would be swapped with a sparse prior, but only for the columns corresponding to extra regressors.

@nsriram13
Copy link

In our specific use case, we have several time series that can roll up into hierarchies. We believe the outcome of one time-series can affect the other. To capture some of the interaction effects across them we want to test several versions of lagged variables of different time series for the target one. Additionally, we have some regressors for each of the time series, which we want to test and see if it interacts with other time series. This results in a lot of variables and we believe that only a few regressors should really trigger for each time series.

I want add that vector autoregression or a hierarchical timeseries approach also seemed promising, but a simple time series approach with regressors seemed to be also worth giving a shot (especially given the robust implementations that are available for the latter :)).

@timvink
Copy link

timvink commented Mar 29, 2018

@bletham : If we had a bunch of extra regressors and believed that the relevant signal could be captured from just a few, then I'd expect this to work really well. I don't have much of a feeling for how most people are using extra regressors, but if this does sound useful to anyone I'd love to hear it.

I'm working on a marketing project that tries to identity which of many activities (represented as separate time series) have a significant effect on sales. I'm looking to de-trend y and then see which extra regressors have a significant effect, and then use their co-efficients to tweak the marketing budget a bit.

Feature importance of regressors, f.e. with regressor coefficients, would be a great addition to prophet :)

@roblisy
Copy link

roblisy commented Apr 25, 2018

I'm doing the exact same thing @timvink is, previously used the BRMS approach in R https://cran.r-project.org/web/packages/brms/brms.pdf

@nithints
Copy link

Can Someone suggest me how can I do Vector Auto regressive with prophet

@bletham
Copy link
Contributor

bletham commented May 30, 2018

@nithints Prophet isn't an autoregressive model, so vector autoregression would not be possible. If you mean more generally just fitting a multivariate time series, this also isn't currently supported but there is an open issue for it in #49 you could follow along in.

@nhernandez05
Copy link

nhernandez05 commented Nov 16, 2018

Hello, I've read through multiple threads on this topic and have received some great insights. How can I observe the actual beta coefficients? I am working in R, and the [insert model name here]$params$beta will provide the beta coefficients but there is no names just column numbers [1-n]...is there a way to see which name corresponds with each beta coefficient? thanks!

@bletham
Copy link
Contributor

bletham commented Nov 21, 2018

@nhernandez05 I think m$train.component.cols should give you what you want. This is a dataframe. Each row is an entry in beta, and each column is the name of a component. 1 indicates that beta is involved in that component, 0 indicates it is not. For example, by default weekly seasonality uses 6 coefficients, and so there will be 6 rows that have 1 in the 'weekly' column, and those are them.

@ghost
Copy link

ghost commented Jul 10, 2020

@bletham Would you be able to provide what the Python syntax would be to find these column names for the beta dataframe?

@bletham
Copy link
Contributor

bletham commented Jul 14, 2020

@MaxBirdChemEng it's in m.train_component_cols. It's a pandas dataframe with columns the name of each seasonal component and rows the corresponding entries in m.params['beta']. Row i column j of m.train_component_cols is 1 if beta_i is used in seasonality component j.

@srilamaiti
Copy link

@bletham : Why weekly seasonlaity has 6 coefficients?

@srilamaiti
Copy link

@bletham : Also how to handle outcome variable with high fluctuations?

@michaelsabramson
Copy link

michaelsabramson commented Feb 3, 2021

@bletham : Why weekly seasonlaity has 6 coefficients?

When adjusting for seasonality you always need a baseline in the season. For example, in a weekly seasonality adjustment you should only adjust for six days because your seventh is your baseline level. This is easier to grasp using dummy seasonality variables in a normal linear regression. If you are adding seasonality for months, you should only include eleven months of dummies as the twelfth is excluded to create a baseline.

@elif-tr
Copy link

elif-tr commented Dec 20, 2021

@timvink hey Tim, I read your approach above on the impact of marketing promos on the sales and was wondering how you utilized prophet for that activity, if ever? I am currently working on a similar project, tho the main goal is to understand the coeff impact of regressors on the y hat which is still not clear as none of the calculations are adding up to what I have in the data frame, I would like to know a bit more about your approach in your project?

@timvink
Copy link

timvink commented Dec 23, 2021

Cool to get a reply 4 years later :) i did not use prophet for that project in the end, but ended up writing my own R package (and innersourced it at my previous company so can't share). If you search 'mixed marketing modelling'
(mmm) you should find enough to get started. Imho mmm is a bit of a pseudo science though, but taken with a big pinch of salt it could help tweak marketing budgets slightly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants